Re: Create RDD from output of unix command

2015-07-18 Thread Gylfi
You may want to look into using the pipe command .. 
http://blog.madhukaraphatak.com/pipe-in-spark/
http://spark.apache.org/docs/0.6.0/api/core/spark/rdd/PipedRDD.html




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723p23895.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Create RDD from output of unix command

2015-07-14 Thread Igor Berman
haven't you thought about spark streaming? there is thread that could help
https://www.mail-archive.com/user%40spark.apache.org/msg30105.html

On 14 July 2015 at 18:20, Hafsa Asif  wrote:

> Your question is very interesting. What I suggest is, that copy your output
> in some text file. Read text file in your code and apply RDD. Just consider
> wordcount example by Spark. I love this example with Java client. Well,
> Spark is an analytical engine and it has a slogan to analyze big big data
> so
> from my point of view your assumption is wrong.
>
> You can also save your data in any respository in some structured form.
> This
> will give you more exposure of Spark behavior.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723p23830.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Create RDD from output of unix command

2015-07-14 Thread Hafsa Asif
Your question is very interesting. What I suggest is, that copy your output
in some text file. Read text file in your code and apply RDD. Just consider
wordcount example by Spark. I love this example with Java client. Well,
Spark is an analytical engine and it has a slogan to analyze big big data so
from my point of view your assumption is wrong. 

You can also save your data in any respository in some structured form. This
will give you more exposure of Spark behavior.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723p23830.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Create RDD from output of unix command

2015-07-08 Thread Richard Marscher
As a distributed data processing engine, Spark should be fine with millions
of lines. It's built with the idea of massive data sets in mind. Do you
have more details on how you anticipate the output of a unix command
interacting with a running Spark application? Do you expect Spark to be
continuously running and somehow observe unix command outputs? Or are you
thinking more along the lines of running a unix command with output and
then taking whatever format that is and running a spark job against it? If
it's the latter, it should be as simple as writing the command output to a
file and then loading the file into an RDD in Spark.

On Wed, Jul 8, 2015 at 2:02 PM, foobar  wrote:

> What's the best practice of creating RDD from some external unix command
> output? I assume if the output size is large (say millions of lines),
> creating RDD from an array of all lines is not a good idea? Thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
-- 
*Richard Marscher*
Software Engineer
Localytics
Localytics.com  | Our Blog
 | Twitter  |
Facebook  | LinkedIn