Re: Create RDD from output of unix command
You may want to look into using the pipe command .. http://blog.madhukaraphatak.com/pipe-in-spark/ http://spark.apache.org/docs/0.6.0/api/core/spark/rdd/PipedRDD.html -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723p23895.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Create RDD from output of unix command
haven't you thought about spark streaming? there is thread that could help https://www.mail-archive.com/user%40spark.apache.org/msg30105.html On 14 July 2015 at 18:20, Hafsa Asif wrote: > Your question is very interesting. What I suggest is, that copy your output > in some text file. Read text file in your code and apply RDD. Just consider > wordcount example by Spark. I love this example with Java client. Well, > Spark is an analytical engine and it has a slogan to analyze big big data > so > from my point of view your assumption is wrong. > > You can also save your data in any respository in some structured form. > This > will give you more exposure of Spark behavior. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723p23830.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Create RDD from output of unix command
Your question is very interesting. What I suggest is, that copy your output in some text file. Read text file in your code and apply RDD. Just consider wordcount example by Spark. I love this example with Java client. Well, Spark is an analytical engine and it has a slogan to analyze big big data so from my point of view your assumption is wrong. You can also save your data in any respository in some structured form. This will give you more exposure of Spark behavior. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723p23830.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Create RDD from output of unix command
As a distributed data processing engine, Spark should be fine with millions of lines. It's built with the idea of massive data sets in mind. Do you have more details on how you anticipate the output of a unix command interacting with a running Spark application? Do you expect Spark to be continuously running and somehow observe unix command outputs? Or are you thinking more along the lines of running a unix command with output and then taking whatever format that is and running a spark job against it? If it's the latter, it should be as simple as writing the command output to a file and then loading the file into an RDD in Spark. On Wed, Jul 8, 2015 at 2:02 PM, foobar wrote: > What's the best practice of creating RDD from some external unix command > output? I assume if the output size is large (say millions of lines), > creating RDD from an array of all lines is not a good idea? Thanks! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- -- *Richard Marscher* Software Engineer Localytics Localytics.com <http://localytics.com/> | Our Blog <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> | Facebook <http://facebook.com/localytics> | LinkedIn <http://www.linkedin.com/company/1148792?trk=tyah>
Create RDD from output of unix command
What's the best practice of creating RDD from some external unix command output? I assume if the output size is large (say millions of lines), creating RDD from an array of all lines is not a good idea? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org