Re: Getting Output From a Cluster

2015-01-12 Thread Su She
Okay, thanks Akhil! Suhas Shekar University of California, Los Angeles B.A. Economics, Specialization in Computing 2014 On Mon, Jan 12, 2015 at 1:24 PM, Akhil Das wrote: > There is no direct way of doing that. If you need a Single file for every > batch duration, then you can repartition the d

Re: Getting Output From a Cluster

2015-01-12 Thread Akhil Das
There is no direct way of doing that. If you need a Single file for every batch duration, then you can repartition the data to 1 before saving. Another way would be to use hadoop's copy merge command/api(available from 2.0 versions) On 13 Jan 2015 01:08, "Su She" wrote: > Hello Everyone, > > Quic

Re: Getting Output From a Cluster

2015-01-12 Thread Su She
Hello Everyone, Quick followup, is there any way I can append output to one file rather then create a new directory/file every X milliseconds? Thanks! Suhas Shekar University of California, Los Angeles B.A. Economics, Specialization in Computing 2014 On Thu, Jan 8, 2015 at 11:41 PM, Su She wr

Re: Getting Output From a Cluster

2015-01-08 Thread Su She
1) Thank you everyone for the help once again...the support here is really amazing and I hope to contribute soon! 2) The solution I actually ended up using was from this thread: http://mail-archives.apache.org/mod_mbox/spark-user/201310.mbox/%3ccafnzj5ejxdgqju7nbdqy6xureq3d1pcxr+i2s99g5brcj5e...@m

Re: Getting Output From a Cluster

2015-01-08 Thread Akhil Das
saveAsHadoopFiles requires you to specify the output format which i believe you are not specifying anywhere and hence the program crashes. You could try something like this: Class> outputFormatClass = (Class>) (Class) SequenceFileOutputFormat.class; 46 yourStream.saveAsNewAPIHadoopFiles(hdfsUrl,

Re: Getting Output From a Cluster

2015-01-08 Thread Su She
Yes, I am calling the saveAsHadoopFiles on the Dstream. However, when I call print on the Dstream it works? If I had to do foreachRDD to saveAsHadoopFile, then why is it working for print? Also, if I am doing foreachRDD, do I need connections, or can I simply put the saveAsHadoopFiles inside the f

Re: Getting Output From a Cluster

2015-01-08 Thread Yana Kadiyska
are you calling the saveAsText files on the DStream --looks like it? Look at the section called "Design Patterns for using foreachRDD" in the link you sent -- you want to do dstream.foreachRDD(rdd => rdd.saveAs) On Thu, Jan 8, 2015 at 5:20 PM, Su She wrote: > Hello Everyone, > > Thanks in a

Getting Output From a Cluster

2015-01-08 Thread Su She
Hello Everyone, Thanks in advance for the help! I successfully got my Kafka/Spark WordCount app to print locally. However, I want to run it on a cluster, which means that I will have to save it to HDFS if I want to be able to read the output. I am running Spark 1.1.0, which means according to th