Also, ssc.textFileStream(dataDir) will read all the files from a directory so as far as I can see there's no need to merge the files. Just write them to the same HDFS directory.
On Thu, Mar 31, 2016 at 8:04 AM, Femi Anthony <femib...@gmail.com> wrote: > I don't think you need to do it this way. > > Take a look here : > http://spark.apache.org/docs/latest/streaming-programming-guide.html > in this section: > Level of Parallelism in Data Receiving > Receiving multiple data streams can therefore be achieved by creating > multiple input DStreams and configuring them to receive different > partitions of the data stream from the source(s).... > These multiple DStreams can be unioned together to create a single > DStream. Then the transformations that were being applied on a single input > DStream can be applied on the unified stream. > > > On Wed, Mar 30, 2016 at 11:08 PM, kramer2...@126.com <kramer2...@126.com> > wrote: > >> Hi >> >> My environment is described like below: >> >> 5 nodes, each nodes generate a big csv file every 5 minutes. I need spark >> stream to analyze these 5 files in every five minutes to generate some >> report. >> >> I am planning to do it in this way: >> >> 1. Put those 5 files into HDSF directory called /data >> 2. Merge them into one big file in that directory >> 3. Use spark stream constructor textFileStream('/data') to generate my >> inputDStream >> >> The problem of this way is I do not know how to merge the 5 files in HDFS. >> It seems very difficult to do it in python. >> >> So question is >> >> 1. Can you tell me how to merge files in hdfs by python? >> 2. Do you know some other way to input those files into spark? >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-design-the-input-source-of-spark-stream-tp26641.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > http://www.femibyte.com/twiki5/bin/view/Tech/ > http://www.nextmatrix.com > "Great spirits have always encountered violent opposition from mediocre > minds." - Albert Einstein. > -- http://www.femibyte.com/twiki5/bin/view/Tech/ http://www.nextmatrix.com "Great spirits have always encountered violent opposition from mediocre minds." - Albert Einstein.