Hi Anthony

Thanks. You are right the api will read all files, no need to merge






At 2016-03-31 20:09:25, "Femi Anthony" <femib...@gmail.com> wrote:

Also,  ssc.textFileStream(dataDir) will read all the files from a directory so 
as far as I can see there's no need to merge the files. Just write them to the 
same HDFS directory.


On Thu, Mar 31, 2016 at 8:04 AM, Femi Anthony <femib...@gmail.com> wrote:

I don't think you need to do it this way.

Take a look here : 
http://spark.apache.org/docs/latest/streaming-programming-guide.html
in this section:

Level of Parallelism in Data Receiving
 Receiving multiple data streams can therefore be achieved by creating multiple 
input DStreams and configuring them to receive different partitions of the data 
stream from the source(s)....
These multiple DStreams can be unioned together to create a single DStream. 
Then the transformations that were being applied on a single input DStream can 
be applied on the unified stream.




On Wed, Mar 30, 2016 at 11:08 PM, kramer2...@126.com<kramer2...@126.com> wrote:
Hi

My environment is described like below:

5 nodes, each nodes generate a big csv file every 5 minutes. I need spark
stream to analyze these 5 files in every five minutes to generate some
report.

I am planning to do it in this way:

1. Put those 5 files into HDSF directory called /data
2. Merge them into one big file in that directory
3. Use spark stream constructor textFileStream('/data') to generate my
inputDStream

The problem of this way is I do not know how to merge the 5 files in HDFS.
It seems very difficult to do it in python.

So question is

1. Can you tell me how to merge files in hdfs by python?
2. Do you know some other way to input those files into spark?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-design-the-input-source-of-spark-stream-tp26641.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org







--

http://www.femibyte.com/twiki5/bin/view/Tech/
http://www.nextmatrix.com
"Great spirits have always encountered violent opposition from mediocre minds." 
- Albert Einstein.





--

http://www.femibyte.com/twiki5/bin/view/Tech/
http://www.nextmatrix.com
"Great spirits have always encountered violent opposition from mediocre minds." 
- Albert Einstein.

Reply via email to