Also,  ssc.textFileStream(dataDir) will read all the files from a directory
so as far as I can see there's no need to merge the files. Just write them
to the same HDFS directory.

On Thu, Mar 31, 2016 at 8:04 AM, Femi Anthony <femib...@gmail.com> wrote:

> I don't think you need to do it this way.
>
> Take a look here :
> http://spark.apache.org/docs/latest/streaming-programming-guide.html
> in this section:
> Level of Parallelism in Data Receiving
>  Receiving multiple data streams can therefore be achieved by creating
> multiple input DStreams and configuring them to receive different
> partitions of the data stream from the source(s)....
> These multiple DStreams can be unioned together to create a single
> DStream. Then the transformations that were being applied on a single input
> DStream can be applied on the unified stream.
>
>
> On Wed, Mar 30, 2016 at 11:08 PM, kramer2...@126.com <kramer2...@126.com>
> wrote:
>
>> Hi
>>
>> My environment is described like below:
>>
>> 5 nodes, each nodes generate a big csv file every 5 minutes. I need spark
>> stream to analyze these 5 files in every five minutes to generate some
>> report.
>>
>> I am planning to do it in this way:
>>
>> 1. Put those 5 files into HDSF directory called /data
>> 2. Merge them into one big file in that directory
>> 3. Use spark stream constructor textFileStream('/data') to generate my
>> inputDStream
>>
>> The problem of this way is I do not know how to merge the 5 files in HDFS.
>> It seems very difficult to do it in python.
>>
>> So question is
>>
>> 1. Can you tell me how to merge files in hdfs by python?
>> 2. Do you know some other way to input those files into spark?
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-design-the-input-source-of-spark-stream-tp26641.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> http://www.femibyte.com/twiki5/bin/view/Tech/
> http://www.nextmatrix.com
> "Great spirits have always encountered violent opposition from mediocre
> minds." - Albert Einstein.
>



-- 
http://www.femibyte.com/twiki5/bin/view/Tech/
http://www.nextmatrix.com
"Great spirits have always encountered violent opposition from mediocre
minds." - Albert Einstein.

Reply via email to