quot;)
>
> val ssc = new StreamingContext(sc, Seconds(4))
>
> val dStream = ssc.textFileStream(pathOfDirToStream)
>
> dStream.foreachRDD { eventsRdd => */* How to get the file name */* }
>
>
>
>
>
> *From:* Jörn Franke [mailto:jornfra...@gmail.com]
> *Sent:*
/* How to get the file name */ }
From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: Thursday, September 15, 2016 11:02 PM
To: Kappaganthu, Sivaram (ES)
Cc: user@spark.apache.org
Subject: Re: Spark Streaming-- for each new file in HDFS
Hi,
I recommend that the third party application puts an e
On 16 Sep 2016, at 01:03, Peyman Mohajerian
mailto:mohaj...@gmail.com>> wrote:
You can listen to files in a specific directory using:
Take a look at:
http://spark.apache.org/docs/latest/streaming-programming-guide.html
streamingContext.fileStream
yes, this works
here's an example I'm using
You can listen to files in a specific directory using:
Take a look at:
http://spark.apache.org/docs/latest/streaming-programming-guide.html
streamingContext.fileStream
On Thu, Sep 15, 2016 at 10:31 AM, Jörn Franke wrote:
> Hi,
> I recommend that the third party application puts an empty file w
Hi,
I recommend that the third party application puts an empty file with the same
filename as the original file, but the extension ".uploaded". This is an
indicator that the file has been fully (!) written to the fs. Otherwise you
risk only reading parts of the file.
Then, you can have a file sy
Hello,
I am a newbie to spark and I have below requirement.
Problem statement : A third party application is dumping files continuously in
a server. Typically the count of files is 100 files per hour and each file is
of size less than 50MB. My application has to process those files.
Here
1)