> On 21 Oct 2016, at 15:53, Nkechi Achara <nkach...@googlemail.com> wrote:
> 
> Hi, 
> 
> I am using Spark 1.5.0 to read gz files with textFileStream, but when new 
> files are dropped in the specified directory. I know this is only the case 
> with gz files as when i extract the file into the directory specified the 
> files are read on the next window and processed.
> 
> My code is here:
> 
> val comments = ssc.fileStream[LongWritable, Text, 
> TextInputFormat]("file:///tmp/", (f: Path) => true, newFilesOnly=false).
>       map(pair => pair._2.toString)
>     comments.foreachRDD(i => i.foreach(m=> println(m)))
> 
> any idea why the gz files are not being recognized.
> 
> Thanks in advance,
> 
> K

Are the files being written in the directory or renamed in? As you should be 
using rename() against a filesystem (not an object store) to make sure that the 
file isn't picked up

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to