Hi,
I'm think I may have encountered some kind of bug that at the moment prevents 
the correct running of my application on a EC2 Cluster.
I'm saying that because the same exact code works wonderfully locally but has a 
really strange behaviour on the cluster.
val uri = ssc.textFileStream(args(1) + "/inputData/newData/")
uri.print() // prints perfectly the data
uri.saveAsTextFiles((args(1) + "/uri/textFiles/"), "") // saves the file as 
intended
val downloaded = uri.map(s => download(s))
      .flatMap(t => t).map(t => createKey(t))
val downloadedAndFiltered = downloaded.filter(t => filterEcho(t))
downloadedAndFilteredAndEchoFiltered(t => t._2).saveAsTextFiles((args(1) + 
"/dissected/textFiles/"), "")
// saves an empty file(why???)
downloadedAndFilteredAndEchoFiltered.print() // prints perfectly
// from now all data gets lost, any further call on 
downloadedAndFilteredAndEchoFiltered DStream receive empty input

I have no idea of what it could be that breaks down my application, someone 
knows if there are some known bug?

Gianluca

Reply via email to