Spark streaming multiple kafka topic doesn't work at-least-once

2017-01-23 Thread hakanilter
Hi everyone, I have a spark (1.6.0-cdh5.7.1) streaming job which receives data from multiple kafka topics. After starting the job, everything works fine first (like 700 req/sec) but after a while (couples of days or a week) it starts processing only some part of the data (like 350 req/sec). When

Re: Problem with loading files: Loss was due to java.io.EOFException java.io.EOFException

2014-05-21 Thread hakanilter
The problem is solved after hadoop-core dependency added. But I think there is a misunderstanding about local files. I found this one: Note that if you've connected to a Spark master, it's possible that it will attempt to load the file on one of the different machines in the cluster, so make sure