Re: SIGTERM 15 Issue : Spark Streaming for ingesting huge text files using custom Receiver

2015-09-12 Thread Jörn Franke
I am not sure what are you trying to achieve here. Have you thought about
using flume? Additionally maybe something like rsync?

Le sam. 12 sept. 2015 à 0:02, Varadhan, Jawahar 
a écrit :

> Hi all,
>I have a coded a custom receiver which receives kafka messages. These
> Kafka messages have FTP server credentials in them. The receiver then opens
> the message and uses the ftp credentials in it  to connect to the ftp
> server. It then streams this huge text file (3.3G) . Finally this stream it
> read line by line using buffered reader and pushed to the spark streaming
> via the receiver's "store" method. Spark streaming process receives all
> these lines and stores it in hdfs.
>
> With this process I could ingest small files (50 mb) but cant ingest this
> 3.3gb file.  I get a YARN exception of SIGTERM 15 in spark streaming
> process. Also, I tried going to that 3.3GB file directly (without custom
> receiver) in spark streaming using ssc.textFileStream  and everything works
> fine and that file ends in HDFS
>
> Please let me know what I might have to do to get this working with
> receiver. I know there are better ways to ingest the file but we need to
> use Spark streaming in our case.
>
> Thanks.
>


SIGTERM 15 Issue : Spark Streaming for ingesting huge text files using custom Receiver

2015-09-11 Thread Varadhan, Jawahar
Hi all,   I have a coded a custom receiver which receives kafka messages. These 
Kafka messages have FTP server credentials in them. The receiver then opens the 
message and uses the ftp credentials in it  to connect to the ftp server. It 
then streams this huge text file (3.3G) . Finally this stream it read line by 
line using buffered reader and pushed to the spark streaming via the receiver's 
"store" method. Spark streaming process receives all these lines and stores it 
in hdfs.
With this process I could ingest small files (50 mb) but cant ingest this 3.3gb 
file.  I get a YARN exception of SIGTERM 15 in spark streaming process. Also, I 
tried going to that 3.3GB file directly (without custom receiver) in spark 
streaming using ssc.textFileStream  and everything works fine and that file 
ends in HDFS
Please let me know what I might have to do to get this working with receiver. I 
know there are better ways to ingest the file but we need to use Spark 
streaming in our case.
Thanks.