Hi all, I have a coded a custom receiver which receives kafka messages. These
Kafka messages have FTP server credentials in them. The receiver then opens the
message and uses the ftp credentials in it to connect to the ftp server. It
then streams this huge text file (3.3G) . Finally this stream it read line by
line using buffered reader and pushed to the spark streaming via the receiver's
"store" method. Spark streaming process receives all these lines and stores it
in hdfs.
With this process I could ingest small files (50 mb) but cant ingest this 3.3gb
file. I get a YARN exception of SIGTERM 15 in spark streaming process. Also, I
tried going to that 3.3GB file directly (without custom receiver) in spark
streaming using ssc.textFileStream and everything works fine and that file
ends in HDFS
Please let me know what I might have to do to get this working with receiver. I
know there are better ways to ingest the file but we need to use Spark
streaming in our case.
Thanks.