I am not sure what are you trying to achieve here. Have you thought about using flume? Additionally maybe something like rsync?
Le sam. 12 sept. 2015 à 0:02, Varadhan, Jawahar <varad...@yahoo.com.invalid> a écrit : > Hi all, > I have a coded a custom receiver which receives kafka messages. These > Kafka messages have FTP server credentials in them. The receiver then opens > the message and uses the ftp credentials in it to connect to the ftp > server. It then streams this huge text file (3.3G) . Finally this stream it > read line by line using buffered reader and pushed to the spark streaming > via the receiver's "store" method. Spark streaming process receives all > these lines and stores it in hdfs. > > With this process I could ingest small files (50 mb) but cant ingest this > 3.3gb file. I get a YARN exception of SIGTERM 15 in spark streaming > process. Also, I tried going to that 3.3GB file directly (without custom > receiver) in spark streaming using ssc.textFileStream and everything works > fine and that file ends in HDFS > > Please let me know what I might have to do to get this working with > receiver. I know there are better ways to ingest the file but we need to > use Spark streaming in our case. > > Thanks. >