Hi All I am using Spark Streaming with Kafka, I recieve messages and after minor processing I write them to HDFS, as of now I am using saveAsTextFiles() / saveAsHadoopFiles() Java methods
- Is there some default way of writing stream to Hadoop like we have HDFS sink concept in Flume? I mean is there some configurable way of writing at Spark Streaming after processing DStream. - How can I check if DStream is empty so that I can skip HDFS write if no message is present (I am pulling Kafka topic every 1 sec)? because sometime it writes empty file to HDFS due to unavailability of messages. Please suggest. TIA -- Anish Sneh "Experience is the best teacher." +91-99718-55883 http://in.linkedin.com/in/anishsneh