Re: Best practice for transforming and storing from Spark to Mongo/HDFS

2015-07-25 Thread Cody Koeninger
Use foreachPartition and batch the writes On Sat, Jul 25, 2015 at 9:14 AM, wrote: > Hello, > I am new user of Spark, and need to know what could be the best practice > to do the following scenario : > > - Spark Streaming receives XML messages from Kafka > - Spark transforms each message of the R

Best practice for transforming and storing from Spark to Mongo/HDFS

2015-07-25 Thread nibiau
Hello, I am new user of Spark, and need to know what could be the best practice to do the following scenario : - Spark Streaming receives XML messages from Kafka - Spark transforms each message of the RDD (xml2json + some enrichments) - Spark store the transformed/enriched messages inside MongoDB