Concurrent inserts into the same table are not supported. I can try to make this clearer in the documentation.
On Tue, Feb 17, 2015 at 8:01 PM, Vasu C <vasuc.bigd...@gmail.com> wrote: > Hi, > > I am running spark batch processing job using spark-submit command. And > below is my code snippet. Basically converting JsonRDD to parquet and > storing it in HDFS location. > > The problem I am facing is if multiple jobs are are triggered parallely, > even though job executes properly (as i can see in spark webUI), there is > no parquet file created in hdfs path. If 5 jobs are executed parallely than > only 3 parquet files are getting created. > > Is this the data loss scenario ? Or am I missing something here. Please > help me in this > > Here tableName is unique with timestamp appended to it. > > > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > > val jsonRdd = sqlContext.jsonRDD(results) > > val parquetTable = sqlContext.parquetFile(parquetFilePath) > > parquetTable.registerTempTable(tableName) > > jsonRdd.insertInto(tableName) > > > Regards, > > Vasu C >