Also, the Spark SQL insert seems to take only two tasks per stage. That might be the reason why it does not have sufficient memory. Is there a way to increase the number of tasks when doing the sql insert?
Stage IdDescriptionSubmittedDurationTasks: Succeeded/TotalInputOutputShuffle ReadShuffle Write 12 (kill) <http://10.224.161.30:4041/stages/stage/kill/?id=12&terminate=true>save at SaveUsersToHdfs.scala:255 <http://10.224.161.30:4041/stages/stage?id=12&attempt=0>+details 2016/05/20 16:32:47 5.0 min 0/2 21.4 MB On Fri, May 20, 2016 at 3:43 PM, SRK <swethakasire...@gmail.com> wrote: > > Hi, > > I see some memory issues when trying to insert the data in the form of ORC > using Spark SQL. Please find the query and exception below. Any idea as to > why this is happening? > > sqlContext.sql(" CREATE EXTERNAL TABLE IF NOT EXISTS records (id STRING, > record STRING) PARTITIONED BY (datePartition STRING, idPartition STRING) > stored as ORC LOCATION '/user/users' ") > sqlContext.sql(" orc.compress= SNAPPY") > sqlContext.sql( > """ from recordsTemp ps insert overwrite table users > partition(datePartition , idPartition ) select ps.id, ps.record , > ps.datePartition, ps.idPartition """.stripMargin) > > > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in > stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 13.0org.apache.hadoop.hive.ql.metadata.HiveException: > parquet.hadoop.MemoryManager$1: New Memory allocation 1048575 bytes is > smaller than the minimum allocation size of 1048576 bytes. > at > > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249) > at > org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.org > $apache$spark$sql$hive$SparkHiveDynamicPartitionWriterContainer$$newWriter$1(hiveWriterContainers.scala:240) > at > > org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:249) > at > > org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:249) > at > scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189) > at > scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91) > at > > org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.getLocalFileWriter(hiveWriterContainers.scala:249) > at > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:112) > at > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:104) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org > $apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:104) > at > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84) > at > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: parquet.hadoop.MemoryManager$1: New Memory allocation 1048575 > bytes is smaller than the minimum allocation size of 1048576 bytes. > at > parquet.hadoop.MemoryManager.updateAllocation(MemoryManager.java:125) > at parquet.hadoop.MemoryManager.addWriter(MemoryManager.java:82) > at > parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:104) > at > > parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303) > at > > parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:267) > at > > org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.<init>(ParquetRecordWriterWrapper.java:65) > at > > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getParquerRecordWriterWrapper(MapredParquetOutputFormat.java:125) > at > > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:114) > at > > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261) > at > > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246) > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Memory-issues-when-trying-to-insert-data-in-the-form-of-ORC-using-Spark-SQL-tp26988.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >