The log explicitly said "java.lang.OutOfMemoryError: Java heap space", so you need to allocate more JVM memory for spark?
// maropu On Mon, Jul 11, 2016 at 11:59 AM, Javier Rey <[email protected]> wrote: > Also the problem appears when I used clause: unionAll > > 2016-07-10 21:58 GMT-05:00 Javier Rey <[email protected]>: > >> This is a part of trace log. >> >> WARN TaskSetManager: Lost task 4.0 in stage 2.0 (TID 13, localhost): >> java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:755) >> at >> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:494) >> at >> org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader.checkEndOfRowGroup(UnsafeRowParquetRecord >> >> 2016-07-10 21:47 GMT-05:00 Takeshi Yamamuro <[email protected]>: >> >>> Hi, >>> >>> What's the schema in the parquets? >>> Also, could you show us the stack trace when the error happens? >>> >>> // maropu >>> >>> On Mon, Jul 11, 2016 at 11:42 AM, Javier Rey <[email protected]> wrote: >>> >>>> Hi everybody, >>>> >>>> I installed Spark 1.6.1, I have two parquet files, but when I need show >>>> registers using unionAll, Spark crash I don't understand what happens. >>>> >>>> But when I use show() only one parquet file this is work correctly. >>>> >>>> code with fault: >>>> >>>> path = '/data/train_parquet/' >>>> train_df = sqlContext.read.parquet(path) >>>> train_df.take(1) >>>> >>>> code works: >>>> >>>> path = '/data/train_parquet/0_0_0.parquet' >>>> train0_df = sqlContext.read.load(path) >>>> train_df.take(1) >>>> >>>> Thanks in advance. >>>> >>>> Samir >>>> >>> >>> >>> >>> -- >>> --- >>> Takeshi Yamamuro >>> >> >> > -- --- Takeshi Yamamuro
