Re: Spark crashes with two parquet files

Takeshi Yamamuro Sun, 10 Jul 2016 20:08:40 -0700

The log explicitly said "java.lang.OutOfMemoryError: Java heap space", so
you need to allocate more JVM memory for spark?


// maropu

On Mon, Jul 11, 2016 at 11:59 AM, Javier Rey <[email protected]> wrote:

> Also the problem appears when I used clause: unionAll
>
> 2016-07-10 21:58 GMT-05:00 Javier Rey <[email protected]>:
>
>> This is a part of trace log.
>>
>>  WARN TaskSetManager: Lost task 4.0 in stage 2.0 (TID 13, localhost):
>> java.lang.OutOfMemoryError: Java heap space
>>     at
>> org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:755)
>>     at
>> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:494)
>>     at
>> org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader.checkEndOfRowGroup(UnsafeRowParquetRecord
>>
>> 2016-07-10 21:47 GMT-05:00 Takeshi Yamamuro <[email protected]>:
>>
>>> Hi,
>>>
>>> What's the schema in the parquets?
>>> Also, could you show us the stack trace when the error happens?
>>>
>>> // maropu
>>>
>>> On Mon, Jul 11, 2016 at 11:42 AM, Javier Rey <[email protected]> wrote:
>>>
>>>> Hi everybody,
>>>>
>>>> I installed Spark 1.6.1, I have two parquet files, but when I need show
>>>> registers using unionAll, Spark crash I don't understand what happens.
>>>>
>>>> But when I use show() only one parquet file this is work correctly.
>>>>
>>>> code with fault:
>>>>
>>>> path = '/data/train_parquet/'
>>>> train_df = sqlContext.read.parquet(path)
>>>> train_df.take(1)
>>>>
>>>> code works:
>>>>
>>>> path = '/data/train_parquet/0_0_0.parquet'
>>>> train0_df = sqlContext.read.load(path)
>>>> train_df.take(1)
>>>>
>>>> Thanks in advance.
>>>>
>>>> Samir
>>>>
>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>


-- 
---
Takeshi Yamamuro

Re: Spark crashes with two parquet files

Reply via email to