Thank you for your replies.

@Mich, using LIMIT 100 in the query prevents the exception but given the
fact that there's enough memory, I don't think this should happen even
without LIMIT.

@Vadim, here's the full stack trace:

Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with
more than 17179869176 bytes
        at org.apache.spark.memory.TaskMemoryManager.allocatePage(
TaskMemoryManager.java:241)
        at org.apache.spark.memory.MemoryConsumer.allocatePage(
MemoryConsumer.java:121)
        at org.apache.spark.util.collection.unsafe.sort.
UnsafeExternalSorter.acquireNewPageIfNecessary(
UnsafeExternalSorter.java:374)
        at org.apache.spark.util.collection.unsafe.sort.
UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:396)
        at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(
UnsafeExternalRowSorter.java:94)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$
GeneratedIterator.sort_addToSorter$(Unknown Source)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$
GeneratedIterator.agg_doAggregateWithoutKey$(Unknown Source)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$
GeneratedIterator.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.
hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$
anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(
BypassMergeSortShuffleWriter.java:125)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(
ShuffleMapTask.scala:79)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(
ShuffleMapTask.scala:47)
        at org.apache.spark.scheduler.Task.run(Task.scala:85)
        at org.apache.spark.executor.Executor$TaskRunner.run(
Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

I'm running spark in local mode so there is only one executor, the driver
and spark.driver.memory is set to 64g. Changing the driver's memory doesn't
help.

*Babak Alipour ,*
*University of Florida*

On Fri, Sep 30, 2016 at 2:05 PM, Vadim Semenov <vadim.seme...@datadoghq.com>
wrote:

> Can you post the whole exception stack trace?
> What are your executor memory settings?
>
> Right now I assume that it happens in UnsafeExternalRowSorter ->
> UnsafeExternalSorter:insertRecord
>
> Running more executors with lower `spark.executor.memory` should help.
>
>
> On Fri, Sep 30, 2016 at 12:57 PM, Babak Alipour <babak.alip...@gmail.com>
> wrote:
>
>> Greetings everyone,
>>
>> I'm trying to read a single field of a Hive table stored as Parquet in
>> Spark (~140GB for the entire table, this single field should be just a few
>> GB) and look at the sorted output using the following:
>>
>> sql("SELECT " + field + " FROM MY_TABLE ORDER BY " + field + " DESC")
>>
>> ​But this simple line of code gives:
>>
>> Caused by: java.lang.IllegalArgumentException: Cannot allocate a page
>> with more than 17179869176 bytes
>>
>> Same error for:
>>
>> sql("SELECT " + field + " FROM MY_TABLE).sort(field)
>>
>> and:
>>
>> sql("SELECT " + field + " FROM MY_TABLE).orderBy(field)
>>
>>
>> I'm running this on a machine with more than 200GB of RAM, running in
>> local mode with spark.driver.memory set to 64g.
>>
>> I do not know why it cannot allocate a big enough page, and why is it
>> trying to allocate such a big page in the first place?
>>
>> I hope someone with more knowledge of Spark can shed some light on this.
>> Thank you!
>>
>>
>> *​Best regards,​*
>> *Babak Alipour ,*
>> *University of Florida*
>>
>
>

Reply via email to