Thank you for your replies. @Mich, using LIMIT 100 in the query prevents the exception but given the fact that there's enough memory, I don't think this should happen even without LIMIT.
@Vadim, here's the full stack trace: Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with more than 17179869176 bytes at org.apache.spark.memory.TaskMemoryManager.allocatePage( TaskMemoryManager.java:241) at org.apache.spark.memory.MemoryConsumer.allocatePage( MemoryConsumer.java:121) at org.apache.spark.util.collection.unsafe.sort. UnsafeExternalSorter.acquireNewPageIfNecessary( UnsafeExternalSorter.java:374) at org.apache.spark.util.collection.unsafe.sort. UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:396) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow( UnsafeExternalRowSorter.java:94) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$ GeneratedIterator.sort_addToSorter$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$ GeneratedIterator.agg_doAggregateWithoutKey$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$ GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator. hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$ anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write( BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask( ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask( ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run( Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) I'm running spark in local mode so there is only one executor, the driver and spark.driver.memory is set to 64g. Changing the driver's memory doesn't help. *Babak Alipour ,* *University of Florida* On Fri, Sep 30, 2016 at 2:05 PM, Vadim Semenov <vadim.seme...@datadoghq.com> wrote: > Can you post the whole exception stack trace? > What are your executor memory settings? > > Right now I assume that it happens in UnsafeExternalRowSorter -> > UnsafeExternalSorter:insertRecord > > Running more executors with lower `spark.executor.memory` should help. > > > On Fri, Sep 30, 2016 at 12:57 PM, Babak Alipour <babak.alip...@gmail.com> > wrote: > >> Greetings everyone, >> >> I'm trying to read a single field of a Hive table stored as Parquet in >> Spark (~140GB for the entire table, this single field should be just a few >> GB) and look at the sorted output using the following: >> >> sql("SELECT " + field + " FROM MY_TABLE ORDER BY " + field + " DESC") >> >> But this simple line of code gives: >> >> Caused by: java.lang.IllegalArgumentException: Cannot allocate a page >> with more than 17179869176 bytes >> >> Same error for: >> >> sql("SELECT " + field + " FROM MY_TABLE).sort(field) >> >> and: >> >> sql("SELECT " + field + " FROM MY_TABLE).orderBy(field) >> >> >> I'm running this on a machine with more than 200GB of RAM, running in >> local mode with spark.driver.memory set to 64g. >> >> I do not know why it cannot allocate a big enough page, and why is it >> trying to allocate such a big page in the first place? >> >> I hope someone with more knowledge of Spark can shed some light on this. >> Thank you! >> >> >> *Best regards,* >> *Babak Alipour ,* >> *University of Florida* >> > >