As I am exploring memory challenges on some queries, I noticed a few things, had some questions, and thought I'd summarize here for the group. (And hopefully get some questions answered).
Basically, I was trying to run a query, with a few aggregations and having it fail. I tried upping the memory per node, and reducing the width per node, and still couldn't get it to run, so I started reading and dove into a few subjects. First of all, I found https://drill.apache.org/docs/guidelines-for-optimizing-aggregation/ and by setting multiphase_agg to false, my query worked as is. This was a bit "magical" to me, so I am hoping that some founds can explain, I will say the uniqueness of the grouped field was high, so that may help in the explanation. Ok, that worked, so I set it back to true and wanted to play with a sort based aggregation vs. hash based. https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/ What I read there was hash based will fail when they run out of memory, and sort based will spill to disk. *side node: I have setup my nodes to splill to MapR FS local volumes. Basically setting things similar to how MapR configures temp space in hadoop. I create local volumes in the drill-env.sh and I believe there is another mailing list post where I share my magic there. Also note, we tried to lower memory to get it to spill, and I never saw it spill... well I had to enable sort based aggregations for it to spill! Lessons learned! Anywho, I expected this work, but I am getting a different error, and I am trying to figure out if it's a Drill error or a MapR error. Basically the query now fails again, but with a spill to disk error (see below). It looks to be setup correctly, and trying to spill, but the No data available (61) error seems to think that the data being sent to the writer has been truncated, so I can't tell if Drill is the culprit or if MapR is. Raw Error: Error: RESOURCE ERROR: External Sort encountered an error while spilling to disk Fragment 2:83 [Error Id: 6277fb9b-2cc6-497e-91c4-b708ad208bb2 node3.mesos:20001] (java.io.IOException) Create failed for file: /var/mapr/local/node3.mesos/drillspill/292a2bf8-32bf-3413-3eac-c468b20e134d/major_fragment_2/minor_fragment_83/operator_2/0, error: No data available (61) com.mapr.fs.MapRClientImpl.create():173 com.mapr.fs.MapRFileSystem.create():730 com.mapr.fs.MapRFileSystem.create():772 org.apache.hadoop.fs.FileSystem.create():812 org.apache.drill.exec.physical.impl.xsort.BatchGroup.addBatch():90 org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill():573 org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext():400 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():137 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.physical.impl.BaseRootExec.next():104 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():145 org.apache.drill.exec.physical.impl.BaseRootExec.next():94 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():415 org.apache.hadoop.security.UserGroupInformation.doAs():1595 org.apache.drill.exec.work.fragment.FragmentExecutor.run():250 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1145 java.util.concurrent.ThreadPoolExecutor$Worker.run():615 java.lang.Thread.run():745 (state=,code=0)
