Hi Ashish,

In general, you should give Drill less memory than your total system memory, 
else swapping will occur and performance will suffer greatly.

In your case, you should *decrease* memory so that the total (along with the 1 
GB for code cache) is under your 8 GB total, while also allowing memory for 
your OS, etc.

Note that Drill is a big data query engine and is designed to run on large 
servers in a cluster; running it on a laptop with 8 GB is handy for testing and 
development, but is not really the primary use case for doing real queries on 
large data sets.

You did not mention the size of your files. If you are running out of memory, 
it may be because your inputs are larger than the memory available to Drill and 
so there is insufficient space in memory to hold the data when doing the join.

If your data is significantly smaller than, say, 4 GB, then there may be some 
other issue. You can perhaps look at the query profile, or give us a bit more 
detail about your data such as file sizes, etc.


Also, in your join condition, how many of the rows are expected to joins? If 
both column[0] values are "Fred", say, then you are doing a cartesian join in 
which every row in one table is matched with every row in the other. Can you 
provide a bit more detail about your data?

A question for others on this list: isn't the join now supposed to spill if 
memory is constrained? What might Ashish look for to see why spilling is not 
happening? Might the plan be using a form of join that does not spill? How 
would we diagnose such an issue? With an EXPLAIN PLAN?

Thanks,
- Paul

 

    On Thursday, October 25, 2018, 12:40:18 AM PDT, Ashish Pancholi 
<apanch...@chambal.com> wrote:  
 
 I am using `Apache Drill` version `1.14` on `windows` system and running
Drill using the command:

    sqlline.bat -u "jdbc:drill:zk=local"


 I am trying to `execute` a `join query` on two `compressed` and `archived`
`CSV` files.

Query:

    SELECT  *  FROM
> dfs.`C:\Users\admin\Desktop\DRILL_FILES\csvFileParquet\TBL_MOREDATA-20180924181406.tar.gz`
> AS Table0 INNER JOIN
> dfs.`C:\Users\admin\Desktop\DRILL_FILES\csvFileParquet\TBL_MOREDATA1-20180924181406.tar.gz`
> AS Table1  ON  Table0.columns[0]=Table1.columns[0]


But an out of memory, error occurred:

    org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR:
> One or more nodes ran out of memory while executing the query. Unable to
> allocate buffer of size 131072 (rounded from 86104) due to memory limit
> (630194176). Current allocation: 630108434 Fragment 0:0 [Error Id:
> 585c0644-5fd5-446e-b9b3-d48e0771eb2a on DESKTOP-SM3E3KM:31010]


To resolve the issue, I tried to update `config\drill-env.sh` file but the
issue remains the same and it looks like updating the script file does not
reflect the changes because I am trying to increase the DIRECT MEMORY
beyond the system memory (RAM), every time drill starts up peacefully. Not
even complaining that you have exceeded the memory, therefore, it looks
like the changes are not reflecting.

    export DRILLBIT_MAX_PROC_MEM=12G
>    export DRILL_HEAP=2G
>    export DRILL_MAX_DIRECT_MEMORY=10G


whereas my system's main memory is only 8 GB.

*Please help me to resolve the out of memory error*. I had even run the
below queries, in order to follow the troubleshooting instructions but the
issue remains the same.


  -    alter session set `planner.enable_hashagg` = false;
  -    alter session set `planner.enable_hashjoin` = false;
  -    alter session set planner.width.max_per_node=3;
  -    alter system set planner.width.max_per_query = 100;











-- 
Thanks
Ashish Pancholi
  

Reply via email to