On Fri, Oct 26, 2018 at 5:02 AM Paul Rogers <par0...@yahoo.com.invalid> wrote:
> Hi Ashish, > > In general, you should give Drill less memory than your total system > memory, else swapping will occur and performance will suffer greatly. > In your case, you should *decrease* memory so that the total (along with > the 1 GB for code cache) is under your 8 GB total, while also allowing > memory for your OS, etc. > > Yes, I understand that, but here my point was, even I increased the memory in script beyond the limit of my Laptop memory, drill was started peacefully i.e. without complaining. So, my concern was just to make sure that I updating the right file. > Note that Drill is a big data query engine and is designed to run on large > servers in a cluster; running it on a laptop with 8 GB is handy for testing > and development, but is not really the primary use case for doing real > queries on large data sets. > So you are saying it could not even design to join two physical files of 1 GB each and show the result on 8 GB RAM? > You did not mention the size of your files. If you are running out of > memory, it may be because your inputs are larger than the memory available > to Drill and so there is insufficient space in memory to hold the data when > doing the join. > Files were of 1 GB each. > > If your data is significantly smaller than, say, 4 GB, then there may be > some other issue. You can perhaps look at the query profile, or give us a > bit more detail about your data such as file sizes, etc. > > Also, in your join condition, how many of the rows are expected to joins? > If both column[0] values are "Fred", say, then you are doing a cartesian > join in which every row in one table is matched with every row in the > other. Can you provide a bit more detail about your data? > FYI: If I ran the same query on plain CSV file (i.e running after untarring and uncompressed files ), it was successful. > > A question for others on this list: isn't the join now supposed to spill > if memory is constrained? What might Ashish look for to see why spilling is > not happening? Might the plan be using a form of join that does not spill? > How would we diagnose such an issue? With an EXPLAIN PLAN? > > Thanks, > - Paul > > > > On Thursday, October 25, 2018, 12:40:18 AM PDT, Ashish Pancholi < > apanch...@chambal.com> wrote: > > I am using `Apache Drill` version `1.14` on `windows` system and running > Drill using the command: > > sqlline.bat -u "jdbc:drill:zk=local" > > > I am trying to `execute` a `join query` on two `compressed` and `archived` > `CSV` files. > > Query: > > SELECT * FROM > > > dfs.`C:\Users\admin\Desktop\DRILL_FILES\csvFileParquet\TBL_MOREDATA-20180924181406.tar.gz` > > AS Table0 INNER JOIN > > > dfs.`C:\Users\admin\Desktop\DRILL_FILES\csvFileParquet\TBL_MOREDATA1-20180924181406.tar.gz` > > AS Table1 ON Table0.columns[0]=Table1.columns[0] > > > But an out of memory, error occurred: > > org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: > > One or more nodes ran out of memory while executing the query. Unable to > > allocate buffer of size 131072 (rounded from 86104) due to memory limit > > (630194176). Current allocation: 630108434 Fragment 0:0 [Error Id: > > 585c0644-5fd5-446e-b9b3-d48e0771eb2a on DESKTOP-SM3E3KM:31010] > > > To resolve the issue, I tried to update `config\drill-env.sh` file but the > issue remains the same and it looks like updating the script file does not > reflect the changes because I am trying to increase the DIRECT MEMORY > beyond the system memory (RAM), every time drill starts up peacefully. Not > even complaining that you have exceeded the memory, therefore, it looks > like the changes are not reflecting. > > export DRILLBIT_MAX_PROC_MEM=12G > > export DRILL_HEAP=2G > > export DRILL_MAX_DIRECT_MEMORY=10G > > > whereas my system's main memory is only 8 GB. > > *Please help me to resolve the out of memory error*. I had even run the > below queries, in order to follow the troubleshooting instructions but the > issue remains the same. > > > - alter session set `planner.enable_hashagg` = false; > - alter session set `planner.enable_hashjoin` = false; > - alter session set planner.width.max_per_node=3; > - alter system set planner.width.max_per_query = 100; > > > > > > > > > > > > -- > Thanks > Ashish Pancholi > -- Thanks Ashish Pancholi