RE: Drill Capacity

Yun Liu Thu, 02 Nov 2017 13:12:52 -0700

Please help me as to what further information I could provide to get this 
going. I am also experiencing a separate issue:


RESOURCE ERROR: One or more nodes ran out of memory while executing the query.

Unable to allocate sv2 for 8501 records, and not enough batchGroups to spill.
batchGroups.size 1
spilledBatchGroups.size 0
allocated memory 42768000
allocator limit 41943040

Current setting is: 
planner.memory.max_query_memory_per_node= 10GB 
HEAP to 12G 
Direct memory to 32G 
Perm to 1024M

What is the issue here?

Thanks,
Yun

-----Original Message-----
From: Yun Liu [mailto:[email protected]] 
Sent: Thursday, November 2, 2017 3:52 PM
To: [email protected]
Subject: RE: Drill Capacity

Yes- I increased planner.memory.max_query_memory_per_node to 10GB HEAP to 12G 
Direct memory to 16G And Perm to 1024M

It didn't have any schema changes. As with the same file format but less data- 
it works perfectly ok. I am unable to tell if there's corruption.

Yun

-----Original Message-----
From: Andries Engelbrecht [mailto:[email protected]]
Sent: Thursday, November 2, 2017 3:35 PM
To: [email protected]
Subject: Re: Drill Capacity

What memory setting did you increase? Have you tried 6 or 8GB?

How much memory is allocated to Drill Heap and Direct memory for the embedded 
Drillbit?

Also did you check the larger document doesn’t have any schema changes or 
corruption?

--Andries



On 11/2/17, 12:31 PM, "Yun Liu" <[email protected]> wrote:

    Hi Kunal and Andries,
    
    Thanks for your reply. We need json in this case because Drill only 
supports up to 65536 columns in a csv file. I also tried increasing the memory 
size to 4GB but I am still experiencing same issues. Drill is installed in 
Embedded Mode.
    
    Thanks,
    Yun
    
    -----Original Message-----
    From: Kunal Khatua [mailto:[email protected]] 
    Sent: Thursday, November 2, 2017 2:01 PM
    To: [email protected]
    Subject: RE: Drill Capacity
    
    Hi Yun
    
    Andries solution should address your problem. However, do understand that, 
unlike CSV files, a JSON file cannot be processed in parallel, because there is 
no clear record delimiter (CSV data usually has a new-line character to 
indicate the end of a record). So, the larger a file gets, the more work a 
single minor fragment has to do in processing it, including maintaining 
internal data-structures to represent the complex JSON document. 
    
    The preferable way would be to create more JSON files so that the files can 
be processed in parallel. 
    
    Hope that helps.
    
    ~ Kunal
    
    -----Original Message-----
    From: Andries Engelbrecht [mailto:[email protected]] 
    Sent: Thursday, November 02, 2017 10:26 AM
    To: [email protected]
    Subject: Re: Drill Capacity
    
    How much memory is allocated to the Drill environment?
    Embedded or in a cluster?
    
    I don’t think there is a particular limit, but a single JSON file will be 
read by a single minor fragment, in general it is better to match the 
number/size of files to the Drill environment.
    
    In the short term try to bump up planner.memory.max_query_memory_per_node 
in the options and see if that works for you.
    
    --Andries
    
    
    
    On 11/2/17, 7:46 AM, "Yun Liu" <[email protected]> wrote:
    
        Hi,
        
        I've been using Apache Drill actively and just wondering what is the 
capacity of Drill? I have a json file which is 390MB and it keeps throwing me 
an DATA_READ ERROR. I have another json file with exact same format but only 
150MB and it's processing fine. When I did a *select* on the large json, it 
returns successfully for some of the fields. None of these errors really apply 
to me. So I am trying to understand the capacity of the json files Drill 
supports up to. Or if there's something else I missed.
        
        Thanks,
        
        Yun Liu
        Solutions Delivery Consultant
        321 West 44th St | Suite 501 | New York, NY 10036
        +1 212.871.8355 office | +1 646.752.4933 mobile
        
        CAST, Leader in Software Analysis and Measurement
        Achieve Insight. Deliver Excellence.
        Join the discussion http://blog.castsoftware.com/
        LinkedIn<http://www.linkedin.com/companies/162909> | 
Twitter<http://twitter.com/onquality> | 
Facebook<http://www.facebook.com/pages/CAST/105668942817177>

RE: Drill Capacity

Reply via email to