Hi All,

Was able to reduce the memory impact of REST queries a bit by avoiding some 
excessive copies and duplicate in-memory objects. The changes will show up in a 
PR for Drill 1.18.

The approach still buffers the entire result set on the heap, which is the next 
thing to fix. Looks feasible to stream the results to the browser as they 
arrive, while keeping the same JSON structure as the current version. The 
current implementation sends column names, then all the data, then column 
types. Might make more sense to send the names and types, followed by the rows. 
That way, the client knows what to do with the rows as they arrive. As long as 
the fields are identical, changing field order should not break existing 
clients (unless someone implemented a brittle do-it-yourself JSON parser.)


With streaming, Drill should be able to deliver any number of rows with no 
memory overhead due to REST. However, the current JSON-based approach is 
awkward for that amount of data.

We briefly mentioned some possible alternatives. For those of you who want to 
use REST to consume large data sets, do you have a favorite example of a tool 
that does a good at sending such data? Might as well avoid reinventing the 
wheel; would be great if Drill can just adopt the solution that works for "Tool 
X." Suggestions?
 

Thanks,
- Paul

 

    On Friday, May 1, 2020, 4:15:01 PM PDT, Dobes Vandermeer <dob...@gmail.com> 
wrote:  
 
 I think an okay approach to take is to use CTAS to dump your result into a 
folder / bucket of your choice instead of trying to receive the result directly 
from Drill.

The user can run a cron job or use lifecycle policies to clean up old query 
results if they fail to delete them manually in the code that consumes them.

However, in my own experimentation I found that when I try to do this using the 
REST API it will still complain about running out of memory, even though it 
doesn't need to buffer any results.

I think it just used a lot of memory to perform the operation regardless of 
whether it needs to serialize the results as JSON.

On 5/1/2020 2:51:49 PM, Paul Rogers <par0...@yahoo.com.invalid> wrote:
Hi All,

TL;DR: Your use case is too large for the REST API as it is currently 
implemented. Thee alternatives:

1. Switch to JDBD/ODBC,
2. Write the results to a file rather than sending to your web client. The web 
client can then read the file.
3. Help us improve the scalability of the REST API.

The REST API is increasingly popular. Unfortunately, it's current 
implementation has significant limitations. All results are held in memory 
until the end of the query, after which they are translated to JSON. This model 
was fine when the REST API was used to run a few, small, sample queries in the 
Drill Web Console, but is not well suited to larger, production use cases.


Let's roughly estimate the memory needs for your query with the current design. 
A 400 MB Parquet file, with compression, might translate to 4 GB uncompressed. 
As it turns out, none of that will be buffered in direct memory unless you also 
have an ORDER BY clause (where we need to hold all data in memory to do the 
sort.)


The real cost is the simple design of the REST API. As your query runs, the 
REST handler stores all rows in an on-heap map of name/string pairs: one for 
each column in each row of your table. This is 15 M rows * 16 cols/row = 250 
million keys and another 250 million string values. A quick check of the code 
suggests it does not do string "interning", so it is likely that each of the 15 
million occurrences of each name is a separate heap object. Verifying, and 
fixing this would be a good short-term improvement.


If your data is 4 GB uncompressed, then when expanded as above, it could easy 
take, say, 10 GB of help to encode as key/string pairs. The code does monitor 
heap size and gives you the error you reported as heap use grows too large. 
This obviously is not a good design, but it is how things work today. It was 
done quickly many years ago and has only been slightly improved since then.

Four your query, with a single Parquet file, the query will run in a single 
minor fragment. None of the tuning parameters you mentioned will solve your 
REST problem because the query itself is quite simple; it is the REST handler 
which is causing this particular problem.

Here is a simple way to verify this. Take your query and wrap it in:

SELECT COUNT(*) FROM ()

This will do all the work to run your query, count the results, and return a 
single row using the REST API. This will give you a sense of how fast the query 
should run if the REST API were out of the picture.


As Rafael noted, the ODBC and JDBC interfaces are designed for scale: they 
incrementally deliver results so that Drill need not hold the entire result set 
in memory. They also transfer results in a compact binary format.

It may be useful to take a step back. It is unclear the use case you are tying 
to solve. If your client intends to work with all 15 M rows and 16 columns, 
then it needs sufficient memory to buffer these results. No human or dashboard 
can consume that much data. So, you must be doing additional processing. 
Consider pushing that processing into SQL and Drill. Or, consider writing the 
results to a file using a CREATE TABLE AS (CTAS) statement to avoid buffering 
the large result set in your client. Big data tools often transform data from 
one set of files to another since data is too large to buffer in memory. The 
REST API is perfectly suitable to run that CTAS statement.


That said, should the REST API be extended to be more scalable? Absolutely. 
Drill is open source. The community is encouraged to help expand Drill's 
capabilities. We've often discussed the idea of a session-oriented REST API so 
clients can fetch blocks of results without the need for server or client-side 
buffering. Easy enough to code.


The key challenge is state. With the current design, the entire query runs in a 
single message. If the user abandons the query, the HTTP connection closes and 
Drill immediately releases all resources. With a REST client using multiple 
messages to transfer results, how do we know when the client has abandoned the 
query? Impala struggled with this. Some commercial tools dump results to disk 
and eventually delete them after x minutes or hours of inactivity. Anyone know 
of how other tools solve this problem?

Thanks,
- Paul



On Friday, May 1, 2020, 10:49:36 AM PDT, Rafael Jaimes III wrote:

Hi Sreeparna,

I know your dataset is 15 million rows and 16 columns, but how big is the
result set you are expecting from that query?

I think that result set is too large for Drill's REST interface to handle
especially with only 16G heap. I try to keep the REST queries in Drill to
about 10k rows with limited number of columns. JDBC or ODBC can handle MUCH
larger volumes without issue.

Best,
Rafael

On Fri, May 1, 2020 at 1:39 PM sreeparna bhabani <>
bhabani.sreepa...@gmail.com> wrote:

> Hi Team,
>
> Kindly suggest on the below problem which we are facing in Apache Drill
> while running query in Web interface. When we are executing query from
> Fiddler, the query is failing after 2 mins saying "There is not enough heap
> memory to run this query using the web interface" . We have tried few
> memory settings. But the issue persists. The query performance is not as
> expected either. Please find the details below-
>
> Dataset details-
> File type - Parquet
> Size of file - 401 MB
> Number of selected columns - 16
> Number of rows - 15136976
> There is no sort, no joins.
>
> Drill Setup-
> 2 Drillbits are used
> DRILL_HEAP - 16 G
> DRILL_MAX_DIRECT_MEMORY - 32 G
> planner.memory.max_query_memory_per_node - 31147483648
> planner.width.max_per_node- 0 (DEFAULT value. Can you please share some
> details why the default is 0)
> drill.exec.memory.operator.output_batch_size - 16777216
>
> Operator profile-
> Type Max process Time Max wait time Max  Peak Memory
> SCREEN 0.048s 2m40s 4MB
> PROJECT 0.068s 0 4MB
> UNORDERED_RECEIVER 0.088s 0.020s 418MB
> SINGLE_SENDER 0.011s 2m31s 8MB
> PROJECT 4.187s 0 8MB
> PARQUET_ROW_GROUP_SCAN 1.765s 0.020s 12MB
> How to find the reason of the wait time of SINGLE_SENDER. It is not clear
> from the profile.
> I found one Jira regarding the Batch sizing of SINGLE_SENDER. Not sure
> whether it is related to this issue or not.
> https://issues.apache.org/jira/browse/DRILL-7093
>
> Please let me know if any other information is required.
>
> Thanks n Regards,
> *Sreeparna Bhabani*
>  

Reply via email to