Paul, 
Nice work!
--C

> On May 5, 2020, at 7:27 PM, Paul Rogers <[email protected]> wrote:
> 
> Hi All,
> 
> One more update. Went ahead and implemented the streaming solution for REST 
> JSON queries. The result is that REST queries run almost as fast as native or 
> JDBC queries: the results stream directly from the query DAG out to the HTTP 
> client with no buffering at all.
> 
> Tested with a file of 1 GB size: 1 M rows of 20 fields, each of 50 bytes. 
> Tests run in a debugger, single threaded. A COUNT(*) query took about 10 
> seconds. Running a SELECT * to JSON took about 18 seconds, presumably for the 
> cost of encoding data as JSON and streaming 1+ GB over the network.
> 
> This will help clients that use the REST JSON query API -- but only if the 
> client itself handles the data in a streaming way (parses rows as they 
> arrive, processes them, and disposes of them.) If the client buffers the 
> entire result set into a data structure, then the client will run out of 
> memory as result set sizes increases.
> 
> As noted earlier, the current JSON structure is awkward for this. A better 
> format might be as a stream of "JSON lines" in which each line is an 
> independent JSON object. An even better format would be binary-encoded rows 
> of some sort to avoid repeating the field names a million times.
> 
> FWIW, it turns out that the current design assumes uniform rows. The list of 
> column names is emitted at the start. If a schema change occurs, the set of 
> fields will change, but there is no way to go back and amend the column name 
> list. Not sure if anyone actually uses schema changes, but just something to 
> be aware of if you do.
> 
> The Web query feature (display the results in a web page) still uses the 
> buffering approach, which is probably fine because you don't want to load a 
> 1GB result set in the browser anyway.
> 
> See DRILL-7733 for the details.
> 
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Monday, May 4, 2020, 12:20:15 AM PDT, Paul Rogers <[email protected]> 
> wrote:  
> 
> Hi All,
> 
> Was able to reduce the memory impact of REST queries a bit by avoiding some 
> excessive copies and duplicate in-memory objects. The changes will show up in 
> a PR for Drill 1.18.
> 
> The approach still buffers the entire result set on the heap, which is the 
> next thing to fix. Looks feasible to stream the results to the browser as 
> they arrive, while keeping the same JSON structure as the current version. 
> The current implementation sends column names, then all the data, then column 
> types. Might make more sense to send the names and types, followed by the 
> rows. That way, the client knows what to do with the rows as they arrive. As 
> long as the fields are identical, changing field order should not break 
> existing clients (unless someone implemented a brittle do-it-yourself JSON 
> parser.)
> 
> 
> With streaming, Drill should be able to deliver any number of rows with no 
> memory overhead due to REST. However, the current JSON-based approach is 
> awkward for that amount of data.
> 
> We briefly mentioned some possible alternatives. For those of you who want to 
> use REST to consume large data sets, do you have a favorite example of a tool 
> that does a good at sending such data? Might as well avoid reinventing the 
> wheel; would be great if Drill can just adopt the solution that works for 
> "Tool X." Suggestions?
> 
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Friday, May 1, 2020, 4:15:01 PM PDT, Dobes Vandermeer 
> <[email protected]> wrote:  
> 
> I think an okay approach to take is to use CTAS to dump your result into a 
> folder / bucket of your choice instead of trying to receive the result 
> directly from Drill.
> 
> The user can run a cron job or use lifecycle policies to clean up old query 
> results if they fail to delete them manually in the code that consumes them.
> 
> However, in my own experimentation I found that when I try to do this using 
> the REST API it will still complain about running out of memory, even though 
> it doesn't need to buffer any results.
> 
> I think it just used a lot of memory to perform the operation regardless of 
> whether it needs to serialize the results as JSON.
> 
> On 5/1/2020 2:51:49 PM, Paul Rogers <[email protected]> wrote:
> Hi All,
> 
> TL;DR: Your use case is too large for the REST API as it is currently 
> implemented. Thee alternatives:
> 
> 1. Switch to JDBD/ODBC,
> 2. Write the results to a file rather than sending to your web client. The 
> web client can then read the file.
> 3. Help us improve the scalability of the REST API.
> 
> The REST API is increasingly popular. Unfortunately, it's current 
> implementation has significant limitations. All results are held in memory 
> until the end of the query, after which they are translated to JSON. This 
> model was fine when the REST API was used to run a few, small, sample queries 
> in the Drill Web Console, but is not well suited to larger, production use 
> cases.
> 
> 
> Let's roughly estimate the memory needs for your query with the current 
> design. A 400 MB Parquet file, with compression, might translate to 4 GB 
> uncompressed. As it turns out, none of that will be buffered in direct memory 
> unless you also have an ORDER BY clause (where we need to hold all data in 
> memory to do the sort.)
> 
> 
> The real cost is the simple design of the REST API. As your query runs, the 
> REST handler stores all rows in an on-heap map of name/string pairs: one for 
> each column in each row of your table. This is 15 M rows * 16 cols/row = 250 
> million keys and another 250 million string values. A quick check of the code 
> suggests it does not do string "interning", so it is likely that each of the 
> 15 million occurrences of each name is a separate heap object. Verifying, and 
> fixing this would be a good short-term improvement.
> 
> 
> If your data is 4 GB uncompressed, then when expanded as above, it could easy 
> take, say, 10 GB of help to encode as key/string pairs. The code does monitor 
> heap size and gives you the error you reported as heap use grows too large. 
> This obviously is not a good design, but it is how things work today. It was 
> done quickly many years ago and has only been slightly improved since then.
> 
> Four your query, with a single Parquet file, the query will run in a single 
> minor fragment. None of the tuning parameters you mentioned will solve your 
> REST problem because the query itself is quite simple; it is the REST handler 
> which is causing this particular problem.
> 
> Here is a simple way to verify this. Take your query and wrap it in:
> 
> SELECT COUNT(*) FROM ()
> 
> This will do all the work to run your query, count the results, and return a 
> single row using the REST API. This will give you a sense of how fast the 
> query should run if the REST API were out of the picture.
> 
> 
> As Rafael noted, the ODBC and JDBC interfaces are designed for scale: they 
> incrementally deliver results so that Drill need not hold the entire result 
> set in memory. They also transfer results in a compact binary format.
> 
> It may be useful to take a step back. It is unclear the use case you are 
> tying to solve. If your client intends to work with all 15 M rows and 16 
> columns, then it needs sufficient memory to buffer these results. No human or 
> dashboard can consume that much data. So, you must be doing additional 
> processing. Consider pushing that processing into SQL and Drill. Or, consider 
> writing the results to a file using a CREATE TABLE AS (CTAS) statement to 
> avoid buffering the large result set in your client. Big data tools often 
> transform data from one set of files to another since data is too large to 
> buffer in memory. The REST API is perfectly suitable to run that CTAS 
> statement.
> 
> 
> That said, should the REST API be extended to be more scalable? Absolutely. 
> Drill is open source. The community is encouraged to help expand Drill's 
> capabilities. We've often discussed the idea of a session-oriented REST API 
> so clients can fetch blocks of results without the need for server or 
> client-side buffering. Easy enough to code.
> 
> 
> The key challenge is state. With the current design, the entire query runs in 
> a single message. If the user abandons the query, the HTTP connection closes 
> and Drill immediately releases all resources. With a REST client using 
> multiple messages to transfer results, how do we know when the client has 
> abandoned the query? Impala struggled with this. Some commercial tools dump 
> results to disk and eventually delete them after x minutes or hours of 
> inactivity. Anyone know of how other tools solve this problem?
> 
> Thanks,
> - Paul
> 
> 
> 
> On Friday, May 1, 2020, 10:49:36 AM PDT, Rafael Jaimes III wrote:
> 
> Hi Sreeparna,
> 
> I know your dataset is 15 million rows and 16 columns, but how big is the
> result set you are expecting from that query?
> 
> I think that result set is too large for Drill's REST interface to handle
> especially with only 16G heap. I try to keep the REST queries in Drill to
> about 10k rows with limited number of columns. JDBC or ODBC can handle MUCH
> larger volumes without issue.
> 
> Best,
> Rafael
> 
> On Fri, May 1, 2020 at 1:39 PM sreeparna bhabani <>
> [email protected]> wrote:
> 
>> Hi Team,
>> 
>> Kindly suggest on the below problem which we are facing in Apache Drill
>> while running query in Web interface. When we are executing query from
>> Fiddler, the query is failing after 2 mins saying "There is not enough heap
>> memory to run this query using the web interface" . We have tried few
>> memory settings. But the issue persists. The query performance is not as
>> expected either. Please find the details below-
>> 
>> Dataset details-
>> File type - Parquet
>> Size of file - 401 MB
>> Number of selected columns - 16
>> Number of rows - 15136976
>> There is no sort, no joins.
>> 
>> Drill Setup-
>> 2 Drillbits are used
>> DRILL_HEAP - 16 G
>> DRILL_MAX_DIRECT_MEMORY - 32 G
>> planner.memory.max_query_memory_per_node - 31147483648
>> planner.width.max_per_node- 0 (DEFAULT value. Can you please share some
>> details why the default is 0)
>> drill.exec.memory.operator.output_batch_size - 16777216
>> 
>> Operator profile-
>> Type Max process Time Max wait time Max  Peak Memory
>> SCREEN 0.048s 2m40s 4MB
>> PROJECT 0.068s 0 4MB
>> UNORDERED_RECEIVER 0.088s 0.020s 418MB
>> SINGLE_SENDER 0.011s 2m31s 8MB
>> PROJECT 4.187s 0 8MB
>> PARQUET_ROW_GROUP_SCAN 1.765s 0.020s 12MB
>> How to find the reason of the wait time of SINGLE_SENDER. It is not clear
>> from the profile.
>> I found one Jira regarding the Batch sizing of SINGLE_SENDER. Not sure
>> whether it is related to this issue or not.
>> https://issues.apache.org/jira/browse/DRILL-7093
>> 
>> Please let me know if any other information is required.
>> 
>> Thanks n Regards,
>> *Sreeparna Bhabani*

Reply via email to