Paul,
Nice work!
--C
> On May 5, 2020, at 7:27 PM, Paul Rogers <[email protected]> wrote:
>
> Hi All,
>
> One more update. Went ahead and implemented the streaming solution for REST
> JSON queries. The result is that REST queries run almost as fast as native or
> JDBC queries: the results stream directly from the query DAG out to the HTTP
> client with no buffering at all.
>
> Tested with a file of 1 GB size: 1 M rows of 20 fields, each of 50 bytes.
> Tests run in a debugger, single threaded. A COUNT(*) query took about 10
> seconds. Running a SELECT * to JSON took about 18 seconds, presumably for the
> cost of encoding data as JSON and streaming 1+ GB over the network.
>
> This will help clients that use the REST JSON query API -- but only if the
> client itself handles the data in a streaming way (parses rows as they
> arrive, processes them, and disposes of them.) If the client buffers the
> entire result set into a data structure, then the client will run out of
> memory as result set sizes increases.
>
> As noted earlier, the current JSON structure is awkward for this. A better
> format might be as a stream of "JSON lines" in which each line is an
> independent JSON object. An even better format would be binary-encoded rows
> of some sort to avoid repeating the field names a million times.
>
> FWIW, it turns out that the current design assumes uniform rows. The list of
> column names is emitted at the start. If a schema change occurs, the set of
> fields will change, but there is no way to go back and amend the column name
> list. Not sure if anyone actually uses schema changes, but just something to
> be aware of if you do.
>
> The Web query feature (display the results in a web page) still uses the
> buffering approach, which is probably fine because you don't want to load a
> 1GB result set in the browser anyway.
>
> See DRILL-7733 for the details.
>
>
> Thanks,
> - Paul
>
>
>
> On Monday, May 4, 2020, 12:20:15 AM PDT, Paul Rogers <[email protected]>
> wrote:
>
> Hi All,
>
> Was able to reduce the memory impact of REST queries a bit by avoiding some
> excessive copies and duplicate in-memory objects. The changes will show up in
> a PR for Drill 1.18.
>
> The approach still buffers the entire result set on the heap, which is the
> next thing to fix. Looks feasible to stream the results to the browser as
> they arrive, while keeping the same JSON structure as the current version.
> The current implementation sends column names, then all the data, then column
> types. Might make more sense to send the names and types, followed by the
> rows. That way, the client knows what to do with the rows as they arrive. As
> long as the fields are identical, changing field order should not break
> existing clients (unless someone implemented a brittle do-it-yourself JSON
> parser.)
>
>
> With streaming, Drill should be able to deliver any number of rows with no
> memory overhead due to REST. However, the current JSON-based approach is
> awkward for that amount of data.
>
> We briefly mentioned some possible alternatives. For those of you who want to
> use REST to consume large data sets, do you have a favorite example of a tool
> that does a good at sending such data? Might as well avoid reinventing the
> wheel; would be great if Drill can just adopt the solution that works for
> "Tool X." Suggestions?
>
>
> Thanks,
> - Paul
>
>
>
> On Friday, May 1, 2020, 4:15:01 PM PDT, Dobes Vandermeer
> <[email protected]> wrote:
>
> I think an okay approach to take is to use CTAS to dump your result into a
> folder / bucket of your choice instead of trying to receive the result
> directly from Drill.
>
> The user can run a cron job or use lifecycle policies to clean up old query
> results if they fail to delete them manually in the code that consumes them.
>
> However, in my own experimentation I found that when I try to do this using
> the REST API it will still complain about running out of memory, even though
> it doesn't need to buffer any results.
>
> I think it just used a lot of memory to perform the operation regardless of
> whether it needs to serialize the results as JSON.
>
> On 5/1/2020 2:51:49 PM, Paul Rogers <[email protected]> wrote:
> Hi All,
>
> TL;DR: Your use case is too large for the REST API as it is currently
> implemented. Thee alternatives:
>
> 1. Switch to JDBD/ODBC,
> 2. Write the results to a file rather than sending to your web client. The
> web client can then read the file.
> 3. Help us improve the scalability of the REST API.
>
> The REST API is increasingly popular. Unfortunately, it's current
> implementation has significant limitations. All results are held in memory
> until the end of the query, after which they are translated to JSON. This
> model was fine when the REST API was used to run a few, small, sample queries
> in the Drill Web Console, but is not well suited to larger, production use
> cases.
>
>
> Let's roughly estimate the memory needs for your query with the current
> design. A 400 MB Parquet file, with compression, might translate to 4 GB
> uncompressed. As it turns out, none of that will be buffered in direct memory
> unless you also have an ORDER BY clause (where we need to hold all data in
> memory to do the sort.)
>
>
> The real cost is the simple design of the REST API. As your query runs, the
> REST handler stores all rows in an on-heap map of name/string pairs: one for
> each column in each row of your table. This is 15 M rows * 16 cols/row = 250
> million keys and another 250 million string values. A quick check of the code
> suggests it does not do string "interning", so it is likely that each of the
> 15 million occurrences of each name is a separate heap object. Verifying, and
> fixing this would be a good short-term improvement.
>
>
> If your data is 4 GB uncompressed, then when expanded as above, it could easy
> take, say, 10 GB of help to encode as key/string pairs. The code does monitor
> heap size and gives you the error you reported as heap use grows too large.
> This obviously is not a good design, but it is how things work today. It was
> done quickly many years ago and has only been slightly improved since then.
>
> Four your query, with a single Parquet file, the query will run in a single
> minor fragment. None of the tuning parameters you mentioned will solve your
> REST problem because the query itself is quite simple; it is the REST handler
> which is causing this particular problem.
>
> Here is a simple way to verify this. Take your query and wrap it in:
>
> SELECT COUNT(*) FROM ()
>
> This will do all the work to run your query, count the results, and return a
> single row using the REST API. This will give you a sense of how fast the
> query should run if the REST API were out of the picture.
>
>
> As Rafael noted, the ODBC and JDBC interfaces are designed for scale: they
> incrementally deliver results so that Drill need not hold the entire result
> set in memory. They also transfer results in a compact binary format.
>
> It may be useful to take a step back. It is unclear the use case you are
> tying to solve. If your client intends to work with all 15 M rows and 16
> columns, then it needs sufficient memory to buffer these results. No human or
> dashboard can consume that much data. So, you must be doing additional
> processing. Consider pushing that processing into SQL and Drill. Or, consider
> writing the results to a file using a CREATE TABLE AS (CTAS) statement to
> avoid buffering the large result set in your client. Big data tools often
> transform data from one set of files to another since data is too large to
> buffer in memory. The REST API is perfectly suitable to run that CTAS
> statement.
>
>
> That said, should the REST API be extended to be more scalable? Absolutely.
> Drill is open source. The community is encouraged to help expand Drill's
> capabilities. We've often discussed the idea of a session-oriented REST API
> so clients can fetch blocks of results without the need for server or
> client-side buffering. Easy enough to code.
>
>
> The key challenge is state. With the current design, the entire query runs in
> a single message. If the user abandons the query, the HTTP connection closes
> and Drill immediately releases all resources. With a REST client using
> multiple messages to transfer results, how do we know when the client has
> abandoned the query? Impala struggled with this. Some commercial tools dump
> results to disk and eventually delete them after x minutes or hours of
> inactivity. Anyone know of how other tools solve this problem?
>
> Thanks,
> - Paul
>
>
>
> On Friday, May 1, 2020, 10:49:36 AM PDT, Rafael Jaimes III wrote:
>
> Hi Sreeparna,
>
> I know your dataset is 15 million rows and 16 columns, but how big is the
> result set you are expecting from that query?
>
> I think that result set is too large for Drill's REST interface to handle
> especially with only 16G heap. I try to keep the REST queries in Drill to
> about 10k rows with limited number of columns. JDBC or ODBC can handle MUCH
> larger volumes without issue.
>
> Best,
> Rafael
>
> On Fri, May 1, 2020 at 1:39 PM sreeparna bhabani <>
> [email protected]> wrote:
>
>> Hi Team,
>>
>> Kindly suggest on the below problem which we are facing in Apache Drill
>> while running query in Web interface. When we are executing query from
>> Fiddler, the query is failing after 2 mins saying "There is not enough heap
>> memory to run this query using the web interface" . We have tried few
>> memory settings. But the issue persists. The query performance is not as
>> expected either. Please find the details below-
>>
>> Dataset details-
>> File type - Parquet
>> Size of file - 401 MB
>> Number of selected columns - 16
>> Number of rows - 15136976
>> There is no sort, no joins.
>>
>> Drill Setup-
>> 2 Drillbits are used
>> DRILL_HEAP - 16 G
>> DRILL_MAX_DIRECT_MEMORY - 32 G
>> planner.memory.max_query_memory_per_node - 31147483648
>> planner.width.max_per_node- 0 (DEFAULT value. Can you please share some
>> details why the default is 0)
>> drill.exec.memory.operator.output_batch_size - 16777216
>>
>> Operator profile-
>> Type Max process Time Max wait time Max Peak Memory
>> SCREEN 0.048s 2m40s 4MB
>> PROJECT 0.068s 0 4MB
>> UNORDERED_RECEIVER 0.088s 0.020s 418MB
>> SINGLE_SENDER 0.011s 2m31s 8MB
>> PROJECT 4.187s 0 8MB
>> PARQUET_ROW_GROUP_SCAN 1.765s 0.020s 12MB
>> How to find the reason of the wait time of SINGLE_SENDER. It is not clear
>> from the profile.
>> I found one Jira regarding the Batch sizing of SINGLE_SENDER. Not sure
>> whether it is related to this issue or not.
>> https://issues.apache.org/jira/browse/DRILL-7093
>>
>> Please let me know if any other information is required.
>>
>> Thanks n Regards,
>> *Sreeparna Bhabani*