Re: Documentation for maths operations between different types?

2019-08-16 Thread Dave Challis
Thanks Paul, I hadn't seen sqlTypeOf before, that looks perfect for
checking this sort of thing.

Dave

On Thu, 15 Aug 2019 at 18:04, Paul Rogers  wrote:

> Hi Dave,
>
> As it turns out, improving the detail in function documentation is a
> long-standing request. The historical answer has been to either 1) read the
> code, or 2) try it with a test query.
>
> You can use the sqlTypeOf() function to learn the answer to your question:
>
> SELECT sqlTypeOf(cast(1 AS INT) / cast(2 AS INT)) FROM values(1)
>
> Charles Givre patiently tracked down and documented all the Drill
> functions in his appendix to our book "Learning Apache Drill." But, even
> there, the level of detail you request is missing.
>
> Maybe, once you do the research to find the answers you want, you could
> submit a Documentation JIRA ticket with the results so that it can be added
> to the documentation.
>
> Thanks,
> - Paul
>
>
>
> On Thursday, August 15, 2019, 03:55:11 AM PDT, Dave Challis <
> dave.chal...@cipher.ai> wrote:
>
>  Is there any documentation out there on how mathematical functions are
> handled when operating on different types?
>
> E.g.:
>
> * would integer division of 1 / 2 produce a float or double of 0.5? Or an
> integer of the same type set to 0?
>
> * if two INT are multipled and produce a result larger than INT can
> support, is the result returned as a BIGINT?
>


Documentation for maths operations between different types?

2019-08-15 Thread Dave Challis
Is there any documentation out there on how mathematical functions are
handled when operating on different types?

E.g.:

* would integer division of 1 / 2 produce a float or double of 0.5? Or an
integer of the same type set to 0?

* if two INT are multipled and produce a result larger than INT can
support, is the result returned as a BIGINT?


Re: Web console responsiveness under heavy load

2018-07-03 Thread Dave Challis
Many thanks, that's good to know, will move to doing that.

On 3 July 2018 at 05:29, Kunal Khatua  wrote:

> Yes. The reason you see the sluggishness is because the embedded webserver
> within the Drillbit also acts as a proxy client, which receives the entire
> result set, before converting it into a JSON response for your browser.
>
> This can lead to rather high memory requirements.
>
> I'd recommend using tools like DBeaver (which has a nice feature of
> automatically downloading the latest JDBC drivers) or Squirrel.
>
> You can then use the WebUI to monitor the queries in flight, etc.
>
> On 6/30/2018 12:18:23 AM, Kunal Khatua  wrote:
> Are you running the query through the WebUI? What are the memory settings
> of your Drillbit?
>
> On 6/26/2018 7:45:30 AM, Dave Challis  wrote:
> Are there any recommended Drill settings to configure in order to ensure
> that the web console (running on 8047) remains responsive even under heavy
> load?
>
> Currently, if I execute a large/complex query (that e.g. takes 5m to
> complete), all queries to 8047 just block until the query completes.
>
> I'd like to use it to keep an eye on the query (on the profiles) page.
>


Re: Web console responsiveness under heavy load

2018-07-02 Thread Dave Challis
Yup, running the query through the web UI. If I avoid doing that, would the
UI be more responsive?

Drillbit settings are:

DRILL_HEAP=8G
DRILL_MAX_DIRECT_MEMORY=16G
DRILLBIT_CODE_CACHE_SIZE=2G

The query itself uses a peak of ~3.5Gb memory while it's running.

On 30 June 2018 at 08:18, Kunal Khatua  wrote:

> Are you running the query through the WebUI? What are the memory settings
> of your Drillbit?
>
> On 6/26/2018 7:45:30 AM, Dave Challis  wrote:
> Are there any recommended Drill settings to configure in order to ensure
> that the web console (running on 8047) remains responsive even under heavy
> load?
>
> Currently, if I execute a large/complex query (that e.g. takes 5m to
> complete), all queries to 8047 just block until the query completes.
>
> I'd like to use it to keep an eye on the query (on the profiles) page.
>


Web console responsiveness under heavy load

2018-06-26 Thread Dave Challis
Are there any recommended Drill settings to configure in order to ensure
that the web console (running on 8047) remains responsive even under heavy
load?

Currently, if I execute a large/complex query (that e.g. takes 5m to
complete), all queries to 8047 just block until the query completes.

I'd like to use it to keep an eye on the query (on the profiles) page.


Re: Which perform better JSON or convert JSON to parquet format ?

2018-06-15 Thread Dave Challis
One gotcha with JSON vs parquet is that there's a longstanding bug that
causes errors when trying to read from Parquet files containing 0 rows.

For cases where we're converting from datasets that might be empty, we use
JSON, and for everything else, Parquet.


How to deal with Parquet files containing no rows without Drill errors?

2018-05-24 Thread Dave Challis
We've got some processes that dump some reporting data as a bunch of
parquet files, then runs queries involving joins with those tables (i.e. we
have a main table which is always non-empty, then a number of link tables
which join against which can be empty).

The Parquet files contain schema metadata, but some contain no row data.

Trying to join against them in Drill using e.g.

SELECT *
FROM dfs.`a.parquet` AS A
JOIN dfs.`b.parquet` AS B ON (A.id=B.id)
JOIN dfs.`c.parquet` AS C ON (A.id=C.id);

Fails with: "SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 0 has
no read entries assigned" if either b.parquet or c.parquet contain no rows.

It looks like it might have been reported as an issue here
https://issues.apache.org/jira/browse/DRILL-4517 , but as it hasn't been
fixed since 2016, I'm wondering if there are any suggested workarounds for
the above, rather than waiting for a fix.

In MySQL/Postgres etc., joining against empty tables is fine, so this
behaviour was a bit unexpected, and is a major blocker for a project I'm
using Drill for.

Thanks,
Dave