Re: Does drill support variable arguments in customer UDAF

2020-11-22 Thread Vova Vysotskyi
Hi,

Yes, it was implemented for both, UDF and UDAF. Please take a look at these 
examples of UADF var arg functions:
https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/testing/CountArgumentsAggFunctions.java

On 2020/10/28 19:34:25, Paul Rogers  wrote: 
> Variable arg UDAF might be a bit harder than var arg UDFs as support is
> needed in the agg operators.
> 
> - Paul
> 
> On Wed, Oct 28, 2020 at 5:21 AM Charles Givre  wrote:
> 
> > Hi there,
> > Drill does support VARARG UDFs.  Take a look at this PR for an example:
> > https://github.com/apache/drill/pull/1835 <
> > https://github.com/apache/drill/pull/1835>
> > -- C
> >
> > > On Oct 28, 2020, at 4:22 AM, wingc.s...@qq.com 
> > wrote:
> > >
> > > I have seen the mailing list history.The same problem is metioned
> > in 2015 and 2017, I wonder if drill support pass variable arguments in
> > customer udaf such as my_sum(1, 2,  3, ..) .I'm looking forward
> > your replay, thanks.
> >
> >
> 


Re: drill about hbase 2.x support

2020-05-15 Thread Vova Vysotskyi
Hello,

Drill supports HBase 2.X version starting from the 1.15 version. It was
updated in the scope of DRILL-6349.

Kind regards,
Volodymyr Vysotskyi


On Fri, May 15, 2020 at 10:09 AM 陈炳新  wrote:

> hello:
>  Apache drill has been used for more than a year, and it is quite
> good in our company.
>  now,Our cluster has been upgraded to HBASE 2.X。
>  When will hbase2.x be supported? Do you have any plans for this!
>


Re: DRILL-7671: Verify Drill with cdh and hdp profiles

2020-04-03 Thread Vova Vysotskyi
Hi all,

Looks like no one found any issues.

I'll proceed with creating a pull request in this case.

Kind regards,
Volodymyr Vysotskyi


On Wed, Apr 1, 2020 at 8:35 PM Vova Vysotskyi  wrote:

> Hi all,
>
> Some time ago I have noticed that Apache Drill has *cdh* and *hdp*
> profiles, but it is unable to build when any of them is enabled.
>
> I have created DRILL-7671
> <https://issues.apache.org/jira/browse/DRILL-7671> to fix it. One of the
> major unresolved issues I faced up with is verification that Drill works
> correctly on both distributives.
> Unfortunately, I don't have a cluster with any of those distributives and
> corresponding sandboxes contain slightly outdated versions of ZooKeeper and
> other projects to ensure that there are no issues.
>
> Could someone who has access to cluster with any of these distributives,
> help with verification of this fix, so we will be sure that there is
> nothing to do before merging it to master.
>
> Here is the branch with changes: DRILL-7671
> <https://github.com/vvysotskyi/drill/tree/DRILL-7671>.
>
> To verify the fix, please follow the next steps:
>
> *git clone g...@github.com:vvysotskyi/drill.git*
> *git checkout DRILL-7671*
>
> For Cloudera distributive, please build Drill with the *cdh* profile:
> *mvn clean install -DskipTests -Pcdh*
>
> For Hortonworks distributive, please build Drill with the *hdp* profile:
> *mvn clean install -DskipTests -Phdp*
>
> I think it would be good to do the following sanity checks:
> - Configure and run Drill in distributed mode;
> - Query some files from HDFS;
> - If possible, configure Hive plugin and query Hive tables;
> - If possible, configure HBase plugin and query HBase tables;
> - If possible, configure any of JDBC clients and query Drill;
> - Something else.
>
> The branch above uses the following versions (latest available in
> corresponding repositories):
>
> *cdh* profile:
> hadoop 3.1.1.7.0.3.0-79
> hbase 2.2.0.7.0.3.0-79
> hive 3.1.2000.7.0.3.0-79
> zookeeper 3.5.5.7.0.3.0-79
>
> *hdp* profile:
> hadoop 3.1.1.3.1.5.6-1
> hbase 2.0.2.3.1.0.6-1
> hive 3.1.0.3.1.0.6-1
> zookeeper 3.4.6.3.3.1.6-1
>
> Kind regards,
> Volodymyr Vysotskyi
>


DRILL-7671: Verify Drill with cdh and hdp profiles

2020-04-01 Thread Vova Vysotskyi
Hi all,

Some time ago I have noticed that Apache Drill has *cdh* and *hdp*
profiles, but it is unable to build when any of them is enabled.

I have created DRILL-7671  to
fix it. One of the major unresolved issues I faced up with is verification
that Drill works correctly on both distributives.
Unfortunately, I don't have a cluster with any of those distributives and
corresponding sandboxes contain slightly outdated versions of ZooKeeper and
other projects to ensure that there are no issues.

Could someone who has access to cluster with any of these distributives,
help with verification of this fix, so we will be sure that there is
nothing to do before merging it to master.

Here is the branch with changes: DRILL-7671
.

To verify the fix, please follow the next steps:

*git clone g...@github.com:vvysotskyi/drill.git*
*git checkout DRILL-7671*

For Cloudera distributive, please build Drill with the *cdh* profile:
*mvn clean install -DskipTests -Pcdh*

For Hortonworks distributive, please build Drill with the *hdp* profile:
*mvn clean install -DskipTests -Phdp*

I think it would be good to do the following sanity checks:
- Configure and run Drill in distributed mode;
- Query some files from HDFS;
- If possible, configure Hive plugin and query Hive tables;
- If possible, configure HBase plugin and query HBase tables;
- If possible, configure any of JDBC clients and query Drill;
- Something else.

The branch above uses the following versions (latest available in
corresponding repositories):

*cdh* profile:
hadoop 3.1.1.7.0.3.0-79
hbase 2.2.0.7.0.3.0-79
hive 3.1.2000.7.0.3.0-79
zookeeper 3.5.5.7.0.3.0-79

*hdp* profile:
hadoop 3.1.1.3.1.5.6-1
hbase 2.0.2.3.1.0.6-1
hive 3.1.0.3.1.0.6-1
zookeeper 3.4.6.3.3.1.6-1

Kind regards,
Volodymyr Vysotskyi


Re: Drill + parquet

2020-02-06 Thread Vova Vysotskyi
Hi Vishal,

Pull request with the fix for DRILL-5733
 is opened and will be
merged soon.

Kind regards,
Volodymyr Vysotskyi


On Tue, Feb 4, 2020 at 11:11 PM Vishal Jadhav (BLOOMBERG/ 731 LEX) <
vjad...@bloomberg.net> wrote:

> It works fine on my local file system, but fails on HDFS.
> Not sure, I am running into the issue mentioned here -
> https://issues.apache.org/jira/browse/DRILL-5733
>
> From: user@drill.apache.org At: 02/04/20 15:48:23To:  Vishal Jadhav
> (BLOOMBERG/ 731 LEX ) ,  user@drill.apache.org
> Subject: Re: Drill + parquet
>
> Please look into logs for more details.
> Not sure why you see these errors but Drill can perfectly query singe
> files,
> subset of files and directories.
>
> select * from dfs.tmp.`*.parquet` limit 4;
> select * from dfs.tmp.`0_0_0.parquet`;
>
> Kind regards,
> Arina
>
> > On Feb 4, 2020, at 7:10 PM, Nitin Pawar  wrote:
> >
> > as the error says .. it expects a directory to query
> > also the document has not been modified for more than 3 years so not sure
> > if it up to date
> >
> > On Tue, Feb 4, 2020 at 10:30 PM Vishal Jadhav (BLOOMBERG/ 731 LEX) <
> > vjad...@bloomberg.net> wrote:
> >
> >> I was following the help pages from here.
> >> https://drill.apache.org/docs/querying-parquet-files/
> >> As per it, I can query an individual parquet file, why is it failing
> with
> >> the 'not a directory' error.
> >>
> >>
> >> From: user@drill.apache.org At: 02/04/20 11:28:25To:  Vishal Jadhav
> >> (BLOOMBERG/ 731 LEX ) ,  user@drill.apache.org
> >> Subject: Re: Drill + parquet
> >>
> >> Parquet is default file format for apache drill
> >> so you do not need to give a parquet file for a drill query. Instead
> give
> >> the folder path which contains the files.
> >>
> >> eg: select * from hdfs_storage>..`folder1` will query all the
> >> parquet files in folder1
> >>
> >> On Tue, Feb 4, 2020 at 9:55 PM Vishal Jadhav (BLOOMBERG/ 731 LEX) <
> >> vjad...@bloomberg.net> wrote:
> >>
> >>> Hello Drillers,
> >>>
> >>> Need some help with the hdfs + parquet files.
> >>>
> >>> I have configured the HDFS storage with parquet & csv format plugins.
> >>>
> >>> I can query the - ..`*.csv` correctly.
> Also, I
> >>> have a similar directory structure for the parquet files (in a
> different
> >>> directory), But, not able to query it.
> >>>
> >>> Show files works fine.
> >>> (1) The following query works fine -
> >>> show files from .
> >>>
> >>> (2) select * from ..`*.parquet` limit 4
> >>> Fails with -
> >>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> >>> NoSuchElementException
> >>>
> >>> (3) select * from hdfs_storage>..`xyz.parquet`;
> >>> fails with -
> >>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> >>> RemoteException:/path/xyz.parquet (is not a directory)
> >>>
> >>> Please let me know, if I am doing something wrong here.
> >>>
> >>> Thank you!
> >>> - Vishal
> >>
> >>
> >> --
> >> Nitin Pawar
> >>
> >>
> >>
> >
> > --
> > Nitin Pawar
>
>
>


Re: 1.17.0 updated protobuf but mapr ODBC drivers have not been updated

2020-01-13 Thread Vova Vysotskyi
Hi Bob,

Could you please create Jira and share more details on how to reproduce
this issue, so QA will be able to verify that it was fixed on a newer
version of the Driver.

Kind regards,
Volodymyr Vysotskyi


On Mon, Jan 13, 2020 at 8:16 PM Bob Rudis  wrote:

> HNY folks,
>
> This is more of an FYI vs anything else.
>
> I realize the intrepid/awesome Drill team has little control over MapR's
> speed of catching up to the latest releases but just in case others haven't
> upgraded to 1.17.0 (I just did today) and use ODBC, you'll see something
> like this in logs:
>
> [libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse
> message of type "exec.user.GetServerMetaResp" because it is missing
> required fields: server_meta.convert_support[506].to,
> server_meta.convert_support[545].from, server_meta.convert_support[545].to
>
> until MapR updates their side of this (this happened with previous
> releases and is 100% on MapR to publish new ODBC drivers).
>
> It seems to only impact Drill instance metadata retrieval (all my usual
> queries are working fine).
>
> REST API & JDBC (which I use via the {sergeant} & {sergeant.caffeinated} R
> pkgs) are fine (as expected).
>
> -Bob


Re: Drill Hangout

2020-01-13 Thread Vova Vysotskyi
Hi Charles,

I wouldn't be able to join Drill Hangouts during this week.

Kind regards,
Volodymyr Vysotskyi


On Mon, Jan 13, 2020 at 4:26 PM Charles Givre  wrote:

> Hello Drill Community,
> I'd like to propose a reinstitution of Drill Hangouts, perhaps every other
> week.  Given that the bulk of the development at the moment has shifted
> from the US to Europe, I'd like to propose 0930 ET / 1630EET / 0630 PT for
> the first one.  Depending on interest, we can rotate times.  Date TBD.
>
> Topics could include:
> - Drill - Arrow integration
> - Future work
> - Anything else of interest
>
> Thoughts?
> -- C


Re: Querying CockroachDB from Apache Drill

2020-01-08 Thread Vova Vysotskyi
Hi Marc,

I haven't tried querying CockroachDB but if the driver is the same as the
Postgres one, or suitable with JDBC spec, Drill should be able to query it.
For the instructions on how to configure it, please refer to
https://drill.apache.org/docs/rdbms-storage-plugin/.

Kind regards,
Volodymyr Vysotskyi


On Wed, Jan 8, 2020 at 12:31 PM Marc Sole Fonte  wrote:

> Hello,
>
> Has anybody here tried querying CockroachDB? I am pretty sure that the
> driver should be the same than for PostgreSQL but I have found no related
> info.
>
> Thank you for your help,
> Marc
>


Re: [ANNOUNCE] New Committer: Denys Ordynskiy

2019-12-30 Thread Vova Vysotskyi
Congrats Denys, well deserved!

Kind regards,
Volodymyr Vysotskyi


On Mon, Dec 30, 2019 at 2:25 PM Arina Ielchiieva  wrote:

> The Project Management Committee (PMC) for Apache Drill has invited Denys
> Ordynskiy to become a committer, and we are pleased to announce that he has
> accepted.
>
> Denys has been contributing to Drill for more than a year. He did many
> contributions as a QA, he found, tested and verified important bugs and
> features. Recently he has actively participated in Hadoop 3 migration
> verification and actively tested current and previous releases. He also
> contributed into drill-test-framework to automate Drill tests.
>
> Welcome Denys, and thank you for your contributions!
>
> - Arina
> (on behalf of Drill PMC)
>


Re: How to force Drill to offload performing LIMIT onto the target storage?

2019-12-18 Thread Vova Vysotskyi
>From the code, it looks like it happens to all functions that are present
in Drill even if such functions are part of the SQL standard (it was done
since function behavior may not be strictly described by standard and
therefore will be implementation-dependent).
But functions that absent in Drill and are present in SqlStdOperatorTable
<https://github.com/apache/calcite/blob/52a57078ba081b24b9d086ed363c715485d1a519/core/src/main/java/org/apache/calcite/sql/fun/SqlStdOperatorTable.java>
(if such exist) will be pushed to JDBC.

Kind regards,
Volodymyr Vysotskyi


On Wed, Dec 18, 2019 at 8:09 PM Andrew Pashkin 
wrote:

> You are right! Group by is being pushed to the storage if I remove
> TO_DATE(). Does it happen with all functions or only with some (perhaps
> the ones that are not in the SQL standard)?
>
> On 18.12.19 20:08, Vova Vysotskyi wrote:
> > Hi Andrew,
> >
> > Drill in general case pushes aggregations into the JDBC storage, but
> Drill
> > doesn't push functions into JDBC storage since some databases may not
> have
> > specific function implementations.
> >
> > In the plan, you have provided, the grouping is produced on top of
> > TO_DATE() function call. If it is possible, please rewrite the query to
> do
> > aggregations, and then call functions. In this case, Drill should be able
> > to push aggregation into JDBC storage.
> >
> > Regarding pushing limit into the JDBC storage, it is a bug which should
> be
> > fixed, I have created a Jira ticket for this: DRILL-7490
> > <https://issues.apache.org/jira/browse/DRILL-7490>.
> >
> > Kind regards,
> > Volodymyr Vysotskyi
> >
> >
> > On Wed, Dec 18, 2019 at 4:26 PM Andrew Pashkin  >
> > wrote:
> >
> >> Hello!
> >>
> >> I have a setup where I'm querying Impala server through JDBC connection
> >> using Drill and I noticed that when I'm doing a simple GROUP BY query
> >> with LIMIT with a small value the query runs for a very long time. And
> >> in the query profile I see that hundreds of thousands of rows are being
> >> fetched from the Impala server.
> >>
> >> The plan looks like this:
> >>
> >> 00-00Screen : rowType = RecordType(DATE payment_date, ANY
> total_sum):
> >> rowcount = 10.0, cumulative cost = {331.0 rows, 2671.0 cpu, 0.0 io, 0.0
> >> network, 1760.2 memory}, id = 67521
> >> 00-01  Project(payment_date=[$0], total_sum=[$1]) : rowType =
> >> RecordType(DATE payment_date, ANY total_sum): rowcount = 10.0,
> cumulative
> >> cost = {330.0 rows, 2670.0 cpu, 0.0 io, 0.0 network, 1760.2
> >> memory}, id = 67520
> >> 00-02SelectionVectorRemover : rowType = RecordType(DATE
> >> payment_date, ANY total_sum): rowcount = 10.0, cumulative cost = {320.0
> >> rows, 2650.0 cpu, 0.0 io, 0.0 network, 1760.2 memory}, id =
> >> 67519
> >> 00-03  Limit(fetch=[10]) : rowType = RecordType(DATE
> payment_date,
> >> ANY total_sum): rowcount = 10.0, cumulative cost = {310.0 rows, 2640.0
> cpu,
> >> 0.0 io, 0.0 network, 1760.2 memory}, id = 67518
> >> 00-04HashAgg(group=[{0}], total_sum=[SUM($1)]) : rowType =
> >> RecordType(DATE payment_date, ANY total_sum): rowcount = 10.0,
> cumulative
> >> cost = {300.0 rows, 2600.0 cpu, 0.0 io, 0.0 network, 1760.2
> >> memory}, id = 67517
> >> 00-05  Project(payment_date=[TO_DATE($0)],
> >> payed_in_usd_amt=[$11]) : rowType = RecordType(DATE date_field, REAL
> >> value_field): rowcount = 100.0, cumulative cost = {200.0 rows, 600.0
> cpu,
> >> 0.0 io, 0.0 network, 0.0 memory}, id = 67516
> >> 00-06Jdbc(sql=[SELECT * FROM `Impala`.``.` >> table>` ]) : rowType = RecordType(): rowcount = 100.0,
> >> cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0
> memory},
> >> id = 67435
> >>
> >> It seems like Drill issues SELECT * FROM  query to perform
> >> group-by and limit on the host which seems terribly inefficient and
> >> could be performed by Impala itself with much less effort.
> >>
> >> I wonder - is it possible to tune Drill somehow to perform
> >> limit/group-by using the storage capabilities (Impala in my case)? Or if
> >> such optimization is something do be developed - is it absent only for
> >> JDBC connections or for all storage types?
> >>
> >> --
> >> With kind regards, Andrew Pashkin.
> >> cell phone - +375 (44) 492-16-85
> >> Skype - waves_in_fluids
> >> e-mail - andrew.pash...@gmx.co.uk
> >>
> >>
>


Re: How to force Drill to offload performing LIMIT onto the target storage?

2019-12-18 Thread Vova Vysotskyi
Hi Andrew,

Drill in general case pushes aggregations into the JDBC storage, but Drill
doesn't push functions into JDBC storage since some databases may not have
specific function implementations.

In the plan, you have provided, the grouping is produced on top of
TO_DATE() function call. If it is possible, please rewrite the query to do
aggregations, and then call functions. In this case, Drill should be able
to push aggregation into JDBC storage.

Regarding pushing limit into the JDBC storage, it is a bug which should be
fixed, I have created a Jira ticket for this: DRILL-7490
.

Kind regards,
Volodymyr Vysotskyi


On Wed, Dec 18, 2019 at 4:26 PM Andrew Pashkin 
wrote:

> Hello!
>
> I have a setup where I'm querying Impala server through JDBC connection
> using Drill and I noticed that when I'm doing a simple GROUP BY query
> with LIMIT with a small value the query runs for a very long time. And
> in the query profile I see that hundreds of thousands of rows are being
> fetched from the Impala server.
>
> The plan looks like this:
>
> 00-00Screen : rowType = RecordType(DATE payment_date, ANY total_sum):
> rowcount = 10.0, cumulative cost = {331.0 rows, 2671.0 cpu, 0.0 io, 0.0
> network, 1760.2 memory}, id = 67521
> 00-01  Project(payment_date=[$0], total_sum=[$1]) : rowType =
> RecordType(DATE payment_date, ANY total_sum): rowcount = 10.0, cumulative
> cost = {330.0 rows, 2670.0 cpu, 0.0 io, 0.0 network, 1760.2
> memory}, id = 67520
> 00-02SelectionVectorRemover : rowType = RecordType(DATE
> payment_date, ANY total_sum): rowcount = 10.0, cumulative cost = {320.0
> rows, 2650.0 cpu, 0.0 io, 0.0 network, 1760.2 memory}, id =
> 67519
> 00-03  Limit(fetch=[10]) : rowType = RecordType(DATE payment_date,
> ANY total_sum): rowcount = 10.0, cumulative cost = {310.0 rows, 2640.0 cpu,
> 0.0 io, 0.0 network, 1760.2 memory}, id = 67518
> 00-04HashAgg(group=[{0}], total_sum=[SUM($1)]) : rowType =
> RecordType(DATE payment_date, ANY total_sum): rowcount = 10.0, cumulative
> cost = {300.0 rows, 2600.0 cpu, 0.0 io, 0.0 network, 1760.2
> memory}, id = 67517
> 00-05  Project(payment_date=[TO_DATE($0)],
> payed_in_usd_amt=[$11]) : rowType = RecordType(DATE date_field, REAL
> value_field): rowcount = 100.0, cumulative cost = {200.0 rows, 600.0 cpu,
> 0.0 io, 0.0 network, 0.0 memory}, id = 67516
> 00-06Jdbc(sql=[SELECT * FROM `Impala`.``.` table>` ]) : rowType = RecordType(): rowcount = 100.0,
> cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory},
> id = 67435
>
> It seems like Drill issues SELECT * FROM  query to perform
> group-by and limit on the host which seems terribly inefficient and
> could be performed by Impala itself with much less effort.
>
> I wonder - is it possible to tune Drill somehow to perform
> limit/group-by using the storage capabilities (Impala in my case)? Or if
> such optimization is something do be developed - is it absent only for
> JDBC connections or for all storage types?
>
> --
> With kind regards, Andrew Pashkin.
> cell phone - +375 (44) 492-16-85
> Skype - waves_in_fluids
> e-mail - andrew.pash...@gmx.co.uk
>
>


Re: Optional fields in Avro files.

2019-10-09 Thread Vova Vysotskyi
Hi Dan,

Thanks for bringing up this question.
Such behavior was intentional since Avro files already have schema and
Drill is able to use it during queries validation. It was done in the scope
of DRILL-3810 .
We have similar behavior for Hive tables - to fail if a non-existent column
is queried.

But I think it would be good to optionally allow returning null for Avro
tables to be consistent with other file storage formats.
Could you please log a Jira for this improvement?

Kind regards,
Volodymyr Vysotskyi


On Tue, Oct 8, 2019 at 10:14 PM Dan Schmitt  wrote:

> Similar to the "Check presence of field in json file" of last week, I was
> hoping to be able to fire drill off at avro files with evolving and
> variant schemas.
> (For the case where a field gets added over time, or a field is optional.)
>
> One of the suggestions for the json version was
>
> sqlTypeOf(someField)
>
> to help filter out files that don't have that value.   The avro parser
> chokes
> with
>
> Error: VALIDATION ERROR: From line 1, column 18 to line 1, column 21:
> Column 'someField' not found in any table
>
> instead of returning null (and the suggested EXISTS/ IS NOT NULL where
> clauses
> are also failing.)
>
> Is there a JIRA issue to support where clauses to sift out missing
> fields for Avro files?
>
>  Dan S.
>


Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2019-09-11 Thread Vova Vysotskyi
Hi Jiang,

Thanks for catching this issue. It was caused by the fix for DRILL-6524
, and this additional
structure is required to control method execution flow to be able to do
some useful things.

I have created DRILL-7372  to
address this issue, I'll provide more details after the investigation.

But for now, you can disable scalar replacement until this issue is not
fixed using the following option:
*set
`org.apache.drill.exec.compile.ClassTransformer.scalar_replacement`='off';*

Kind regards,
Volodymyr Vysotskyi


On Wed, Sep 11, 2019 at 12:29 AM Jiang Wu 
wrote:

> Hi Paul, thanks for the tip. I will set it up to see if that makes a
> difference.
>
> Looking at the heap dump, it appears that there are 19,971 O(20k)
> AssignmentTrackingFrame and
> the first AssignmentTrackingFrame has 1,260 localVariablesSet size.  And
> each localVariableSet is made of many individual Integer objects.  So in
> total, we have
>
>   O(20k) AssignmentTrackingFrame x
>   O(100) localVariablesSet per AssignmentTrackingFrame
>   O(10)   Integer objects per localVariablesSet
>
> Total ~O(20M) objects!
>
> Looking at the code on AssignmentTrackingFrame, this class has been updated
> in 1.16.0 to introduce a new member variable:
>
> *private* *final* Deque> localVariablesSet;
>
>
> The memory dump seems to indicate this new internal data structure is
> consuming a lot of memory.
>
> We have been running the same queries in Drill 1.14 multiple times a day
> over many months without memory issues.
>
> -- Jiang
>
>
>
> On Tue, Sep 10, 2019 at 1:09 PM Paul Rogers 
> wrote:
>
> > Hi Jiang,
> >
> > Many factors influence memory usage; the trick is to tease them apart.
> >
> > An obvious question is the memory use of your custom storage plugin. This
> > will depend on any buffering done by the plugin, the number of threads
> > (minor fragments) per node, and the number of concurrent queries. Since
> > this is your code, you presumably understand these issues.
> >
> > In the dump, it looks like you have many objects associated with Drill's
> > byte code optimization mechanism. Byte code size will be based on query
> > complexity. As a rough measure of query size, about how many K in size is
> > the SELECT statement you are trying to run? Very large expressions, or
> > large numbers of projected columns, could drive the generated code to be
> > large.
> >
> > If the problem is, indeed, related to the byte code rewrite, there is a
> > trick you can try: you can switch to using the "Plain Java" mechanism.
> > Briefly, this mechanism generates Java source code, then lets the
> compiler
> > generate byte codes directly without the usual Drill byte code rewrite.
> > This works because modern Java compilers are at least as good at Drill
> when
> > doing scalar replacement.
> >
> > Here are the options:
> >
> > drill.exec: {
> > compile: {
> >   compiler: "JDK",prefer_plain_java: true
> >   },
> >
> > This forces use of the JDK compiler (instead of Janino) and bypasses the
> > byte code rewrite step.
> >
> > No guarantee this will work, but something to try.
> >
> > Thanks,
> >
> > - Paul
> >
> >
> >
> > On Tuesday, September 10, 2019, 12:28:07 PM PDT, Jiang Wu
> >  wrote:
> >
> >  While doing testing against Apache Drill 1.16.0, we are running into
> this
> > error:  java.lang.OutOfMemoryError: GC overhead limit exceeded
> >
> > In our use case, Apache Drill is using a custom storage plugin and no
> other
> > storage plugins like PostgreSQL, MySQL, etc.  Some of the queries are
> very
> > large involving many subquery, join, functions, etc.  And we are running
> > through the same set of queries that work without issue in Drill version
> > 1.14.0.
> >
> > We generated a heap dump at the time of out of memory exception. Heap
> dump
> > file is about 5.8 GB.  Opening the dump showed:
> >
> > Heap:
> >   Size: 3.1 GB
> >   Classes: 21.1k
> >   Objects: 82.4m
> >   Class Loader: 538
> >
> > Showing the dominator tree for the allocated heap indicate two threads,
> > both with similar ownership stack for the bulk of the memory allocated.
> > E.g.
> >
> > Class Name
> > | Shallow Heap | Retained Heap | Percentage
> >
> >
> 
> > java.lang.Thread @ 0x73af9b238
> >  2288c900-b265-3988-1524-8e920a884075:frag:4:0 Thread|
> > 120 | 1,882,336,336 |56.51%
> > |- org.apache.drill.exec.compile.bytecode.MethodAnalyzer @ 0x73cc3fe88
> > |  56 | 1,873,674,888 |56.25%
> > |  |- org.objectweb.asm.tree.analysis.Frame[33487] @ 0x73d19b570
> > |  133,968 | 1,873,239,392 |56.24%
> > |  |  |-
> >
> >
> org.apache.drill.exec.compile.bytecode.MethodAnalyzer$AssignmentTrackingFrame
> > @ 

Re: Blocker on drill upgrade path

2019-04-22 Thread Vova Vysotskyi
Hi Nitin,

This behavior to allow aliases in a group by clause is driven by Calcite
and commonly used in other projects.
I think the workaround proposed by Aman is the best solution for this
problem, since for example if you have several aggregate functions in the
project for the same columns, it will cause problems with such naming.

Kind regards,
Volodymyr Vysotskyi


On Sat, Apr 20, 2019 at 8:44 AM Nitin Pawar  wrote:

> Right now the aliases are derived programmatically and we use the same name
> in group by as an alias and these are already defined in the jobs so we can
> not change them now
> That's one reason it became blocker as these jobs are configured and were
> running fine and suddenly started breaking.
>
> On Sat, Apr 20, 2019 at 5:24 AM Aman Sinha  wrote:
>
> > Interesting that it ran on 1.13..but I still think the new behavior is
> the
> > right one.  Several changes went into Calcite between Drill's 1.13 and
> 1.15
> > release, so I cannot identify when this behavior changed.   Can you use a
> > slightly different alias name ?  The following should work:
> > select max(last_name) *max_last_name* from cp.`employee.json` group
> by
> > last_name limit 5;
> >
> > On Fri, Apr 19, 2019 at 2:24 PM Nitin Pawar 
> > wrote:
> >
> > > sorry  my bad. i meant the query which was failing was with alias
> > > following is output on drill 1.13.0
> > >
> > > bash-3.2$ bin/drill-embedded
> > > Apr 20, 2019 2:46:45 AM org.glassfish.jersey.server.ApplicationHandler
> > > initialize
> > > INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29
> > > 01:25:26...
> > > apache drill 1.13.0-SNAPSHOT
> > > "a drill in the hand is better than two in the bush"
> > > 0: jdbc:drill:zk=local> select max(last_name) last_name from
> > > cp.`employee.json` group by
> > > . . . . . . . . . . . > last_name limit 5;
> > > ++
> > > | last_name  |
> > > ++
> > > | Nowmer |
> > > | Whelply|
> > > | Spence |
> > > | Gutierrez  |
> > > | Damstra|
> > > ++
> > >
> > >
> > > On Sat, Apr 20, 2019 at 1:40 AM Aman Sinha 
> wrote:
> > >
> > > > This is legal:
> > > >   select max(last_name)  from cp.`employee.json` group by last_name
> > limit
> > > > 5;
> > > > But this is not:
> > > >   select max(last_name) last_name from cp.`employee.json` group by
> > > > last_name limit 5;
> > > >
> > > > The reason is the second query is aliasing the max() output to
> > > 'last_name'
> > > > which is being referenced in the group-by clause.  Referencing an
> > > aggregate
> > > > expr in the group-by is not allowed by SQL standards, hence Calcite
> > > (which
> > > > does the parsing and validation, not Drill) throws this error during
> > > > validation phase.  Detailed error stack is below.  I don't think this
> > > would
> > > > have worked in 1.13 either.  My guess is you may have run the first
> > query
> > > > in 1.13 and that should still continue to work.
> > > >
> > > > Validation error thrown by Calcite:
> > > >
> > > > Caused By (org.apache.calcite.sql.validate.SqlValidatorException)
> > > Aggregate
> > > > expression is illegal in GROUP BY clause
> > > >
> > > > sun.reflect.NativeConstructorAccessorImpl.newInstance0():-2
> > > >
> > > > sun.reflect.NativeConstructorAccessorImpl.newInstance():62
> > > >
> > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
> > > >
> > > > java.lang.reflect.Constructor.newInstance():423
> > > >
> > > > org.apache.calcite.runtime.Resources$ExInstWithCause.ex():463
> > > >
> > > > org.apache.calcite.runtime.Resources$ExInst.ex():572
> > > >
> > > > org.apache.calcite.sql.SqlUtil.newContextException():787
> > > >
> > > > org.apache.calcite.sql.SqlUtil.newContextException():772
> > > >
> > > >
> > > >
> > >
> >
> org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError():4788
> > > >
> > > >
> > > >
> > >
> >
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateGroupClause():3941
> > > >
> > > >
> > >  org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3306
> > > >
> > > > org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
> > > >
> > > > org.apache.calcite.sql.validate.AbstractNamespace.validate():84
> > > >
> > > >
> > > >
> > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():977
> > > >
> > > >
> >  org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():953
> > > >
> > > > org.apache.calcite.sql.SqlSelect.validate():216
> > > >
> > > >
> > > >
> > > >
> > >
> >
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():928
> > > >
> > > > org.apache.calcite.sql.validate.SqlValidatorImpl.validate():632
> > > >
> > > > org.apache.drill.exec.planner.sql.SqlConverter.validate():207
> > > >
> > > > On Fri, Apr 19, 2019 at 12:39 PM Nitin Pawar <
> nitinpawar...@gmail.com>
> > > > wrote:
> > > >
> > > > > I think the error is not with storage plugin but with query parsing
> > > > >

Re: Newbie: problem launching Drill 1.15.0 on Windows

2019-02-05 Thread Vova Vysotskyi
Hi,

Please remove *$HOME/sqlline/history* file and try starting Drill again.

Kind regards,
Volodymyr Vysotskyi


On Tue, Feb 5, 2019 at 7:34 PM Leyne, Sean 
wrote:

> All,
>
> I am getting an error trying to open drill on Windows 10 (from an Admin
> instance of CMD.exe) using the instructions from the Drill in 10 Minutes
> tutorial.
>
>   C:\Drill\apache-drill-1.15.0\bin>sqlline -u "jdbc:drill:zk=local"
>   DRILL_ARGS - " -u jdbc:drill:zk=local"
>   HADOOP_HOME not detected...
>   HBASE_HOME not detected...
>   Calculating Drill classpath...
>   Exception in thread "main" java.lang.NumberFormatException: For
> input string: "select * from dfs.'d"
>   at java.lang.NumberFormatException.forInputString(Unknown
> Source)
>   at java.lang.Long.parseLong(Unknown Source)
>   at java.lang.Long.parseLong(Unknown Source)
>   at
> org.jline.reader.impl.history.DefaultHistory.addHistoryLine(DefaultHistory.java:108)
>   at
> org.jline.reader.impl.history.DefaultHistory.lambda$load$0(DefaultHistory.java:86)
>   at java.util.Iterator.forEachRemaining(Unknown Source)
>   at
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Unknown Source)
>   at java.util.stream.ReferencePipeline$Head.forEach(Unknown
> Source)
>   at
> org.jline.reader.impl.history.DefaultHistory.load(DefaultHistory.java:86)
>   at
> org.jline.reader.impl.history.DefaultHistory.attach(DefaultHistory.java:69)
>   at sqlline.SqlLine.getConsoleReader(SqlLine.java:614)
>   at sqlline.SqlLine.begin(SqlLine.java:510)
>   at sqlline.SqlLine.start(SqlLine.java:264)
>   at sqlline.SqlLine.main(SqlLine.java:195)
>
> Would appreciate all assistance.
>
>
> Sean
>
>
>


Re: to_date() string to date conversion ERROR

2018-10-03 Thread Vova Vysotskyi
Hello Herman,

I tried to reproduce this error, but all queries passed on my machine.
Could you please add more details about your env? Which version of Drill is
used, which timezone is set?
Is it reproduced with UTC timezone?

Kind regards,
Volodymyr Vysotskyi


On Mon, Oct 1, 2018 at 10:58 AM Herman Tan  wrote:

> Hi,
>
> I have a very puzzling error.
> Try the following SQL statements.
>
> What is the problem with '1982/01/01 00:01:00.0'?
> Error message: Illegal instant due to time zone offset transition
>
> select to_date('1981/12/31 00:00:00.0','/MM/dd
> HH:mm:ss.S') -- pass
> from (values(1))
>
> select to_date('1981/12/31 11:59:59.0','/MM/dd
> HH:mm:ss.S') -- pass
> from (values(1))
>
> select to_date('1982/01/01 00:00:00.0','/MM/dd
> HH:mm:ss.S') -- fail
> from (values(1))
>
> select to_date('1982/01/01 00:00:01.0','/MM/dd
> HH:mm:ss.S') -- fail
> from (values(1))
>
> select to_date('1982/01/01 00:01:00.0','/MM/dd
> HH:mm:ss.S') -- fail
> from (values(1))
>
> select to_date('1982/01/01 01:00:00.0','/MM/dd
> HH:mm:ss.S') -- pass
> from (values(1))
>
> select to_date('1982/01/02 00:00:00.0','/MM/dd
> HH:mm:ss.S') -- pass
> from (values(1))
>
> select to_date('1983/01/01 00:00:00.0','/MM/dd
> HH:mm:ss.S') -- pass
> from (values(1))
>
> Herman
>


Re: [ANNOUNCE] New Committer: Chunhui Shi

2018-09-28 Thread Vova Vysotskyi
Congratulations! Well deserved!

Kind regards,
Volodymyr Vysotskyi


On Fri, Sep 28, 2018 at 12:17 PM Arina Ielchiieva  wrote:

> The Project Management Committee (PMC) for Apache Drill has invited Chunhui
> Shi to become a committer, and we are pleased to announce that he has
> accepted.
>
> Chunhui Shi has become a contributor since 2016, making changes in various
> Drill areas. He has shown profound knowledge in Drill planning side during
> his work to support lateral join. He is also one of the contributors of the
> upcoming feature to support index based planning and execution.
>
> Welcome Chunhui, and thank you for your contributions!
>
> - Arina
> (on behalf of Drill PMC)
>


Re: [IDEAS] Drill start up quotes

2018-09-12 Thread Vova Vysotskyi
Two things are infinite: the universe and drill; and I'm not sure about the
universe. (Albert Einstein)
If drill hasn't profoundly shocked you, you haven't understood it yet.
(Niels Bohr)
Drill gives you meaning and purpose, and life is empty without it. (Stephen
Hawking)
Drill must go on. (Queen)

Kind regards,
Volodymyr Vysotskyi


On Tue, Sep 11, 2018 at 11:36 PM Oleksandr Kalinin 
wrote:

> Some random ideas, not sure how appropriate - feel free to choose/modify
> as necessary :)
>
> In use since 37 thousand years (according to Wikipedia, initial use of
> rotary instruments by homo sapiens is dated about 35000 BC)
> It’s not just Bosch or Black+Decker (Two major elec. drill brands)
> This drill bit is made of bits
> This product is free of steel, cobalt and titanium (Typical drill bit
> materials)
> Let’s drill something more solid than concrete
> Eye and hearing protection are not required when using this drill
> If only Mr Arnot knew ... (elec. drill inventor)
>
> Cheers,
> Alex
>
> > On 11 Sep 2018, at 19:27, Arina Yelchiyeva 
> wrote:
> >
> > Some quotes ideas:
> >
> > drill never goes out of style
> > everything is easier with drill
> >
> > Kunal,
> > regarding config, sounds reasonable, I'll do that.
> >
> > Kind regards,
> > Arina
> >
> >
> > On Tue, Sep 11, 2018 at 12:17 AM Benedikt Koehler  >
> > wrote:
> >
> >> You told me to drill sergeant! (Forrest Gump)
> >>
> >> Benedikt
> >> @furukama
> >>
> >>
> >> Kunal Khatua  schrieb am Mo. 10. Sep. 2018 um 21:01:
> >>
> >>> +1 on the suggestion.
> >>>
> >>> I would also suggest that we change the backend implementation of the
> >>> quotes to refer to a properties file (within the classpath) rather than
> >>> have it hard coded within the SqlLine package.  This will ensure that
> new
> >>> quotes can be added with every release without the need to touch the
> >>> SqlLine fork for Drill.
> >>>
> >>> ~ Kunal
> >>> On 9/10/2018 7:06:59 AM, Arina Ielchiieva  wrote:
> >>> Hi all,
> >>>
> >>> we are close to SqlLine 1.5.0 upgrade which now has the mechanism to
> >>> preserve Drill customizations. This one does include multiline support
> >> but
> >>> the next release might.
> >>> You all know that one of the Drill customizations is quotes at
> startup. I
> >>> was thinking we might want to fresh up the list a little bit.
> >>>
> >>> Here is the current list:
> >>>
> >>> start your sql engine
> >>> this isn't your grandfather's sql
> >>> a little sql for your nosql
> >>> json ain't no thang
> >>> drill baby drill
> >>> just drill it
> >>> say hello to my little drill
> >>> what ever the mind of man can conceive and believe, drill can query
> >>> the only truly happy people are children, the creative minority and
> drill
> >>> users
> >>> a drill is a terrible thing to waste
> >>> got drill?
> >>> a drill in the hand is better than two in the bush
> >>>
> >>> If anybody has new serious / funny / philosophical / creative quotes
> >>> ideas, please share and we can consider adding them to the existing
> list.
> >>>
> >>> Kind regards,
> >>> Arina
> >>>
> >> --
> >>
> >> --
> >> Dr. Benedikt Köhler
> >> Kreuzweg 4 • 82131 Stockdorf
> >> Mobil: +49 170 333 0161 • Telefon: +49 89 857 45 84
> >> Mail: bened...@eigenarbeit.org
> >>
>


Re: Apache Drill Automatically Converts Large Numbers to Exponents

2018-05-08 Thread Vova Vysotskyi
Hi Peter,

If the problem is only with displaying the numbers, you may convert it to
the string with the specified format using TO_CHAR(expression, format) UDF.

For more details please see
https://drill.apache.org/docs/data-type-conversion/#other-data-type-conversions

Kind regards,
Volodymyr Vysotskyi


вт, 8 трав. 2018 о 18:36 Peter Edike 
пише:

> Hello everyone,
>
> How can I prevent Apache Drill From Displaying large numers as exponents
> as this is not acceptable for my use case
>
>
> Kind regards
> Peter Edike
>


Re: Exception While Querying Decimal Fields in Apache Drill

2018-04-30 Thread Vova Vysotskyi
Thanks for the stack trace, it helped to find the root cause of this
problem.
Decimal values in parquet table are stored using BINARY primitive type, but
currently, Drill does not support decimals stored as binary.

The good news is that it will be fixed in DRILL-6094.

Kind regards,
Volodymyr Vysotskyi


пн, 30 квіт. 2018 о 11:44 Peter Edike 
пише:

>
>
> Hi
>
> Here is the stacktrace on the server side
>
> error_type: SYSTEM
> message: "SYSTEM ERROR: ClassCastException:
> org.apache.drill.exec.vector.NullableDecimal28SparseVector cannot be cast
> to org.apache.drill.exec.vector.VariableWidthVector\n\nFragment
> 2:8\n\n[Error Id: b144b650-99c3-4305-880f-d3a6794a2eab on
> BGDTEST3.INTERSWITCH.COM:31010]"
> exception {
>   exception_class:
> "org.apache.drill.common.exceptions.DrillRuntimeException"
>   message: "Error in parquet record reader.\nMessage: Failure in
> setting up reader\nParquet Metadata: ParquetMetaData{FileMetaData{schema:
> message postillion_data_schema_generator.postillion_super_switch_schema
> {\n  optional int64 post_tran_id;\n  optional int64 post_tran_cust_id;\n
> optional int32 settle_entity_id;\n  optional int32 batch_nr;\n  optional
> int64 prev_post_tran_id;\n  optional int64 next_post_tran_id;\n  optional
> binary sink_node_name (UTF8);\n  optional binary tran_postilion_originated
> (DECIMAL(1,0));\n  optional binary tran_completed (DECIMAL(1,0));\n
> optional binary message_type (UTF8);\n  optional binary tran_type
> (UTF8);\n  optional int64 tran_nr;\n  optional binary system_trace_audit_nr
> (UTF8);\n  optional binary rsp_code_req (UTF8);\n  optional binary
> rsp_code_rsp (UTF8);\n  optional binary abort_rsp_code (UTF8);\n  optional
> binary auth_id_rsp (UTF8);\n  optional binary auth_type (DECIMAL(1,0));\n
> optional binary auth_reason (DECIMAL(1,0));\n  optional binary
> retention_data (UTF8);\n  optional binary acquiring_inst_id_code (UTF8);\n
> optional binary message_reason_code (UTF8);\n  optional binary sponsor_bank
> (UTF8);\n  optional binary retrieval_reference_nr (UTF8);\n  optional int64
> datetime_tran_gmt (TIMESTAMP_MILLIS);\n  optional int64 datetime_tran_local
> (TIMESTAMP_MILLIS);\n  optional int64 datetime_req (TIMESTAMP_MILLIS);\n
> optional int64 datetime_rsp (TIMESTAMP_MILLIS);\n  optional int64
> realtime_business_date (TIMESTAMP_MILLIS);\n  optional int64
> recon_business_date (TIMESTAMP_MILLIS);\n  optional binary
> from_account_type (UTF8);\n  optional binary to_account_type (UTF8);\n
> optional binary from_account_id (UTF8);\n  optional binary to_account_id
> (UTF8);\n  optional binary tran_amount_req (DECIMAL(16,0));\n  optional
> binary tran_amount_rsp (DECIMAL(16,0));\n  optional binary
> settle_amount_impact (DECIMAL(16,0));\n  optional binary tran_cash_req
> (DECIMAL(16,0));\n  optional binary tran_cash_rsp (DECIMAL(16,0));\n
> optional binary tran_currency_code (UTF8);\n  optional binary
> tran_tran_fee_req (DECIMAL(16,0));\n  optional binary tran_tran_fee_rsp
> (DECIMAL(16,0));\n  optional binary tran_tran_fee_currency_code (UTF8);\n
> optional binary tran_proc_fee_req (DECIMAL(16,0));\n  optional binary
> tran_proc_fee_rsp (DECIMAL(16,0));\n  optional binary
> tran_proc_fee_currency_code (UTF8);\n  optional binary settle_amount_req
> (DECIMAL(16,0));\n  optional binary settle_amount_rsp (DECIMAL(16,0));\n
> optional binary settle_cash_req (DECIMAL(16,0));\n  optional binary
> settle_cash_rsp (DECIMAL(16,0));\n  optional binary settle_tran_fee_req
> (DECIMAL(16,0));\n  optional binary settle_tran_fee_rsp (DECIMAL(16,0));\n
> optional binary settle_proc_fee_req (DECIMAL(16,0));\n  optional binary
> settle_proc_fee_rsp (DECIMAL(16,0));\n  optional binary
> settle_currency_code (UTF8);\n  optional binary icc_data_req (UTF8);\n
> optional binary icc_data_rsp (UTF8);\n  optional binary pos_entry_mode
> (UTF8);\n  optional binary pos_condition_code (UTF8);\n  optional binary
> additional_rsp_data (UTF8);\n  optional binary structured_data_req
> (UTF8);\n  optional binary structured_data_rsp (UTF8);\n  optional binary
> tran_reversed (UTF8);\n  optional binary prev_tran_approved
> (DECIMAL(1,0));\n  optional binary issuer_network_id (UTF8);\n  optional
> binary acquirer_network_id (UTF8);\n  optional binary extended_tran_type
> (UTF8);\n  optional binary ucaf_data (UTF8);\n  optional binary
> from_account_type_qualifier (UTF8);\n  optional binary
> to_account_type_qualifier (UTF8);\n  optional binary bank_details
> (UTF8);\n  optional binary payee (UTF8);\n  optional binary
> card_verification_result (UTF8);\n  optional int32 online_system_id;\n
> optional int32 participant_id;\n  optional int32 opp_participant_id;\n
> optional binary receiving_inst_id_code (UTF8);\n  optional int32
> routing_type;\n  optional binary pt_pos_operating_environment (UTF8);\n
> optional binary pt_pos_card_input_mode (UTF8);\n  optional binary
> pt_pos_cardholder_auth_method (UTF8);\n  optional binary
> 

Re: Exception While Querying Decimal Fields in Apache Drill

2018-04-30 Thread Vova Vysotskyi
This part of the stack trace connected with the client side, could
you please share a stack trace that corresponds to the error on the server
side?

Kind regards,
Volodymyr Vysotskyi


пн, 30 квіт. 2018 о 10:44 Peter Edike <peter.ed...@interswitchgroup.com>
пише:

> Hi,
>
> Stacktrace as requested
>
> java.sql.SQLException: [MapR][DrillJDBCDriver](500165) Query execution
> error. Details: SYSTEM ERROR: ClassCastException:
> org.apache.drill.exec.vector.NullableDecimal28SparseVector cannot be cast
> to org.apache.drill.exec.vector.VariableWidthVector
> Fragment 2:18
> [Error Id: dba9df08-fb1d-4bd2-93e6-d08fb6f79ff1 on
> BGDTEST3.INTERSWITCH.COM:31010].
> at
> com.mapr.drill.drill.dataengine.DRQryResultListener.checkAndThrowException(Unknown
> Source)
> at
> com.mapr.drill.drill.dataengine.DRQryResultListener.getNextBatch(Unknown
> Source)
> at
> com.mapr.drill.drill.dataengine.DRJDBCResultSet.doLoadRecordBatchData(Unknown
> Source)
> at
> com.mapr.drill.drill.dataengine.DRJDBCResultSet.doMoveToNextRow(Unknown
> Source)
> at
> com.mapr.drill.drill.dataengine.DRJDBCQueryExecutor.execute(Unknown Source)
> at com.mapr.drill.jdbc.common.SStatement.executeNoParams(Unknown
> Source)
> at com.mapr.drill.jdbc.common.SStatement.execute(Unknown Source)
> at
> org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
> at
> org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
> at
> org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:581)
> at
> org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:692)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:498)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
> at
> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Caused by: com.mapr.drill.support.exceptions.GeneralException:
> [MapR][DrillJDBCDriver](500165) Query execution error. Details: SYSTEM
> ERROR: ClassCastException:
> org.apache.drill.exec.vector.NullableDecimal28SparseVector cannot be cast
> to org.apache.drill.exec.vector.VariableWidthVector
> Fragment 2:18
> [Error Id: dba9df08-fb1d-4bd2-93e6-d08fb6f79ff1 on
> BGDTEST3.INTERSWITCH.COM:31010].
> ... 21 more
>
>
> This message has been marked as CONFIDENTIAL on Monday, April 30, 2018 @
> 8:44:22 AM
>
> -Original Message-
> From: Vova Vysotskyi <vvo...@gmail.com>
> Sent: Friday, April 27, 2018 5:29 PM
> To: user@drill.apache.org
> Subject: Re: Exception While Querying Decimal Fields in Apache Drill
>
> Hi Peter,
>
> Could you please also share a stacktrace?
>
> Does the specified table
> contain pan, terminal_id, source_node_name, tran_completed, tran_reversed
> and tran_type columns? If it contains them, which types do they have?
>
> Kind regards,
> Volodymyr Vysotskyi
>
>
> пт, 27 квіт. 2018 о 17:47 Peter Edike <peter.ed...@interswitchgroup.com>
> пише:
>
> >
> >
> > Drill Version  1.12.0
> >
> >
> >
> > planner.enable_decimal_data_type is set to true on the system.
> >
> >
> >
> > --peter
> >
> >
> >
> >
> >
> > -Original Message-
> >
> > From: Andries Engelbrecht <aengelbre...@mapr.com>
> >
> > Sent: Friday, April 27, 2018 3:41 PM
> >
> > To: user@drill.apache.org
> >
> > Subject: Re: Exception While Querying Decimal Fields in Apache Drill
> >
> >
> >
> > What version of Drill are you using?
> >
> > Also is planner.enable_decimal_data_type set to true on the system?
> >
> >
> >
> > --Andries
> >
> >
> >
> > On 4/27/18, 7:24 AM, "Peter Edike" <peter.ed...@interswitchgroup

Re: Exception While Querying Decimal Fields in Apache Drill

2018-04-27 Thread Vova Vysotskyi
Hi Peter,

Could you please also share a stacktrace?

Does the specified table
contain pan, terminal_id, source_node_name, tran_completed, tran_reversed
and tran_type columns? If it contains them, which types do they have?

Kind regards,
Volodymyr Vysotskyi


пт, 27 квіт. 2018 о 17:47 Peter Edike 
пише:

>
>
> Drill Version  1.12.0
>
>
>
> planner.enable_decimal_data_type is set to true on the system.
>
>
>
> --peter
>
>
>
>
>
> -Original Message-
>
> From: Andries Engelbrecht 
>
> Sent: Friday, April 27, 2018 3:41 PM
>
> To: user@drill.apache.org
>
> Subject: Re: Exception While Querying Decimal Fields in Apache Drill
>
>
>
> What version of Drill are you using?
>
> Also is planner.enable_decimal_data_type set to true on the system?
>
>
>
> --Andries
>
>
>
> On 4/27/18, 7:24 AM, "Peter Edike" 
> wrote:
>
>
>
> Tried That, It did not work. Still Fails with the exception. Let me
> even add that even if the query is a simple select from statement, as
> long any of the fields is decimal type, the statement will fail with the
> stated exception
>
>
>
> Please Help a lot depends on this
>
>
>
> Best regards,
>
> Peter Edike
>
>
>
> This message has been marked as CONFIDENTIAL on Friday, April 27, 2018
> @ 3:24:36 PM
>
>
>
> -Original Message-
>
> From: Andries Engelbrecht 
>
> Sent: Friday, April 27, 2018 3:07 PM
>
> To: user@drill.apache.org
>
> Subject: Re: Exception While Querying Decimal Fields in Apache Drill
>
>
>
> Perhaps try to convert the predicate and select operations involving
> the decimal types to float or similar.
>
>
>
> i.e tran_completed = 1.0   and ((cast(SETTLE_AMOUNT_IMPACT as double)
> *(-1.0))/100.0)
>
>
>
> Alternatively you may have to cast the decimals as float, but that
> will be more cumbersome.
>
>
>
> --Andries
>
>
>
> On 4/27/18, 5:18 AM, "Peter Edike" 
> wrote:
>
>
>
> I am trying to run the following query in apache drill, I am
> querying data stored in parquet files using the following query
>
>
>
>
>
> select pan, count(*) as number_of_transactions ,
>
> terminal_id,SUM((cast(SETTLE_AMOUNT_IMPACT as double) *-1)/100) AS
> settle_amount_impact
>
>
>
>
>
> from
> dfs.`/iswdata/storage/products/superswitch/parquet/transactions`
>
>
>
> where pan like '506126%' and terminal_id like '1%' and
>
> sink_node_name like ('SWTDB%')
>
>and source_node_name not like ('SWTDBLsrc')
>
> and tran_completed=1
>
> and tran_reversed = 0
>
> and tran_postilion_originated = 1
>
> AND tran_type = '01'
>
> --and pan like '506126%0011'
>
> group by pan,terminal_id
>
>
>
> The Schema for the data I am querying is as follows
>
>
>
>
>
> post_tran_id LONG
>
>
>
>  post_tran_cust_id :LONG
>
>
>
>  settle_entity_id :INTEGER
>
>
>
>  batch_nr : INTEGER
>
>
>
>  prev_post_tran_id : LONG
>
>
>
>  next_post_tran_id : LONG
>
>
>
>  sink_node_name : STRING
>
>
>
>  tran_postilion_originated : DECIMAL
>
>
>
>  tran_completed : DECIMAL
>
>
>
>  tran_amount_req : DECIMAL
>
>
>
>  tran_amount_rsp : DECIMAL
>
>
>
>  settle_amount_impact : DECIMAL
>
>
>
>  tran_cash_req : DECIMAL
>
>
>
>  tran_cash_rsp : DECIMAL
>
>
>
> tran_currency_code : STRING
>
>
>
> tran_tran_fee_req : DECIMAL
>
>
>
> tran_tran_fee_rsp : DECIMAL
>
>
>
> tran_tran_fee_currency_code : STRING
>
>
>
> tran_proc_fee_req : DECIMAL
>
>
>
> tran_proc_fee_rsp : DECIMAL
>
>
>
> tran_proc_fee_currency_code : STRING
>
>
>
> settle_amount_req : DECIMAL
>
>
>
> settle_amount_rsp : DECIMAL
>
>
>
> settle_cash_req : DECIMAL
>
>
>
> settle_cash_rsp : DECIMAL
>
>
>
> settle_tran_fee_req : DECIMAL
>
>
>
> settle_tran_fee_rsp : DECIMAL
>
>
>
> settle_proc_fee_req : DECIMAL
>
>
>
> settle_proc_fee_rsp : DECIMAL
>
>
>
> settle_currency_code : STRING
>
>
>
> However When I run the query against the dataset, I get the
> following exception
>
>
>
>
>
> SYSTEM ERROR: ClassCastException:
> org.apache.drill.exec.vector.NullableDecimal28SparseVector cannot be cast
> to org.apache.drill.exec.vector.VariableWidthVector
>
>
>
>
>
> More so, the same error occurs when I include a decimal field in
> the select clause. Please, is there something I am missing or doing wrong,
> Any pointer will be deeply appreciated
>
>
>
> Kind Regards
>
>
>
> Peter
>
> 
>
>
>
>
>
>
>
>
>
>
>


Avro storage format behaviour

2018-02-28 Thread Vova Vysotskyi
Hi all,

I am working on DRILL-4120: dir0 does not work when the directory structure
contains Avro files.

In DRILL-3810 was added validation of query using avro schema before start
executing the query.
Therefore with these changes Drill throws an exception when the
query contains non-existent column and table has avro format.
Other storage formats such as json or parquet allow usage of non-existing
fields.

So here is my question: should we continue to treat avro as a format with
fixed schema, or we should start treating avro as a dynamic format to be
consistent with other storage formats?

-- 
Kind regards,
Volodymyr Vysotskyi


Re: Decimal Support Target Date & Workarounds?

2018-02-06 Thread Vova Vysotskyi
There are also problems with aggregate functions. Some of them, for
example, return the result with double type instead of decimal.

Queries on decimal columns may fail with exceptions, return the wrong
result or run slower than similar queries with other types.

For more details please see document [1]. There were analyzed major
problems, connected with the decimal data type and ways to improve current
approach and fix mentioned bugs.

[1]
https://docs.google.com/document/d/1kfWUZ_OsEmLOJu_tKZO0-ZUJwc52drrm15fyif7BmzQ/edit?usp=sharing

2018-02-06 16:28 GMT+02:00 <john.humphr...@nomura.com>:

> Oh cool, thank you so much for the information.
>
> Any chance you can elaborate on the nature of the problems?  Assuming you
> stayed away from decimal UDFs, would doing regular selects and aggregates
> (like max) on decimal columns cause failures?
>
> Also, are we talking about drill-bits actually crashing, or decimal
> related queries just throwing normal exceptions/errors?
>
> -----Original Message-
> From: Vova Vysotskyi [mailto:vvo...@gmail.com]
> Sent: Tuesday, February 06, 2018 5:24 AM
> To: user@drill.apache.org
> Subject: Re: Decimal Support Target Date & Workarounds?
>
> Hi John,
>
> Enabling decimal support does not make drill unstable if decimal types do
> not process.
> Problems may appear only when you trying to use decimal columns or some
> problematic decimal UDFs.
> Currently, we are actively working on completing decimal support and I
> suppose it will be available in 1.13 release.
>
>
> 2018-02-05 23:17 GMT+02:00 <john.humphr...@nomura.com>:
>
> > Hey,
> >
> > I’m on the MapR platform using Drill 1.10 at the minute.
> >
> > I have numerous parquet files that I need to query, but they have
> > decimal data types in them.
> >
> > I do not actually need to query the decimal columns in general, but
> > drill prevents me from querying the files as a whole even when I avoid
> > those columns.
> >
> > I know I can “enable” decimal support.  So I guess my questions are:
> >
> >
> > 1.   Will enabling decimal support actually make drill unstable, or
> > does it just slow some queries down?
> >
> > 2.   Is there any risk in enabling the support as long as I avoid
> > querying the decimal columns?
> >
> > 3.   Is decimal support being actively worked on and will it be
> > available soon-ish?
> >
> > Thanks,
> >
> > -John
> >
> >
> > PLEASE READ: This message is for the named person's use only. It may
> > contain confidential, proprietary or legally privileged information.
> > No confidentiality or privilege is waived or lost by any
> > mistransmission. If you receive this message in error, please delete
> > it and all copies from your system, destroy any hard copies and notify
> > the sender. You must not, directly or indirectly, use, disclose,
> > distribute, print, or copy any part of this message if you are not the
> > intended recipient. Nomura Holding America Inc., Nomura Securities
> > International, Inc, and their respective subsidiaries each reserve the
> > right to monitor all e-mail communications through its networks. Any
> > views expressed in this message are those of the individual sender,
> > except where the message states otherwise and the sender is authorized
> > to state the views of such entity. Unless otherwise stated, any
> > pricing information in this message is indicative only, is subject to
> > change and does not constitute an offer to deal at any price quoted.
> > Any reference to the terms of executed transactions should be treated as
> preliminary only and subject to our formal written confirmation.
> >
>
>
>
> --
> Kind regards,
> Volodymyr Vysotskyi
>
> PLEASE READ: This message is for the named person's use only. It may
> contain confidential, proprietary or legally privileged information. No
> confidentiality or privilege is waived or lost by any mistransmission. If
> you receive this message in error, please delete it and all copies from
> your system, destroy any hard copies and notify the sender. You must not,
> directly or indirectly, use, disclose, distribute, print, or copy any part
> of this message if you are not the intended recipient. Nomura Holding
> America Inc., Nomura Securities International, Inc, and their respective
> subsidiaries each reserve the right to monitor all e-mail communications
> through its networks. Any views expressed in this message are those of the
> individual sender, except where the message states otherwise and the sender
> is authorized to state the views of such entity. Unless otherwise stated,
> any pricing information in this message is indicative only, is subject to
> change and does not constitute an offer to deal at any price quoted. Any
> reference to the terms of executed transactions should be treated as
> preliminary only and subject to our formal written confirmation.
>



-- 
Kind regards,
Volodymyr Vysotskyi


Re: Decimal Support Target Date & Workarounds?

2018-02-06 Thread Vova Vysotskyi
Hi John,

Enabling decimal support does not make drill unstable if decimal types do
not process.
Problems may appear only when you trying to use decimal columns or some
problematic decimal UDFs.
Currently, we are actively working on completing decimal support and
I suppose it will be available in 1.13 release.


2018-02-05 23:17 GMT+02:00 :

> Hey,
>
> I’m on the MapR platform using Drill 1.10 at the minute.
>
> I have numerous parquet files that I need to query, but they have decimal
> data types in them.
>
> I do not actually need to query the decimal columns in general, but drill
> prevents me from querying the files as a whole even when I avoid those
> columns.
>
> I know I can “enable” decimal support.  So I guess my questions are:
>
>
> 1.   Will enabling decimal support actually make drill unstable, or
> does it just slow some queries down?
>
> 2.   Is there any risk in enabling the support as long as I avoid
> querying the decimal columns?
>
> 3.   Is decimal support being actively worked on and will it be
> available soon-ish?
>
> Thanks,
>
> -John
>
>
> PLEASE READ: This message is for the named person's use only. It may
> contain confidential, proprietary or legally privileged information. No
> confidentiality or privilege is waived or lost by any mistransmission. If
> you receive this message in error, please delete it and all copies from
> your system, destroy any hard copies and notify the sender. You must not,
> directly or indirectly, use, disclose, distribute, print, or copy any part
> of this message if you are not the intended recipient. Nomura Holding
> America Inc., Nomura Securities International, Inc, and their respective
> subsidiaries each reserve the right to monitor all e-mail communications
> through its networks. Any views expressed in this message are those of the
> individual sender, except where the message states otherwise and the sender
> is authorized to state the views of such entity. Unless otherwise stated,
> any pricing information in this message is indicative only, is subject to
> change and does not constitute an offer to deal at any price quoted. Any
> reference to the terms of executed transactions should be treated as
> preliminary only and subject to our formal written confirmation.
>



-- 
Kind regards,
Volodymyr Vysotskyi


Re: Invalid storage plugin name

2018-02-05 Thread Vova Vysotskyi
Hi Flavio,

Default identifier quotes[1] should be used to avoid this error:
SELECT * from `mysql-test`.mydb.mytable

Also, you may configure the type of identifier quotes as described in[1].

[1] https://drill.apache.org/docs/lexical-structure/#identifier-quotes

2018-02-05 11:11 GMT+02:00 Flavio Pompermaier :

> Hi to all,
> I'm trying to play around with storage plugins (on Drill 1.12) and I
> discovered that while I can successfully create a storage plugin containing
> a minusc character ('-') then I can't query against it (Error: PARSE ERROR:
> Encountered "-" at line 1, column XX).
>
> For example, I can create a MYSQL storage plugin mysql-test, but then I
> can't do
> SELECT * from mysql-test.mydb.mytable (because if the aforementioned parse
> error).
>
> I've tried the following queries as well but none of them works:
> SELECT * from 'mysql-test'.mydb.mytable
> SELECT * from "mysql-test".mydb.mytable
>
> Best,
> Flavio
>



-- 
Kind regards,
Volodymyr Vysotskyi


Re: Issue with time zone

2017-12-14 Thread Vova Vysotskyi
Hi Kostyantyn,

I just checked this issue:
1) With timezone America/New_York query fails as it was described:
0: jdbc:drill:zk=local> select to_timestamp('2015-03-08
02:58:51','-MM-dd HH:mm:ss') from sys.version;
Error: SYSTEM ERROR: IllegalInstantException: Cannot parse "2015-03-08
02:58:51": Illegal instant due to time zone offset transition
(America/New_York)

2) When I set the timezone in drill-env.sh using
export DRILL_JAVA_OPTS="-Duser.timezone=UTC"
timezone is set correctly, and the query returned the correct result:
0: jdbc:drill:zk=local> select to_timestamp('2015-03-08
02:58:51','-MM-dd HH:mm:ss') from sys.version;
++
| EXPR$0 |
++
| 2015-03-08 02:58:51.0  |
++
1 row selected (1.697 seconds)

Perhaps, you have used the wrong character instead of the double quote at
the end of the export string.
Please confirm if this helped to avoid this issue.

When I set the timezone in drill-env.sh, it was seen with ps.

Option user.timezone was deleted from drill options, so you could not see
it when running select * from sys.options where name like '%timezone%'

Also, drillbit could not be started on a local computer without zookeeper.



2017-12-13 18:53 GMT+02:00 Kostyantyn Krakovych :

> Hi Team,
>
> I faced with the issue described in http://www.openkb.info/2015/
> 05/understanding-drills-timestamp-and.html  05/understanding-drills-timestamp-and.html>
>
> Drill 1.11
>
> I run sqlline -u jdbc:drill:zk=local on local computer.
> Meantime I do not see user.timestamp option neither in sys.options nor in
> sys.boot.
> And the issue is not resolved when I set the parameter in drill-env.sh as
> export DRILL_JAVA_OPTS="-Duser.timezone=UTC”
> I do not see the option with ps -ef.
>
> N.B. I do not start drillbit. Though for the tool I confirm I see
> -Duser.timestamp=UTC with ps -ef | grep “user.timestamp” IF I start it, so
> it fails with other reason on local computer (Failure to connect to the
> zookeeper cluster service within the allotted time of 1 milliseconds.).
>
> Could you please advice on the issue.
>
>
> Best regards,
> Kostyantyn




-- 
Kind regards,
Volodymyr Vysotskyi


Re: Illegal Argument Exception while convert unix date format to drill timestamp

2017-12-14 Thread Vova Vysotskyi
Hi Divya,

Default timestamp format for Drill is '-MM-DD HH:MI:SS'.
For the cases, when you want to get a timestamp from a string with another
format, date pattern should be specified.

Drill allows using both Jodatime patterns (to_date, to_time, to_timestamp
functions, see allowed patterns at [1])
and sql-like patterns (sql_to_date, sql_to_time, sql_to_timestamp
functions, see allowed patterns at [2])

[1]
http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html
[2] https://issues.apache.org/jira/browse/DRILL-4864

2017-12-14 10:17 GMT+02:00 Arjun kr :

> Pls see if this works for you.
>
>
> 0: jdbc:drill:schema=dfs> SELECT TO_TIMESTAMP('Sun Apr 1 00:00:01 UTC
> 2018', 'E MMM d HH:mm:ss z ') FROM (VALUES(1));
> ++
> | EXPR$0 |
> ++
> | 2018-04-01 00:00:01.0  |
> ++
> 1 row selected (0.165 seconds)
> 0: jdbc:drill:schema=dfs>
>
>
>
> Thanks,
>
> Arjun kr
>
> 
> From: Divya Gehlot 
> Sent: Thursday, December 14, 2017 9:12 AM
> To: user@drill.apache.org
> Subject: Illegal Argument Exception while convert unix date format to
> drill timestamp
>
> Hi,
> Does Drill supprts to convert Unix date format to Drill timstamp ?
>
> Unix TimeStamp : Thu Dec 14 03:40:50 UTC 2017
> When I Cast to Drill time stamp I get Illegal Argument Exception.
>
>
> Thanks,
> Divya
>



-- 
Kind regards,
Volodymyr Vysotskyi


Re: Querying Data with period in name

2017-08-11 Thread Vova Vysotskyi
Hi John,

Fix for the DRILL-4264
 should
solve this issue. This error appears when you try to do *select **. But
while DRILL-4264  is not
fixed, you can try to do *select `**id.orig_h`*. It should not throw the
error.

Kind regards,
Volodymyr Vysotskyi

2017-08-11 21:07 GMT+03:00 John Omernik :

> Hey all,
>
> I am querying some json and parquet data that has dots in the name. Not all
> the data I may be querying will come from Drill, thus dot is a valid
> character... when I go to initially explore my data, Drill throws the error
> below when I run a select * query.
>
> I understand the error, and I can create a view, selecting each column out
> and renaming it for easier select *  in the future. However, as a user, if
> I get a new data set, this could (unless I am informed of another way here)
> force me to leave drill to explore my data.
>
> I get how using periods as field qualifiers causes issues... but if we had
> had a way to read a file to get the schema, to either produce the all the
> fields in a select query for easy view creation or a way to query with
> periods in the name that would be awesome! It would keep users IN drill
> instead of going elsewhere to explore their data.
>
> I am open to ideas!
>
>
>
>
>
>
>
>
>
>
> Error Returned - Code: 500
> Error Text:
> SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference
> "id.orig_h"; a field reference identifier must not have the form of a
> qualified name (i.e., with ".").
>
> Fragment 0:0
>
> [Error Id: 88acd3d8-4e44-49f6-b587-24bf26f89a3b on
> zeta4.brewingintel.com:20005]
>