CVE-2023-48362: Apache Drill: XXE Vulnerability in XML Format Reader

2024-07-23 Thread James Turton
Severity: moderate

Affected versions:

- Apache Drill 1.19.0 before 1.21.2

Description:

XXE in the XML Format Plugin in Apache Drill version 1.19.0 and greater allows 
a user to read any file on a remote file system or execute commands via a 
malicious XML file.
Users are recommended to upgrade to version 1.21.2, which fixes this issue.

This issue is being tracked as DRILL-8461 

Credit:

Yuzhe Huang (finder)

References:

https://drill.apache.org/
https://www.cve.org/CVERecord?id=CVE-2023-48362
https://issues.apache.org/jira/browse/DRILL-8461



[ANNOUNCE] Apache Drill 1.21.2 Released

2024-06-23 Thread James Turton
On behalf of the Apache Drill community, I am happy to announce the 
release of Apache Drill 1.21.2.


Drill is an Apache open-source SQL query engine for Big Data exploration.
Drill is designed from the ground up to support high-performance analysis
on the semi-structured and rapidly evolving data coming from modern Big
Data applications, while still providing the familiarity and ecosystem of
ANSI SQL, the industry-standard query language. Drill provides
plug-and-play integration with existing Apache Hive and Apache HBase
deployments.

For information about Apache Drill, and to get involved, visit the 
project website [1].


A total of 44 JIRA's are resolved in this bugfix release of Drill. For 
the full list please see release notes [2].


The binary and source artifacts are available here [3].

Thanks to everyone in the community who contributed to this release!


1. https://drill.apache.org/
2. https://drill.apache.org/docs/apache-drill-1-21-2-release-notes/
3. https://drill.apache.org/download/


This user agent is already supported !
[http://www.mozilla.com/thunderbird/]
[Options]

[RESULT] [VOTE] Release Apache Drill 1.21.2 RC1

2024-06-22 Thread James Turton
The vote passes. Thanks to everyone who has tested the release candidate 
and given their comments and votes. Final tally:


3x +1 (binding): Charles, James, Maksym

5x +1 (non-binding): none

No 0s or -1s.

I'll start process for pushing the release artifacts and send an 
announcement once propagated.


Kind regards
James



Re: ChannelClosedException

2024-05-17 Thread James Turton
It sounds like the query over the full set is crashing your Drillbit, 
perhaps due to an OOM error.


On 2023/10/12 13:58, muhl...@ntokoto.co.za wrote:

Hi,
I am running directly from the drill. The main application I want to run from 
is QuerySurge.

Is there any solution you can advise?

Regards,
Muhluri

-Original Message-
From: James Turton 
Sent: Thursday, 12 October 2023 13:43
To: muhl...@ntokoto.co.za
Subject: Re: ChannelClosedException

Also, please look in the Drillbit logs. The query may be encountering invalid 
data and failing when you remove your LIMIT clause. Are you using DBeaver to 
submit your queries btw? That error looks familiar...

On 2023/10/10 15:58, Charles Givre wrote:

Hi Desmond,
Can you share a bit more?  What version of Drill are you running?  Java?  etc.  
 What data are you trying to query?
Best,
-- C




On Oct 10, 2023, at 2:02 AM,   
wrote:

Hi Team,



When I run a Query for a limited number of records it returns the results.
When I remove the limit on the Query it fails with below error. May
you please assist.





Exception: SQL_EXCEPTION java.sql.SQLException: CONNECTION ERROR:
Connection <-->  (user client) closed unexpectedly. Drillbit down? [Error Id:
fba8461f-c896-4c56-b617-1625361e577d ]



Regards,

Muhluri









Re: Data Fragments

2024-05-17 Thread James Turton
In order to scale query execution horizontally, Drill divides it up over 
over all (well, 70% by default) of the CPUs available to the cluster by 
slicing physical plans up into "major fragments" and then those into 
"minor fragments". You can roughly think of a minor fragment as a single 
thread of execution at runtime. I've included some further reading below.


1. https://drill.apache.org/architecture/
2. Learning Apache Drill (O'Reilly)

On 2022/12/06 00:05, marc nicole wrote:

Hi,
Could somebody explain the notion of fragments in Drill and why would a
query be executed on each of the data fragments?
Thanks.





Re: ANALYZE TABLE giving error

2024-05-17 Thread James Turton

Hi Prabhakar

I don't believe that the metastore supports tables that have a varying 
schema.


Regards
James

On 2023/09/25 14:40, Prabhakar Bhosale wrote:

Hi Team,
I am getting the following error when I run the analyze query command on
parquet tables.

The sequence of activities are as follows

1. Run analyze table command on a parquet table (folder having one parquet
file) with 3 columns
2. add another parquet file in that folder but add one additional column in
it.
3. Run analyze table command

It gives the following error. Please advise.

org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
NullPointerException: Cannot invoke
"org.apache.drill.metastore.statistics.ColumnStatistics.get(org.apache.drill.metastore.statistics.StatisticsKind)"
because the return value of
"org.apache.drill.metastore.metadata.BaseMetadata.getColumnStatistics(org.apache.drill.common.expression.SchemaPath)"
is null

REgards
Prabhakar





Re: ODBC Drivers

2024-05-17 Thread James Turton

I've updated the Drill docs to reflect the information below. E.g.

https://drill.apache.org/docs/installing-the-driver-on-windows/#step-1-download-the-drill-odbc-driver

On 2024/04/10 18:57, Loiseau, Valery wrote:

Hi,

You must create a HPE passport account with your mail or you already have 
maybe, then you must obtain a token :
https://docs.ezmeral.hpe.com/datafabric-customer-managed/75/AdvancedInstallation/Obtaining_a_Token.html

With your token and your passport mail then you can connect to 
https://package.ezmeral.hpe.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc_v1.5.1.1002/

Valéry Loiseau
HPE Ezmeral



-Original Message-
From: James Turton 
Sent: Wednesday, April 10, 2024 4:37 PM
To: user@drill.apache.org; Romain Bugey 
Cc: Matthias Fröhlich 
Subject: Re: ODBC Drivers

Sadly, HPE relicensed the Drill ODBC previously distributed by MapR so that 
it's no longer freely available. Until an open source driver comes along the 
only option I know of is to contact HPE to request a license.

On 2024/04/09 17:13, Romain Bugey wrote:

Dear Team,

We’re using Tableau from Salesforce and for a new installation we need
the ODBC drivers for windows 2022.

When we follow the procedure on the tableau website we arrive on your
download page but when we click on the drivers it’s asking for a user
and password that I don’t have.

Can you help us ?

Thanks

Romain

**

*Romain Bugey*

IT Coordinator – Global Infrastructure

Esplanade de Pont-Rouge 4

1212 Grand-Lancy

Switzerland

Tel (41) 22 544 46 19

Mobile (41) 79 432 60 16

romain.bu...@alvean.com <mailto:romain.bu...@alvean.com>

The information contained in this email and attachments are privileged
and confidential. It is intended solely for the addressee(s). If you
are not an intended recipient, any disclosure, copying, distribution
of the contents of this email is strictly prohibited and unlawful. If
you are not the intended addressee, please notify the sender
immediately by replying to this email and delete it from your computer.




[VOTE] Release Apache Drill 1.21.2 - RC1

2024-05-17 Thread James Turton

Hi all

I'd like to propose the second release candidate (RC1) of Apache Drill, 
version 1.21.2.


The release candidate covers a total of 44 resolved Jira issues [1]. 
Thanks to everyone who contributed to this release.


The tarball artefacts are hosted at [2] and the Maven artefacts are 
hosted at [3].


This release candidate is based on 
6b6a90fb6a03576d8d2b6a1858f72fb01b86d877 located at [4].


CI runs for the release candidate are viewable at [5].

[ ] +1
[ ] +0
[ ] -1

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12353550

[2] https://dist.apache.org/repos/dist/dev/drill/1.21.2-rc1/
[3] https://repository.apache.org/content/repositories/orgapachedrill-1109/
[4] https://github.com/jnturton/drill/commits/drill-1.21.2
[5] https://github.com/apache/drill/actions/runs/9132581568


Re: [VOTE] Release Apache Drill 1.21.2 - RC0

2024-04-26 Thread James Turton
To keep the mailing list appraised of goings on, some regressions are 
being fixed (DRILL-8493, DRILL-8329) and another RC will be prepared. 
Here's my -1 for RC0.


On 2024/04/15 11:26, James Turton wrote:

Hi all

I'd like to propose the first release candidate (RC0) of Apache Drill, 
version 1.21.2.


The release candidate covers a total of 38 resolved Jira issues [1]. 
Thanks to everyone who contributed to this release.


The tarball artefacts are hosted at [2] and the Maven artefacts are 
hosted at [3].


This release candidate is based on 
a98e5f50405437a2fd670fdcb5840796fadfa6b2 located at [4].


CI runs for the release candidate are viewable at [5].

[ ] +1
[ ] +0
[ ] -1

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12353550&projectId=12313820 
<https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12353550&projectId=12313820> 


[2] https://dist.apache.org/repos/dist/dev/drill/1.21.2-rc0/
[3] 
https://repository.apache.org/content/repositories/orgapachedrill-1108/

[4] https://github.com/jnturton/drill/commits/drill-1.21.2
[5] 
https://github.com/apache/drill/actions/runs/8685131686/job/23813981058




[VOTE] Release Apache Drill 1.21.2 - RC0

2024-04-15 Thread James Turton

Hi all

I'd like to propose the first release candidate (RC0) of Apache Drill, 
version 1.21.2.


The release candidate covers a total of 38 resolved Jira issues [1]. 
Thanks to everyone who contributed to this release.


The tarball artefacts are hosted at [2] and the Maven artefacts are 
hosted at [3].


This release candidate is based on 
a98e5f50405437a2fd670fdcb5840796fadfa6b2 located at [4].


CI runs for the release candidate are viewable at [5].

[ ] +1
[ ] +0
[ ] -1

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12353550&projectId=12313820 


[2] https://dist.apache.org/repos/dist/dev/drill/1.21.2-rc0/
[3] https://repository.apache.org/content/repositories/orgapachedrill-1108/
[4] https://github.com/jnturton/drill/commits/drill-1.21.2
[5] https://github.com/apache/drill/actions/runs/8685131686/job/23813981058


Re: ODBC Drivers

2024-04-10 Thread James Turton

Thanks for clarifying this here!

On 2024/04/10 18:57, Loiseau, Valery wrote:

Hi,

You must create a HPE passport account with your mail or you already have 
maybe, then you must obtain a token :
https://docs.ezmeral.hpe.com/datafabric-customer-managed/75/AdvancedInstallation/Obtaining_a_Token.html

With your token and your passport mail then you can connect to 
https://package.ezmeral.hpe.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc_v1.5.1.1002/

Valéry Loiseau
HPE Ezmeral



-Original Message-
From: James Turton 
Sent: Wednesday, April 10, 2024 4:37 PM
To: user@drill.apache.org; Romain Bugey 
Cc: Matthias Fröhlich 
Subject: Re: ODBC Drivers

Sadly, HPE relicensed the Drill ODBC previously distributed by MapR so that 
it's no longer freely available. Until an open source driver comes along the 
only option I know of is to contact HPE to request a license.

On 2024/04/09 17:13, Romain Bugey wrote:

Dear Team,

We’re using Tableau from Salesforce and for a new installation we need
the ODBC drivers for windows 2022.

When we follow the procedure on the tableau website we arrive on your
download page but when we click on the drivers it’s asking for a user
and password that I don’t have.

Can you help us ?

Thanks

Romain

**

*Romain Bugey*

IT Coordinator – Global Infrastructure

Esplanade de Pont-Rouge 4

1212 Grand-Lancy

Switzerland

Tel (41) 22 544 46 19

Mobile (41) 79 432 60 16

romain.bu...@alvean.com <mailto:romain.bu...@alvean.com>

The information contained in this email and attachments are privileged
and confidential. It is intended solely for the addressee(s). If you
are not an intended recipient, any disclosure, copying, distribution
of the contents of this email is strictly prohibited and unlawful. If
you are not the intended addressee, please notify the sender
immediately by replying to this email and delete it from your computer.




Re: ODBC Drivers

2024-04-10 Thread James Turton
Sadly, HPE relicensed the Drill ODBC previously distributed by MapR so 
that it's no longer freely available. Until an open source driver comes 
along the only option I know of is to contact HPE to request a license.


On 2024/04/09 17:13, Romain Bugey wrote:


Dear Team,

We’re using Tableau from Salesforce and for a new installation we need 
the ODBC drivers for windows 2022.


When we follow the procedure on the tableau website we arrive on your 
download page but when we click on the drivers it’s asking for a user 
and password that I don’t have.


Can you help us ?

Thanks

Romain

**

*Romain Bugey*

IT Coordinator – Global Infrastructure

Esplanade de Pont-Rouge 4

1212 Grand-Lancy

Switzerland

Tel (41) 22 544 46 19

Mobile (41) 79 432 60 16

romain.bu...@alvean.com 

The information contained in this email and attachments are privileged 
and confidential. It is intended solely for the addressee(s). If you 
are not an intended recipient, any disclosure, copying, distribution 
of the contents of this email is strictly prohibited and unlawful. If 
you are not the intended addressee, please notify the sender 
immediately by replying to this email and delete it from your computer. 


Re: How to config Drill in Squirrel SQL with the host and port where the database is

2023-12-06 Thread James Turton
There no doubt is but I think that already requires one to write some 
code, at which point I'd personally rather try to work closer to what I 
see as the source of the trouble.


On 2023/12/07 08:04, Charles Givre wrote:

James,
I wonder if there would be a way to simply ignore or remove the leading DRILL?  Of 
course, if someone names a plugin "drill" that would then cause problems.
--C


On Dec 7, 2023, at 00:41, James Turton  wrote:

This is what I consider to be a misfeature in Drill. At some point it was 
decided that rooting the INFORMATION_SCHEMA hierarchy in a fictitious catalog 
named DRILL would increase Drill's compatibility with BI tools. What breaks as 
a result is the correspondence between INFORMATION_SCHEMA and schema paths in 
queries. Clients like Squirrel generate queries like

SELECT * FROM DRILL.dfs.tmp.a_table

which go on to fail because of the leading 'DRILL.'.

For now you can write SELECTs by hand to view table content. I have got a Jira 
open to adda toggle tro Drill that switches off the DRILL catalog in 
INFORMATION_SCHEMA but I haven't started development yet.

On 2023/12/05 23:58, yu sun wrote:

Hi James,
Thank you so much for your help!
It works! I query successfully in drill bash. But in Squirrel SQL. I cannot get 
the table content show up due to the following error:
2023-12-05 10:20:24,573 [1a90a426-e547-8df5-db2a-be595f708cb9:foreman] INFO  
o.a.d.e.p.s.conversion.SqlConverter - User Error Occurred: From line 1, column 
19 to line 1, column 60: Object 'DRILL' not found (From line 1, column 19 to 
line 1, column 60: *Object 'DRILL' not found*)
org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: From line 
1, column 19 to line 1, column 60: Object 'DRILL' not found
[Error Id: d10ed29c-6559-4805-8405-2cdfb0db3350 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:688)
at 
org.apache.drill.exec.planner.sql.conversion.SqlConverter.validate(SqlConverter.java:220)
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:662)
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:198)
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:172)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:298)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:179)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:129)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:94)
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:594)
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1, 
column 19 to line 1, column 60: Object 'DRILL' not found
at sun.reflect.GeneratedConstructorAccessor110.newInstance(Unknown Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:505)
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:945)
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:930)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:5464)
at 
org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:183)
at 
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:188)
at 
org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:88)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1135)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:1106)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3429)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3408)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3766)
at 
org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:61)
at 
org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:88)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1135)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:1106)
at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:282)
at

Re: How to config Drill in Squirrel SQL with the host and port where the database is

2023-12-06 Thread James Turton
reflect.GeneratedConstructorAccessor109.newInstance(Unknown Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:505)

at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:599)
... 31 common frames omitted

The database structure looks like that. And when I click on one table, 
the query log show:

{"schema":"jdbc.schema.oauser","start":1701796824382,"finish":1701796824574,"outcome":"FAILED","remoteAddress":"127.0.0.1:59371 
<http://127.0.0.1:59371>","id":"1a90a426-e547-8df5-db2a-be595f708cb9","query":"select tbl.* from `DRILL`.`jdbc.schema.oauser`.`AG_MANIFEST` 
tbl","user":"t1"}
image.png

Do you know how to configure 'DRILL' so it can be recognize in Squirrel SQL?

Thank you!
Yu


James Turton mailto:dz...@apache.org>> 于2023年12月1 
日周五 23:16写道:


__

jdbc:drill:drillbit=:31010

The above should work for connecting to embedded Drill running on a remote 
host. If it doesn't then I'd test whether Drill has bound to port 31010 on a 
reachable IP address and whether any firewall is interfering.

Related:
https://drill.apache.org/docs/ports-and-bind-addresses-used-by-drill/ 
<https://drill.apache.org/docs/ports-and-bind-addresses-used-by-drill/>

On 2023/12/01 20:38, yu sun wrote:

Hi there,

I'm trying to replace the previous JDBC driver with Drill. However, I had a
hard time configuring Squirrel SQL after going through all the tutorials.

My previous alias URL has the format below:
jdbc:dataaccess://:;ServerDataSource=

I want to use the embedded mode with Drill. The below URL works in Squirrel
SQL.
jdbc:drill:drillbit=localhost:31010
But how can I visit the database in the specific host and port, rather than
my localhost?
Where shall I put the information of :;ServerDataSource=?

I tried to use jdbc:drill:drillbit=:;ServerDataSource= but it won't work.

Thank you so much for your help!
Yu Sun





Re: How to config Drill in Squirrel SQL with the host and port where the database is

2023-12-01 Thread James Turton

jdbc:drill:drillbit=:31010

The above should work for connecting to embedded Drill running on a remote 
host. If it doesn't then I'd test whether Drill has bound to port 31010 on a 
reachable IP address and whether any firewall is interfering.

Related: 
https://drill.apache.org/docs/ports-and-bind-addresses-used-by-drill/


On 2023/12/01 20:38, yu sun wrote:

Hi there,

I'm trying to replace the previous JDBC driver with Drill. However, I had a
hard time configuring Squirrel SQL after going through all the tutorials.

My previous alias URL has the format below:
jdbc:dataaccess://:;ServerDataSource=

I want to use the embedded mode with Drill. The below URL works in Squirrel
SQL.
jdbc:drill:drillbit=localhost:31010
But how can I visit the database in the specific host and port, rather than
my localhost?
Where shall I put the information of :;ServerDataSource=?

I tried to use jdbc:drill:drillbit=:;ServerDataSource= but it won't work.

Thank you so much for your help!
Yu Sun



Re: 揭秘:如何自救?

2023-11-30 Thread James Turton

I've checked and this is spam, ignore.

On 2023/11/30 07:33, Zhang Bohai wrote:

你好!

我有多种病:胆囊炎,肝叶肿大,胃下垂,骨质酥松,免疫力低下,眩晕症。虽然不是要命的病,可是折腾的我死去活来,药没少吃,针也没少打,可就是治不好我的病,几次想死,都没死成。
就在我人生走到尽头的时候,我遇到了这万古不遇的高德,从此以后,我走上一条返本归真的路。


不长时间,师父给我净化了身体,我无病一身轻让我体悟到了没有病的幸福感,在中修炼,身体健康,心胸宽阔,平静而祥和,心中很少有苦恼,我很庆幸自己能修炼。

了解更多,请看附件


Re: elasticsearch connection

2023-11-29 Thread James Turton
I've got a local ES storage config which I assume from some testing with 
Drill 1.21.1 I did a while back. It looks this after some obfuscation, 
in case this helps.


{
  "type" : "elastic",
  "hosts" : [ 
"https://47f0d48fa8846d___.us-central1.gcp.cloud.es.io"; ],

  "authMode" : "SHARED_USER",
  "disableSSLVerification" : false,
  "credentialsProvider" : {
    "credentialsProviderType" : "PlainCredentialsProvider",
    "credentials" : {
  "username" : "elastic",
  "password" : "_"
    },
    "userCredentials" : { }
  },
  "enabled" : false
}


On 2023/11/28 04:41, 河村裕太 wrote:

I am having trouble configuring the elasticsearch connection from apachedrill.
I can't set "hosts" when configuring plugIn, so I would like to see a sample 
configuration.





Re: Deployment architecture of drill

2023-10-26 Thread James Turton

Hi Prabhakar

ZooKeeper. It will make no meaningful difference to Drill which nodes 
you run it on since it is only used for configuration and control. I'd 
probably put it on the two namenodes since it is similar in spirit and 
has no need of the more serious storage that is likely to be installed 
on the datanodes.


Drill. To take advantage of data locality in Hadoop you would install 
Drill on the datanodes of your Hadoop cluster. Note, however, that for a 
small cluster like this, and given modern LAN speeds, data locality can 
often just be ignored. Nevertheless I'd personally start by installing 
Drillbits on each of the three datanodes.


Regards
James

On 2023/10/26 10:12, Prabhakar Bhosale wrote:

Hi Team,
I am looking for deployment architecture of drill  on production server. I
have following configurations

HAdoop cluster
2 name nodes - Faiiover
3 data nodes

I have following questions
1. should zookeeper to be installed only on name node or both datanode and
namenodes?
2. Should drill be installed on all nodes or only namenodes or only
datanodes?

Regards
Prabhakar





Re: ChannelClosedException

2023-10-16 Thread James Turton
I've looked at the log files and what I see is that Drillbit becomes 
unable to maintain its end of the JDBC connection to CLI that you're 
working in.


You can try to set drill.exec.rpc.user.timeout to 0 in 
drill-override.conf, which will disable the relevant timeout. Disabling 
this timeout isn't any kind of solution, just a way to keep the query 
running while you look at what's making the Drillbit unresponsive. My 
bet is that physical RAM is being exhausted causing the OS to start 
heavy paging of data to swap space. If you look at OS RAM and swap 
metrics while the query is running it should be clear if this is this case.


If it is the case then it is probably better to let Drill's hash join 
spill data to disk itself than it is to oversubscribe physical memory 
and force the OS to swap (e.g. by setting too big a number in a variable 
like DRILLBIT_MAX_PROC_MEM).


Regards
James

On 2023/10/13 10:31, muhl...@ntokoto.co.za wrote:

Hi,
Let me regenerate the error again I will share the log file once done.
I have 3 log file:
Which one of those I must share:
sqlline.log
sqlline.log.1
sqlline_queries.log

Files are a bit big, Which email can I use to share the bigger files?
Regards,
Muhluri

-Original Message-
From: James Turton 
Sent: Friday, 13 October 2023 10:16
To: user@drill.apache.org; muhl...@ntokoto.co.za
Subject: Re: ChannelClosedException

Going further, it looks like your entire Drillbit is crashing so this is 
something severe, perhaps an out-of-memory situation. The sqlline.log file 
should provide some clues.

On 2023/10/13 09:53, James Turton wrote:

The errors that reach the surface in the embedded Drill CLI do not
include all of the errors logged by Drill server processes. Please
look for the file sqlline.log which by default will be in the log/
subdirectory of your Drill installation...

On 2023/10/12 15:03, muhl...@ntokoto.co.za wrote:

Hi,
I am running from the drill console, The application I want to use it
on is QuerySurge. I am getting the same error message in both platforms.

Regards,
Muhluri

-Original Message-
From: James Turton 
Sent: Thursday, 12 October 2023 13:43
To: muhl...@ntokoto.co.za
Subject: Re: ChannelClosedException

Also, please look in the Drillbit logs. The query may be encountering
invalid data and failing when you remove your LIMIT clause. Are you
using DBeaver to submit your queries btw? That error looks familiar...

On 2023/10/10 15:58, Charles Givre wrote:

Hi Desmond,
Can you share a bit more?  What version of Drill are you running?
Java?  etc.   What data are you trying to query?
Best,
-- C




On Oct 10, 2023, at 2:02 AM, 
 wrote:

Hi Team,



When I run a Query for a limited number of records it returns the
results.
When I remove the limit on the Query it fails with below error. May
you please assist.





Exception: SQL_EXCEPTION java.sql.SQLException: CONNECTION ERROR:
Connection <-->  (user client) closed unexpectedly. Drillbit down?
[Error Id:
fba8461f-c896-4c56-b617-1625361e577d ]



Regards,

Muhluri












Re: ChannelClosedException

2023-10-13 Thread James Turton
sqlline.log will contain the most recent information. You can zip it and 
send it directly to me at this address. The log file should not usually 
contain any sensitive information but please look over it and remove 
anything you might want to nonetheless.


On 2023/10/13 10:31, muhl...@ntokoto.co.za wrote:

Hi,
Let me regenerate the error again I will share the log file once done.
I have 3 log file:
Which one of those I must share:
sqlline.log
sqlline.log.1
sqlline_queries.log

Files are a bit big, Which email can I use to share the bigger files?
Regards,
Muhluri

-Original Message-
From: James Turton 
Sent: Friday, 13 October 2023 10:16
To: user@drill.apache.org; muhl...@ntokoto.co.za
Subject: Re: ChannelClosedException

Going further, it looks like your entire Drillbit is crashing so this is 
something severe, perhaps an out-of-memory situation. The sqlline.log file 
should provide some clues.

On 2023/10/13 09:53, James Turton wrote:

The errors that reach the surface in the embedded Drill CLI do not
include all of the errors logged by Drill server processes. Please
look for the file sqlline.log which by default will be in the log/
subdirectory of your Drill installation...

On 2023/10/12 15:03, muhl...@ntokoto.co.za wrote:

Hi,
I am running from the drill console, The application I want to use it
on is QuerySurge. I am getting the same error message in both platforms.

Regards,
Muhluri

-Original Message-
From: James Turton 
Sent: Thursday, 12 October 2023 13:43
To: muhl...@ntokoto.co.za
Subject: Re: ChannelClosedException

Also, please look in the Drillbit logs. The query may be encountering
invalid data and failing when you remove your LIMIT clause. Are you
using DBeaver to submit your queries btw? That error looks familiar...

On 2023/10/10 15:58, Charles Givre wrote:

Hi Desmond,
Can you share a bit more?  What version of Drill are you running?
Java?  etc.   What data are you trying to query?
Best,
-- C




On Oct 10, 2023, at 2:02 AM, 
 wrote:

Hi Team,



When I run a Query for a limited number of records it returns the
results.
When I remove the limit on the Query it fails with below error. May
you please assist.





Exception: SQL_EXCEPTION java.sql.SQLException: CONNECTION ERROR:
Connection <-->  (user client) closed unexpectedly. Drillbit down?
[Error Id:
fba8461f-c896-4c56-b617-1625361e577d ]



Regards,

Muhluri












Re: ChannelClosedException

2023-10-13 Thread James Turton
Going further, it looks like your entire Drillbit is crashing so this is 
something severe, perhaps an out-of-memory situation. The sqlline.log 
file should provide some clues.


On 2023/10/13 09:53, James Turton wrote:
The errors that reach the surface in the embedded Drill CLI do not 
include all of the errors logged by Drill server processes. Please 
look for the file sqlline.log which by default will be in the log/ 
subdirectory of your Drill installation...


On 2023/10/12 15:03, muhl...@ntokoto.co.za wrote:

Hi,
I am running from the drill console, The application I want to use it 
on is QuerySurge. I am getting the same error message in both platforms.


Regards,
Muhluri

-Original Message-
From: James Turton 
Sent: Thursday, 12 October 2023 13:43
To: muhl...@ntokoto.co.za
Subject: Re: ChannelClosedException

Also, please look in the Drillbit logs. The query may be encountering 
invalid data and failing when you remove your LIMIT clause. Are you 
using DBeaver to submit your queries btw? That error looks familiar...


On 2023/10/10 15:58, Charles Givre wrote:

Hi Desmond,
Can you share a bit more?  What version of Drill are you running?  
Java?  etc.   What data are you trying to query?

Best,
-- C



On Oct 10, 2023, at 2:02 AM,  
 wrote:


Hi Team,



When I run a Query for a limited number of records it returns the 
results.

When I remove the limit on the Query it fails with below error. May
you please assist.





Exception: SQL_EXCEPTION java.sql.SQLException: CONNECTION ERROR:
Connection <-->  (user client) closed unexpectedly. Drillbit down? 
[Error Id:

fba8461f-c896-4c56-b617-1625361e577d ]



Regards,

Muhluri











Re: ChannelClosedException

2023-10-13 Thread James Turton
The errors that reach the surface in the embedded Drill CLI do not 
include all of the errors logged by Drill server processes. Please look 
for the file sqlline.log which by default will be in the log/ 
subdirectory of your Drill installation...


On 2023/10/12 15:03, muhl...@ntokoto.co.za wrote:

Hi,
I am running from the drill console, The application I want to use it on is 
QuerySurge. I am getting the same error message in both platforms.

Regards,
Muhluri

-Original Message-
From: James Turton 
Sent: Thursday, 12 October 2023 13:43
To: muhl...@ntokoto.co.za
Subject: Re: ChannelClosedException

Also, please look in the Drillbit logs. The query may be encountering invalid 
data and failing when you remove your LIMIT clause. Are you using DBeaver to 
submit your queries btw? That error looks familiar...

On 2023/10/10 15:58, Charles Givre wrote:

Hi Desmond,
Can you share a bit more?  What version of Drill are you running?  Java?  etc.  
 What data are you trying to query?
Best,
-- C




On Oct 10, 2023, at 2:02 AM,   
wrote:

Hi Team,



When I run a Query for a limited number of records it returns the results.
When I remove the limit on the Query it fails with below error. May
you please assist.





Exception: SQL_EXCEPTION java.sql.SQLException: CONNECTION ERROR:
Connection <-->  (user client) closed unexpectedly. Drillbit down? [Error Id:
fba8461f-c896-4c56-b617-1625361e577d ]



Regards,

Muhluri









Re: ChannelClosedException

2023-10-11 Thread James Turton
Also, please look in the Drillbit logs. The query may be encountering 
invalid data and failing when you remove your LIMIT clause. Are you 
using DBeaver to submit your queries btw? That error looks familiar...


On 2023/10/10 15:58, Charles Givre wrote:

Hi Desmond,
Can you share a bit more?  What version of Drill are you running?  Java?  etc.  
 What data are you trying to query?
Best,
-- C




On Oct 10, 2023, at 2:02 AM,   
wrote:

Hi Team,



When I run a Query for a limited number of records it returns the results.
When I remove the limit on the Query it fails with below error. May you
please assist.





Exception: SQL_EXCEPTION java.sql.SQLException: CONNECTION ERROR: Connection
<-->  (user client) closed unexpectedly. Drillbit down? [Error Id:
fba8461f-c896-4c56-b617-1625361e577d ]



Regards,

Muhluri







Re: column with default value not working for parquet files

2023-09-06 Thread James Turton
I don't think this feature is actually supported for Parquet files. How 
about defining a SQL view that includes a COALESCE(TRANAMT, 1.77) column 
expression?


On 2023/09/06 09:23, Prabhakar Bhosale wrote:

Hi Team,
I am trying to add a column with default into a parquet file by way of
defining a schema for the folder after enabling the metostore. I used the
following query to provide a schema. but the new column always returns null
value instead of giving default value. I am using drill 1.21.1

ANALYZE TABLE   table(dfs.tmp.`TESTCUST_1`
(type=>'parquet',schema=>'inline=(`TRANID`
VARCHAR,
 `CUST_ID` VARCHAR,
 `ACTID` VARCHAR,
 `TRANAMT` FLOAT NOT NULL properties {`DEFAULT` = `1.77`})')) REFRESH
METADATA;

REgards
Prabhakar





Re: table schema for parquet file is not working

2023-08-08 Thread James Turton
Okay I get the same result as you when I try with Drill 1.20.3 so I 
guess that there's a JSON reader bug that got fixed somewhere in between 
1.20.3 and 1.21.1. Do you need to stay on Drill 1.20 or can you upgrade 
to 1.21?


On 2023/08/07 08:32, Prabhakar Bhosale wrote:

hi James,
I re-tried the steps once again after looking at the output you shared with
me. But still I am not able to get the expected output on drill 1.20.1.
Then I downloaded drill 1.21.1 and tried the same steps. This time I got
the expected output.
So can you please try the same on drill 1.20.1 once if possible? thx

REgards
Prabhakar

On Wed, Aug 2, 2023 at 3:11 PM James Turton  wrote:


Hi!

I just got back from travelling. I ran a test and Drill did do what I
believe you're after.

Here's the test JSON file

➜  ~ cat /tmp/foo/bar.json
{"id":"T06125309","cust_id":"A20","num":"VAB6169028"}


And here's my Drill session.

apache drill> use dfs.tmp;
ok   true
summary  Default schema changed to [dfs.tmp]

apache drill (dfs.tmp)> select * from foo;
id   T06125309
cust_id  A20
num  VAB6169028

apache drill (dfs.tmp)> create schema (id varchar, cust_id varchar, num
varchar, tranamt double not null default '1.11'
) for table foo;
ok   true
summary  Created schema for [foo]

1 row selected (0.257 seconds)
apache drill (dfs.tmp)> select * from foo;
id   T06125309
cust_id  A20
num  VAB6169028
tranamt  1.11


On 2023/07/27 07:38, Prabhakar Bhosale wrote:

Hi James,
Any advice on the problem on schema for json file as mentioned in
my previous mail?

Regards
Prabhakar

On Mon, Jul 24, 2023 at 3:41 PM Prabhakar Bhosale  

wrote:


Hi James,

I tried the same on the JSON file and it is still not working.

Below is JSON file content

*{"id":"T06125309","cust_id":"A20","num":"VAB6169028"}*

The contents of the ".query.schema" file are as follows































*{  "table" : "mystore.`TEST_MOD`",  "schema" : {"type" :
"tuple_schema","columns" : [  {"name" : "ID","type"
: "VARCHAR","mode" : "OPTIONAL"  },  {"name" :
"CUST_ID","type" : "VARCHAR","mode" : "OPTIONAL"  },
   {"name" : "NUM","type" : "VARCHAR","mode" :
"OPTIONAL"  },  {"name" : "TRANAMT","type" :
"DOUBLE","mode" : "REQUIRED","properties" : {
"drill.default" : "1.11"}  }]  },  "version" : 1}*

When I fire the query below, I expect the value of TRANAMT column to be
*1.11* but it gives out value as *NULL*

*select  A.ID <http://A.ID> <http://A.ID>, A.CUST_ID, A.NUM, A.TRANAMT from
table(mystore.`TEST_MOD`(schema =>
'path=`/archived_files_nw/TEST_MOD/.drill.schema`')) A*

So essentially it is not considering the schema given at the query
execution time. Please let me know if I am doing anything incorrectly.


Thanks and Regards
Prabhakar


On Wed, Jul 12, 2023 at 4:33 PM James Turton  
 wrote:


Hi Prabhakar

  From what I recall, Drill won't consider a provided schema when
querying Parquet because Parquet files bundle their own schema. You
might need to use a SQL function like COALESCE(TRAN_AMOUNT, 1.11) and
possibly put that in a SQL view for reuse.

Regards
James

On 2023/07/11 18:40, Prabhakar Bhosale wrote:

Hi Team,
I am using drill 1.20.1 with parquet files.

I have two parquet files in a directory with one column missing in one
file. When I query the directory it gives me NULL values for all those

rows

which are from the file where that column is missing.

But I want a specific value for that column instead of NULL. So I
have created the schema as given below. But even after creating it is

still

returning the NULL value. Please let me know what is going wrong.

I have also ensured that storage.table.user_schema_file=true at system
level.

The files are stored on linux mount point.
The name of the missing column is "TRAN_AMOUNT".



The schema is as below

{
"table" : "archive.default.`executions`",
"schema" : {
  "type" : "tuple_schema",
  "columns" : [
{
  "name" : "EXEC_ID",
  "type" : "VARCHAR",
  "mode" : "OPTIONAL"
},
{
  "name" : "CUST_ID",
  "type" : "VARCHAR",
  "mode" : "OPTIONAL"
},
{
  "name" : "CELL_ID",
  "type" : "VARCHAR",
  "mode" : "OPTIONAL"
},
{
  "name" : "TRAN_AMOUNT",
  "type" : "FLOAT",
  "mode" : "REQUIRED",
  "properties" : {
"drill.default" : "1.11"
  }
}
  ]
},
"version" : 1
}








Re: table schema for parquet file is not working

2023-08-02 Thread James Turton

Hi!

I just got back from travelling. I ran a test and Drill did do what I 
believe you're after.


Here's the test JSON file

   |➜  ~ cat /tmp/foo/bar.json|||
   |||{"id":"T06125309","cust_id":"A20","num":"VAB6169028"}|||
   ||


And here's my Drill session.

   |apache drill> use dfs.tmp;|||
   |||ok   true|||
   |||summary  Default schema changed to [dfs.tmp]|||
   
   |||apache drill (dfs.tmp)> select * from foo;|||
   |||id   T06125309|||
   |||cust_id  A20|||
   |||num  VAB6169028|||
   
   |||apache drill (dfs.tmp)> create schema (id varchar, cust_id
   varchar, num varchar, tranamt double not null default '1.11'|||
   |||) for table foo;|||
   |||ok   true|||
   |||summary  Created schema for [foo]|||
   
   |||1 row selected (0.257 seconds)|||
   |||apache drill (dfs.tmp)> select * from foo;|||
   |||id   T06125309|||
   |||cust_id  A20|||
   |||num  VAB6169028|||
   |||tranamt  1.11|||
   ||


On 2023/07/27 07:38, Prabhakar Bhosale wrote:

Hi James,
Any advice on the problem on schema for json file as mentioned in
my previous mail?

Regards
Prabhakar

On Mon, Jul 24, 2023 at 3:41 PM Prabhakar Bhosale
wrote:


Hi James,

I tried the same on the JSON file and it is still not working.

Below is JSON file content

*{"id":"T06125309","cust_id":"A20","num":"VAB6169028"}*

The contents of the ".query.schema" file are as follows































*{  "table" : "mystore.`TEST_MOD`",  "schema" : {"type" :
"tuple_schema","columns" : [  {"name" : "ID","type"
: "VARCHAR","mode" : "OPTIONAL"  },  {"name" :
"CUST_ID","type" : "VARCHAR","mode" : "OPTIONAL"  },
   {"name" : "NUM","type" : "VARCHAR","mode" :
"OPTIONAL"  },  {"name" : "TRANAMT","type" :
"DOUBLE","mode" : "REQUIRED","properties" : {
"drill.default" : "1.11"}  }]  },  "version" : 1}*

When I fire the query below, I expect the value of TRANAMT column to be
*1.11* but it gives out value as *NULL*

*select  A.ID<http://A.ID>, A.CUST_ID, A.NUM, A.TRANAMT from
table(mystore.`TEST_MOD`(schema =>
'path=`/archived_files_nw/TEST_MOD/.drill.schema`')) A*

So essentially it is not considering the schema given at the query
execution time. Please let me know if I am doing anything incorrectly.


Thanks and Regards
Prabhakar


On Wed, Jul 12, 2023 at 4:33 PM James Turton  wrote:


Hi Prabhakar

  From what I recall, Drill won't consider a provided schema when
querying Parquet because Parquet files bundle their own schema. You
might need to use a SQL function like COALESCE(TRAN_AMOUNT, 1.11) and
possibly put that in a SQL view for reuse.

Regards
James

On 2023/07/11 18:40, Prabhakar Bhosale wrote:

Hi Team,
I am using drill 1.20.1 with parquet files.

I have two parquet files in a directory with one column missing in one
file. When I query the directory it gives me NULL values for all those

rows

which are from the file where that column is missing.

But I want a specific value for that column instead of NULL. So I
have created the schema as given below. But even after creating it is

still

returning the NULL value. Please let me know what is going wrong.

I have also ensured that storage.table.user_schema_file=true at system
level.

The files are stored on linux mount point.
The name of the missing column is "TRAN_AMOUNT".



The schema is as below

{
"table" : "archive.default.`executions`",
"schema" : {
  "type" : "tuple_schema",
  "columns" : [
{
  "name" : "EXEC_ID",
  "type" : "VARCHAR",
  "mode" : "OPTIONAL"
},
{
  "name" : "CUST_ID",
  "type" : "VARCHAR",
  "mode" : "OPTIONAL"
},
{
  "name" : "CELL_ID",
  "type" : "VARCHAR",
  "mode" : "OPTIONAL"
},
{
  "name" : "TRAN_AMOUNT",
  "type" : "FLOAT",
  "mode" : "REQUIRED",
  "properties" : {
"drill.default" : "1.11"
  }
}
  ]
},
"version" : 1
}





Re: table schema for parquet file is not working

2023-07-12 Thread James Turton

Hi Prabhakar

From what I recall, Drill won't consider a provided schema when 
querying Parquet because Parquet files bundle their own schema. You 
might need to use a SQL function like COALESCE(TRAN_AMOUNT, 1.11) and 
possibly put that in a SQL view for reuse.


Regards
James

On 2023/07/11 18:40, Prabhakar Bhosale wrote:

Hi Team,
I am using drill 1.20.1 with parquet files.

I have two parquet files in a directory with one column missing in one
file. When I query the directory it gives me NULL values for all those rows
which are from the file where that column is missing.

But I want a specific value for that column instead of NULL. So I
have created the schema as given below. But even after creating it is still
returning the NULL value. Please let me know what is going wrong.

I have also ensured that storage.table.user_schema_file=true at system
level.

The files are stored on linux mount point.
The name of the missing column is "TRAN_AMOUNT".



The schema is as below

{
   "table" : "archive.default.`executions`",
   "schema" : {
 "type" : "tuple_schema",
 "columns" : [
   {
 "name" : "EXEC_ID",
 "type" : "VARCHAR",
 "mode" : "OPTIONAL"
   },
   {
 "name" : "CUST_ID",
 "type" : "VARCHAR",
 "mode" : "OPTIONAL"
   },
   {
 "name" : "CELL_ID",
 "type" : "VARCHAR",
 "mode" : "OPTIONAL"
   },
   {
 "name" : "TRAN_AMOUNT",
 "type" : "FLOAT",
 "mode" : "REQUIRED",
 "properties" : {
   "drill.default" : "1.11"
 }
   }
 ]
   },
   "version" : 1
}





Re: Apache Drill - Unable to start Drill in distributed mode (In GCP Dataproc)

2023-05-30 Thread James Turton

Hi!

Please see my comment on your StackOverflow post. After looking at the 
provisioning scripts that you're using, I don't think you should be 
trying to start Drillbits manually.


Regards
James

On 2023/05/29 16:28, Vigneswaran S wrote:

Dear Apache Drill Team,

I am trying to run Apache Drill in distributed mode on Google Cloud 
Dataproc, but unable to start drillbit on each node in the cluster.


I have created a basic cluster (1 master, 2 worker) with GCP Dataproc 
service, using the initialization scripts and instructions provided in 
the Apache Drill website.


https://drill.apache.org/docs/installing-drill-in-distributed-mode-with-gcp-dataproc/ 



Apache Drill 1.19.0 and Apache Zookeeper 3.6.3 versions were 
configured in the setup script. The cluster provisioning in Dataproc 
was successful and I am able to connect with each node using SSH. When 
I tried to check the status of Zookeeper using telnet localhost 2181 
and entering stats, it is showing the following


zookeeper.png

Then, I try to start drillbit service on each node using the command 
bin/drillbit.sh start as mentioned here


https://drill.apache.org/docs/starting-drill-in-distributed-mode/

then it shows

Starting drillbit, logging to /opt/drill/log/drillbit.out

When I check the status of drill using bin/drillbit.sh status, it displays

/opt/drill/drillbit.pid file is present but drillbit is not running.

When I try to access Drill web UI public_ip_addr:8047 using public ip 
address of any node, it gives "can’t establish a connection to the 
server". So it is unclear whether drill is running or not. Note: I 
have opened port 8047 under firewall rules


Kindly provide help on how to resolve the issue and set up Apache 
Drill in distributed mode on GCP.


Regards,
Vigneswaran S
vigneswaran@gmail.com



Re: Apache drill query plan: cumulative cost

2023-05-04 Thread James Turton
The cost you've found is just an estimate produced by the planner and 
not a good number to performance tune with. I recommend reviewing the 
fragment and operator metrics written to the query profile after 
execution. If you are further able to isolate the query so that it's the 
only thing running then OS and JVM counters can also be recorded.


On 2023/05/04 11:09, Prabhakar Bhosale wrote:

Hi Team,
I am trying to understand what is the IO and network cost of the query? I
could find cumulative cost but was not able to interpret it.
Please advise or any documentation pointer will help. thx

REgards
Prabhakar





Re: [ANNOUNCE] Apache Drill 1.21.1 Released

2023-05-03 Thread James Turton
Thanks, that's right: we had to upgrade Calcite to fix a regression 
affecting date functions. Doing that brought in support for the QUALIFY 
clause in window functions.


On 2023/04/30 16:40, Charles Givre wrote:

Great work everyone!
One thing to note is that hidden in this bug fix release is the update to 
Calcite 1.34, which also includes bug fixes to the date functions as well as 
adding some new functionality to Drill!  James, I couldn't remember what new 
operators it added but would you mind please sending a note to the list?  I can 
work on updating the docs.
Best,
-- C




On Apr 30, 2023, at 6:37 AM, James Turton  wrote:

On behalf of the Apache Drill community, I am happy to announce the release of 
Apache Drill 1.21.1.

Drill is an Apache open-source SQL query engine for Big Data exploration.
Drill is designed from the ground up to support high-performance analysis
on the semi-structured and rapidly evolving data coming from modern Big
Data applications, while still providing the familiarity and ecosystem of
ANSI SQL, the industry-standard query language. Drill provides
plug-and-play integration with existing Apache Hive and Apache HBase
deployments.

For information about Apache Drill, and to get involved, visit the project 
website [1].

A total of 12 JIRA's are resolved in this bugfix release of Drill. For the full 
list please see release notes [2].

The binary and source artifacts are available here [3].

Thanks to everyone in the community who contributed to this release!

1. https://drill.apache.org/
2. https://drill.apache.org/docs/apache-drill-1-21-1-release-notes/
3. https://drill.apache.org/download/





[ANNOUNCE] Apache Drill 1.21.1 Released

2023-04-30 Thread James Turton
On behalf of the Apache Drill community, I am happy to announce the 
release of Apache Drill 1.21.1.


Drill is an Apache open-source SQL query engine for Big Data exploration.
Drill is designed from the ground up to support high-performance analysis
on the semi-structured and rapidly evolving data coming from modern Big
Data applications, while still providing the familiarity and ecosystem of
ANSI SQL, the industry-standard query language. Drill provides
plug-and-play integration with existing Apache Hive and Apache HBase
deployments.

For information about Apache Drill, and to get involved, visit the 
project website [1].


A total of 12 JIRA's are resolved in this bugfix release of Drill. For 
the full list please see release notes [2].


The binary and source artifacts are available here [3].

Thanks to everyone in the community who contributed to this release!

1. https://drill.apache.org/
2. https://drill.apache.org/docs/apache-drill-1-21-1-release-notes/
3. https://drill.apache.org/download/



[RESULT] [VOTE] Release Apache Drill 1.21.1 RC0

2023-04-28 Thread James Turton
The vote passes. Thanks to everyone who has tested the release candidate 
and given their comments and votes. Final tally:


3x +1 (binding): Charles, James, Vova

1x +1 (non-binding): Maksym

No 0s or -1s.

I'll complete the release this weekend.

Regards
James


Re: [VOTE] Release Apache Drill 1.21.1 - RC0

2023-04-28 Thread James Turton

Thanks Maksym, and good catch!

On 2023/04/28 15:38, Rumar, Maksym wrote:

Hello all,

Started Drill from the tarball (RC0) and ran different queries. LGTM +1

By the way, the link in the James mail leads to an incorrect RC version - RC1, 
but we are voting for RC0 now.

Regards,
Maksym

Від: Charles Givre 
Надіслано: 25 квітня 2023 р. 15:09
Кому: d...@drill.apache.org 
Копія: user@drill.apache.org 
Тема: Re: [VOTE] Release Apache Drill 1.21.1 - RC0

Hello all,
Built from source, ran various queries. LGTM +1 (Binding)
-- C




On Apr 19, 2023, at 5:26 AM, James Turton  wrote:

Hi all

I'd like to propose the first release candidate (RC0) of Apache Drill, version 
1.21.1.

The release candidate covers a total of 12 resolved Jira issues [1]. Thanks to 
everyone who contributed to this release.

The tarball artefacts are hosted at [2] and the Maven artefacts are hosted at 
[3].

This release candidate is based on commit 
6da4b77ceb1510df4de4a2cfcc43a536b99f37ab located at [4].

[ ] +1
[ ] +0
[ ] -1

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12352949
[2] https://dist.apache.org/repos/dist/dev/drill/1.21.1-rc1/
[3] https://repository.apache.org/content/repositories/orgapachedrill-1105/
[4] https://github.com/jnturton/drill/commits/drill-1.21.1






[VOTE] Release Apache Drill 1.21.1 - RC0

2023-04-19 Thread James Turton

Hi all

I'd like to propose the first release candidate (RC0) of Apache Drill, 
version 1.21.1.


The release candidate covers a total of 12 resolved Jira issues [1]. 
Thanks to everyone who contributed to this release.


The tarball artefacts are hosted at [2] and the Maven artefacts are 
hosted at [3].


This release candidate is based on commit 
6da4b77ceb1510df4de4a2cfcc43a536b99f37ab located at [4].


[ ] +1
[ ] +0
[ ] -1

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12352949

[2] https://dist.apache.org/repos/dist/dev/drill/1.21.1-rc1/
[3] https://repository.apache.org/content/repositories/orgapachedrill-1105/
[4] https://github.com/jnturton/drill/commits/drill-1.21.1


Re: Improve performance for "PARQUET_ROW_GROUP_SCAN"

2023-03-22 Thread James Turton
To increase the minor fragment count set the option 
planner.cpu_load_average. You can also increase the number of concurrent 
Parquet reader threads using store.parquet.reader.columnreader.async.


However, since your tests with faster compression codecs showed no 
improvements I think that your query is probably memory bandwidth bound, 
a common state of affairs for a single node cluster. To add more memory 
bandwidth to your cluster you'll need to scale horizontally e.g. 2x 16Gb 
Drillbits instead of 1x 32Gb Drillbit.


What do you see for TIME_DISK_SCAN? If that's also small then

On 2023/03/18 06:36, Prabhakar Bhosale wrote:

Hi James,
Thanks for your detail guidance. Please see my findings below

*You wrote*: GZip compresses very well but uses a lot of CPU during 
compression and

decompression. Try running a test with store.parquet.compression =
'zstd' (introduced in Drill 1.20.0). You can use CTAS statements in
Drill to create Parquet files compressed with Zstandard.
*Me*: I tried both lz4 and zstd also, but none of them seems to be 
giving any better results. Lz4 give some improvement but not considerable.
In Operator metrics, lz4 is faster in decompression but that time is 
nullified by time_load_datepage and time_to_decode_datapage

For zstd the decompression time is same as that of gzip
On CPU utilization - querying to all 3 types of compressed files 
almost utilizes similar CPU,


*You Wrote:* If some columns or row groups need not be scanned, ensure 
that they are

being excluded by the query.
*Me: *Yes, this I had already tried and it improved the performance 
considerably. I had to sort the data while creating parquet files


*You Wrote: *Ensure that your Parquet files have been partitioned to a 
suitable size,

normally somewhere between 250 and 1000Mb.
*Me: *No changes made to drill defaults

*You Wrote:* For some data, setting store.parquet.use_new_reader = 
false will be

significantly faster.
*Me: *I am using drill 1.20.1, in this version, for this options it is 
written that "NOt supported in this version" and the value is false. I 
tried after making it true and the query could not complete even after 
3 times of duration taken for gzip. so I think, this is not useful for 
my data.


*You Wrote: *If profiling the Drillbits doing the scans reveals that 
they are waiting

for data due to limited I/O throughput then consider faster storage.
E.g. Data locality in HDFS can be exploited by Drill to achieve higher
throughput.
*Me: *Operator Metrics "TIME_DISK_SCAN_WAIT" is less than 0.2 sec, so 
i don't think disk I/O bottleneck here.


*My additional observations are*
1. The operator metrics "TIME_VARCOLUMN_READ" is taking 19+ seconds as 
most of the columns query reads are of VARCHAR. is there any way to 
improve upon this?
2. the "numfiles" reported in physical plan is different for all 3 
compression for same exact data and same query. The numfiles for 
gzip-109, lz4 - 161 and zstd-152. I was expecting this should be same 
for all 3 compression formats. Same is the case with NUM_ROWGROUPS 
operator
3. The minor fragments created under "PARQUET_ROW_GROUP_SCAN" are 6. I 
assume these are the number of parallel threads created to select 
data. Is there any setting that can allow me to create more minor 
fragments for this operator?


Thanks for reading this long email.

Regards
Prabhakar



On Mon, Mar 13, 2023 at 7:44 PM James Turton  wrote:

GZip compresses very well but uses a lot of CPU during compression
and
decompression. Try running a test with store.parquet.compression =
'zstd' (introduced in Drill 1.20.0). You can use CTAS statements in
Drill to create Parquet files compressed with Zstandard.

If some columns or row groups need not be scanned, ensure that
they are
being excluded by the query.

Ensure that your Parquet files have been partitioned to a suitable
size,
normally somewhere between 250 and 1000Mb.

For some data, setting store.parquet.use_new_reader = false will be
significantly faster.

If profiling the Drillbits doing the scans reveals that they are
waiting
for data due to limited I/O throughput then consider faster storage.
E.g. Data locality in HDFS can be exploited by Drill to achieve
higher
throughput.


On 2023/03/07 08:17, Prabhakar Bhosale wrote:
> hi team,
> I have compressed (gzip) parquet files created with apache
drill. the total
> folder size is 7.8gb and the number of rows are 116,249,263. the
query
> takes 2min 18sec.
> Most of the time is spent on "PARQUET_ROW_GROUP_SCAN". Is there
any way to
> improve this performance?
> i am using
> Drill - 1.20
> CPU - 8 core
> mem - 16gb
>
> I also tried increasing memory to 32GB but no much difference. I
also tried
> certain recommendat

Re: Improve performance for "PARQUET_ROW_GROUP_SCAN"

2023-03-13 Thread James Turton
GZip compresses very well but uses a lot of CPU during compression and 
decompression. Try running a test with store.parquet.compression = 
'zstd' (introduced in Drill 1.20.0). You can use CTAS statements in 
Drill to create Parquet files compressed with Zstandard.


If some columns or row groups need not be scanned, ensure that they are 
being excluded by the query.


Ensure that your Parquet files have been partitioned to a suitable size, 
normally somewhere between 250 and 1000Mb.


For some data, setting store.parquet.use_new_reader = false will be 
significantly faster.


If profiling the Drillbits doing the scans reveals that they are waiting 
for data due to limited I/O throughput then consider faster storage. 
E.g. Data locality in HDFS can be exploited by Drill to achieve higher 
throughput.



On 2023/03/07 08:17, Prabhakar Bhosale wrote:

hi team,
I have compressed (gzip) parquet files created with apache drill. the total
folder size is 7.8gb and the number of rows are 116,249,263. the query
takes 2min 18sec.
Most of the time is spent on "PARQUET_ROW_GROUP_SCAN". Is there any way to
improve this performance?
i am using
Drill - 1.20
CPU - 8 core
mem - 16gb

I also tried increasing memory to 32GB but no much difference. I also tried
certain recommendations given in drill documentation but with no success.

Any pointer/help is highly appreciated. thx

REgards
Prabhakar





[ANNOUNCE] Apache Drill 1.21.0 Released

2023-02-21 Thread James Turton
On behalf of the Apache Drill community, I am happy to announce the 
release of Apache Drill 1.21.0.


Drill is an Apache open-source SQL query engine for Big Data exploration.
Drill is designed from the ground up to support high-performance analysis
on the semi-structured and rapidly evolving data coming from modern Big
Data applications, while still providing the familiarity and ecosystem of
ANSI SQL, the industry-standard query language. Drill provides
plug-and-play integration with existing Apache Hive and Apache HBase
deployments.

For information about Apache Drill, and to get involved, visit the 
project website [1].


Total of 110 JIRAs are resolved in this release of Drill with following
new features and improvements [2]:

 * A major upgrade of the parsing and planning library Calcite from
   1.21 to 1.33 enabled by the elimination of Drill’s fork of Calcite.
 * Upgrades of most format plugins to the internal EVF2 reader
   framework included support for provided schemas.
 * A new native Drill storage plugin enabling “Drill-on-Drill”
   federated deployments.
 * INSERT support, currently in the JDBC, Splunk and Google Sheets plugins.
 * New SQL syntax including filtered aggregates, PIVOT, UNPIVOT,
   INTERSECT and EXCEPT.
 * Support for new authentication modes in storage plugins including
   user translation for using different external credentials for
   different Drill users.
 * An overhaul of the implicit type casting logic for a more consistent
   user experience.
 * New functions and storage plugins including Delta Lake, Google
   Sheets, MS Access, threat hunting functions and statistical
   distribution functions.


For the full list please see release notes [3].

The binary and source artifacts are available here [4].

Thanks to everyone in the community who contributed to this release!

1. https://drill.apache.org/
2. https://drill.apache.org/blog/2023/02/21/drill-1.21.0-released/
3. https://drill.apache.org/docs/apache-drill-1-21-0-release-notes/
4. https://drill.apache.org/download/

[RESULT] [VOTE] Release Apache Drill 1.21.0 RC0

2023-02-20 Thread James Turton
The vote passes. Thanks to everyone who has tested the release candidate 
and given their comments and votes. Final tally:


3x +1 (binding): Charles, James, Vova

1x +1 (non-binding): Jingchuan

No 0s or -1s.

I got a bit delayed preparing needed new docs for 1.21 by some sort of 
cold or flu that went through my family, but as soon as I've caught up 
I'll start process for pushing the release artifacts and send an 
announcement once propagated. 1.21 RC0, which will be promoted to 1.21 
without modification, remains available for testing and experimentation.


Kind regards
James


[VOTE] Release Apache Drill 1.21.0 - RC0

2023-02-15 Thread James Turton

Hi all,

I'd like to propose the first release candidate (RC0) of Apache Drill, 
version 1.21.0.


The release candidate covers a total of 106 resolved JIRAs [1]. Thanks 
to everyone who contributed to this release.


The tarball artifacts are hosted at [2] and the maven artifacts are 
hosted at [3].


This release candidate is based on commit 
ff86c29b125776c56f361fc9187fb96458de37a6 located at [4].


Please download and try out the release, especially using JDK 8 because 
this release was built with JDK 17.


The vote ends at Sat 18 Feb 2023 12:00:00 UTC.

[ ] +1
[ ] +0
[ ] -1

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12335462

[2] https://github.com/jnturton/drill/releases/tag/drill-1.21.0
[3] https://repository.apache.org/content/repositories/orgapachedrill-1104/
[4] https://github.com/jnturton/drill/commits/drill-1.21.0


Re: Advice execution plan

2023-01-31 Thread James Turton
If you frequently need to operate on a particular subset of records that 
is much smaller than the total dataset (such as might be selected by 
ORG_ID = '01' and CUST_CUS_TYPE = 'DD') and that is weakly correlated 
with your partitioning scheme then you should consider a storage back 
end that supports the creation of secondary indexes. Parquet itself 
doesn't, but you do have the option of maintaining a derivative Parquet 
dataset that is already sorted and perhaps even already aggregated by 
the columns of interest. Alternatively, take a look at the secondary 
indexes provided by systems like Apache Phoenix[1]. Drill has strong 
support for pushing SQL down to Phoenix.


1. https://phoenix.apache.org/secondary_indexing.html

On 2023/01/28 07:52, Prabhakar Bhosale wrote:

Dear Team,
Any advice?


Regards
Prabhakar

On Thu, Jan 26, 2023 at 5:28 PM Prabhakar Bhosale 
wrote:


Dear Team,

I am querying around 250 parquet files with a total size 23gb . total
records are 140 million.
The query is taking around 3.5 minutes to retrieve certain records. From
the execution plan it is very obvious that the cost is very high.
Any advice on performance improvement is highly appreciated. Below is the
query execution plan from profile. Please guide

Drill -.1.20.1
OS - RHEL

{
   "head" : {
 "version" : 1,
 "generator" : {
   "type" : "ExplainHandler",
   "info" : ""
 },
 "type" : "APACHE_DRILL_PHYSICAL",
 "options" : [ {
   "kind" : "LONG",
   "accessibleScopes" : "ALL",
   "name" : "exec.query.max_rows",
   "num_val" : 0,
   "scope" : "QUERY"
 } ],
 "queue" : 0,
 "hasResourcePlan" : false,
 "resultMode" : "EXEC"
   },
   "graph" : [ {
 "pop" : "jdbc-scan",
 "@id" : 327683,
 "sql" : "SELECT \"CUSTCODE\", \"CUST_DEC\"\nFROM
\"CUSTD\".\"CUST_CODE_T\"\nWHERE \"CUST_CUS_TYPE\" = 'DD' AND \"ORG_ID\" =
'01' AND (\"DEL_FLG\" <> 'Y' AND \"CUSTCODE\" IS NOT NULL)",
 "columns" : [ "`CUSTCODE`", "`CUST_DEC`" ],
 "config" : {
   "type" : "jdbc",
   "driver" : "oracle.jdbc.OracleDriver",
   "url" : "jdbc:oracle:thin:X_X",
   "caseInsensitiveTableNames" : true,
   "writerBatchSize" : 1,
   "enabled" : true
 },
 "userName" : "",
 "cost" : {
   "memoryCost" : 1.6777216E7,
   "outputRowCount" : 1.0125
 }
   }, {
 "pop" : "external-sort",
 "@id" : 327682,
 "child" : 327683,
 "orderings" : [ {
   "order" : "ASC",
   "expr" : "`CUSTCODE`",
   "nullDirection" : "LAST"
 } ],
 "reverse" : false,
 "initialAllocation" : 2000,
 "maxAllocation" : 100,
 "cost" : {
   "memoryCost" : 16.2,
   "outputRowCount" : 1.0125
 }
   }, {
 "pop" : "streaming-aggregate",
 "@id" : 327681,
 "child" : 327682,
 "keys" : [ {
   "ref" : "`CUSTCODE`",
   "expr" : "`CUSTCODE`"
 } ],
 "exprs" : [ {
   "ref" : "`$f1`",
   "expr" : "single_value(`CUST_DEC`) "
 } ],
 "initialAllocation" : 100,
 "maxAllocation" : 100,
 "cost" : {
   "memoryCost" : 1.6777216E7,
   "outputRowCount" : 0.50625
 }
   }, {
 "pop" : "broadcast-exchange",
 "@id" : 196611,
 "child" : 327681,
 "initialAllocation" : 100,
 "maxAllocation" : 100,
 "cost" : {
   "memoryCost" : 1.6777216E7,
   "outputRowCount" : 1.0
 }
   },
   {
 "pop" : "parquet-scan",
 "@id" : 196618,
 "userName" : "R",
 "entries" : [ {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~270~2019~JULY~122/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~270~2019~JULY~75/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~300~2019~JULY~118/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~270~2019~JULY~85/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~270~2019~JULY~6/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~300~2019~JULY~27/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~270~2019~JULY~56/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~270~2019~JULY~29/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~270~2019~JULY~80/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~300~2019~JULY~50/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~270~2019~JULY~50/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~300~2019~JULY~2/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_files/UAT/JOB_TABLE/2019/7/JOB_TABLE~256~270~2019~JULY~92/0_0_0.parquet"
 }, {
   "path" :
"/llog/log_fi

[ANNOUNCE] Apache Drill 1.20.3 Released

2023-01-07 Thread James Turton
On behalf of the Apache Drill community, I am happy to announce the 
release of Apache Drill 1.20.3.


Drill is an Apache open-source SQL query engine for Big Data exploration.
Drill is designed from the ground up to support high-performance analysis
on the semi-structured and rapidly evolving data coming from modern Big
Data applications, while still providing the familiarity and ecosystem of
ANSI SQL, the industry-standard query language. Drill provides
plug-and-play integration with existing Apache Hive and Apache HBase
deployments.

For information about Apache Drill, and to get involved, visit the 
project website [1].


A total of 30 JIRA's are resolved in this bugfix release of Drill. For 
the full list please see release notes [2].


The binary and source artifacts are available here [3].

Thanks to everyone in the community who contributed to this release!

1. https://drill.apache.org/
2. https://drill.apache.org/docs/apache-drill-1-20-3-release-notes/
3. https://drill.apache.org/download/



[RESULT] [VOTE] Release Apache Drill 1.20.3 RC0

2023-01-07 Thread James Turton
The vote passes. Thanks to everyone who has tested the release candidate 
and given their comments and votes. Final tally:


3x +1 (binding): Charles, James, Vova

1x +1 (non-binding): Jingchuan

No 0s or -1s.

I'll start process for pushing the release artifacts and send an 
announcement once propagated.


Kind regards
James


Re: About a "DRILL-5033" fix

2023-01-02 Thread James Turton

Hi Marc

Pushing to the main repo is locked down and the procedure to follow is 
to fork the main repo to your own GitHub account, then push to your 
fork, then open a PR from your fork to the main repo.


On 12/29/22 14:39, marc nicole wrote:

Hi,

Thanks for your reply,
I actually want to submit my changes, but I am being denied to push any
changes to the Drill repo. How to do the pull request in Git ? Are there
any permissions required to get beforehand pushing to the repo ?


Le mer. 28 déc. 2022 à 15:46, Charles Givre  a écrit :


Hi Marc,
Thanks for this.  Here's the thing... Let's say you have json that looks
like this:

{
 "foo":null
},{
 "foo": 3.5
}

If you take the approach that `null` is treated like a string, you will
get a schema change exception when you read the next row.  Our current
approach is to basically ignore fields that Drill cannot figure out what
they are in terns of data type.  Once Drill encounters a data type, it will
then assign a data type to that column.  See the example below which is
from DRILL-5033.  I added a second row to demonstrate what happens once
Drill is able to determine a data type.  Note that for the columns with a
defined value in the second row, Drill returns 'null' as the value.


[{
"intKey" : null,
"bgintKey": null,
"strKey": null,
"boolKey": null,
"fltKey": null,
"dblKey": null,
"timKey": null,
"dtKey": null,
"tmstmpKey": null,
"intrvldyKey": null,
"intrvlyrKey": null
},
{
"intKey" : 1,
"bgintKey": 3666565464,
"strKey": "hithere",
"boolKey": true,
"fltKey": 3.5,
"dblKey": 4.2,
"timKey": null,
"dtKey": null,
"tmstmpKey": null,
"intrvldyKey": null,
"intrvlyrKey": null
}]


select * from dfs.test.`nulls.json`;

++---+-+-++++---+---+-+-+
| intKey |   bgintKey| strKey  | boolKey | fltKey | dblKey | timKey |
dtKey | tmstmpKey | intrvldyKey | intrvlyrKey |

++---+-+-++++---+---+-+-+
| null   | null  | null| null| null   | null   | [] |
[]| []| []  | []  |
| 1.0| 3.666565464E9 | hithere | true| 3.5| 4.2| [] |
[]| []| []  | []  |

++---+-+-++++---+---+-+-+
2 rows selected (0.232 seconds)

You are definitely welcome to submit a pull request, however this area is
extremely complex, and I'd suspect that what you propose will break other
unit tests.  Another option which you might not be aware of is providing a
schema.  If you do that from the beginning, then Drill will know what data
types to expect.

Best,
-- C



On Dec 28, 2022, at 8:57 AM, marc nicole  wrote:

Hello Drillers :)

I came across the aforementioned bug (DRILL-5033) and wanted to

contribute.

My attempt is to consider a *null *token as a *string *and print the

"null"

as the column value instead of omitting the key in the output
resultset, details
of the fix attempt is below:


*1)* In JsonReader.java (java-exec/drill-exec/vector/complex/fn/) at line
283 i add the following:


...
case VALUE_NULL:
  // handle null as string
  handleString(parser, map, fieldName);
  break;
...


*2)* then at line 415 the handleString() becomes:

private void handleString(JsonParser parser, MapWriter writer, String

fieldName) throws IOException {
try {
 // added the following if
  if (parser.nextToken() == VALUE_NULL)
writer.varChar(fieldName)
  .writeVarChar(0, workingBuffer.prepareVarCharHolder("null"),
workingBuffer.getBuf());
  else
  writer.varChar(fieldName)
  .writeVarChar(0,
workingBuffer.prepareVarCharHolder(parser.getText()),
workingBuffer.getBuf());
} catch (IllegalArgumentException e) {
  if (parser.getText() == null || parser.getText().isEmpty()) {
   // return;
  }
  throw e;
}
  }



Is this a possible fix to the mentioned bug?
If yes should i pull request ?

Thanks.






Re: JDBC driver compliance

2023-01-02 Thread James Turton

Hi Prabhakar

You can check the JDBC version that a driver supports using the methods 
getMetaData().getJDBCMajorVersion() and 
getMetaData().getJDBCMinorVersion(). Doing this with Drill driver that 
will show you that it supports version 4.1 of the specification. Note 
that parts of the JDBC spec are optional and Drill only supports those 
of them that are relevant for it.


Regards
James

On 1/2/23 08:59, Prabhakar Bhosale wrote:

Dear Team,
Does Apache drill JDBC driver comply with JDBC 4.0 specification?

Regards
Prabhakar





Re: [VOTE] Release Apache Drill 1.20.3 - RC0

2022-12-27 Thread James Turton

Herewith an attempt to unmangle the formatting of my original email :(.

Hi all,

I'd like to propose the first release candidate (RC0) of Apache Drill, 
version 1.20.3. This is very likely to
be the final update to the 1.20.x series given that 1.21 is expected in 
the near future.


The release candidate covers a total of 30 resolved JIRAs [1]. Thanks to 
everyone who contributed to this release

and to Jingchuan Hu for some nontrivial backporting of fixes.

The tarball artifacts are hosted at [2] and the maven artifacts are 
hosted at [3]. The unit test run based on the

final merge into the 1.20 branch may be viewed here [4].

This release candidate is based on commit 
f49d553d71db553fda2fc330f9a3f05d7dcdd17f located at [5].


Please download and try out the release.

[ ] +1
[ ] +0
[ ] -1

Here's my vote: +1

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12352165

[2] https://dist.apache.org/repos/dist/dev/drill/1.20.3-rc0
[3] https://repository.apache.org/content/repositories/orgapachedrill-1103/
[4] https://github.com/apache/drill/actions/runs/3780949700
[5] https://github.com/jnturton/drill/releases/tag/drill-1.20.3

Season's greetings
James


[VOTE] Release Apache Drill 1.20.3 - RC0

2022-12-27 Thread James Turton
|Hi all, I'd like to propose the first release candidate (RC0) of Apache 
Drill, version 1.20.3. This is very likely to be the final update to the 
1.20.x series given that 1.21 is expected in the near future. The 
release candidate covers a total of 30 resolved JIRAs [1]. Thanks to 
everyone who contributed to this release and to Jingchuan Hu for some 
nontrivial backporting of fixes. The tarball artifacts are hosted at [2] 
and the maven artifacts are hosted at [3]. The unit test run based on 
the final merge into the 1.20 branch may be viewed here [4]. This 
release candidate is based on commit ||f49d553d71db553fda2fc330f9a3f05d7dcdd17f located at [5]. Please 
download and try out the release. [ ] +1 [ ] +0 [ ] -1 Here's my vote: 
+1 [1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12352165 
[2] https://dist.apache.org/repos/dist/dev/drill/1.20.3-rc0 [3] 
https://repository.apache.org/content/repositories/orgapachedrill-1103/ 
[4] https://github.com/apache/drill/actions/runs/3780949700 [5] 
https://github.com/jnturton/drill/releases/tag/drill-1.20.3 Season's 
greetings James |


Re: How to query a Json file without using JDBC connector?

2022-11-27 Thread James Turton
The first hop in the sequence Java App -> Drill -> JSON File is your 
application communicating with Drill using the Drill JDBC driver and 
standard JDBC API calls like createStatement() and execute() to send SQL 
queries to Drill. Those queries will be run over JSON files if that's 
what you've put in the FROM clause or, for other FROM clauses, they will 
be run over external RDBMSes or CSV files or Kafka etc.


It does not matter if you have no relational database, Drill presents a 
relational view of files, e.g. JSON, via SQL.


https://drill.apache.org/docs/using-the-jdbc-driver/#example-of-connecting-to-drill-programmatically

On 2022/11/27 11:32, marc nicole wrote:

*"Drill only runs queries written in SQL."*

I know and i want to use an SQL written query and apply it on a JSON file
all using Java syntax.

*" You can send that SQL from your Java application to Drill using JDBC or
Drill's REST API"*

How to do this using Java? what is the required code ? executeQuery on a
Statement using a Connection won't work since i don't use a database to go
in the direction of JDBC but instead use a simple JSON file

Le sam. 26 nov. 2022 à 10:51, marc nicole  a écrit :


Hi,
Thanks,

*"Drill only runs queries written in SQL."*

I know and i want to use an SQL written query and apply it on a JSON file
all using Java syntax.

*" You can send that SQL from your Java application to Drill using JDBC or
Drill's REST API"*

How to do this using Java? what is the required code ? executeQuery on a
Statement using a Connection won't work since i don't use a database to go
in the direction of JDBC but instead use a simple JSON file

Le ven. 25 nov. 2022 à 09:25, James Turton  a écrit :


my data files could get big. Is Drill Spark integration a solution in that
case?

Drill remains a solution if your data gets big because it scales
horizontally like Spark. You will have to replace the Windows Desktop
folder with some scalable, network enabled storage, however, irrespective
of which query engine you choose. Neither Drill nor Spark provide a storage
layer themselves but compatible options include HDFS and S3.

After setting the workspace to query the file system, how to execute such
query in Java syntax?

Drill only runs queries written in SQL. You can send that SQL from your
Java application to Drill using JDBC or Drill's REST API. If you prefer to
generate the SQL from object oriented Java expressions, take a look at
jOOQ <https://www.jooq.org/>. There might be a little dialect work
required to make jOOQ fully compatible with Drill but (a) we'd be prepared
to help you with that and (b) Drill's SQL dialect is by and large vanilla
ANSI SQL:2003.

Regards
James

On 2022/11/25 09:54, marc nicole wrote:

Hi,

After setting the workspace to query the file system, how to execute such
query in Java syntax?

Le ven. 25 nov. 2022 à 02:25, Charles Givre  
 a écrit :


Hi Marc,
I should have asked, are you running Drill on a single windows machine?
If so, Drill will be able to query anything you throw at it.  If your data
starts to get bigger than a single machine can handle, you'll need to set
up a Drill cluster with multiple nodes.  This is no different than Spark. I
would suggest using Drill to convert the data to parquet format.  Often you
can achieve a 10x reduction in file size and extreme improvements in query
speed.

As for configuring Drill, take a look 
here:https://drill.apache.org/docs/workspaces/.   This explains how to set up
a workspace. What you'll want to do is set the workspace to the path to
your desktop.   Then you can query the files as noted below.
Best,
-- C






On Nov 24, 2022, at 6:05 PM, marc nicole  
 wrote:

also how to execute such queries as  SELECT *
FROM dfs.desktop.`file.json` in Java ?

Le jeu. 24 nov. 2022 à 23:31, Charles Givre  
 a écrit :


Hi Marc,
Welcome to Drill!  Firstly, take a look at the docs for querying a file
system:

https://drill.apache.org/docs/querying-a-file-system-introduction/

When you start up drill out of the box, there is a connector called dfs
which points to the local filesystem.  You can configure a workspace to
your desktop folder, then all you have to do is write a query like:

SELECT *
FROM dfs.desktop.`file.json`

If you're looking to do this programmatically from Java and your data
isn't too big, the easiest way is probably to use Drill's REST API 
(https://drill.apache.org/docs/rest-api-introduction/).  You can make a
simple HtTP call to Drill and get the data that way.

Hope this helps!
-- C




On Nov 24, 2022, at 5:02 PM, marc nicole  
 wrote:

Hi,

I want to query a JSON file placed in Desktop folder (Windows).
How to do that in Java ?

PS: i saw this type of code :

Connection con = null;

 con = new Driver().connect(DRILL_JDBC_LOCAL_URI,

getDefaultProperties());

 Statement stmt = con.createStatement();
 ResultSet rs

Re: How to query a Json file without using JDBC connector?

2022-11-25 Thread James Turton

my data files could get big. Is Drill Spark integration a solution in that
case?
Drill remains a solution if your data gets big because it scales 
horizontally like Spark. You will have to replace the Windows Desktop 
folder with some scalable, network enabled storage, however, 
irrespective of which query engine you choose. Neither Drill nor Spark 
provide a storage layer themselves but compatible options include HDFS 
and S3.



After setting the workspace to query the file system, how to execute such
query in Java syntax?
Drill only runs queries written in SQL. You can send that SQL from your 
Java application to Drill using JDBC or Drill's REST API. If you prefer 
to generate the SQL from object oriented Java expressions, take a look 
at jOOQ . There might be a little dialect work 
required to make jOOQ fully compatible with Drill but (a) we'd be 
prepared to help you with that and (b) Drill's SQL dialect is by and 
large vanilla ANSI SQL:2003.


Regards
James

On 2022/11/25 09:54, marc nicole wrote:

Hi,

After setting the workspace to query the file system, how to execute such
query in Java syntax?

Le ven. 25 nov. 2022 à 02:25, Charles Givre  a écrit :


Hi Marc,
I should have asked, are you running Drill on a single windows machine?
If so, Drill will be able to query anything you throw at it.  If your data
starts to get bigger than a single machine can handle, you'll need to set
up a Drill cluster with multiple nodes.  This is no different than Spark. I
would suggest using Drill to convert the data to parquet format.  Often you
can achieve a 10x reduction in file size and extreme improvements in query
speed.

As for configuring Drill, take a look here:
https://drill.apache.org/docs/workspaces/.   This explains how to set up
a workspace. What you'll want to do is set the workspace to the path to
your desktop.   Then you can query the files as noted below.
Best,
-- C






On Nov 24, 2022, at 6:05 PM, marc nicole  wrote:

also how to execute such queries as  SELECT *
FROM dfs.desktop.`file.json` in Java ?

Le jeu. 24 nov. 2022 à 23:31, Charles Givre  a écrit :


Hi Marc,
Welcome to Drill!  Firstly, take a look at the docs for querying a file
system:

https://drill.apache.org/docs/querying-a-file-system-introduction/

When you start up drill out of the box, there is a connector called dfs
which points to the local filesystem.  You can configure a workspace to
your desktop folder, then all you have to do is write a query like:

SELECT *
FROM dfs.desktop.`file.json`

If you're looking to do this programmatically from Java and your data
isn't too big, the easiest way is probably to use Drill's REST API (
https://drill.apache.org/docs/rest-api-introduction/).  You can make a
simple HtTP call to Drill and get the data that way.

Hope this helps!
-- C




On Nov 24, 2022, at 5:02 PM, marc nicole  wrote:

Hi,

I want to query a JSON file placed in Desktop folder (Windows).
How to do that in Java ?

PS: i saw this type of code :

Connection con = null;

 con = new Driver().connect(DRILL_JDBC_LOCAL_URI,

getDefaultProperties());

 Statement stmt = con.createStatement();
 ResultSet rs = stmt.executeQuery(DRILL_SAMPLE_QUERY);...


But that requires using JDBC and to place JSON in jar file within CP of
Drill which i don't want;

Thanks.






Re: drill start failed with older zk

2022-11-04 Thread James Turton

Can you try ZooKeeper >= 3.5.7?

On 2022/11/04 01:52, june wrote:

Hi, I tried to start drill-1.19 with zk-3.4.5, I replace jar with zk-3.4.5 as 
official doc said, but still failed. The error is above:

Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/zookeeper/admin/ZooKeeperAdmin
 at 
org.apache.curator.framework.CuratorFrameworkFactory.(CuratorFrameworkFactory.java:65)
 at 
org.apache.drill.exec.coord.zk.ZKClusterCoordinator.(ZKClusterCoordinator.java:109)
 at 
org.apache.drill.exec.coord.zk.ZKClusterCoordinator.(ZKClusterCoordinator.java:86)
 at org.apache.drill.exec.server.Drillbit.(Drillbit.java:184)
 at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:574)
 at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:554)
 at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:550)
Caused by: java.lang.ClassNotFoundException: 
org.apache.zookeeper.admin.ZooKeeperAdmin
 at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 ... 7 more

Anyone could help me out?






Re: Error while implementing order By drill

2022-10-08 Thread James Turton
To confirm, if you remove the ORDER BY clause from this query it runs 
successfully?


On 2022/10/08 14:02, jagadeesh maddi wrote:

Hi James,

I am attaching the json profile

Please can you look into this

Waiting for your suggestion.

Thanks and regards
Jagadeesh maddi

On Sat, 8 Oct 2022 at 11:03 AM, James Turton <mailto:dz...@apache.org>> wrote:


Please send the JSON query profile.

On 2022/10/06 18:46, jagadeesh maddi wrote:
 > Hi James,
 >
 > We are in production. We need your help
 >
 > Please provide your suggestion to solve this problem
 >
 > Thanks and regards
 > Jagadeesh maddi
 >
 > On Thu, 6 Oct 2022 at 1:15 AM, jagadeesh maddi
mailto:jagadeesh.m...@gmail.com>
 > <mailto:jagadeesh.m...@gmail.com
<mailto:jagadeesh.m...@gmail.com>>> wrote:
 >
 >     HI james,
 >
 >     Error: error i am geting
 >
 >     [30038]Query execution error. Details:[ SYSTEM ERROR:
 >     NullPointerException Fragment: 3:0 Please, refer to logs for more
 >     information. [Error Id: ba482a08-ffab-4069-a094-9fb82cbe0955 on
 >     blp14571311:31010]
 >     (com.fasterxml.jackson.databind.exc.ValueInstantiationException)
 >     Cannot construct instance of
 >     `org.apache.drill.exec.store.druid.DruidSubScan`, problem:
 >     `java.lang.NullPointerException` at [Source: (String)"{ "pop" :
 >     "hash-partition-sender", "@id" : 0, "receiver-major-fragment"
: 2,
 >     "child" : { "pop" : "druid-datasource-scan", "@id" : 1,
"columns" :
 >     [ "`__time`", "`COUNTER_GROUP_ID`", "C_1`", "`C_10`", "`C_11`",
 >     "`C_12`" ], "maxRecordsToRead" : -1, "initialAlloca...
 >     thanks and regards
 >     Jagadeesh Maddi
 >
 >     On Wed, Oct 5, 2022 at 11:34 PM jagadeesh maddi
 >     mailto:jagadeesh.m...@gmail.com>
<mailto:jagadeesh.m...@gmail.com <mailto:jagadeesh.m...@gmail.com>>>
wrote:
 >
 >         thanks james,
 >
 >         But in using as `__time` not 'time' it was type error..i am
 >         sorry for that
 >
 >         i have checked no keywords used in the Query
 >
 >         thanks and regards
 >         jagadeesh Maddi
 >
 >         On Wed, Oct 5, 2022 at 7:55 PM James Turton
mailto:dz...@apache.org>
 >         <mailto:dz...@apache.org <mailto:dz...@apache.org>>> wrote:
 >
 >             Hi Jagadeesh
 >
 >             "time" is a keyword in Drill SQL. I'm not sure that
this is
 >             connected to
 >             the problem you're hitting but please try your query with
 >             `time` (and
 >             any other names that coincide with keywords) enclosed in
 >             backticks.
 >
 >             Regards
 >             James
 >
 >             On 2022/10/05 15:01, jagadeesh maddi wrote:
 >              > Hi Team,
 >              >
 >              > we are using Apache Drill to connect to Druid and pull
 >             the data
 >              > we are facing a issue when we did "order by"
 >              >
 >              > Query: select time,XXX,YYY from  X order by
time desc
 >              >
 >              > Exception:
 >              > 2022-10-05 13:46:44,594
 >             [1cc2834a-c555-f719-ae42-4a6e85b69d80:frag:3:0]
 >              > INFO  o.a.d.e.w.fragment.FragmentExecutor -
 >              > 1cc2834a-c555-f719-ae42-4a6e85b69d80:3:0: State change
 >             requested
 >              > AWAITING_ALLOCATION --> FAILED
 >              > 2022-10-05 13:46:44,594
 >             [1cc2834a-c555-f719-ae42-4a6e85b69d80:frag:3:0]
 >              > INFO  o.a.d.e.w.fragment.FragmentExecutor -
 >              > 1cc2834a-c555-f719-ae42-4a6e85b69d80:3:0: State change
 >             requested FAILED
 >              > --> FINISHED
 >              > 2022-10-05 13:46:44,594
 >             [1cc2834a-c555-f719-ae42-4a6e85b69d80:frag:3:0]
 >              > ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM
ERROR:
 >              > NullPointerException
 >              >
 >              > Fragment: 3:0
 >              >
 >              > Please, refer to logs fo

Re: Error while implementing order By drill

2022-10-07 Thread James Turton

Please send the JSON query profile.

On 2022/10/06 18:46, jagadeesh maddi wrote:

Hi James,

We are in production. We need your help

Please provide your suggestion to solve this problem

Thanks and regards
Jagadeesh maddi

On Thu, 6 Oct 2022 at 1:15 AM, jagadeesh maddi <mailto:jagadeesh.m...@gmail.com>> wrote:


HI james,

Error: error i am geting

[30038]Query execution error. Details:[ SYSTEM ERROR:
NullPointerException Fragment: 3:0 Please, refer to logs for more
information. [Error Id: ba482a08-ffab-4069-a094-9fb82cbe0955 on
blp14571311:31010]
(com.fasterxml.jackson.databind.exc.ValueInstantiationException)
Cannot construct instance of
`org.apache.drill.exec.store.druid.DruidSubScan`, problem:
`java.lang.NullPointerException` at [Source: (String)"{ "pop" :
"hash-partition-sender", "@id" : 0, "receiver-major-fragment" : 2,
"child" : { "pop" : "druid-datasource-scan", "@id" : 1, "columns" :
[ "`__time`", "`COUNTER_GROUP_ID`", "C_1`", "`C_10`", "`C_11`",
"`C_12`" ], "maxRecordsToRead" : -1, "initialAlloca...
thanks and regards
Jagadeesh Maddi

On Wed, Oct 5, 2022 at 11:34 PM jagadeesh maddi
mailto:jagadeesh.m...@gmail.com>> wrote:

thanks james,

But in using as `__time` not 'time' it was type error..i am
sorry for that

i have checked no keywords used in the Query

thanks and regards
jagadeesh Maddi

On Wed, Oct 5, 2022 at 7:55 PM James Turton mailto:dz...@apache.org>> wrote:

Hi Jagadeesh

"time" is a keyword in Drill SQL. I'm not sure that this is
connected to
the problem you're hitting but please try your query with
`time` (and
any other names that coincide with keywords) enclosed in
backticks.

Regards
James

On 2022/10/05 15:01, jagadeesh maddi wrote:
 > Hi Team,
 >
 > we are using Apache Drill to connect to Druid and pull
the data
 > we are facing a issue when we did "order by"
 >
 > Query: select time,XXX,YYY from  X order by time desc
 >
 > Exception:
 > 2022-10-05 13:46:44,594
[1cc2834a-c555-f719-ae42-4a6e85b69d80:frag:3:0]
 > INFO  o.a.d.e.w.fragment.FragmentExecutor -
 > 1cc2834a-c555-f719-ae42-4a6e85b69d80:3:0: State change
requested
 > AWAITING_ALLOCATION --> FAILED
 > 2022-10-05 13:46:44,594
[1cc2834a-c555-f719-ae42-4a6e85b69d80:frag:3:0]
 > INFO  o.a.d.e.w.fragment.FragmentExecutor -
 > 1cc2834a-c555-f719-ae42-4a6e85b69d80:3:0: State change
requested FAILED
 > --> FINISHED
 > 2022-10-05 13:46:44,594
[1cc2834a-c555-f719-ae42-4a6e85b69d80:frag:3:0]
 > ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR:
 > NullPointerException
 >
 > Fragment: 3:0
 >
 > Please, refer to logs for more information.
 >
 > [Error Id: 9a7171d4-8789-412d-84c4-87b46040fe1b on
xx:31010]
 > org.apache.drill.common.exceptions.UserException: SYSTEM
ERROR:
 > NullPointerException
 >
 > Fragment: 3:0
 >
 > Please, refer to logs for more information.
 >
 > [Error Id: 9a7171d4-8789-412d-84c4-87b46040fe1b on
xx:31010]
 >          at
 >

org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
 >          at
 > org.apache.drill.exec.work

<http://org.apache.drill.exec.work>.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:392)
 >          at
 > org.apache.drill.exec.work

<http://org.apache.drill.exec.work>.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:244)
 >          at
 > org.apache.drill.exec.work

<http://org.apache.drill.exec.work>.fragment.FragmentExecutor.run(FragmentExecutor.java:359)
 >          at
 >

org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
 >          at
 >

java.util.concurrent.ThreadPoolExecutor.runWorker(T

[ANNOUNCE] New Drill Committer Maksym Rymar

2022-10-07 Thread James Turton
The Project Management Committee (PMC) for Apache Drill is pleased to 
announce that we have invited Maksym Rymar to join us as a committer of 
the Drill project and he has accepted. Please join me in congratulating 
Maksym and welcoming him to Drill committers!


James Turton
Drill PMC


Re: Error while implementing order By drill

2022-10-05 Thread James Turton

Hi Jagadeesh

"time" is a keyword in Drill SQL. I'm not sure that this is connected to 
the problem you're hitting but please try your query with `time` (and 
any other names that coincide with keywords) enclosed in backticks.


Regards
James

On 2022/10/05 15:01, jagadeesh maddi wrote:

Hi Team,

we are using Apache Drill to connect to Druid and pull the data
we are facing a issue when we did "order by"

Query: select time,XXX,YYY from  X order by time desc

Exception:
2022-10-05 13:46:44,594 [1cc2834a-c555-f719-ae42-4a6e85b69d80:frag:3:0] 
INFO  o.a.d.e.w.fragment.FragmentExecutor - 
1cc2834a-c555-f719-ae42-4a6e85b69d80:3:0: State change requested 
AWAITING_ALLOCATION --> FAILED
2022-10-05 13:46:44,594 [1cc2834a-c555-f719-ae42-4a6e85b69d80:frag:3:0] 
INFO  o.a.d.e.w.fragment.FragmentExecutor - 
1cc2834a-c555-f719-ae42-4a6e85b69d80:3:0: State change requested FAILED 
--> FINISHED
2022-10-05 13:46:44,594 [1cc2834a-c555-f719-ae42-4a6e85b69d80:frag:3:0] 
ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
NullPointerException


Fragment: 3:0

Please, refer to logs for more information.

[Error Id: 9a7171d4-8789-412d-84c4-87b46040fe1b on xx:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
NullPointerException


Fragment: 3:0

Please, refer to logs for more information.

[Error Id: 9a7171d4-8789-412d-84c4-87b46040fe1b on xx:31010]
         at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
         at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:392)
         at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:244)
         at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:359)
         at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

         at java.lang.Thread.run(Thread.java:748)
Caused by: 
com.fasterxml.jackson.databind.exc.ValueInstantiationException: Cannot 
construct instance of `org.apache.drill.exec.store.druid.DruidSubScan`, 
problem: `java.lang.NullPointerException`

  at [Source: (String)"{
   "pop" : "hash-partition-sender",
   "@id" : 0,
   "receiver-major-fragment" : 2,
   "child" : {
     "pop" : "druid-datasource-scan",
     "@id" : 1,
     "columns" : [ "`__time`", "`xxx_ID`", "`xxx_C_1`", "`xxx_C_10`", 
"`xxx_C_11`", "`xxx_C_12`", "`xxx_C_13`", "`xxx_C_14`", "`xxx_C_15`", 
"`xxx_C_16`", "`xxx_C_17`", "`xxx_C_18`", "`xxx_C_19`", "`xxx_C_2`", 
"`xxx_C_20`", "`xxx_C_21`", "`xxx_C_22`", "`xxx_C_23`", "`xxx_C_24`", 
"`xxx_C_25`", "`xxx_C_26`", "`xxx_C_27`", "`xxx_C_28`", 
"`xxx_C"[truncated 3449 chars]; line: 16, column: 3] (through reference 
chain: org.apache.drill.exec.physical.config.HashPartitionSender["child"])
         at 
com.fasterxml.jackson.databind.exc.ValueInstantiationException.from(ValueInstantiationException.java:47)
         at 
com.fasterxml.jackson.databind.DeserializationContext.instantiationException(DeserializationContext.java:2047)
         at 
com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapAsJsonMappingException(StdValueInstantiator.java:587)
         at 
com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.rewrapCtorProblem(StdValueInstantiator.java:610)
         at 
com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:293)
         at 
com.fasterxml.jackson.databind.deser.ValueInstantiator.createFromObjectWith(ValueInstantiator.java:288)
         at 
com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:202)
         at 
com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:518)
         at 
com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1405)
         at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:351)
         at 
com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithObjectId(BeanDeserializerBase.java:1371)
         at 
com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:217)
         at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:186)
         at 
com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:144)
         at 
com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:110)
         at 
com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:263)

Re: Two Apache-drill docker images did not pass security scans

2022-09-29 Thread James Turton

Hi Dan

We get automatic scans done by GitHub's Dependabot and we periodically 
run a manual scan using an OWASP tool. It would be nice to see the 
results of the Sonatype scanner but these mailing lists don't support 
images. Can you put them in a pastebin (I don't believe there's any 
security benefit in avoid a public upload here) or send them directly to 
me at this address?


Thanks
James

On 2022/09/28 18:50, Danny Mayer wrote:

Hi Support,

I'm developing a solution using Apache Drill on a MongoDB cluster 
server, and it works well.


But, when I tried to approve the package at my company, it did not 
pass IT security scans.


I performed a security scan using Sonatype Nexus IQ scanner, done on a 
Linux box, on two docker images:


- apache-drill:master

- apache-drill:1.20.2

Both docker images did not pass the security scan.

I've tried to attach both reports, but they pass the limit of allowed 
size by your email server.


Here are the steps to reproduce the reports:

1. Pull the docker images
# docker pull apache/drill:master
# docker pull apache/drill:1.20.2

2. Save docker images to a local file
# docker save -o apache-drill-master.tar 
# docker save -o apache-drill-1.20.2.tar 

2. Install Sonatype Nexus IQ scanner

3. Run Sonatype Nexus IQ scanner

4. Load each docker image file and start the scan
At the end of the scan a report is sent to you by email.

I've attached two screenshots of the first report page of each report.
image.png
image.png

Can you check these vulnerabilities, especially the high and medium 
security levels, and write about them?


Regards,

Dan Mayer


Re: Error While Querying Druid from drill

2022-09-15 Thread James Turton

Good idea, thanks.

On 2022/09/15 14:17, Charles Givre wrote:

Hey James,
Would it make sense to use OkHttp since it is in use elsewhere?

Sent from my iPhone


On Sep 15, 2022, at 08:16, James Turton  wrote:

Sorry about that, that's a bug. Please watch the following Jira issue I just 
created. This looks like a simple fix so we'll try to have this fixed in 1.20.3 
and out within a few weeks.

https://issues.apache.org/jira/browse/DRILL-8307

Regards
James


On 2022/09/15 13:22, jagadeesh maddi wrote:
Hi Team,

we have created a cluster of apache drill , and by using Druid storage
plugin  we are firing multiple queries on the cluster using same
connection  we are getting exception like this

  [1cdd2b75-1310---5a638567ed07:foreman] INFO
o.a.d.e.s.d.s.DruidSchemaFactory
- User Error Occurred: Failure while loading druid datasources for database
'druid-egsmd300'. (Invalid use of BasicClientConnManager: connection still
allocated.
Make sure to release the connection before allocating another one.)
org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Failure
while loading druid datasources for database ''.

please help

Thanks and Regards
Jagadeesh Maddi





Re: Error While Querying Druid from drill

2022-09-15 Thread James Turton
Sorry about that, that's a bug. Please watch the following Jira issue I 
just created. This looks like a simple fix so we'll try to have this 
fixed in 1.20.3 and out within a few weeks.


https://issues.apache.org/jira/browse/DRILL-8307

Regards
James

On 2022/09/15 13:22, jagadeesh maddi wrote:

Hi Team,

we have created a cluster of apache drill , and by using Druid storage
plugin  we are firing multiple queries on the cluster using same
connection  we are getting exception like this

  [1cdd2b75-1310---5a638567ed07:foreman] INFO
o.a.d.e.s.d.s.DruidSchemaFactory
- User Error Occurred: Failure while loading druid datasources for database
'druid-egsmd300'. (Invalid use of BasicClientConnManager: connection still
allocated.
Make sure to release the connection before allocating another one.)
org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Failure
while loading druid datasources for database ''.

please help

Thanks and Regards
Jagadeesh Maddi





Re: Error while querying parquet files

2022-08-22 Thread James Turton
Hmm. What happens if you restrict down to smaller subsets of those 
Parquet files or put in a LIMIT?


On 2022/08/22 15:59, Prabhakar Bhosale wrote:

Hi JAmes,

My apologies for the delayed reply. The drill version is 1.20.1 and below
is JSON profile





{
 "id": {
 "part1": 2089923342900841500,
 "part2": -6273738056146546000
 },
 "type": 1,
 "start": 1660885450907,
 "end": 1660885532000,
 "query": "select * from parquetstore.table1  where dir0='2018' and
dir1>='1' and dir1<='3' and gid='01U41'",
 "plan": "00-00Screen : rowType = RecordType(DYNAMIC_STAR **):
rowcount = 1000.0, cumulative cost = {3.01451237E8 rows, 1.3135629269775E9
cpu, 4.01925516E8 io, 1.6384E7 network, 0.0 memory}, id = 935\n00-01
Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount =
1000.0, cumulative cost = {3.01451137E8 rows, 1.3135628269775E9 cpu,
4.01925516E8 io, 1.6384E7 network, 0.0 memory}, id = 934\n00-02
Project(T2¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR T2¦¦**): rowcount
= 1000.0, cumulative cost = {3.01450137E8 rows, 1.3135618269775E9 cpu,
4.01925516E8 io, 1.6384E7 network, 0.0 memory}, id = 931\n00-03
SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY
dir0, ANY dir1, ANY gid): rowcount = 1000.0, cumulative cost =
{3.01449137E8 rows, 1.3135608269775E9 cpu, 4.01925516E8 io, 1.6384E7
network, 0.0 memory}, id = 930\n00-04Limit(fetch=[1000]) :
rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY dir0, ANY dir1, ANY gid):
rowcount = 1000.0, cumulative cost = {3.01448137E8 rows, 1.3135598269775E9
cpu, 4.01925516E8 io, 1.6384E7 network, 0.0 memory}, id = 929\n00-05
   UnionExchange : rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY dir0,
ANY dir1, ANY gid): rowcount = 1000.0, cumulative cost = {3.01447137E8
rows, 1.3135558269775E9 cpu, 4.01925516E8 io, 1.6384E7 network, 0.0
memory}, id = 928\n01-01SelectionVectorRemover : rowType =
RecordType(DYNAMIC_STAR T2¦¦**, ANY dir0, ANY dir1, ANY gid): rowcount =
1000.0, cumulative cost = {3.01446137E8 rows, 1.3135478269775E9 cpu,
4.01925516E8 io, 0.0 network, 0.0 memory}, id = 927\n01-02
Limit(fetch=[1000]) : rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY dir0,
ANY dir1, ANY gid): rowcount = 1000.0, cumulative cost = {3.01445137E8
rows, 1.3135468269775E9 cpu, 4.01925516E8 io, 0.0 network, 0.0 memory}, id
= 926\n01-03Filter(condition=[AND(=($1, '2018'), >=($2,
'1'), <=($2, '3'), =($3, '01U41'))]) : rowType =
RecordType(DYNAMIC_STAR T2¦¦**, ANY dir0, ANY dir1, ANY gid): rowcount =
565207.756875, cumulative cost = {3.01444137E8 rows, 1.3135428269775E9 cpu,
4.01925516E8 io, 0.0 network, 0.0 memory}, id = 925\n01-04
 Project(T2¦¦**=[$0], dir0=[$1], dir1=[$2], gid=[$3]) : rowType =
RecordType(DYNAMIC_STAR T2¦¦**, ANY dir0, ANY dir1, ANY gid): rowcount =
1.00481379E8, cumulative cost = {2.00962758E8 rows, 8.03851032E8 cpu,
4.01925516E8 io, 0.0 network, 0.0 memory}, id = 924\n01-05
   Scan(table=[[]], groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/2/4/table1
~155~3558~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/2/4/1/table1
~155~3587~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/3/3/table1
~155~3599~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/2/2/table1
~155~3465~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/3/1/table1
~155~3603~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/2/1/1/table1
~155~3600~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/2/2/table1
~155~3613~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/2/4/1/table1
~155~3599~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/2/4/1/table1
~155~3558~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/2/3/1/table1
~155~3600~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/2/2/1/table1
~155~3613~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/1/table1
~155~3462~2018~JANUARY~1/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/2/2/1/table1
~155~3587~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/2/4/table1
~155~3537~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetstore/table1 /2018/2/4/1/table1
~155~3603~2018~MAY~7/0_0_0.parquet], ReadEntryWithPath
[path=/remote_t1/archived_files1/parquetsto

Re: Drill Security Plain Authentication- Error

2022-08-22 Thread James Turton
You need to provide credentials when starting embedded Drill. Have a 
look at the output of drill-embedded --help and try something like 
drill-embedded -n alice -p top_secret


On 2022/08/22 09:46, Prabhakar Bhosale wrote:

Hi Team,
I am having Drill 1.20.1 in embedded mode and trying to set up plain
authentication using the htpasswd file. After setting up everything as
given in documentation getting error as

Error: Failure in connecting to Drill:
org.apache.drill.exec.rpc.NonTransientRpcException:
javax.security.sasl.SaslException: Server requires authentication using
[PLAIN]. Insufficient credentials?. [Details: Encryption: disabled ,
MaxWrappedSize: 65536 , WrapSizeLimit: 0]. (state=,code=0)

I tied creating htpasswd file both plain password and MD5 but still
same error

Below is my drill-override.conf

drill.exec: {
   cluster-id: "drillbits1",
   zk.connect: "localhost:2181",

   http.memory.heap.failure.threshold = 2,


   impersonation: {
enabled: true,
max_chained_user_hops: 3
  },
  security: {
  auth.mechanisms : ["PLAIN"],
   },
  security.user.auth: {
  enabled: true,
  packages += "org.apache.drill.exec.rpc.user.security",
  impl: "htpasswd",
  htpasswd.path:
"/app/archival/drill/apache-drill-1.20.1/conf/htpasswd"
   }

}

Please help

Thanks
Prabhakar





Re: Error while querying parquet files

2022-08-17 Thread James Turton

Please attach the query profile JSON disclose the version of Drill.

On 2022/08/17 08:51, Prabhakar Bhosale wrote:

Hi Team,
Any pointers on the below error? thx

Regards
Prabhakar

On Tue, Aug 16, 2022 at 9:05 AM Prabhakar Bhosale 
wrote:


Hi Team,
I am querying parquet files as per below query and getting the error.
Please advise

select * from parquetstore.table1 where dir0='2018' and dir1>='1' and
dir1<='3' and gid='01U41'


Getting error
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
IndexOutOfBoundsException: readerIndex: 0, writerIndex: 1946157056
(expected: 0 <= readerIndex <= writerIndex <= capacity(0)) Fragment: 0:0
Please, refer to logs for more information. [Error Id:
b335690a-a814-41dc-ba5f-a3c4d8e3ccc2





Re: [RESULT] [VOTE] Release Apache Drill 1.20.2 RC1

2022-08-03 Thread James Turton

Drill 1.20.2 has been released.

https://drill.apache.org/download/
https://github.com/apache/drill/releases
https://hub.docker.com/r/apache/drill/tags?page=1&name=latest


On 2022/08/03 07:40, James Turton wrote:
The vote passes. Thanks to everyone who has tested the release 
candidate and given their comments and votes. Final tally:


3x +1 (binding): Charles, James, Vova

1x +1 (non-binding): Jingchuan

No 0s or -1s.

I'll start process for pushing the release artifacts and send an 
announcement once propagated.


James




[VOTE] Release Apache Drill 1.20.2 - RC1

2022-08-01 Thread James Turton
I'd like to propose the second release candidate (RC1) of Apache Drill, 
version 1.20.2. The release candidate covers a total of 23 resolved 
Jiras since 1.20.1 [1]. Thanks to everyone who contributed to this 
release and to Jingchuan Hu for his help in preparing the release.


The tarball artifacts are hosted at [2] and the maven artifacts are 
hosted at [3]. This release candidate is based on commit 
3b924b778990c41bf2c15a917097c038a10faf5d located at [4].


Please download and try out the release.

[ ] +1
[ ] +0
[ ] -1

✅ Launch Hadoop 3 build under Java 8 using drill-embedded on Linux, 
check sys.version, run a CTAS, check the web UI.
✅ Launch Hadoop 2 build under Java 8 using drill-embedded on Linux, 
check sys.version, run a CTAS.
✅ Launch Hadoop 3 build under Java 8 using drill-embedded on Windows 10, 
run a CTAS.

✅ Check which Hadoop and Netty jars are present in the Hadoop 3 build.
✅ Check which Hadoop and Netty jars are present in the Hadoop 2 build.

I vote +1 (binding).

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12351742 


[2] https://dist.apache.org/repos/dist/dev/drill/1.20.2-rc1/
[3] https://repository.apache.org/content/repositories/orgapachedrill-1101/
[4] https://github.com/jnturton/drill/commits/drill-1.20.2


Re: [VOTE] Release Apache Drill 1.20.2 - RC0

2022-07-31 Thread James Turton

Everything needed for RC1 is now merged, I'll build and upload it soon.

On 2022/07/22 18:04, James Turton wrote:
I turned up some dependency issues in the Hadoop 2 build so I'm now a 
-1 on RC0 until DRILL-8268 
<https://issues.apache.org/jira/browse/DRILL-8268> (the dependency 
management parts of it, at least).


On 2022/07/21 17:26, Charles Givre wrote:

Downloaded built release.  Ran various queries.
+1 from me. (Binding)


On Jul 21, 2022, at 10:47 AM, James Turton  wrote:

This is just a resend that attempts to fix the mangled formatting in 
the first attempt.


I'd like to propose the first release candidate (RC0) of Apache 
Drill, version 1.20.2. The release candidate covers a total of 20 
resolved Jiras since 1.20.1 [1]. Thanks to everyone who contributed 
to this release and to Jingchuan Hu for his help in preparing the 
release.


The tarball artifacts are hosted at [2] and the maven artifacts are 
hosted at [3]. This release candidate is based on commit 
1ff69babc1a61b1136f01a20f8f28dfe0f1d9ce8 located at [4].


Please download and try out the release.

[ ] +1
[ ] +0
[ ] -1

Here's my vote: +1

[1]https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12351742 


[2]https://dist.apache.org/repos/dist/dev/drill/1.20.2-rc0/
[3]https://repository.apache.org/content/repositories/orgapachedrill-1099/ 


[4]https://github.com/jnturton/drill/commits/drill-1.20.2|


On 2022/07/21 16:39, James Turton wrote:
|Hi all, I'd like to propose the first release candidate (RC0) of 
Apache Drill, version 1.20.2. The release candidate covers a total 
of 20 resolved Jiras since 1.20.1 [1]. Thanks to everyone who 
contributed to this release and to Jingchuan Hu for his help in 
preparing the release. The tarball artifacts are hosted at [2] and 
the maven artifacts are hosted at [3]. This release candidate is 
based on commit 1ff69babc1a61b1136f01a20f8f28dfe0f1d9ce8 located at 
[4]. Please download and try out the release. [ ] +1 [ ] +0 [ ] -1 
Here's my vote: +1 
[1]https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12351742 
[2]https://dist.apache.org/repos/dist/dev/drill/1.20.2-rc0/ 
[3]https://repository.apache.org/content/repositories/orgapachedrill-1099/ 
[4]https://github.com/jnturton/drill/commits/drill-1.20.2|






Re: [VOTE] Release Apache Drill 1.20.2 - RC0

2022-07-22 Thread James Turton
I turned up some dependency issues in the Hadoop 2 build so I'm now a -1 
on RC0 until DRILL-8268 
<https://issues.apache.org/jira/browse/DRILL-8268> (the dependency 
management parts of it, at least).


On 2022/07/21 17:26, Charles Givre wrote:

Downloaded built release.  Ran various queries.
+1 from me. (Binding)


On Jul 21, 2022, at 10:47 AM, James Turton  wrote:

This is just a resend that attempts to fix the mangled formatting in the first 
attempt.

I'd like to propose the first release candidate (RC0) of Apache Drill, version 
1.20.2. The release candidate covers a total of 20 resolved Jiras since 1.20.1 
[1]. Thanks to everyone who contributed to this release and to Jingchuan Hu for 
his help in preparing the release.

The tarball artifacts are hosted at [2] and the maven artifacts are hosted at 
[3]. This release candidate is based on commit 
1ff69babc1a61b1136f01a20f8f28dfe0f1d9ce8 located at [4].

Please download and try out the release.

[ ] +1
[ ] +0
[ ] -1

Here's my vote: +1

[1]https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12351742
[2]https://dist.apache.org/repos/dist/dev/drill/1.20.2-rc0/
[3]https://repository.apache.org/content/repositories/orgapachedrill-1099/
[4]https://github.com/jnturton/drill/commits/drill-1.20.2|


On 2022/07/21 16:39, James Turton wrote:

|Hi all, I'd like to propose the first release candidate (RC0) of Apache Drill, 
version 1.20.2. The release candidate covers a total of 20 resolved Jiras since 
1.20.1 [1]. Thanks to everyone who contributed to this release and to Jingchuan Hu 
for his help in preparing the release. The tarball artifacts are hosted at [2] and 
the maven artifacts are hosted at [3]. This release candidate is based on commit 
1ff69babc1a61b1136f01a20f8f28dfe0f1d9ce8 located at [4]. Please download and try 
out the release. [ ] +1 [ ] +0 [ ] -1 Here's my vote: +1 
[1]https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12351742
  [2]https://dist.apache.org/repos/dist/dev/drill/1.20.2-rc0/  
[3]https://repository.apache.org/content/repositories/orgapachedrill-1099/  
[4]https://github.com/jnturton/drill/commits/drill-1.20.2|


Re: Drill Heap error

2022-07-22 Thread James Turton

Hey Prabhakar, can you run exactly the same test as below after setting

drill.exec.http.memory.heap.failure.threshold = 2

in your drill-override.conf? I have a theory that we're carrying some 
memory usage limiting code that we no longer need since we gained 
streaming HTTP results, and that might be interacting badly with the 
Java garbage collector. Alternatively, and I sincerely hope it's not the 
case, we are genuinely leaking heap memory somewhere and need to find 
it. If we get a favourable result from your testing I think we should 
consider incorporating an extra fix in 1.20.2 that removes the home 
grown memory usage limiting.


Footnote: the value of 2 above (200% of the maximum heap size) is not 
meant to be a sensible fraction but was chosen only to completely 
circumvent the mentioned memory usage limiting logic in Drill.



On 2022/07/22 08:50, Prabhakar Bhosale wrote:

Hi Team,
I am Running the drill in embedded mode. The drill version is 1.20.1.

After running multiple queries from WEB UI which opens 8000 parquet
compressed files it gives me the below error. I am running queries one
after another. My understanding is that once the query is completed
successfully or failed then it should release the memory. So the system
should not give this error if I am executing the queries one at a time.

I have not changed any memory settings for drill.  Any suggestions?


warning UserException : RESOURCE ERROR: There is not enough heap memory to
run this query using the web interface.
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is
not enough heap memory to run this query using the web interface. Please
try a query with fewer columns or with a filter or limit condition to limit
the data returned. You can also try an ODBC/JDBC client. [Error Id:
c63dda94-071a-4c79-953c-fab956418cd8 ]





Re: [VOTE] Release Apache Drill 1.20.2 - RC0

2022-07-21 Thread James Turton
This is just a resend that attempts to fix the mangled formatting in the 
first attempt.


I'd like to propose the first release candidate (RC0) of Apache Drill, 
version 1.20.2. The release candidate covers a total of 20 resolved 
Jiras since 1.20.1 [1]. Thanks to everyone who contributed to this 
release and to Jingchuan Hu for his help in preparing the release.


The tarball artifacts are hosted at [2] and the maven artifacts are 
hosted at [3]. This release candidate is based on commit 
1ff69babc1a61b1136f01a20f8f28dfe0f1d9ce8 located at [4].


Please download and try out the release.

[ ] +1
[ ] +0
[ ] -1

Here's my vote: +1

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12351742

[2] https://dist.apache.org/repos/dist/dev/drill/1.20.2-rc0/
[3] https://repository.apache.org/content/repositories/orgapachedrill-1099/
[4] https://github.com/jnturton/drill/commits/drill-1.20.2|


On 2022/07/21 16:39, James Turton wrote:
|Hi all, I'd like to propose the first release candidate (RC0) of 
Apache Drill, version 1.20.2. The release candidate covers a total of 
20 resolved Jiras since 1.20.1 [1]. Thanks to everyone who contributed 
to this release and to Jingchuan Hu for his help in preparing the 
release. The tarball artifacts are hosted at [2] and the maven 
artifacts are hosted at [3]. This release candidate is based on commit 
1ff69babc1a61b1136f01a20f8f28dfe0f1d9ce8 located at [4]. Please 
download and try out the release. [ ] +1 [ ] +0 [ ] -1 Here's my vote: 
+1 [1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12351742 
[2] https://dist.apache.org/repos/dist/dev/drill/1.20.2-rc0/ [3] 
https://repository.apache.org/content/repositories/orgapachedrill-1099/ 
[4] https://github.com/jnturton/drill/commits/drill-1.20.2|




[VOTE] Release Apache Drill 1.20.2 - RC0

2022-07-21 Thread James Turton
|Hi all, I'd like to propose the first release candidate (RC0) of Apache 
Drill, version 1.20.2. The release candidate covers a total of 20 
resolved Jiras since 1.20.1 [1]. Thanks to everyone who contributed to 
this release and to Jingchuan Hu for his help in preparing the release. 
The tarball artifacts are hosted at [2] and the maven artifacts are 
hosted at [3]. This release candidate is based on commit 
1ff69babc1a61b1136f01a20f8f28dfe0f1d9ce8 located at [4]. Please download 
and try out the release. [ ] +1 [ ] +0 [ ] -1 Here's my vote: +1 [1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12351742 
[2] https://dist.apache.org/repos/dist/dev/drill/1.20.2-rc0/ [3] 
https://repository.apache.org/content/repositories/orgapachedrill-1099/ 
[4] https://github.com/jnturton/drill/commits/drill-1.20.2|




Re: [DISCUSS] Drill 1.20.2 bugfix release

2022-07-20 Thread James Turton
We've been able to include the mentioned fixes plus a number of 
extras[1]. Are we ready to freeze the 1.20 branch for 1.20.2 here?


[1] https://github.com/apache/drill/commits/1.20

On 2022/07/08 14:33, Charles Givre wrote:

Hey James,
Thanks for doing this.  There are a few CI and CVE related PRs that we might 
want to think about including such as the one below.  Also I seem to remember 
that Vova made a fix to the Calcite fork that fixed a bug relating to 
Elasticsearch.  I know he's working on some other things, but do you think it 
might be worth including that in 1.20.2?

Best,
-- C

https://github.com/apache/drill/pull/2581 
<https://github.com/apache/drill/pull/2581>




On Jul 8, 2022, at 3:38 AM, James Turton  wrote:

Hi Drillers

It's been about seven weeks since the last bug fix release and it is time to do 
the next one. I volunteer to be the release manager with the kind assistance of 
Jingchuan Hu who has already been busy backporting fixes for us [1] . If there 
are any issues on which work is in progress, that you feel we *must* include in 
the release, please post in reply to this thread. Otherwise please indicate 
that you are in favour of freezing the stable branch at its current height [2].

[1] https://github.com/apache/drill/pull/2584
[2] https://github.com/apache/drill/commits/1.20

Thank you
James Turton






Re: Running drill in background in embedded mode

2022-07-18 Thread James Turton

Well if we're playing with hacks... :-)

nohup drill-embedded -f <(sleep infinity) > /dev/null

That needs a shell smart enough to do process substitution and avoids 
what I guess was busy wait loop in sqlline's input reader that you ran into.


Back to being boring and responsible: are you sure you want to run Drill 
this way? It would be a lot more natural to launch a standalone Drillbit 
with drillbit.sh, having started a ZooKeeper somewhere beforehand.


On 2022/07/19 05:30, Prabhakar Bhosale wrote:

Hi Luoc,
When I run the drill in embedded mode as foreground process, the %CPU does
not go beyond 1%.  for java process. Please let me know if you need any
additional information. thx

Regards
Prabhakar

On Tue, Jul 19, 2022 at 6:57 AM luoc  wrote:


Hi,

What is the cost of the CPU if you are running in a front process?


On Jul 18, 2022, at 14:51, Prabhakar Bhosale 

wrote:

Hi Team,
I am trying to run drill in embedded mode as background process with

below

command
nohup sh drill-embedded >/dev/null 2>&1 &

My observation is that it takes too much CPU. After starting drill by

above

command the output of top command against java process shows %CPU

anything

between 150 to 175%.

So any recommended way to run drill in embedded mode in background?

thanks

REgards
Prabhakar






Re: [DISCUSS] Drill 1.20.2 bugfix release

2022-07-11 Thread James Turton
Good points, thanks. I think we're nearly done with the CI and CVE PRs 
now and I've asked Vova if he can cherry pick CALCITE-4992 for this 
release too.


On 2022/07/08 14:33, Charles Givre wrote:

Hey James,
Thanks for doing this.  There are a few CI and CVE related PRs that we might 
want to think about including such as the one below.  Also I seem to remember 
that Vova made a fix to the Calcite fork that fixed a bug relating to 
Elasticsearch.  I know he's working on some other things, but do you think it 
might be worth including that in 1.20.2?

Best,
-- C

https://github.com/apache/drill/pull/2581 
<https://github.com/apache/drill/pull/2581>




On Jul 8, 2022, at 3:38 AM, James Turton  wrote:

Hi Drillers

It's been about seven weeks since the last bug fix release and it is time to do 
the next one. I volunteer to be the release manager with the kind assistance of 
Jingchuan Hu who has already been busy backporting fixes for us [1] . If there 
are any issues on which work is in progress, that you feel we *must* include in 
the release, please post in reply to this thread. Otherwise please indicate 
that you are in favour of freezing the stable branch at its current height [2].

[1] https://github.com/apache/drill/pull/2584
[2] https://github.com/apache/drill/commits/1.20

Thank you
James Turton






[DISCUSS] Drill 1.20.2 bugfix release

2022-07-08 Thread James Turton

Hi Drillers

It's been about seven weeks since the last bug fix release and it is 
time to do the next one. I volunteer to be the release manager with the 
kind assistance of Jingchuan Hu who has already been busy backporting 
fixes for us [1] . If there are any issues on which work is in progress, 
that you feel we *must* include in the release, please post in reply to 
this thread. Otherwise please indicate that you are in favour of 
freezing the stable branch at its current height [2].


[1] https://github.com/apache/drill/pull/2584
[2] https://github.com/apache/drill/commits/1.20

Thank you
James Turton


Apache Drill Community Meetup

2022-06-03 Thread James Turton

Hi folks

Just a little informational message to say that neither Charles nor I 
aren't able to get to this one but, of course, that need not stop 
everyone else. I hope to join you at the July meetup, I got a bit lonely 
at the May one 😅.


Regards
James


Re: How to use MongoDB collections starting with numbers?

2022-05-26 Thread James Turton

Also note that identifiers must be quoted with `backticks` in Drill SQL.

On 2022/05/26 11:24, luoc wrote:

Hi Chris,

It is strongly recommended not to create table names beginning with numbers in 
mongo. Although mongo supports this, it can become very complex in ANSI SQL.


On May 26, 2022, at 17:16, spamo...@freenet.de wrote:

Hello,

I've got a few Mongo collections and would like to use:
01_test

apache drill (mongo.crm_test)> SHOW DATABASES;
++
|SCHEMA_NAME |
++
| mongo.01_test   |
| mongo.test |

apache drill (mongo.crm_test)> usemongo.test;
+--++
|  ok  |  summary  |
+--++
| true| Default schema changed to [mongo.test]|
+--++

But:

apache drill (mongo.test)> usemongo.01_test;
Error: PARSE ERROR: Encountered ".01" at line 1, column 10.

SQL Query: use mongo.01_test
^

[Error Id: 056ff092-822b-4113-bf06-27e960cfe69a ] (state=,code=0)


I've tried it with " and ' but nothing seems to be working for
a collection, starting with a number.
How to do this?

Thanks and kind regards,
Chris




Drill 1.20.1 released

2022-05-16 Thread James Turton

Dear Drill community

I'm pleased to announce the arrival of our first bugfix release. If 
you're after the Hadoop 2 build please be patient for a few more hours 
-- releasing Drill over a 2 Mbit/s uplink is not a fast process. I'll be 
back on fast link later in the day, however. A copy of the release notes 
follows.


Release Notes - Apache Drill - Version 1.20.1

** Sub-task
    * [DRILL-8145] - Fix flaky 
TestDrillbitResilience#memoryLeaksWhenCancelled test case




** Bug
    * [DRILL-8013] - Drill attempts to push "$SUM0" to JDBC storage 
plugin for AVG

    * [DRILL-8146] - SAS reader fails to read the majority of sas files
    * [DRILL-8168] - Duplicated attempt to apply inbound impersonation 
in the REST API

    * [DRILL-8172] - Use the specified memory usage for Travis CI
    * [DRILL-8176] - upgrade jackson due to CVE-2020-36518
    * [DRILL-8187] - Dialect factory returns ANSI SQL dialect for BigQuery
    * [DRILL-8192] - Cassandra queries fail when enabled Mongo plugin
    * [DRILL-8194] - Function of REPLACE throws 
IndexOutOfBoundsException, if text's length is more than previously applied

    * [DRILL-8200] - Update Hadoop libs to ≥ 3.2.3 for CVE-2022-26612
    * [DRILL-8219] - Handle null catalog names returned by DB2 in 
storage-jdbc




** Improvement
    * [DRILL-8150] - upgrade to log4j 2.17.2
    * [DRILL-8151] - Add support for more ElasticSearch and Cassandra 
data types

    * [DRILL-8154] - upgrade to poi 5.2.1
    * [DRILL-8156] - Declare and chown a /data VOLUME in the Drill 
Dockerfile

    * [DRILL-8175] - Update Drill release script after 1.20



** Task
    * [DRILL-8164] - Upgrade metadata-extractor because of CVE-2022-24613
    * [DRILL-8165] - Upgrade liquibase because of CVE-2022-0839
    * [DRILL-8178] - Bump S3 SDK to Lastest Version









































Re: Apache drill: how to build a custom ODBC/JDBC driver that performs rest api calls

2022-04-25 Thread James Turton
At the current time there is unfortunately no open source ODBC driver 
for Drill and updates to the MapR closed source driver have stopped. I 
believe it does still work, but obviously an open source driver would be 
preferable by a long shot.


On 2022/04/22 14:56, Damien Deom wrote:

Hi James,

Thank you for your answer. I will take a deeper look at this…

At Loamics, we’re dealing with large datasets coming from sevreral databaes.

The challenge is whether we can perform joins with sufficient 
performance in real time. If not, we will have to go through a data 
preparation phase.


Looking closer, I noticed that drill as an http plugin, but does not 
implement join filter pushdown :


https://drill.apache.org/docs/http-storage-plugin/ 
<https://drill.apache.org/docs/http-storage-plugin/>


Do you plan to implement this feature ? will I have to create my own 
connector in order to achieve best performance ?


Also, I can see that the .msi ODBC driver has not been updated since 
October 2018 :


http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc_v1.5.1.1002/ 
<http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc_v1.5.1.1002/>


See you soon, and keep the good work !



*Damien DEOM*

Head of Lab

Mobile : +33 6 28 07 06 41

@ : damien.d...@loamics.com <mailto:damien.d...@loamics.com>

*www.loamics.com* <http://www.loamics.com/>**

*De :*James Turton 
*Envoyé :* mercredi 13 avril 2022 15:24
*À :* user@drill.apache.org; Damien Deom 
*Objet :* Re: Apache drill: how to build a custom ODBC/JDBC driver that 
performs rest api calls


Hi Damien

This is not quite the same thing but Drill does include a JDBC driver 
for its clients and it can query HTTP APIs (with a couple of caveats) 
through its HTTP storage plugin. The net effect is that without much 
legwork you can query data from an HTTP API by sending SQL statements 
over a JDBC connection to Drill. One of our community, Charles, built 
pretty much all of that functionality and is quite active on here so he 
may be able to guide you further.


James

On 2022/04/13 12:05, Damien Deom wrote:

Hi,

I’d like to know if Apache Drill allows to easily  build an
ODBC/JDBC driver that performs RPC calls to api’s we’re developing
in our company.

Solutions like like Progress allows to do that :

https://www.progress.com/tutorials/odbc/2-hour-tutorial-build-your-own-custom-odbc-driver-for-rest-api

<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.progress.com%2ftutorials%2fodbc%2f2-hour-tutorial-build-your-own-custom-odbc-driver-for-rest-api&c=E,1,Urgj-NuXu90dZgfTsgtEYFo8kQDL4jPOSzf51rvMzGrLrlyJtj7PBbriIvmAzedneFbKbDn5tsvKLzkEvjP9A3G3emncEtvtmZp-nbge7BgiuyX92RZqMvs,&typo=1>

Apache Calcites does that too, but does not implement ODBC

https://calcite.apache.org/avatica/docs/index.html
<https://calcite.apache.org/avatica/docs/index.html>

Thanks in advance,



*Damien DEOM*

Head of Lab

Mobile : +33 6 28 07 06 41

@ : damien.d...@loamics.com <mailto:damien.d...@loamics.com>

*www.loamics.com*

<https://linkprotect.cudasvc.com/url?a=http%3a%2f%2fwww.loamics.com%2f&c=E,1,csUnRmHisqLnkrioD06sE5UFvufpfbiKpODYMkS7KJ05LB3IVoyopISz1k7B7uO-Xj7U2YCKUs2MTReN-88o9OemD50LEljF61vSr-WZ&typo=1>



Re: Apache drill: how to build a custom ODBC/JDBC driver that performs rest api calls

2022-04-13 Thread James Turton

Hi Damien

This is not quite the same thing but Drill does include a JDBC driver 
for its clients and it can query HTTP APIs (with a couple of caveats) 
through its HTTP storage plugin. The net effect is that without much 
legwork you can query data from an HTTP API by sending SQL statements 
over a JDBC connection to Drill. One of our community, Charles, built 
pretty much all of that functionality and is quite active on here so he 
may be able to guide you further.


James

On 2022/04/13 12:05, Damien Deom wrote:


Hi,

I’d like to know if Apache Drill allows to easily  build an ODBC/JDBC 
driver that performs RPC calls to api’s we’re developing in our company.


Solutions like like Progress allows to do that : 
https://www.progress.com/tutorials/odbc/2-hour-tutorial-build-your-own-custom-odbc-driver-for-rest-api


Apache Calcites does that too, but does not implement ODBC

https://calcite.apache.org/avatica/docs/index.html

Thanks in advance,



*Damien DEOM*

Head of Lab

Mobile : +33 6 28 07 06 41

@ : damien.d...@loamics.com 

*www.loamics.com *



Re: query time comparison to several SQL engines

2022-04-07 Thread James Turton
What might be the biggest factor affecting running time here is that 
Drill's query execution is not fault tolerant while Spark's is.  The 
philosophy is different, Drill's says "when you're doing interactive 
analytics and a node dies, killing your query as it goes, just run the 
query again."


On 2022/04/07 16:11, Wes Peng wrote:


Hi Jacek,

Spark and Drill have no direct relations. But they have the similar 
architecture.


If you read the book "Learning Apache Drill" (I guess it's free 
online), chap 3 will give you Drill's SQL engine architecture:



It's quite similar to Spark's.

And the distributed implementation architecture is almost the same as 
Spark:



Though they are separated products, but have the similar 
implementation IMO.


No, I didn't use a statement optimized for Drill. It's just a common 
SQL statement.


The reason for drill is faster, I think it's b/c drill's direct mmap 
technology. It's more memory consumed than spark, so more faster.


Thanks.


Jacek Laskowski wrote:
Is this true that Drill is Spark or vice versa under the hood? If so, 
how is it possible that Drill is faster? What does Drill do to make 
the query faster? Could this be that you used a type of query Drill 
is optimized for? Just guessing and am really curious (not implying 
that one is better or worse than the other(s)).




Re: 1.21.0-SNAPSHOT: Schema change not currently supported for schemas with complex types

2022-04-06 Thread James Turton
It's good to attach a profile and log too. The better the instructions and 
observations that a ticket contains, the faster a developer can proceed to a 
fix.

On 6 April 2022 19:55:52 GMT+02:00, Daniel Clark  wrote:
>Hi James,
>
>Is a mongodb dump, along with the query sufficient, or should I also attach
>the profile and error log also?
>
>On Thu, Mar 31, 2022 at 11:02 AM James Turton 
>wrote:
>
>> Hi again Daniel
>>
>> Sorry everyone's so busy at the moment.  The best way to turn this into
>> something a developer will work on is going to be to make it a small
>> reproducible example in Jira ticket.  That should include some trivial
>> Mongo datasets that have the right data types to reveal the problem and a
>> query like the one below.
>>
>> Regards
>> James
>>
>> On 2022/03/09 21:21, Daniel Clark wrote:
>>
>> I'm attempting to run this mongo query that ran successfully in Drill 1.19
>> with the 1.21.0-SNAPSHOT build.
>>
>> SELECT `Elements_Efforts`.`EffortTypeName` AS `EffortTypeName`,
>>   `Elements`.`ElementSubTypeName` AS `ElementSubTypeName`,
>>   `Elements`.`ElementTypeName` AS `ElementTypeName`,
>>   `Elements`.`PlanID` AS `PlanID`
>> FROM `mongo.grounds`.`Elements` `Elements`
>>   INNER JOIN `mongo.grounds`.`Elements_Efforts` `Elements_Efforts` ON
>> (`Elements`.`_id` = `Elements_Efforts`.`_id`)
>> WHERE (`Elements`.`PlanID` = '1623263140')
>> GROUP BY `Elements_Efforts`.`EffortTypeName`,
>>   `Elements`.`ElementSubTypeName`,
>>   `Elements`.`ElementTypeName`,
>>   `Elements`.`PlanID`
>>
>> I'm getting this error message: UserRemoteException : SYSTEM ERROR:
>> RuntimeException: Schema change not currently supported for schemas with
>> complex types. I've attached both the log and the profile. Any tips, or
>> suggestions will be greatly appreciated.
>>
>>
>>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: A simple comparison for three SQL engines

2022-04-06 Thread James Turton

Nice, thanks for sharing!

On 2022/04/06 10:51, Wes Peng wrote:

Hello the list,

I have wrote a blog about the comparison to three sql engines (mysql, 
spark, drill). The blog url:


https://blog.cloudcache.net/139-2/

If you have found the wrong expression, please let me know.

Thanks.




Re: filter function in drill

2022-04-06 Thread James Turton
Noting that Drill does not currently support the EXCEPT operator, there 
are some different options.  I'd probably use


select * from `words.csv` where word not in (select stopword from 
`stopwords.csv`);


On 2022/04/06 09:41, Wes Peng wrote:

Hi James,

I have two table, one for words, another for stopwords.
for instance,

apache drill (dfs.pyh)> select * from `words.csv` limit 10;
+---+
|   WORD    |
+---+
| on    |
| jan   |
| 2022  |
| at    |
| wolfgang  |
| engelmann |
| via   |
| gdb   |
| now   |
| waits |
+---+

apache drill (dfs.pyh)> select * from `stopwords.csv` limit 10;
+-+
|  STOPWORD   |
+-+
| able    |
| about   |
| above   |
| abroad  |
| according   |
| accordingly |
| across  |
| actually    |
| adj |
| after   |
+-+

How to select words which are in table "words" but not in table 
"stopwords"?


In Spark I was using the filter function for this job. for instance,

rdd=sc.textFile("words.txt")
df=spark.createDataFrame(rdd.map(lambda x:(x,1)),["word","count"])
rdd2=sc.textFile("stopwords.txt")
stoplist=rdd2.collect()
df2=df.filter(~col("word").isin(stoplist))

But I am not sure how drill implements this.
Please help. Thanks.

regards.





Re: can't show table

2022-04-06 Thread James Turton
Are those all views reported by SHOW TABLES run on the local filesystem? 
 Views do get reported...


On 2022/04/06 08:46, Wes Peng wrote:

Thank you James.

I asked this b/c if the location is local file system the command show 
tables seems work.


apache drill> use dfs.pyh;
+--+-+
|  ok  |   summary   |
+--+-+
| true | Default schema changed to [dfs.pyh] |
+--+-+
1 row selected (0.664 seconds)

apache drill (dfs.pyh)> show tables;
+--++
| TABLE_SCHEMA | TABLE_NAME |
+--++
| dfs.pyh  | pplview    |
| dfs.pyh  | tt1    |
| dfs.pyh  | tt2    |
| dfs.pyh  | spamIP |
| dfs.pyh  | ttt    |
| dfs.pyh  | foodmart   |
+--++


And yes 'show files' does work on HDFS path.



On 2022/4/6 12:07, James Turton wrote:
In filesystem storage a dataset will only be reported by SHOW FILES 
unless you enable the Drill metastore and collect metadata from it 
using ANALYZE TABLE ... REFRESH METADATA.


On 2022/04/06 03:42, Wes Peng wrote:

Hello the list,

Today I created a table in HDFS space:

apache drill (hdfs.tmp)> create table tt1 as select * from 
`people.csvh` limit 1000;



I can run query to this table:

apache drill (hdfs.tmp)> select count(*) from tt1;
++
| EXPR$0 |
++
| 1000   |
++
1 row selected (0.217 seconds)


And the table located in HDFS by listing the directory:

$ hdfs dfs -ls /tmp/test/tt1
Found 1 items
-rw-rw-r--   3 pyh supergroup 100618 2022-04-06 09:37 
/tmp/test/tt1/0_0_0.parquet



But the `show table` command does not work:

apache drill (hdfs.tmp)> show tables;
+--++
| TABLE_SCHEMA | TABLE_NAME |
+--++
No rows selected (0.177 seconds)


Do you know why?
Thanks.




Re: Drill security for standalone implementation

2022-04-05 Thread James Turton

Hi Prabhakar

Yes it can.  You can by-and-large think of embedded Drill as a 
single-node Drill cluster.


On 2022/04/06 08:04, Prabhakar Bhosale wrote:

Hi Team,
All security related documentation available is considering drill cluster.
My question is , Can this security be implemented even in a standalone
drill environment? thx

REgards
Prabhakar





Re: can't show table

2022-04-05 Thread James Turton
In filesystem storage a dataset will only be reported by SHOW FILES 
unless you enable the Drill metastore and collect metadata from it using 
ANALYZE TABLE ... REFRESH METADATA.


On 2022/04/06 03:42, Wes Peng wrote:

Hello the list,

Today I created a table in HDFS space:

apache drill (hdfs.tmp)> create table tt1 as select * from 
`people.csvh` limit 1000;



I can run query to this table:

apache drill (hdfs.tmp)> select count(*) from tt1;
++
| EXPR$0 |
++
| 1000   |
++
1 row selected (0.217 seconds)


And the table located in HDFS by listing the directory:

$ hdfs dfs -ls /tmp/test/tt1
Found 1 items
-rw-rw-r--   3 pyh supergroup 100618 2022-04-06 09:37 
/tmp/test/tt1/0_0_0.parquet



But the `show table` command does not work:

apache drill (hdfs.tmp)> show tables;
+--++
| TABLE_SCHEMA | TABLE_NAME |
+--++
No rows selected (0.177 seconds)


Do you know why?
Thanks.




Re: which engine run the SQL query when connecting to outside data source

2022-04-05 Thread James Turton
Generally, both.  Drill will push down as much of the query as it can to 
Hive then apply the operations that it couldn't push down itself.  You 
can see a lot of what's going on by looking at the physical plan 
recorded in the query profile, e.g. in the web UI.


On 2022/04/05 13:41, Wes Peng wrote:

Hello the community,

Given the case I run a SQL query to hive table.
Will this statement run by drill, or by Hive, or both?

for example,

apache drill (hive.default)> show tables;
+--++
| TABLE_SCHEMA | TABLE_NAME |
+--++
| hive.default | hivemysql  |
+--++
1 row selected (5.613 seconds)

apache drill (hive.default)> select * from hivemysql;
++---++
| id |   name    |    born    |
++---++
| 1  | john doe  | 1999-02-02 |
| 2  | lisa holt | 1977-01-01 |
++---++
2 rows selected (3.979 seconds)

Thank you.




Re: question about ANY data type

2022-04-05 Thread James Turton
There are implicit type casting rules that allow Drill to cast 
automatically between certain types.  It is not special in this respect, 
you will find you can write the same sort of WHERE clause in many SQL 
engines.


On 2022/04/05 09:26, Wes Peng wrote:

thanks James.
For the queries below, I got the birth_date as varchar type, but why 
this column can be compared as date type?



apache drill (dfs.pyh)> select typeOf(birth_date) from foodmart limit 1 ;
+-+
| EXPR$0  |
+-+
| VARCHAR |
+-+
1 row selected (0.234 seconds)

apache drill (dfs.pyh)> select * from foodmart where birth_date > 
'1970-01-01';
+-++-++-+-+ 

| employee_id | full_name  | position_title  | 
birth_date | salary  |   education_level   |
+-++-++-+-+ 

| 9   | Brenda Blumberg    | Store Manager   | 
1979-06-23 | 17000.0 | Graduate Degree |
| 12  | Jewel Creek    | Store Manager   | 
1971-10-18 | 8500.0  | Graduate Degree |
| 13  | Peggy Medina   | Store Manager   | 
1975-10-12 | 15000.0 | Bachelors Degree    |



Regards



On 2022/4/5 2:37 下午, James Turton wrote:

select typeOf(employee_id) from foodmart limit 1




Re: question about ANY data type

2022-04-04 Thread James Turton
I think it's a Calcite data type that is used as a placeholder by Drill 
in late binding scenarios, i.e. when Drill doesn't yet know the actual 
type.  If you run


select typeOf(employee_id) from foodmart limit 1

you should see the resolved data type.  This data type is the one that 
will be matched against the AVG function.



On 2022/04/05 07:09, Wes Peng wrote:

what does ANY data type mean?

apache drill (dfs.pyh)> desc foodmart;
+-+---+-+
|   COLUMN_NAME   | DATA_TYPE | IS_NULLABLE |
+-+---+-+
| employee_id | ANY   | YES |
| full_name   | ANY   | YES |
| position_title  | ANY   | YES |
| birth_date  | ANY   | YES |
| salary  | ANY   | YES |
| education_level | ANY   | YES |
+-+---+-+
6 rows selected (0.534 seconds)


I saw this type can be used in aggregate functions. doesn't it need a 
cast() translation?


apache drill (dfs.pyh)> select avg(salary) as avgsal from foodmart;
++
|   avgsal   |
++
| 4019.6017316017314 |
++
1 row selected (0.452 seconds)


Thanks




Re: Question on hive storage plugin

2022-04-03 Thread James Turton

Not at all, thanks for sharing your solution.

On 2022/04/04 03:37, Wes Peng wrote:
I have resolved this issue. Just b/c I was using hive2 along with 
drill-1.20, which causes this problem. When I switched to hive3, the 
issue was resolved.


sorry for bothering.


On 2022/4/3 8:56, Wes Peng wrote:
  ERROR o.a.h.h.metastore.RetryingHMSHandler - HMSHandler Fatal 
error: MetaException(message:Version information not found in 
metastore.)




Re: 1.21.0-SNAPSHOT: Schema change not currently supported for schemas with complex types

2022-03-31 Thread James Turton

Hi again Daniel

Sorry everyone's so busy at the moment.  The best way to turn this into 
something a developer will work on is going to be to make it a small 
reproducible example in Jira ticket.  That should include some trivial 
Mongo datasets that have the right data types to reveal the problem and 
a query like the one below.


Regards
James

On 2022/03/09 21:21, Daniel Clark wrote:
I'm attempting to run this mongo query that ran successfully in Drill 
1.19 with the 1.21.0-SNAPSHOT build.


SELECT `Elements_Efforts`.`EffortTypeName` AS `EffortTypeName`,
  `Elements`.`ElementSubTypeName` AS `ElementSubTypeName`,
  `Elements`.`ElementTypeName` AS `ElementTypeName`,
  `Elements`.`PlanID` AS `PlanID`
FROM `mongo.grounds`.`Elements` `Elements`
  INNER JOIN `mongo.grounds`.`Elements_Efforts` `Elements_Efforts` ON 
(`Elements`.`_id` = `Elements_Efforts`.`_id`)

WHERE (`Elements`.`PlanID` = '1623263140')
GROUP BY `Elements_Efforts`.`EffortTypeName`,
  `Elements`.`ElementSubTypeName`,
  `Elements`.`ElementTypeName`,
  `Elements`.`PlanID`

I'm getting this error message: UserRemoteException : SYSTEM ERROR: 
RuntimeException: Schema change not currently supported for schemas 
with complex types. I've attached both the log and the profile. Any 
tips, or suggestions will be greatly appreciated.


Meetup tomorrow!

2022-03-31 Thread James Turton

Hey all

Just a reminder that the first Friday of April lands tomorrow, bang on 1 
April!  I hope you can join us for another meetup.  So far in the agenda 
we have


- Introduction to the Drill test framework by Anton

let me know if you'd like to add anything else.

Regards
James


Re: Application for release manager of the next Drill release

2022-03-23 Thread James Turton

Hi Jingchuan Hu!

It's a long way off and I will leave this planning to the "elders" like 
Cong Luo and Charles but I do want to say that I will certainly offer 
you my support if you do manage the next release, since I have some 
experience now.


Regards
James

On 2022/03/20 17:15, Jingchuan Hu wrote:

Hi team,

I am here to apply for the role as the release manager of the next drill
release.

As a newcomer to the Drill community. I tried to help the community to get
known by more users through the Drill web-site Chinese version setup.
Committed several PRs for Drill bug fix. Also, helped to summarize the
keynotes of Drill online meetup for our community members to easily "Async"
with Drill activities. And I want to keep those contributions with better
quality in the future.

Over the six months after I joined the community, I am deeply impressed by
our community member's dedication and intelligence. No matter whether I
could be the release manager, I hope that I can grow with Drill.

If there are some suggestions for me, I would really appreciate it.

Sincerely,
Jingchuan





Re: 1.21.0-SNAPSHOT: Schema change not currently supported for schemas with complex types

2022-03-20 Thread James Turton

It's also surprising and worrying that this query took 24 minutes to plan.

On 2022/03/09 21:21, Daniel Clark wrote:
I'm attempting to run this mongo query that ran successfully in Drill 
1.19 with the 1.21.0-SNAPSHOT build.


SELECT `Elements_Efforts`.`EffortTypeName` AS `EffortTypeName`,
  `Elements`.`ElementSubTypeName` AS `ElementSubTypeName`,
  `Elements`.`ElementTypeName` AS `ElementTypeName`,
  `Elements`.`PlanID` AS `PlanID`
FROM `mongo.grounds`.`Elements` `Elements`
  INNER JOIN `mongo.grounds`.`Elements_Efforts` `Elements_Efforts` ON 
(`Elements`.`_id` = `Elements_Efforts`.`_id`)

WHERE (`Elements`.`PlanID` = '1623263140')
GROUP BY `Elements_Efforts`.`EffortTypeName`,
  `Elements`.`ElementSubTypeName`,
  `Elements`.`ElementTypeName`,
  `Elements`.`PlanID`

I'm getting this error message: UserRemoteException : SYSTEM ERROR: 
RuntimeException: Schema change not currently supported for schemas 
with complex types. I've attached both the log and the profile. Any 
tips, or suggestions will be greatly appreciated.


Re: 1.21.0-SNAPSHOT: Schema change not currently supported for schemas with complex types

2022-03-20 Thread James Turton

Hi Daniel

Please refresh my memory, have we tried to run this query with all push 
downs disabled?


Regards
James

On 2022/03/09 21:21, Daniel Clark wrote:
I'm attempting to run this mongo query that ran successfully in Drill 
1.19 with the 1.21.0-SNAPSHOT build.


SELECT `Elements_Efforts`.`EffortTypeName` AS `EffortTypeName`,
  `Elements`.`ElementSubTypeName` AS `ElementSubTypeName`,
  `Elements`.`ElementTypeName` AS `ElementTypeName`,
  `Elements`.`PlanID` AS `PlanID`
FROM `mongo.grounds`.`Elements` `Elements`
  INNER JOIN `mongo.grounds`.`Elements_Efforts` `Elements_Efforts` ON 
(`Elements`.`_id` = `Elements_Efforts`.`_id`)

WHERE (`Elements`.`PlanID` = '1623263140')
GROUP BY `Elements_Efforts`.`EffortTypeName`,
  `Elements`.`ElementSubTypeName`,
  `Elements`.`ElementTypeName`,
  `Elements`.`PlanID`

I'm getting this error message: UserRemoteException : SYSTEM ERROR: 
RuntimeException: Schema change not currently supported for schemas 
with complex types. I've attached both the log and the profile. Any 
tips, or suggestions will be greatly appreciated.


Re: insufficient memory

2022-03-12 Thread James Turton
That's not much memory for Drill but you can try setting 
DRILLBIT_MAX_PROC_MEM to 2GiB.


https://drill.apache.org/docs/configuring-drill-memory/

If you have a swap partition in the VM you could allocate more memory 
than physical RAM, but be aware that system performance can become 
significantly impacted by the IO rates of the device where you host the 
swap space.


On 2022/03/11 09:02, Bitfox wrote:

Hello

My VM has only 4gb memory, 2gb free for use.
When I run drill-embedded i got the error:

OpenJDK 64-Bit Server VM warning: INFO:
os::commit_memory(0x0007, 4294967296, 0) failed; error='Not
enough space' (errno=12)

#

# There is insufficient memory for the Java Runtime Environment to continue.

# Native memory allocation (mmap) failed to map 4294967296 bytes for
committing reserved memory.



How can I run it successfully?


Thanks





Apache Drill March Community Meetup recording

2022-03-04 Thread James Turton




 Forwarded Message 
Subject:Cloud Recording - Apache Drill Community Meetup is now available
Date:   Fri, 04 Mar 2022 17:24:33 + (UTC)
From:   Zoom 
To: ja...@datadistillr.com



Zoom Logo <https://zoom.us/?zcid=1640>

Hi James Turton,
Your cloud recording is now available.
Topic: Apache Drill Community Meetup
Date: Mar 4, 2022 07:55 AM Pacific Time (US and Canada)
	View Detail 
<https://datadistillr.zoom.us/recording/detail?meeting_id=ucFkPnDLRFWzZb4xazF3Ng%3D%3D> 
Share 
<https://datadistillr.zoom.us/recording/detail?meeting_id=ucFkPnDLRFWzZb4xazF3Ng%3D%3D&show_share=true> 
	


You can copy the recording information below and share with others

https://datadistillr.zoom.us/rec/share/GLwjAISSWyIF45ZxZUKb_JZ3cBLi6vEEGzLy8SIrzwmzkzjAJuwlUwr_h6jaLsnJ.ecfcFMA1NwTcmEd7 
Passcode: V0xT0Oc=


Twitter <https://twitter.com/zoom_us> 		LinkedIn 
<https://www.linkedin.com/company/zoom-video-communications/> 		Blog 
<http://blog.zoom.us/>


+1.888.799.9666 
© 2022 Zoom - All Rights Reserved

Visit zoom.us <http://zoom.us>
55 Almaden Blvd
San Jose, CA 95113 
<https://www.google.com/maps/place/55+Almaden+Blvd,+San+Jose,+CA+95113/@37.3328541,-121.897097,17z/data=!3m1!4b1!4m5!3m4!1s0x808fcca40adf3cb7:0x5a2d33d3593e0a33!8m2!3d37.3328541!4d-121.8949083> 



Re: March Community Meetup

2022-03-03 Thread James Turton
Tengfei has added offered to share the architecture of their realtime 
data warehouse built on Drill and I've learned there is also a chance 
that Vitalii and Anton will be able to connect.  See you this time tomorrow!



On 2022/03/03 10:07, James Turton wrote:
The next meetup comes around tomorrow.  I have one topic for 
discussion: "Who wants a 1.21 when there is a 2.0 that is certainly 
reachable, even if it is minimal one that has a major theme of 
cleaning out obsolete stuff?"


It's also a deeply disturbing time in world history right now, with 
direct impact on this community, and it's hard to act like things are 
in any way normal.  So I'll take guidance on whether we should skip 
the March meetup.


Thanks
James





March Community Meetup

2022-03-03 Thread James Turton
The next meetup comes around tomorrow.  I have one topic for discussion: 
"Who wants a 1.21 when there is a 2.0 that is certainly reachable, even 
if it is minimal one that has a major theme of cleaning out obsolete stuff?"


It's also a deeply disturbing time in world history right now, with 
direct impact on this community, and it's hard to act like things are in 
any way normal.  So I'll take guidance on whether we should skip the 
March meetup.


Thanks
James



[ANNOUNCE] Apache Drill 1.20.0 Released

2022-02-25 Thread James Turton

 Note from the release manager.
As we announce Drill 1.20 to the world I'd like to point to the fact 
that this release, and all of Drill, is full of valuable contributions 
from talented Ukrainian developers whom many of us know personally.  On 
behalf of the Drill community may I express once more our deepest 
concern for their well-being during the unfolding crisis in their homeland.



On behalf of the Apache Drill community, I am happy to announce the 
release of Apache Drill 1.20.0.


Drill is an Apache open-source SQL query engine for Big Data exploration.
Drill is designed from the ground up to support high-performance analysis
on the semi-structured and rapidly evolving data coming from modern Big
Data applications, while still providing the familiarity and ecosystem of
ANSI SQL, the industry-standard query language. Drill provides
plug-and-play integration with existing Apache Hive and Apache HBase
deployments.

For information about Apache Drill, and to get involved, visit the 
project website [1].


Total of 109 JIRA's are resolved in this release of Drill with the following
new features and improvements [2]:

[DRILL-1282] - Add read and write support for Parquet v2
[DRILL-7985] - Support Mongo aggregate, union, project, limit, sort 
pushdowns

[DRILL-8027] - Format plugin for Apache Iceberg
[DRILL-8073] - Add support for persistent table and storage aliases
[DRILL-8107] - Hadoop2 backport Maven profile
[DRILL-7969] - Add support for reading and writing Parquet files using 
Brotli, LZ4 and Zstandard codecs

[DRILL-7971] - Support Elasticsearch authentication
[DRILL-7988] - Add credentials provider support for API connections in 
HTTP plugin

[DRILL-7995] - Add ability to query OCI OS
[DRILL-8005] - Add Writer to JDBC Storage Plugin
[DRILL-8011] - Add Dropbox File System to Drill
[DRILL-8022] - Add Provided Schema Support for Excel Reader
[DRILL-8028] - Add PDF Format Plugin
[DRILL-8047] - Add a custom authn provider for HashiCorp Vault
[DRILL-8054] - Add SAS Format Plugin
[DRILL-8056] - Add OAuth2 Support for HTTP Rest Plugin
...

For the full list please see release notes [3].

The binary and source artifacts are available here [4].

Thanks to everyone in the community who contributed to this release!

1. https://drill.apache.org/
2. https://drill.apache.org/blog/2022/02/25/drill-1.20-released/
3. https://drill.apache.org/docs/apache-drill-1-20-0-release-notes/
4. https://drill.apache.org/download/


Re: [RESULT] [VOTE] Release Apache Drill 1.20.0 - RC5

2022-02-25 Thread James Turton
Tags and branches have been pushed, so I hereby unfreeze drill/master.  
The official release announcement will arrive once I see the download 
mirrors are up to date.


Looking back I think we should consider making the Hadoop 2 build 
something that is supported but must be built by end users, just because 
I didn't find any really natural way to include it in our release 
process.  It was certainly doable, but it's a little clunky.  Or maybe 
there are Maven plugin secrets that I don't know... something we can 
discuss before the next one.


On 2022/02/25 09:41, James Turton wrote:
The vote passes. Thanks to everyone who has tested the six(!) release 
candidates over last twenty(!) days and given their comments and 
votes. Final tally:


3x +1 (binding): Cong Luo, Charles, James

2x +1 (non-binding): Jinfeng Ni, Christian

No 0s or -1s.

I'll start process for pushing the release artifacts and send an 
announcement once propagated.


Kind regards
James




[RESULT] [VOTE] Release Apache Drill 1.20.0 - RC5

2022-02-24 Thread James Turton
The vote passes. Thanks to everyone who has tested the six(!) release 
candidates over last twenty(!) days and given their comments and votes. 
Final tally:


3x +1 (binding): Cong Luo, Charles, James

2x +1 (non-binding): Jinfeng Ni, Christian

No 0s or -1s.

I'll start process for pushing the release artifacts and send an 
announcement once propagated.


Kind regards
James


[VOTE] Release Apache Drill 1.20.0 - RC5

2022-02-22 Thread James Turton

Hi all

I'd like to propose the sixth release candidate (RC5) of Apache Drill, 
version 1.20.0 which differs from the previous RC in the following.


DRILL-8144: Cannot launch Drill 1.20 RC 4 on Windows (#2470)
DRILL-8143: Error querying json with $date field (#2469)
DRILL-8142: SAS Reader Returns NPE #2468

The release candidate covers a total of 122 resolved JIRAs [1]. Thanks 
to everyone who contributed to this release.


The tarball artifacts are hosted at [2][3] and the maven artifacts are 
hosted at [4].


This release candidate is based on commits 
d19878973ef6723250d231258f470340863ddc23 and 
20ff3778fd1a046272426178aeca671ed822d970 located at [5][6].


Please download and try out the release.

[ ] +1
[ ] +0
[ ] -1

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301&projectId=12313820

[2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-rc5/
[3] 
https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-hadoop2-rc5/ 
(Hadoop 2 build)

[4] https://repository.apache.org/content/repositories/orgapachedrill-1095/
[5] https://github.com/jnturton/drill/commits/drill-1.20.0
[6] https://github.com/jnturton/drill/commits/drill-1.20.0-hadoop2 
(Hadoop 2 build)




Re: [VOTE] Release Apache Drill 1.20.0 - RC4

2022-02-22 Thread James Turton
RC5 has just been uploaded.  I'll add the Hadoop 2 build and send the 
usual email in the morning.  There are three fixes relating to Drill on 
Windows, ISO-8601 timestamps in JSON and SAS files with null values.


https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-rc5/

On 2022/02/17 20:53, James Turton wrote:

Hi all

I'd like to propose the fifth release candidate (RC4) of Apache Drill, 
version 1.20.0 which differs from the previous RC in the following.


DRILL-8139: Parquet CodecFactory thread safety bug  (#2463)
DRILL-8134: Cannot query Parquet INT96 columns as timestamps (#2460)
DRILL-8122: Change kafka metadata obtaining due to KAFKA-5697 (#2456)
DRILL-8137: Prevent reading union inputs after cancellation request 
(#2462)


The release candidate covers a total of 117 resolved JIRAs [1]. Thanks 
to everyone who contributed to this release.


The tarball artifacts are hosted at [2][3] and the maven artifacts are 
hosted at [4][5].


This release candidate is based on commits 
753bff39d8dd08eaa1273eadc20175d34a87e044 and 
9955d082bcdba401666799f49a6cd3c3f996af97 located at [6][7].


Please download and try out the release.

[ ] +1
[ ] +0
[ ] -1

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301&projectId=12313820

[2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-rc4/
[3] 
https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-hadoop2-rc4/ 
(Hadoop 2 build)
[4] 
https://repository.apache.org/content/repositories/orgapachedrill-1094/
[5] 
https://repository.apache.org/content/repositories/orgapachedrill-1095/ 
(Hadoop 2 build)

[6] https://github.com/jnturton/drill/commits/drill-1.20.0
[7] https://github.com/jnturton/drill/commits/drill-1.20.0-hadoop2 
(Hadoop 2 build)




  1   2   >