from:"Vitalii Diravka"

Re: Json Schma change error

2021-11-22 Thread Vitalii Diravka

Hi
Prabhakar Bhosaale.

Could you provide the test case with a data source to reproduce? I am going
to try it with a new JSON loader

Kind regards
Vitalii

On Sun, Sep 26, 2021 at 4:12 PM Prabhakar Bhosaale 
wrote:

> Thanks Luoc. will check on it.
>
>
> Regards
> Prabhakar
>
> On Sun, Sep 26, 2021 at 6:36 PM luoc  wrote:
>
> >
> > Hello Prabhakar,
> >   There may not be such a detailed docs. But you can review the docs of
> > Drill internals and architecture.
> >   https://github.com/paul-rogers/drill/wiki#internals-topics
> >
> >
> > > 在 2021年9月25日，18:51，luoc  写道：
> > >
> > > Hello Prabhakar,
> >
>

Re: [DISCUSSION] Roles and Privileges, Security, Secrets

2021-01-21 Thread Vitalii Diravka

The goal to improve Identity, Authentication, Authorization in Drill.
The specific cases are to manage Drill users and their permissions better,
to give them ability to add/edit some Storage Plugin configs, to set System
options for itself only.
And possibly some separate UI page can be added for admin for that purpose.
Also improving some gaps should be done, for instance storing user/pass in
plain text in plugin configs.

I agree about complexity magnified around the kerberos. Possibly
implementing  the separate security mechanism can be considered.
I just know that Vault can work with kerberos, so looks like it should be
easier to integrate it to Drill. And any Drill
*Identity, Authentication, Authorization* functionality can be built with
this tool too.
I'll investigate SPIFFE with (or instead of) Vault:
https://spiffe.io/docs/latest/spire/understand/comparisons/

Kind regards
Vitalii

On Thu, Jan 21, 2021 at 2:39 AM Ted Dunning  wrote:

> I think that pushing too much of this kind of authentication and
> authorization logic into Drill has a large complexity risk. Anything to do
> with kerberos magnifies that complexity.
>
> I also think that it is a mistake to depend on user identity if
> authorization tokens are likely to need to be embedded in scripts and such.
> Identity that is inherited can work that way, but identity that has to be
> given to a script should use an alternative intended for workload
> authorization such as SPIFFE.
>
> Is there a reason that most or all of this couldn't be handled by storing
> the configuration in files? That would allow file permissions to naturally
> allow or disallow these operations.
>
> Also, what are the specific goals here?
>
>
>
> On Wed, Jan 20, 2021 at 3:34 PM Vitalii Diravka 
> wrote:
>
> > Hi Dev and User,
> >
> > Drill has a very important feature - Roles and Privileges [1], but it has
> > really weak functionality. There are only two roles (admin and user) and
> > admin can't really give any user permissions to set query options for all
> > their sessions or to allow configure storage plugin in other manner, etc.
> >
> > I think it is necessary to make this functionality broader: introduce a
> > middle layer user-system options, the ability to change some configs of
> > Storage Plugins for users, possibly permission for UDF creation etc. The
> > main thing that this functionality requires good support for management
> of
> > users and their secrets (credentials).
> >
> > There is a very good tool  - Hashicorp Vault [2], which can provide
> Drill a
> > mechanism to store secrets in a safe manner, to deliver the secrets via
> > tokens mechanism to the proper users and it can be integrated with
> Kerberos
> > and Spnego.
> >
> > What do you think? Can we integrate Drill with Vault or no, what
> additional
> > pros and cons of this decision? If it is a good decision I can start
> > preparing design for this functionality
> >
> >
> > [1] https://drill.apache.org/docs/roles-and-privileges/
> > [2] https://www.vaultproject.io/
> >
> > Kind regards
> > Vitalii
> >
>

[DISCUSSION] Roles and Privileges, Security, Secrets

2021-01-20 Thread Vitalii Diravka

Hi Dev and User,

Drill has a very important feature - Roles and Privileges [1], but it has
really weak functionality. There are only two roles (admin and user) and
admin can't really give any user permissions to set query options for all
their sessions or to allow configure storage plugin in other manner, etc.

I think it is necessary to make this functionality broader: introduce a
middle layer user-system options, the ability to change some configs of
Storage Plugins for users, possibly permission for UDF creation etc. The
main thing that this functionality requires good support for management of
users and their secrets (credentials).

There is a very good tool  - Hashicorp Vault [2], which can provide Drill a
mechanism to store secrets in a safe manner, to deliver the secrets via
tokens mechanism to the proper users and it can be integrated with Kerberos
and Spnego.

What do you think? Can we integrate Drill with Vault or no, what additional
pros and cons of this decision? If it is a good decision I can start
preparing design for this functionality


[1] https://drill.apache.org/docs/roles-and-privileges/
[2] https://www.vaultproject.io/

Kind regards
Vitalii

Re: Slow query execution in Drill Embedded

2021-01-20 Thread Vitalii Diravka

Hello Jonathan,

Did you try to execute both logical and physical plans? Does the query
involve any aggregation, join or sort operators?
Could you provide the query here, please? Did you try to exclude some
fields from the query (to identify what fields cause the delay).

Kind regards
Vitalii

On Sat, Jan 2, 2021 at 8:55 AM Jonathan Shraga
 wrote:

> Hello,
> Release: 1.18
> Evaluating Drill for a large Business Reporting project.Using
> drill-embedded to query Parquet data hosted in s3.Query execution on small
> objects/tables (<200K) takes about 9 sec.Looking at query execution
> statistics, about 7-8 seconds are spent on query planning alone.Enabled the
> metastore (Iceberg), ANALYZED and REFRESH METADATA for the table.Browing
> the metadata store directory shows the table's metadata files (parquet
> format), andquering INFORMATION_SCHEMA works as expected.
> However, query execution remains slow and running EXPLAIN still takes 7-8
> seconds.
> Any tips will be greatly appreciated.
> - Jonathan

Re: [ANNOUNCE] New Committer: Denys Ordynskiy

2020-01-03 Thread Vitalii Diravka

Congrats Denys! Glad to know you became a committer! Happy New Year :)

Kind regards
Vitalii


On Fri, Jan 3, 2020 at 10:11 AM Ellen Friedman 
wrote:

> Great way to start 2020! Congratulations Denys and thank you for all your
> contributions to Drill.
>
> Best wishes,
> Ellen
>
> On Mon, Dec 30, 2019 at 4:25 AM Arina Ielchiieva  wrote:
>
> > The Project Management Committee (PMC) for Apache Drill has invited Denys
> > Ordynskiy to become a committer, and we are pleased to announce that he
> has
> > accepted.
> >
> > Denys has been contributing to Drill for more than a year. He did many
> > contributions as a QA, he found, tested and verified important bugs and
> > features. Recently he has actively participated in Hadoop 3 migration
> > verification and actively tested current and previous releases. He also
> > contributed into drill-test-framework to automate Drill tests.
> >
> > Welcome Denys, and thank you for your contributions!
> >
> > - Arina
> > (on behalf of Drill PMC)
> >
>

Re: [ANNOUNCE] New Committer: Anton Gozhyi

2019-07-29 Thread Vitalii Diravka

My congratulations Anton! You deserved it!

Kind regards
Vitalii


On Mon, Jul 29, 2019 at 1:13 PM Arina Ielchiieva  wrote:

> Congratulations Anton! Thanks for your contributions.
>
> Kind regards,
> Arina
>
> On Mon, Jul 29, 2019 at 12:55 PM Павел Семенов  >
> wrote:
>
> > Congratulations Anton ! Well done.
> >
> > пн, 29 июл. 2019 г. в 12:54, Bohdan Kazydub :
> >
> > > Congratulations Anton!
> > >
> > > On Mon, Jul 29, 2019 at 12:44 PM Igor Guzenko <
> > ihor.huzenko@gmail.com>
> > > wrote:
> > >
> > > > Congratulations Anton!
> > > >
> > > > On Mon, Jul 29, 2019 at 12:09 PM denysord88 
> > > wrote:
> > > >
> > > > > Congratulations Anton! Well deserved!
> > > > >
> > > > > On 07/29/2019 12:02 PM, Volodymyr Vysotskyi wrote:
> > > > > > The Project Management Committee (PMC) for Apache Drill has
> invited
> > > > Anton
> > > > > > Gozhyi to become a committer, and we are pleased to announce that
> > he
> > > > has
> > > > > > accepted.
> > > > > >
> > > > > > Anton Gozhyi has been contributing to Drill for more than a year
> > and
> > > a
> > > > > > half. He did significant contributions as a QA, including
> reporting
> > > > > > non-trivial issues and working on automation of Drill tests. All
> > the
> > > > > issues
> > > > > > reported by Anton have a clear description of the problem, steps
> to
> > > > > > reproduce and expected behavior. Besides contributions as a QA,
> > Anton
> > > > > made
> > > > > > high-quality fixes into Drill.
> > > > > >
> > > > > > Welcome Anton, and thank you for your contributions!
> > > > > >
> > > > > > - Volodymyr
> > > > > > (on behalf of Drill PMC)
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > *Kind regards,*
> > *Pavel Semenov*
> >
>

Re: [ANNOUNCE] New Committer: Bohdan Kazydub

2019-07-15 Thread Vitalii Diravka

Congrats Bohdan! Well deserved!

Kind regards
Vitalii


On Mon, Jul 15, 2019 at 6:48 PM Paul Rogers 
wrote:

> Congrats Bohdan!
> - Paul
>
>
>
> On Monday, July 15, 2019, 01:08:04 AM PDT, Arina Ielchiieva <
> ar...@apache.org> wrote:
>
>  The Project Management Committee (PMC) for Apache Drill has invited Bohdan
> Kazydub to become a committer, and we are pleased to announce that he has
> accepted.
>
> Bohdan has been contributing into Drill for more than a year. His
> contributions include
> logging and various functions handling improvements, planning optimizations
> and S3 improvements / fixes. His recent work includes Calcite 1.19 / 1.20
> [DRILL-7200] and implementation of canonical Map [DRILL-7096].
>
> Welcome Bohdan, and thank you for your contributions!
>
> - Arina
> (on behalf of the Apache Drill PMC)
>

Re: [VOTE] Apache Drill Release 1.16.0 - RC0

2019-04-19 Thread Vitalii Diravka

Arina/Anton thanks for catching DRILL-7186

Sorabh, agree with you DRILL-7190 should be resolved and included to 1.16.0
Drill release.
I will provide PR shortly.

Kind regards
Vitalii


On Fri, Apr 19, 2019 at 8:05 PM SorabhApache  wrote:

> Hi Anton/Arina,
> Thanks for finding the issue and fixing it. I have some follow-up questions
> for the original change which I have posted in DRILL-7186
> , would be great to
> clarify on those as well.
>
> Thanks,
> Sorabh
>
> On Fri, Apr 19, 2019 at 9:54 AM Arina Yelchiyeva <
> arina.yelchiy...@gmail.com>
> wrote:
>
> > Fix is ready: https://github.com/apache/drill/pull/1757
> >
> > > On Apr 19, 2019, at 2:36 PM, Anton Gozhiy  wrote:
> > >
> > > Reported a regression:
> > > https://issues.apache.org/jira/browse/DRILL-7186
> > >
> > > On Fri, Apr 19, 2019 at 2:10 AM Bob Rudis  wrote:
> > >
> > >> Thx Sorabh!
> > >>
> > >> On Thu, Apr 18, 2019 at 16:23 SorabhApache  wrote:
> > >>
> > >>> Hi Bob,
> > >>> With protobuf change both JDBC and ODBC will need to be updated but
> for
> > >>> 1.16 release this change was reverted since it will take some time to
> > >>> prepare for drivers with latest protobuf versions. In the original
> JIRA
> > >>> there is a comment stating that the commit is reverted on 1.16 branch
> > and
> > >>> will add the commit id once the release branch is finalized.
> > >>>
> > >>> JIRA: https://issues.apache.org/jira/browse/DRILL-6642
> > >>> Commit that revert's the change [1]:
> > >>> 6eedd93dadf6d7d4f745f99d30aee329976c2191
> > >>>
> > >>> [1]: https://github.com/sohami/drill/commits/drill-1.16.0
> > >>>
> > >>> Thanks,
> > >>> Sorabh
> > >>>
> > >>> On Thu, Apr 18, 2019 at 1:12 PM Bob Rudis  wrote:
> > >>>
> >  Q abt the RC (and eventual full release): Does
> >  https://issues.apache.org/jira/browse/DRILL-5509 mean that the ODBC
> >  drivers will need to be updated to avoid warning messages (and
> >  potential result set errors) as the was the case with a previous
> >  release or is this purely (as I read the ticket) a "make maven less
> >  unhappy" and no underlying formats are changing? If it does mean the
> >  ODBC drivers need changing, any chance there's a way to ensure the
> >  provider of those syncs it closer to release than the last time?
> > 
> >  On Thu, Apr 18, 2019 at 3:06 PM SorabhApache 
> > >> wrote:
> > >
> > > Hi Drillers,
> > > I'd like to propose the first release candidate (RC0) for the
> Apache
> >  Drill,
> > > version 1.16.0.
> > >
> > > The RC0 includes total of 211 resolved JIRAs [1].
> > > Thanks to everyone for their hard work to contribute to this
> release.
> > >
> > > The tarball artifacts are hosted at [2] and the maven artifacts are
> >  hosted
> > > at [3].
> > >
> > > This release candidate is based on commit
> > > 61cd2b779d85ff1e06947327fca2e076994796b5 located at [4].
> > >
> > > Please download and try out the release.
> > >
> > > The vote ends at 07:00 PM UTC (12:00 PM PDT, 10:00 PM EET, 12:30 AM
> > > IST(next day)), Apr 23rd, 2019
> > >
> > > [ ] +1
> > > [ ] +0
> > > [ ] -1
> > >
> > > Here is my vote: +1
> > >
> > >  [1]
> > >
> > 
> > >>>
> > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12344284
> > >  [2] http://home.apache.org/~sorabh/drill/releases/1.16.0/rc0/
> > >  [3]
> > >
> > >>>
> > https://repository.apache.org/content/repositories/orgapachedrill-1066/
> > >  [4] https://github.com/sohami/drill/commits/drill-1.16.0
> > >
> > > Thanks,
> > > Sorabh
> > 
> > >>>
> > >>
> > >
> > >
> > > --
> > > Sincerely, Anton Gozhiy
> > > anton5...@gmail.com
> >
> >
>

Re: Drill and OData

2019-04-02 Thread Vitalii Diravka

Hi Kevin!

It is feasible. Any Drill Storage or Format Plugin can be as an example for
you.
Somewhere I saw initial work for Drill REST Plugin. Earlier Charles Givre
posted an email about REST plugin.
I will add info from that mail here.
Charles could share your findings in this area with Kevin?

[DISCUSS]: Storage Plugin Development

All,
> I want to thank everyone for the discussion on Drill Developers Day about
> storage plugin development.  I was thinking about this, and there is a
> storage plugin out there which would enable Drill to query REST interfaces
> that return JSON data.  I’d like to propose that we as a community complete
> work on this plugin and get it incorporated into Drill.
> My logic is that REST APIs are easy to understand—you don’t have to have
> an in depth understanding of their internals to get data—and that this
> could be a “template" for future storage plugin development.
> Clearly, I would like for us to develop simpler ways of developing storage
> plugins, however, I think completing this might be a good first step so
> that more people can understand the mechanisms of how storage plugins
>  work.
> Thoughts?


Kind regards
Vitalii


On Tue, Apr 2, 2019 at 8:07 PM Kevin D. Falkenstein <
kevin.falkenst...@t-online.de> wrote:

> Dear developers,
>
> Is there a plan to add OData suppot for Drill? If not, would you consider
> this to be technically feasible? Would it be hard to develop such a plug-in
> for someone with medium Java skills?
>
> Looking forward to your answer.
>
> Kind regards
> Kevin
>

Re: Cyrillic names for views

2019-03-05 Thread Vitalii Diravka

Hi Nick,

Looks like Cyrillic View name works fine for me:

0: jdbc:drill:zk=local> create view МояВьюха as select employee_id,
full_name from cp.`employee.json` limit 2;
+---+---+
|  ok   |  summary  |
+---+---+
| true  | View 'МояВьюха' created successfully in 'dfs.tmp' schema  |
+---+---+
1 row selected (0.125 seconds)
0: jdbc:drill:zk=local> describe МояВьюха;
+--++--+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--++--+
| employee_id  | ANY| YES  |
| full_name| ANY| YES  |
+--++--+
2 rows selected (0.241 seconds)
0: jdbc:drill:zk=local> select * from МояВьюха;
+--+--+
| employee_id  |full_name |
+--+--+
| 1| Sheri Nowmer |
| 2| Derrick Whelply  |
+--+--+
2 rows selected (0.173 seconds)
0: jdbc:drill:zk=local>

*My locale is:*
vitalii@vitalii-pc:~$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=
vitalii@vitalii-pc:~$

Kind regards
Vitalii


On Tue, Mar 5, 2019 at 12:25 PM Nick Gressky 
wrote:

> Good afternoon!
>
> I tried to create a view with a Cyrillic name, but all the symbols were
> transformed into '?'. Is there any way to add Russian locale support?
>
> I'm using Apache Drill 1.15
>
> Thanks in advance
>

Re: Import drill sources in eclipse

2019-02-21 Thread Vitalii Diravka

Hi Angelo,

Welcome to Drill community.

Most of Drill devs use Intellij Idea. When I used eclipse for Drill
project, I haven't this issue.
But looks like there are several ways how to solve it:
* You can compile the project externally and the import it into Eclipse.
* You can fix it in Drill root pom file *org.eclipse.m2e:lifecycle-mapping *[1]
and to make the first contribution :)
* You can edit your *Lifecycle Mappings *Eclipse configs [2].
All above approaches are described in [2].

Thanks

[1] https://github.com/apache/drill/blob/master/pom.xml#L810
[2]
https://stackoverflow.com/questions/30642630/artifact-has-not-been-packaged-yet

Kind regards
Vitalii


On Thu, Feb 21, 2019 at 5:40 PM Angelo Mantellini 
wrote:

> Hi,
> I want to try to partecipate to the development of drill.
> My problem is that when I try to import the maven project (I select all
> pom.xml in the drill root dir), I have a list of 440 errors.
> For example
> Description ResourcePathLocationType
> Class cannot be resolved to a type
>  ReadersInitializer.java
> /drill-storage-hive-core/src/main/java/org/apache/drill/exec/store/hive/readers/initilializers
> line 80 Java Problem
>
> Description ResourcePathLocationType
> Artifact has not been packaged yet. When used on reactor artifact, unpack
> should be executed after packaging: see MDEP-98.
> (org.apache.maven.plugins:maven-dependency-plugin:3.1.1:unpack:unpack-vector-types:initialize)
>
> org.apache.maven.plugin.MojoExecutionException: Artifact has not been
> packaged yet. When used on reactor artifact, unpack should be executed
> after packaging: see MDEP-98.
> at
> org.apache.maven.plugins.dependency.AbstractDependencyMojo.unpack(AbstractDependencyMojo.java:250)
> at
> org.apache.maven.plugins.dependency.fromConfiguration.UnpackMojo.unpackArtifact(UnpackMojo.java:128)
> at
> org.apache.maven.plugins.dependency.fromConfiguration.UnpackMojo.doExecute(UnpackMojo.java:107)
> at
> org.apache.maven.plugins.dependency.AbstractDependencyMojo.execute(AbstractDependencyMojo.java:143)
> at
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
> at
> org.eclipse.m2e.core.internal.embedder.MavenImpl.execute(MavenImpl.java:331)
> at
> org.eclipse.m2e.core.internal.embedder.MavenImpl.lambda$7(MavenImpl.java:1342)
> at
> org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.executeBare(MavenExecutionContext.java:177)
> at
> org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:112)
> at
> org.eclipse.m2e.core.internal.embedder.MavenImpl.execute(MavenImpl.java:1341)
> at
> org.eclipse.m2e.core.project.configurator.MojoExecutionBuildParticipant.build(MojoExecutionBuildParticipant.java:52)
> at
> com.ianbrandt.tools.m2e.mdp.core.MdpBuildParticipant.executeMojo(MdpBuildParticipant.java:133)
> at
> com.ianbrandt.tools.m2e.mdp.core.MdpBuildParticipant.build(MdpBuildParticipant.java:67)
> at
> org.eclipse.m2e.core.internal.builder.MavenBuilderImpl.build(MavenBuilderImpl.java:137)
> at
> org.eclipse.m2e.core.internal.builder.MavenBuilder$1.method(MavenBuilder.java:173)
> at
> org.eclipse.m2e.core.internal.builder.MavenBuilder$1.method(MavenBuilder.java:1)
> at
> org.eclipse.m2e.core.internal.builder.MavenBuilder$BuildMethod$1$1.call(MavenBuilder.java:116)
> at
> org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.executeBare(MavenExecutionContext.java:177)
> at
> org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:112)
> at
> org.eclipse.m2e.core.internal.builder.MavenBuilder$BuildMethod$1.call(MavenBuilder.java:106)
> at
> org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.executeBare(MavenExecutionContext.java:177)
> at
> org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:151)
> at
> org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:99)
> at
> org.eclipse.m2e.core.internal.builder.MavenBuilder$BuildMethod.execute(MavenBuilder.java:87)
> at
> org.eclipse.m2e.core.internal.builder.MavenBuilder.build(MavenBuilder.java:201)
> at
> org.eclipse.core.internal.events.BuildManager$2.run(BuildManager.java:798)
> at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:45)
> at
> org.eclipse.core.internal.events.BuildManager.basicBuild(BuildManager.java:219)
> at
> org.eclipse.core.internal.events.BuildManager.basicBuild(BuildManager.java:262)
> at
> org.eclipse.core.internal.events.BuildManager$1.run(BuildManager.java:315)
> at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:45)
> at
>

Re: [DISCUSS] Format plugins in contrib module

2019-02-05 Thread Vitalii Diravka

Absolutely agree with Arina.

I think the core Format Plugins for Parquet, Json and CSV, TSV, PSV files
(which are used for creating Drill tables) can be left in current config
file
and the rest ones should be factored out to the separate config files along
with creating separate modules in Drill *contrib *module.

Therefore the process of creating the new plugins will be more transparent.

Kind regards
Vitalii


On Tue, Feb 5, 2019 at 3:12 PM Charles Givre  wrote:

> I’d concur with Arina’s suggestion.  I do think this would be useful and
> make it easier to make plugins “pluggable”.
> In the meantime, should we recommend that developers of format-plugins
> include their plugins in the bootstrap-storage-plugins.json?  I was
> thinking also that we might want to have some guidelines for unit tests for
> format plugins.  I’m doing some work on the HTTPD format plugin and found
> some issues which cause it to throw NPEs.
> — C
>
>
> > On Feb 5, 2019, at 06:40, Arina Yelchiyeva 
> wrote:
> >
> > Hi all,
> >
> > Before we were adding new formats / plugins into the exec module.
> Eventually we came up to the point that exec package size is growing and
> adding plugin and format contributions is better to separate out in the
> different module.
> > Now we have contrib module where we add such contributions. Plugins are
> pluggable, there are added automatically by means of having
> drill-module.conf file which points to the scanning packages.
> > Format plugins are using the same approach, the only problem is that
> they are not added into bootstrap-storage-plugins.json. So when adding new
> format plugin, in order for it to automatically appear in Drill Web UI,
> developer has to update bootstrap file which is in the exec module.
> > My suggestion we implement some functionality that would merge format
> config with the bootstrap one. For example, each plugin would have to have
> bootstrap-format.json file with the information to which plugin format
> should be added (structure the same as in bootstrap-storage-plugins.json):
> > Example:
> >
> > {
> >  "storage":{
> >dfs: {
> >  formats: {
> >"psv" : {
> >  type: "msgpack",
> >  extensions: [ "mp" ]
> >}
> >  }
> >}
> >  }
> > }
> >
> > Then during Drill start up such bootstrap-format.json files will be
> merged with bootstrap-storage-plugins.json.
> >
> >
> > Current open PR for adding new format plugins:
> > Format plugin for LTSV files - https://github.com/apache/drill/pull/1627
> > SYSLOG (RFC-5424) Format Plugin -
> https://github.com/apache/drill/pull/1530
> > Msgpack format reader - https://github.com/apache/drill/pull/1500
> >
> > Any suggestions?
> >
> > Kind regards,
> > Arina
>
>

Re: January Apache Drill board report

2019-01-31 Thread Vitalii Diravka

+1

Kind regards
Vitalii


On Thu, Jan 31, 2019 at 6:18 PM Aman Sinha  wrote:

> Thanks for putting this together, Arina.
> The Drill Developer Day and Meetup were separate events, so you can split
> them up.
>   - A half day Drill Developer Day was held on Nov 14.  A variety of
> technical design issues were discussed.
>   - A Drill user meetup was held on the same evening.  2 presentations -
> one on use case for Drill and one about indexing support in Drill were
> presented.
>
> Rest of the report LGTM.
>
> -Aman
>
>
> On Thu, Jan 31, 2019 at 7:58 AM Arina Ielchiieva  wrote:
>
> > Hi all,
> >
> > please take a look at the draft board report for the last quarter and let
> > me know if you have any comments.
> >
> > Thanks,
> > Arina
> >
> > =
> >
> > ## Description:
> >  - Drill is a Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud
> > Storage.
> >
> > ## Issues:
> >  - There are no issues requiring board attention at this time.
> >
> > ## Activity:
> >  - Since the last board report, Drill has released version 1.15.0,
> >including the following enhancements:
> >- Add capability to do index based planning and execution
> >- CROSS join support
> >- INFORMATION_SCHEMA FILES and FUNCTIONS were added
> >- Support for TIMESTAMPADD and TIMESTAMPDIFF functions
> >- Ability to secure znodes with custom ACLs
> >- Upgrade to SQLLine 1.6
> >- Parquet filter pushdown for VARCHAR and DECIMAL data types
> >- Support JPPD (Join Predicate Push Down)
> >- Lateral join functionality was enabled by default
> >- Multiple Web UI improvements to simplify the use of options and
> submit
> > queries
> >- Query performance with the semi-join functionality was improved
> >- Support for aliases in the GROUP BY clause
> >- Options to return null for empty string and prevents Drill from
> > returning
> >  a result set for DDL statements
> >- Storage plugin names became case-insensitive
> >
> > - Drill developer meet up was held on November 14, 2018.
> >
> > ## Health report:
> >  - The project is healthy. Development activity
> >as reflected in the pull requests and JIRAs is good.
> >  - Activity on the dev and user mailing lists are stable.
> >  - Three committers were added in the last period.
> >
> > ## PMC changes:
> >
> >  - Currently 23 PMC members.
> >  - No new PMC members added in the last 3 months
> >  - Last PMC addition was Charles Givre on Mon Sep 03 2018
> >
> > ## Committer base changes:
> >
> >  - Currently 51 committers.
> >  - New commmitters:
> > - Hanumath Rao Maduri was added as a committer on Thu Nov 01 2018
> > - Karthikeyan Manivannan was added as a committer on Fri Dec 07 2018
> > - Salim Achouche was added as a committer on Mon Dec 17 2018
> >
> > ## Releases:
> >
> >  - 1.15.0 was released on Mon Dec 31 2018
> >
> > ## Mailing list activity:
> >
> >  - d...@drill.apache.org:
> > - 415 subscribers (down -12 in the last 3 months):
> > - 2066 emails sent to list (2653 in previous quarter)
> >
> >  - iss...@drill.apache.org:
> > - 18 subscribers (up 0 in the last 3 months):
> > - 2480 emails sent to list (3228 in previous quarter)
> >
> >  - user@drill.apache.org:
> > - 592 subscribers (down -5 in the last 3 months):
> > - 249 emails sent to list (310 in previous quarter)
> >
> >
> > ## JIRA activity:
> >
> >  - 196 JIRA tickets created in the last 3 months
> >  - 171 JIRA tickets closed/resolved in the last 3 months
> >
>

Re: N4}[48H$DHWIOXNQQB%~$GD

2019-01-17 Thread Vitalii Diravka

Hi!

Do you use UNION type (is "exec.enable_union_type" enabled)?
This feature is experimental:
https://drill.apache.org/docs/json-data-model/#experimental-feature-heterogeneous-types

Kind regards
Vitalii


On Thu, Jan 17, 2019 at 9:52 PM 陈。  wrote:

>
>
>
> -- 原始邮件 --
> *发件人:* "陈。";
> *发送时间:* 2019年1月17日(星期四) 上午10:52
> *收件人:* "user-subscribe";
> *主题:* N4}[48H$DHWIOXNQQB%~$GD
>
> I don't know if it's due to an empty result
> query sql
> select `6a993905-f23e-4f1a-967a-0d7504e64740`.`委托任务名称` as
> `委托任务名称`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`委托任务编号` as
> `委托任务编号`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`委托分类` as
> `委托分类`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`预估金额(万元)` as
> `预估金额(万元)`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`预估工作量` as
> `预估工作量`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`甲方` as
> `甲方`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`甲方联系人` as
> `甲方联系人`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`联系电话` as
> `联系电话`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`主责部门` as
> `主责部门`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`委托日期` as
> `委托日期`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`约定完工日期` as
> `约定完工日期`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`状态` as
> `状态`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`附件` as
> `附件`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`备注` as
> `备注`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`登记人` as
> `登记人`,`6a993905-f23e-4f1a-967a-0d7504e64740`.`登记时间` as
> `登记时间`,`30cf7541-fbed-4a42-acea-86e46822b31a`.`合同编号` as
> `合同编号`,`30cf7541-fbed-4a42-acea-86e46822b31a`.`合同名称` as
> `合同名称`,`30cf7541-fbed-4a42-acea-86e46822b31a`.`项目类型(小类)` as
> `项目类型(小类)`,`30cf7541-fbed-4a42-acea-86e46822b31a`.`项目类型(大类)` as
> `项目类型(大类)`,`30cf7541-fbed-4a42-acea-86e46822b31a`.`合同额(万元)` as
> `合同额(万元)`,`5eb9d0e4-0f63-44cf-b1d2-29228a423db3`.`name` as
> `name`,`fbd15f17-9fb3-4283-a6b5-89c47014a627`.`ID` as
> `ID`,`fbd15f17-9fb3-4283-a6b5-89c47014a627`.`已采` as
> `已采`,`fbd15f17-9fb3-4283-a6b5-89c47014a627`.`已发` as
> `已发`,`fbd15f17-9fb3-4283-a6b5-89c47014a627`.`电商平台` as
> `电商平台`,`fbd15f17-9fb3-4283-a6b5-89c47014a627`.`平台` as
> `平台`,`fbd15f17-9fb3-4283-a6b5-89c47014a627`.`标题` as
> `标题`,`fbd15f17-9fb3-4283-a6b5-89c47014a627`.`评论` as
> `评论`,`fbd15f17-9fb3-4283-a6b5-89c47014a627`.`时间` as
> `时间`,`fbd15f17-9fb3-4283-a6b5-89c47014a627`.`PAGEURL` as
> `PAGEURL`,`b63343f4-a65c-47f2-8e70-ae48c85d181d`.`省名称` as
> `省名称`,`b63343f4-a65c-47f2-8e70-ae48c85d181d`.`省简称` as
> `省简称`,`b63343f4-a65c-47f2-8e70-ae48c85d181d`.`所属国家` as
> `所属国家`,`b63343f4-a65c-47f2-8e70-ae48c85d181d`.`省编号` as
> `省编号`,`b63343f4-a65c-47f2-8e70-ae48c85d181d`.`描述` as
> `描述`,`b63343f4-a65c-47f2-8e70-ae48c85d181d`.`外键` as `外键` from (select
> `附件`,`甲方联系人`,`约定完工日期`,`状态`,`登记时间`,`登记人`,`委托分类`,`甲方`,`备注`,`委托日期`,`预估工作量`,`联系电话`,`预估金额(万元)`,`主责部门`,`委托任务编号`,`委托任务名称`
> from mongo.`lzdataplatformbusiness`.`dP_OperationControlCooperation` where
> leadingDPWorkTableId='351501113310052353')
> `6a993905-f23e-4f1a-967a-0d7504e64740` left join(select
> `项目类型(大类)`,`合同额(万元)`,`项目类型(小类)`,`合同编号`,`合同名称` from
> mongo.`lzdataplatformbusiness`.`dP_Mysql` where
> leadingDPWorkTableId='351501113343606784')
> `30cf7541-fbed-4a42-acea-86e46822b31a`  on cast(
> `6a993905-f23e-4f1a-967a-0d7504e64740`.`委托任务名称` as varchar)=cast(
> `30cf7541-fbed-4a42-acea-86e46822b31a`.`合同编号` as varchar) left join(select
> `name` from mongo.`lzdataplatformbusiness`.`dP_Sqlserver` where
> leadingDPWorkTableId='351501113364578304')
> `5eb9d0e4-0f63-44cf-b1d2-29228a423db3`  on cast(
> `30cf7541-fbed-4a42-acea-86e46822b31a`.`合同编号` as varchar)=cast(
> `5eb9d0e4-0f63-44cf-b1d2-29228a423db3`.`name` as varchar) left join(select
> `时间`,`已发`,`平台`,`PAGEURL`,`评论`,`ID`,`标题`,`电商平台`,`已采` from
> mongo.`lzdataplatformbusiness`.`dP_Oracle` where
> leadingDPWorkTableId='351501113389744128')
> `fbd15f17-9fb3-4283-a6b5-89c47014a627`  on cast(
> `5eb9d0e4-0f63-44cf-b1d2-29228a423db3`.`name` as varchar)=cast(
> `fbd15f17-9fb3-4283-a6b5-89c47014a627`.`ID` as varchar) left join(select
> `外键`,`所属国家`,`省编号`,`省名称`,`描述`,`省简称` from
> mongo.`lzdataplatformbusiness`.`dP_Excel` where
> leadingDPWorkTableId='351501113284886529')
> `b63343f4-a65c-47f2-8e70-ae48c85d181d`  on cast(
> `fbd15f17-9fb3-4283-a6b5-89c47014a627`.`ID` as varchar)=cast(
> `b63343f4-a65c-47f2-8e70-ae48c85d181d`.`省名称` as varchar)
>
> Exception
> java.util.concurrent.ExecutionException: java.lang.IllegalStateException:
> Failure while reading vector.  Expected vector class of
> org.apache.drill.exec.vector.NullableVarCharVector but was holding vector
> class org.apache.drill.exec.vector.complex.UnionVector, field= [`name`
> (UNION:OPTIONAL), subtypes=([VARCHAR, INT]), children=([`internal`
> (MAP:REQUIRED), children=([`types` (UINT1:REQUIRED)])])]
> at
> org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$PartitionerTask.run(PartitionerDecorator.java:347)
> ~[drill-java-exec-1.14.0.jar:1.14.0]
> at
> com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)
> ~[guava-18.0.jar:na]
> at
>

Re: Problem when using files with differents schemas in the same SELECT

2019-01-04 Thread Vitalii Diravka

Hi Benj,

This is a known issue with a column data type without values, by default it
is INT:OPTIONAL (for your last query) and when the meaningful data is came
that INT:OPTIONAL is treated to the new datatype.
It was discussed very frequently in the different topics. Paul raised this
topic in some mail threads, for instance "Possible way to specify column
types in query" [1].
One of the solutions is to specify the schema before reading the data, it
can be done in the query, schema from the file or with Drill Metastore. But
all mentioned approaches are under development [2], [3].

You can create a Jira ticket for this issue, since it is a good use case
for shoeing this issue. Also possibly some improvements can be done here,
for instance to show missing string values as *null*, but not as empty
string.

each of these SELECT can sometimes return "Error: SYSTEM ERROR:
> NullPointerException".

For the NPE, did you check the stacktrace? Please specify it in Jira as
well.

[1] http://mail-archives.apache.org/mod_mbox/drill-dev/201809.mbox/browser
[2] https://issues.apache.org/jira/browse/DRILL-6552
[3] https://issues.apache.org/jira/browse/DRILL-6835

Kind regards
Vitalii


On Wed, Jan 2, 2019 at 7:59 PM benj.dev 
wrote:

> Hi,
>
> I have read that in SELECT from multiple sources (SELECT * FROM
> tmp.`myfile*`), the files are treated in random order.
> But I don't understand why the processing of (parquet) files that do not
> have the same columns is not homogeneous.
>
> Example (on Drill 1.14) :
>
> CREATE TABLE tmp2.`mytable1` AS SELECT1 AS myc1,
>   'col3_1' AS myc3;
> CREATE TABLE tmp2.`mytable2` AS SELECT2 AS myc1, 'col2_2' AS
> myc2, 'col3_2' AS myc3, 'col4_2' AS myc4;
> CREATE TABLE tmp2.`mytable3` AS SELECT 0 AS myc0, 3 AS myc1, 'col2_3' AS
> myc2;
>
> SELECT * FROM tmp2.`mytable*`;
> | mytable3  | 0   | 3 | col2_3  |
> | mytable2  | 1635023213  | 2 | col2_2  |
> | mytable1  | 1635023213  | 1 | |
>
> SELECT myc0 FROM tmp2.`mytable*`;
> | 0   |
> | 1818386772  |
> | 1818386772  |
>
> SELECT myc2 FROM tmp2.`mytable*`;
> | col2_3  |
> | col2_2  |
> | |
>
> SELECT myc0, myc1, myc2, myc3, myc4 FROM tmp2.`mytable*`;
> | 0 | 3 | col2_3  | null| null|
> | 0 | 2 | col2_2  | col3_2  | col4_2  |
> | 0 | 1 | | col3_1  | |
>
> Please note that :
> - each of these SELECT can sometimes return "Error: SYSTEM ERROR:
> NullPointerException".
> - The undefined columns may have different value in different calls.
> - Another point is that for a given column undefined in some files, this
> one can appear with a null value or empty chain (illustrated by the last
> example).
>   Maybe this is consequent of the (random) order of the SELECT.
>
> I can understand that the processing of different files in the same
> request can be difficult, but
> - Why try to put (random) value on unknown columns and not just put a
> NULL. Put NULL everytime will allow to treat this case
> - An error should appears all the time OR never, not randomly.
>
> Does anyone have an explanation or any trick or is it a well-known
> comportment/bug with already planned developments ?
>
> Thanks for any explanations or digression,
> Regards,
>

Re: Query on tables

2018-12-27 Thread Vitalii Diravka

Hi Kiran,

You ask something similar to INSERT statements, but Drill doesn't support
this functionality now [1].
Also CREATE OR REPLACE TABLE could be helpful for you (with additional
work), but most databases don't support it [2].
You can write scripts with creating tables with different names in the same
directory (and then possibly manage the newly created files).
Also consider temporary tables and views for your case.

Did you restart drillbits on your cluster? Max Directory memory should be
easily changed for you on the cluster.
Here is an info about it [3] and here is the last improvements in this area
[4].
There are also a lot of system/session options, which can help you to
accelerate your CTAS [5]. It mainly depends on the type and structure of
your data and also operators, which you use in your SQL query.
You can start from investigating the individual SELECT queries and profiles
for them.

[1] https://issues.apache.org/jira/browse/DRILL-3534
[2] https://issues.apache.org/jira/browse/DRILL-3979
[3] https://drill.apache.org/docs/configuring-drill-memory/
[4] https://issues.apache.org/jira/browse/DRILL-5741
[5] https://drill.apache.org/docs/configuration-options-introduction/

Kind regards
Vitalii


On Wed, Dec 26, 2018 at 5:41 PM Kiran Kumar NS 
wrote:

> Hi Vitalii,
>
> Thanks for getting back.
> Yes, I did try it. But, I got the table is already present. It is because
> I don’t want to create another table with new table name for that changed
> data .
>
> Also, if it works in any which way, every time the source changes, is
> there any automatic way to get the drill tables also updated?
>
> Also, I have another set of queries.
>
> If I set max proc mem, max direct mem it is not reflected in cluster. I
> see it in UI. But if I change java heap mem, it reflects. Wanted to
> understand when we set these settings, how still distributes this setting
> to cluster. I did observe whatever the settings I do in options link of UI,
> it reflects in the whole cluster. But, if I do it in drill-env.sh, it does
> not. I have to go and change in all the nodes. Also, it is mentioned that
> distrib-env.sh should not be used by users. Essentially, I want to
> understand, how the configurations are propagated to all nodes in cluster
> and how can I reap the benefit of additional compute of the cluster so that
> my queries are executed fast. Currently one CTAS command and partition by
> commands are taking hours to process GBs of data.
>
> I can provide statistics of elapsed time of query execution if required,
> for your analysis.
>
> Kind Regards
> Kiran
>
> Sent from my iPhone
>
> > On Dec 25, 2018, at 2:16 PM, Vitalii Diravka  wrote:
> >
> > Hello!
> >
> > Can you filter the newly coming data and run CTAS only for those data?
> It will allow to avoid extra work.
> >
> > Kind regards
> > Vitalii
> >
> >
> >> On Mon, Dec 24, 2018 at 6:53 PM Idea Everywhere <
> mailtonski...@gmail.com> wrote:
> >> Hi Team,
> >>
> >> My current situation:
> >> I have apache drill installed in AWS EC2 (M4.4x large) instances
> cluster of 3 nodes. My source data is coming from S3 bucket.
> >> I want to engage drill to read that data from S3, create tables within
> itself (using CTAS) while the table data is stored in AWS EFS mapped to ec2
> instances created as mentioned above and allow the user to read the data
> from those tables.
> >> Tables and Partitioned tables are created as of now.
> >>
> >> Questions:
> >> 1. It is observed that, when the tables are created, it reads the data
> from source and the table is created along with that data (ie., if the
> original source is 10GB, the tables stored in the file system are
> comparable to that size). However, I have a question, if the source is
> growing, how it gets into the CTAS tables or CTAS Partition by tables, so
> that queries will result latest output
> >>
> >> Kind Regards
> >> Kiran
> >>
>

Re: Query on tables

2018-12-25 Thread Vitalii Diravka

Hello!

Can you filter the newly coming data and run CTAS only for those data? It
will allow to avoid extra work.

Kind regards
Vitalii


On Mon, Dec 24, 2018 at 6:53 PM Idea Everywhere 
wrote:

> Hi Team,
>
> My current situation:
> I have apache drill installed in AWS EC2 (M4.4x large) instances cluster
> of 3 nodes. My source data is coming from S3 bucket.
> I want to engage drill to read that data from S3, create tables within
> itself (using CTAS) while the table data is stored in AWS EFS mapped to ec2
> instances created as mentioned above and allow the user to read the data
> from those tables.
> Tables and Partitioned tables are created as of now.
>
> Questions:
> 1. It is observed that, when the tables are created, it reads the data
> from source and the table is created along with that data (ie., if the
> original source is 10GB, the tables stored in the file system are
> comparable to that size). However, I have a question, if the source is
> growing, how it gets into the CTAS tables or CTAS Partition by tables, so
> that queries will result latest output
>
> Kind Regards
> Kiran
>
>

Re: [VOTE] Apache Drill release 1.15.0 - RC0

2018-12-20 Thread Vitalii Diravka

Hi Aman,

The root cause of the issue is not obvious and related to HBase lib [1].
I have described the root cause of the issue and steps for reproducing it
in [2] and also opened PR for it [3].

Thanks for finding this.

[1] https://issues.apache.org/jira/browse/HBASE-21005
[1] https://issues.apache.org/jira/browse/DRILL-6916
[2] https://github.com/apache/drill/pull/1579

Kind regards
Vitalii


On Wed, Dec 19, 2018 at 2:18 PM Vitalii Diravka  wrote:

> @Aman Sinha  I am investigating, which maven plugin
> causes creating this dir.
>
> I guess this sinks rc0. I'll prepare new release candidate once
> issues in DRILL-6912 [1] and DRILL-6913 [2] are fixed.
>
>   [1] https://issues.apache.org/jira/browse/DRILL-6912
>   [2] https://issues.apache.org/jira/browse/DRILL-6913
>
> Kind regards
> Vitalii
>
>
> On Wed, Dec 19, 2018 at 1:58 PM Arina Ielchiieva  wrote:
>
>> I think above issues are blockers, Vitalii please cancel the vote
>>
>> Kind regards,
>> Arina
>>
>> On Wed, Dec 19, 2018 at 6:09 AM Aman Sinha  wrote:
>>
>> > @vita...@apache.org   any idea why there's an
>> > extraneous directory in the source ?
>> > drwxrwxr-x vitalii/vitalii   0 2018-12-18 03:48
>> > apache-drill-1.15.0-src/${project.*basedir*}/
>> >
>> > drwxrwxr-x vitalii/vitalii   0 2018-12-18 03:48
>> > apache-drill-1.15.0-src/${project.*basedir*}/src/
>> >
>> > drwxrwxr-x vitalii/vitalii   0 2018-12-18 03:48
>> > apache-drill-1.15.0-src/${project.*basedir*}/src/site/
>> >
>> > drwxrwxr-x vitalii/vitalii   0 2018-12-18 03:48
>> > apache-drill-1.15.0-src/${project.*basedir*}/src/site/resources/
>> >
>> > drwxrwxr-x vitalii/vitalii   0 2018-12-18 03:48
>> > apache-drill-1.15.0-src/${project.*basedir*}/src/site/resources/repo/
>> >
>> > On Tue, Dec 18, 2018 at 10:58 AM Vitalii Diravka 
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > I'd like to propose the first release candidate (RC0) of Apache Drill,
>> > > version 1.15.0.
>> > >
>> > > The release candidate covers a total of 185 resolved JIRAs [1].
>> > > Thanks to everyone who contributed to this release.
>> > >
>> > > The tarball artifacts are hosted at [2] and the maven artifacts are
>> > hosted
>> > > at [3].
>> > >
>> > > This release candidate is based on commit
>> > > ff797695de0e27a732c22e2410cbef58abbfcef3 located at [4].
>> > >
>> > > Please download and try out the release.
>> > >
>> > > The vote ends at 7:00 PM UTC (11:00 AM PDT, 9:00 PM EET, 0:30 AM IST),
>> > Dec
>> > > 21th, 2018
>> > >
>> > > [ ] +1
>> > > [ ] +0
>> > > [ ] -1
>> > >
>> > > I have found two issues, which are not observed earlier [5], [6].
>> Should
>> > we
>> > > consider them as blockers?
>> > >
>> > >   [1]
>> > > *
>> > >
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343317==12313820
>> > > <
>> > >
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343317==12313820
>> > > >*
>> > >
>> > >   [2] *http://home.apache.org/~vitalii/drill/releases/1.15.0/rc0/
>> > > <http://home.apache.org/~vitalii/drill/releases/1.15.0/rc0/>*
>> > >
>> > >   [3] *
>> > >
>> https://repository.apache.org/content/repositories/orgapachedrill-1058/
>> > > <
>> https://repository.apache.org/content/repositories/orgapachedrill-1058/
>> > >*
>> > >
>> > >   [4] *https://github.com/vdiravka/drill/commits/drill-1.15.0
>> > > <https://github.com/vdiravka/drill/commits/drill-1.15.0>*
>> > >
>> > >   [5] https://issues.apache.org/jira/browse/DRILL-6912
>> > >
>> > >   [6] https://issues.apache.org/jira/browse/DRILL-6913
>> > >
>> > > Kind regards
>> > > Vitalii
>> > >
>> >
>>
>

Re: [VOTE] Apache Drill release 1.15.0 - RC0

2018-12-19 Thread Vitalii Diravka

@Aman Sinha  I am investigating, which maven plugin
causes creating this dir.

I guess this sinks rc0. I'll prepare new release candidate once
issues in DRILL-6912 [1] and DRILL-6913 [2] are fixed.

  [1] https://issues.apache.org/jira/browse/DRILL-6912
  [2] https://issues.apache.org/jira/browse/DRILL-6913

Kind regards
Vitalii


On Wed, Dec 19, 2018 at 1:58 PM Arina Ielchiieva  wrote:

> I think above issues are blockers, Vitalii please cancel the vote
>
> Kind regards,
> Arina
>
> On Wed, Dec 19, 2018 at 6:09 AM Aman Sinha  wrote:
>
> > @vita...@apache.org   any idea why there's an
> > extraneous directory in the source ?
> > drwxrwxr-x vitalii/vitalii   0 2018-12-18 03:48
> > apache-drill-1.15.0-src/${project.*basedir*}/
> >
> > drwxrwxr-x vitalii/vitalii   0 2018-12-18 03:48
> > apache-drill-1.15.0-src/${project.*basedir*}/src/
> >
> > drwxrwxr-x vitalii/vitalii   0 2018-12-18 03:48
> > apache-drill-1.15.0-src/${project.*basedir*}/src/site/
> >
> > drwxrwxr-x vitalii/vitalii   0 2018-12-18 03:48
> > apache-drill-1.15.0-src/${project.*basedir*}/src/site/resources/
> >
> > drwxrwxr-x vitalii/vitalii   0 2018-12-18 03:48
> > apache-drill-1.15.0-src/${project.*basedir*}/src/site/resources/repo/
> >
> > On Tue, Dec 18, 2018 at 10:58 AM Vitalii Diravka 
> > wrote:
> >
> > > Hi all,
> > >
> > > I'd like to propose the first release candidate (RC0) of Apache Drill,
> > > version 1.15.0.
> > >
> > > The release candidate covers a total of 185 resolved JIRAs [1].
> > > Thanks to everyone who contributed to this release.
> > >
> > > The tarball artifacts are hosted at [2] and the maven artifacts are
> > hosted
> > > at [3].
> > >
> > > This release candidate is based on commit
> > > ff797695de0e27a732c22e2410cbef58abbfcef3 located at [4].
> > >
> > > Please download and try out the release.
> > >
> > > The vote ends at 7:00 PM UTC (11:00 AM PDT, 9:00 PM EET, 0:30 AM IST),
> > Dec
> > > 21th, 2018
> > >
> > > [ ] +1
> > > [ ] +0
> > > [ ] -1
> > >
> > > I have found two issues, which are not observed earlier [5], [6].
> Should
> > we
> > > consider them as blockers?
> > >
> > >   [1]
> > > *
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343317==12313820
> > > <
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343317==12313820
> > > >*
> > >
> > >   [2] *http://home.apache.org/~vitalii/drill/releases/1.15.0/rc0/
> > > <http://home.apache.org/~vitalii/drill/releases/1.15.0/rc0/>*
> > >
> > >   [3] *
> > >
> https://repository.apache.org/content/repositories/orgapachedrill-1058/
> > > <
> https://repository.apache.org/content/repositories/orgapachedrill-1058/
> > >*
> > >
> > >   [4] *https://github.com/vdiravka/drill/commits/drill-1.15.0
> > > <https://github.com/vdiravka/drill/commits/drill-1.15.0>*
> > >
> > >   [5] https://issues.apache.org/jira/browse/DRILL-6912
> > >
> > >   [6] https://issues.apache.org/jira/browse/DRILL-6913
> > >
> > > Kind regards
> > > Vitalii
> > >
> >
>

Re: [ANNOUNCE] New Committer: Salim Achouche

2018-12-17 Thread Vitalii Diravka

Congratulations Salim!
Well deserved!

Kind regards
Vitalii


On Mon, Dec 17, 2018 at 12:40 PM Arina Ielchiieva  wrote:

> The Project Management Committee (PMC) for Apache Drill has invited Salim
> Achouche to become a committer, and we are pleased to announce that he has
> accepted.
>
> Salim Achouche [1] started contributing to the Drill project in 2017. He
> has made many improvements for the parquet reader, including performance
> for flat data types, columnar parquet batch sizing functionality, fixed
> various bugs and memory leaks. He also optimized implicit columns handling
> with scanner and improved sql pattern contains performance.
>
> Welcome Salim, and thank you for your contributions!
>
> - Arina
> (on behalf of Drill PMC)
>

Re: Running drill in distributed mode

2018-12-10 Thread Vitalii Diravka

Hi!

You can run drillbit on one machine and configure FileSystem Storage Plugin
to specify "Connection" to the remote fs [1].
Do you need to run drillbit on every machine and coordinate them via
Zookeeper? Not sure that it is efficient for local file system.

[1]
https://drill.apache.org/docs/file-system-storage-plugin/#connecting-drill-to-a-file-system

Kind regards
Vitalii


On Mon, Dec 10, 2018 at 9:09 AM Mehran.D [BR-PD] 
wrote:

> I wanted to know if it is possible to run drill in distributed mode on
> local file systems of machines.
>
> Is it possible to run it such as splunk does and have a search head and 1
> or mode indexers that can distribute search query and search  head can
> aggregate response and get the query done?
>
> I ran drill on local file system with one specific zookeeper and second
> machine is getting unavailable in drill monitoring web  interface
>
>
>
> I wanted to know if:
>
> · Is it possible to run apache drill in distributed local file
> system?
>
> · Is it possible to run drill in a way that query another apache
> drill as a storage-plugin?
>
>
>
> *Best Regards,*
>
>
>
> *  [image: LOGO1]*
>
> *Mehran Dashti*
>
> *Product Leader*
>
> *09125902452*
>
>
>

Re: Drill “VALIDATION ERROR: A table or view with given name already exists in schema” for empty directory

2018-12-06 Thread Vitalii Diravka

Reed,

I see you have asked the same question on stackoverflow [1] and found the
root cause of the problem.
I have added comment there.

[1]
https://stackoverflow.com/questions/53604950/drill-validation-error-a-table-or-view-with-given-name-already-exists-in-schem/53654748#53654748

Kind regards
Vitalii


On Thu, Dec 6, 2018 at 5:17 PM Vitalii Diravka  wrote:

> @Khurram Thank you for pointing out the Jira ticket. It differs and it is
> no more the issue. I have resolved the ticket.
>
> @Reed I looked to your case and it looks like it is expected
> behavior. Empty directory can be regular (but "schemaless") Drill table.
> So you can't create the table with the same name under the same workspace.
> If you specify the empty directory as a workspace for Drill, then you can
> create new tables inside it.
> I have also checked that the same behavior was for drill-1.11.0-mapr and
> drill-1.10.0-mapr versions.
>
> See more description about queering empty directories here:
>
> https://drill.apache.org/docs/data-sources-and-file-formats-introduction/#schemaless-tables
>
> Kind regards
> Vitalii
>
>
> On Thu, Dec 6, 2018 at 3:23 AM Khurram Faraaz  wrote:
>
>> Vitalii, this could be related to
>> https://issues.apache.org/jira/browse/DRILL-2775
>>
>> Regards,
>> Khurram
>>
>> On Wed, Dec 5, 2018 at 4:51 PM Vitalii Diravka 
>> wrote:
>>
>> > Hi Reed,
>> >
>> > It looks like a bug. Could you please create a jira ticket with an above
>> > description?
>> >
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_projects_DRILL_issues=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=H5JEl9vb-mBIjic10QAbDD2vkUUKAxjO6wZO322RtdI=WD8U6gAg6W1XWQoDvaWyzjdcvuEAdui_jCDb-8JOQhM=gNgq2I5icxbflP6fLgeNK5U2fF8N9vGBuokgEb03H6I=
>> >
>> > Kind regards
>> > Vitalii
>> >
>> >
>> > On Wed, Dec 5, 2018 at 6:57 PM Reed Villanueva 
>> > wrote:
>> >
>> > > After upgrading drill on our cluster to drill-1.12.0-mapr, testing our
>> > > daily ETL scripts (which all use drill for converting parquet files to
>> > > tsv), a validation error ("*table or view with given name already
>> > exists*")
>> > > is always thrown when trying to run a `CREATE TABLE` statement on some
>> > > empty directories in a writable workspace.
>> > >
>> > >
>> > > [Error Id: 6ea46737-8b6a-4887-a671-4bddbea02476 on
>> > > mapr002.ucera.local:31010]
>> > > at
>> > >
>> > >
>> >
>> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
>> > > at
>> > >
>> > >
>> >
>> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
>> > > :
>> > > :
>> > > :
>> > > Caused by: org.apache.drill.common.exceptions.UserRemoteException:
>> > > VALIDATION ERROR: A table or view with given name
>> > > [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already
>> > exists
>> > > in schema [dfs.etl_internal]
>> > >
>> > >
>> > > After some brief debugging, I see that the directory in question under
>> > the
>> > > workspace (ie.
>> > /internal_etl/project/version-2/stages/storage/ACCOUNT/tsv)
>> > > *is in fact empty*, yet still throwing these errors.
>> > >
>> > > Looking for the error ID in the drillbit.log file in the associated
>> node
>> > in
>> > > the error message above, we see
>> > >
>> > > 2018-12-04 10:13:25,285
>> > [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
>> > > INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id
>> > > 23f92019-db56-862f-e7b9-cd51b3e174ae: create table
>> > >
>> > >
>> >
>> dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv`
>> > > as
>> > > select 
>> > > from
>> > >
>> > >
>> >
>> dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/parquet`
>> > > 2018-12-04 10:13:25,406
>> > [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
>> > > INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
>> > took
>> > > 0 ms, numFiles: 1
>> > > 2018-12-04 10:13:25,408
>> > [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
>> > > IN

Re: Drill “VALIDATION ERROR: A table or view with given name already exists in schema” for empty directory

2018-12-06 Thread Vitalii Diravka

@Khurram Thank you for pointing out the Jira ticket. It differs and it is
no more the issue. I have resolved the ticket.

@Reed I looked to your case and it looks like it is expected
behavior. Empty directory can be regular (but "schemaless") Drill table.
So you can't create the table with the same name under the same workspace.
If you specify the empty directory as a workspace for Drill, then you can
create new tables inside it.
I have also checked that the same behavior was for drill-1.11.0-mapr and
drill-1.10.0-mapr versions.

See more description about queering empty directories here:
https://drill.apache.org/docs/data-sources-and-file-formats-introduction/#schemaless-tables

Kind regards
Vitalii


On Thu, Dec 6, 2018 at 3:23 AM Khurram Faraaz  wrote:

> Vitalii, this could be related to
> https://issues.apache.org/jira/browse/DRILL-2775
>
> Regards,
> Khurram
>
> On Wed, Dec 5, 2018 at 4:51 PM Vitalii Diravka  wrote:
>
> > Hi Reed,
> >
> > It looks like a bug. Could you please create a jira ticket with an above
> > description?
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_projects_DRILL_issues=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=H5JEl9vb-mBIjic10QAbDD2vkUUKAxjO6wZO322RtdI=WD8U6gAg6W1XWQoDvaWyzjdcvuEAdui_jCDb-8JOQhM=gNgq2I5icxbflP6fLgeNK5U2fF8N9vGBuokgEb03H6I=
> >
> > Kind regards
> > Vitalii
> >
> >
> > On Wed, Dec 5, 2018 at 6:57 PM Reed Villanueva 
> > wrote:
> >
> > > After upgrading drill on our cluster to drill-1.12.0-mapr, testing our
> > > daily ETL scripts (which all use drill for converting parquet files to
> > > tsv), a validation error ("*table or view with given name already
> > exists*")
> > > is always thrown when trying to run a `CREATE TABLE` statement on some
> > > empty directories in a writable workspace.
> > >
> > >
> > > [Error Id: 6ea46737-8b6a-4887-a671-4bddbea02476 on
> > > mapr002.ucera.local:31010]
> > > at
> > >
> > >
> >
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
> > > at
> > >
> > >
> >
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
> > > :
> > > :
> > > :
> > > Caused by: org.apache.drill.common.exceptions.UserRemoteException:
> > > VALIDATION ERROR: A table or view with given name
> > > [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already
> > exists
> > > in schema [dfs.etl_internal]
> > >
> > >
> > > After some brief debugging, I see that the directory in question under
> > the
> > > workspace (ie.
> > /internal_etl/project/version-2/stages/storage/ACCOUNT/tsv)
> > > *is in fact empty*, yet still throwing these errors.
> > >
> > > Looking for the error ID in the drillbit.log file in the associated
> node
> > in
> > > the error message above, we see
> > >
> > > 2018-12-04 10:13:25,285
> > [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> > > INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id
> > > 23f92019-db56-862f-e7b9-cd51b3e174ae: create table
> > >
> > >
> >
> dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv`
> > > as
> > > select 
> > > from
> > >
> > >
> >
> dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/parquet`
> > > 2018-12-04 10:13:25,406
> > [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> > > INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
> > took
> > > 0 ms, numFiles: 1
> > > 2018-12-04 10:13:25,408
> > [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> > > INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
> > took
> > > 0 ms, numFiles: 1
> > > 2018-12-04 10:13:25,893
> > [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> > > INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
> > took
> > > 0 ms, numFiles: 1
> > > 2018-12-04 10:13:25,894
> > [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> > > INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
> > took
> > > 0 ms, numFiles: 1
> > > 2018-12-04 10:13:25,898
> > [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> > > INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
> > took
> > > 0 ms, numFiles: 1
> >

Re: Drill “VALIDATION ERROR: A table or view with given name already exists in schema” for empty directory

2018-12-05 Thread Vitalii Diravka

Hi Reed,

It looks like a bug. Could you please create a jira ticket with an above
description?
https://issues.apache.org/jira/projects/DRILL/issues

Kind regards
Vitalii


On Wed, Dec 5, 2018 at 6:57 PM Reed Villanueva 
wrote:

> After upgrading drill on our cluster to drill-1.12.0-mapr, testing our
> daily ETL scripts (which all use drill for converting parquet files to
> tsv), a validation error ("*table or view with given name already exists*")
> is always thrown when trying to run a `CREATE TABLE` statement on some
> empty directories in a writable workspace.
>
>
> [Error Id: 6ea46737-8b6a-4887-a671-4bddbea02476 on
> mapr002.ucera.local:31010]
> at
>
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
> at
>
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
> :
> :
> :
> Caused by: org.apache.drill.common.exceptions.UserRemoteException:
> VALIDATION ERROR: A table or view with given name
> [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists
> in schema [dfs.etl_internal]
>
>
> After some brief debugging, I see that the directory in question under the
> workspace (ie. /internal_etl/project/version-2/stages/storage/ACCOUNT/tsv)
> *is in fact empty*, yet still throwing these errors.
>
> Looking for the error ID in the drillbit.log file in the associated node in
> the error message above, we see
>
> 2018-12-04 10:13:25,285 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id
> 23f92019-db56-862f-e7b9-cd51b3e174ae: create table
>
> dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv`
> as
> select 
> from
>
> dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/parquet`
> 2018-12-04 10:13:25,406 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
> 0 ms, numFiles: 1
> 2018-12-04 10:13:25,408 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
> 0 ms, numFiles: 1
> 2018-12-04 10:13:25,893 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
> 0 ms, numFiles: 1
> 2018-12-04 10:13:25,894 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
> 0 ms, numFiles: 1
> 2018-12-04 10:13:25,898 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
> 0 ms, numFiles: 1
> 2018-12-04 10:13:25,898 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
> 0 ms, numFiles: 1
> 2018-12-04 10:13:25,905 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> INFO  o.a.d.e.p.s.h.CreateTableHandler - User Error Occurred: A table or
> view with given name
> [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists
> in schema [dfs.etl_internal]
> org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: A
> table or view with given name
> [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists
> in schema [dfs.etl_internal]
>
>
> [Error Id: 45177abc-7e9f-4678-959f-f9e0e38bc564 ]
> at
>
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
> ~[drill-common-1.12.0-mapr.jar:1.12.0-mapr]
> at
>
> org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.checkTableCreationPossibility(CreateTableHandler.java:326)
> [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
> at
>
> org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler.java:90)
> [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
> at
>
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:131)
> [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
> at
>
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79)
> [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
> at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:567)
> [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264)
> [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_151]
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_151]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
> 2018-12-04 10:13:25,924 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman]
> INFO  o.apache.drill.exec.work.WorkManager - Waiting for 0 queries to
> complete before shutting down
> 2018-12-04 10:13:25,924

Hangout Discussion Topics

2018-11-26 Thread Vitalii Diravka

Hi All,

Does anyone have any topics to discuss during the hangout tomorrow?

Kind regards
Vitalii

Re: November Apache Drill board report

2018-11-07 Thread Vitalii Diravka

+1
Does it make sense to add "Support Transitive Closure during Filter Push
Down and Partition Pruning" feature DRILL-6173 [1]?
This is one of the places where we are farther than Spark, for example [2],
[3].

[1] https://issues.apache.org/jira/browse/DRILL-6173
[2] https://issues.apache.org/jira/browse/SPARK-13940
[3] https://issues.apache.org/jira/browse/SPARK-13209

Kind regards
Vitalii


On Wed, Nov 7, 2018 at 4:01 PM Volodymyr Vysotskyi 
wrote:

> +1, sorry for the delay.
>
> Kind regards,
> Volodymyr Vysotskyi
>
>
> On Wed, Nov 7, 2018 at 3:56 PM Arina Ielchiieva  wrote:
>
> > Hi Padma,
> >
> > I can include mention about batch sizing but I am not sure what I should
> > mention, quick search over release notes shows a couple of changes
> related
> > to batch sizing:
> > https://drill.apache.org/docs/apache-drill-1-14-0-release-notes/
> > Could you please propose what I should include?
> >
> > @PMCs and committers
> > Only one PMC member has given +1 for the report. Could more folks please
> > review the report?
> >
> > Kind regards,
> > Arina
> >
> > On Fri, Nov 2, 2018 at 8:33 PM Padma Penumarthy <
> > penumarthy.pa...@gmail.com>
> > wrote:
> >
> > > Hi Arina,
> > >
> > > Can you add batch sizing (for bunch of operators and parquet reader)
> > also ?
> > >
> > > Thanks
> > > Padma
> > >
> > >
> > > On Fri, Nov 2, 2018 at 2:55 AM Arina Ielchiieva 
> > wrote:
> > >
> > > > Sure, let's mention.
> > > > Updated the report.
> > > >
> > > > =
> > > >
> > > >  ## Description:
> > > >  - Drill is a Schema-free SQL Query Engine for Hadoop, NoSQL and
> Cloud
> > > > Storage.
> > > >
> > > > ## Issues:
> > > >  - There are no issues requiring board attention at this time.
> > > >
> > > > ## Activity:
> > > >  - Since the last board report, Drill has released version 1.14.0,
> > > >including the following enhancements:
> > > > - Drill in a Docker container
> > > > - Image metadata format plugin
> > > > - Upgrade to Calcite 1.16.0
> > > > - Kafka plugin push down support
> > > > - Phonetic and String functions
> > > > - Enhanced decimal data support
> > > > - Spill to disk for the Hash Join support
> > > > - CGROUPs resource management support
> > > > - Lateral / Unnest support (disabled by default)
> > > >  - There were active discussions about schema provision in Drill.
> > > >Based on these discussions two projects are currently evolving:
> > > >Drill metastore and schema provision in the file and in a query.
> > > >  - Apache Drill book has been written by two PMC members (Charles and
> > > > Paul).
> > > >  - Drill developer meet up will be held on November 14, 2018.
> > > >
> > > >The following areas are going to be discussed:
> > > > - Storage plugins
> > > > - Schema discovery & Evolution
> > > > - Metadata Management
> > > > - Resource management
> > > > - Integration with Apache Arrow
> > > >
> > > > ## Health report:
> > > >  - The project is healthy. Development activity
> > > >as reflected in the pull requests and JIRAs is good.
> > > >  - Activity on the dev and user mailing lists are stable.
> > > >  - Three committers and three new PMC member were added in the last
> > > period.
> > > >
> > > > ## PMC changes:
> > > >
> > > >  - Currently 23 PMC members.
> > > >  - New PMC members:
> > > > - Boaz Ben-Zvi was added to the PMC on Fri Aug 17 2018
> > > > - Charles Givre was added to the PMC on Mon Sep 03 2018
> > > > - Vova Vysotskyi was added to the PMC on Fri Aug 24 2018
> > > >
> > > > ## Committer base changes:
> > > >
> > > >  - Currently 48 committers.
> > > >  - New commmitters:
> > > > - Chunhui Shi was added as a committer on Thu Sep 27 2018
> > > > - Gautam Parai was added as a committer on Mon Oct 22 2018
> > > > - Weijie Tong was added as a committer on Fri Aug 31 2018
> > > >
> > > > ## Releases:
> > > >
> > > >  - 1.14.0 was released on Sat Aug 04 2018
> > > >
> > > > ## Mailing list activity:
> > > >
> > > >  - d...@drill.apache.org:
> > > > - 427 subscribers (down -6 in the last 3 months):
> > > > - 2827 emails sent to list (2126 in previous quarter)
> > > >
> > > >  - iss...@drill.apache.org:
> > > > - 18 subscribers (down -1 in the last 3 months):
> > > > - 3487 emails sent to list (4769 in previous quarter)
> > > >
> > > >  - user@drill.apache.org:
> > > > - 597 subscribers (down -6 in the last 3 months):
> > > > - 332 emails sent to list (346 in previous quarter)
> > > >
> > > >
> > > > ## JIRA activity:
> > > >
> > > >  - 164 JIRA tickets created in the last 3 months
> > > >  - 128 JIRA tickets closed/resolved in the last 3 months
> > > >
> > > >
> > > >
> > > > On Fri, Nov 2, 2018 at 12:25 AM Sorabh Hamirwasia <
> > shamirwa...@mapr.com>
> > > > wrote:
> > > >
> > > > > Hi Arina,
> > > > > Lateral/Unnest feature was part of 1.14 though it was disabled by
> > > > default.
> > > > > Should we mention it as part of 1.14 enhancements in the report?
> > > > >
> > > > > Thanks,
> > > > > Sorabh
> > > > >
> > > >

Re: [ANNOUNCE] New Committer: Hanumath Rao Maduri

2018-11-01 Thread Vitalii Diravka

Congratulations!

Kind regards
Vitalii


On Thu, Nov 1, 2018 at 5:43 PM salim achouche  wrote:

> Congrats Hanu!
>
> On Thu, Nov 1, 2018 at 6:05 AM Arina Ielchiieva  wrote:
>
> > The Project Management Committee (PMC) for Apache Drill has invited
> > Hanumath
> > Rao Maduri to become a committer, and we are pleased to announce that he
> > has accepted.
> >
> > Hanumath became a contributor in 2017, making changes mostly in the Drill
> > planning side, including lateral / unnest support. He is also one of the
> > contributors of index based planning and execution support.
> >
> > Welcome Hanumath, and thank you for your contributions!
> >
> > - Arina
> > (on behalf of Drill PMC)
> >
>
>
> --
> Regards,
> Salim
>

Re: Drill JDBC Plugin limit queries

2018-10-20 Thread Vitalii Diravka

Rahul,

*double rows* is an estimate row count, which can be used in choosing right
Join operator, maybe somewhere else.
But to have a proper *PushLimitIntoScan *it is necessary to change *String*
*sql*.
Possibly it is necessary to keep *JdbcImplementor *or* JdbcImplementor.Result
*from* JdbcPrel*in class in *JdbcGroupScan *class
and change the sqlNode in the *applyLimit() *method.

Not sure why *DrillPushLimitToScanRule *is not matched. Is it added to the
planner program?
To find the reason of it you can compare the flow with Parquet Scan, for
instance.


On Fri, Oct 19, 2018 at 7:37 PM Rahul Raj  wrote:

> Vitalii,
>
> I made both the changes, it did not work and a full scan was issued as
> shown in the plan below.
>
> 00-00Screen : rowType = RecordType(INTEGER actor_id, VARCHAR(45)
> first_name, VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount
> = 5.0, cumulative cost = {120.5 rows, 165.5 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 227
> 00-01  Project(actor_id=[$0], first_name=[$1], last_name=[$2],
> last_update=[$3]) : rowType = RecordType(INTEGER actor_id, VARCHAR(45)
> first_name, VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount
> = 5.0, cumulative cost = {120.0 rows, 165.0 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 226
> 00-02SelectionVectorRemover : rowType = RecordType(INTEGER
> actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3)
> last_update): rowcount = 5.0, cumulative cost = {115.0 rows, 145.0
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 225
> 00-03  Limit(fetch=[5]) : rowType = RecordType(INTEGER
> actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3)
> last_update): rowcount = 5.0, cumulative cost = {110.0 rows, 140.0
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 224
> 00-04Limit(fetch=[5]) : rowType = RecordType(INTEGER
> actor_id, VARCHAR(45) first_name, VARCHAR(45) last_name, TIMESTAMP(3)
> last_update): rowcount = 5.0, cumulative cost = {105.0 rows, 120.0
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 223
> 00-05  Jdbc(sql=[SELECT * FROM "public"."actor" ]) :
> rowType = RecordType(INTEGER actor_id, VARCHAR(45) first_name,
> VARCHAR(45) last_name, TIMESTAMP(3) last_update): rowcount = 100.0,
> cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0
> memory}, id = 164
>
> Regards,
> Rahul
>
> On Fri, Oct 19, 2018 at 8:47 PM Rahul Raj  wrote:
>
> > I will make the changes and update you.
> >
> > Regards,
> > Rahul
> >
> > On Fri, Oct 19, 2018 at 1:05 AM Vitalii Diravka 
> > wrote:
> >
> >> Rahul,
> >>
> >> Possibly *JdbcGroupScan* can be improved, for instance by overriding
> >> *supportsLimitPushdown()* and *applyLimit()* methods,
> >> *double rows *field can be updated by the limit value.
> >>
> >> I've performed the following query: select * from mysql.`testdb`.`table`
> >> limit 2;
> >> but the following one is passed to MySQL: SELECT * FROM `testdb`.`table`
> >>
> >>
> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java#L187
> >> So it is definitely should be improved.
> >>
> >> *Note:* Changed mailing list to devs.
> >>
> >> On Sun, Oct 14, 2018 at 6:30 AM Rahul Raj  wrote:
> >>
> >> > Vitalii,
> >> >
> >> > Created documentation ticket DRILL-6794
> >> >
> >> > How do we proceed on extending the scan operators to support JDBC
> >> plugins?
> >> >
> >> > Regards,
> >> > Rahul
> >> >
> >> > On Sat, Oct 13, 2018 at 6:47 PM Vitalii Diravka 
> >> > wrote:
> >> >
> >> > > To update the documentation, since that issues were solved by using
> >> these
> >> > > properties in connection URL:
> >> > > defaultRowFetchSize=1  [1]
> >> > > defaultAutoCommit=false[2]
> >> > > The full URL was there "url": "jdbc:postgresql://
> >> > >
> >> > >
> >> >
> >>
> myhost.mydomain.com/mydb?useCursorFetch=true=false=TRACE=/tmp/jdbc.log=1
> >> > > "
> >> > >
> >> > > If some issues are still present, it is also reasonable to create
> >> tickets
> >> > > to track them.
> >> > >
> >> > > [1]
> >> > >
> >> > >
> >> >
> >>
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAo

Re: Drill JDBC Plugin limit queries

2018-10-18 Thread Vitalii Diravka

Rahul,

Possibly *JdbcGroupScan* can be improved, for instance by overriding
*supportsLimitPushdown()* and *applyLimit()* methods,
*double rows *field can be updated by the limit value.

I've performed the following query: select * from mysql.`testdb`.`table`
limit 2;
but the following one is passed to MySQL: SELECT * FROM `testdb`.`table`
https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java#L187
So it is definitely should be improved.

*Note:* Changed mailing list to devs.

On Sun, Oct 14, 2018 at 6:30 AM Rahul Raj  wrote:

> Vitalii,
>
> Created documentation ticket DRILL-6794
>
> How do we proceed on extending the scan operators to support JDBC plugins?
>
> Regards,
> Rahul
>
> On Sat, Oct 13, 2018 at 6:47 PM Vitalii Diravka 
> wrote:
>
> > To update the documentation, since that issues were solved by using these
> > properties in connection URL:
> > defaultRowFetchSize=1  [1]
> > defaultAutoCommit=false[2]
> > The full URL was there "url": "jdbc:postgresql://
> >
> >
> myhost.mydomain.com/mydb?useCursorFetch=true=false=TRACE=/tmp/jdbc.log=1
> > "
> >
> > If some issues are still present, it is also reasonable to create tickets
> > to track them.
> >
> > [1]
> >
> >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
> > [2]
> >
> >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E
> >
> > On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj  wrote:
> >
> > > Should I create tickets to track these issues or should I create a
> ticket
> > > to update the documentation?
> > >
> > > Rahul
> > >
> > > On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka 
> > > wrote:
> > >
> > > > 1. You are right, it means it is reasonable to extend this rule for
> > > > applying on other Scan operators (or possibly to create the separate
> > > one).
> > > > 2. There was a question about OOM issues in Drill + PostgreSQL,
> please
> > > take
> > > > a look [1].
> > > > Since you are trying to setup this configs, It will be good, if
> you
> > > > create a Jira ticket to add this info to Drill docs [2]
> > > >
> > > > [1]
> > > >
> > https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
> > > > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
> > > >
> > > > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj 
> > wrote:
> > > >
> > > > > Regarding the heap out of error, it could be that the jdbc driver
> is
> > > > > prefetching the entire record set to memory. I just had a look at
> > > > > JdbcRecordReader, looks like by setting
> connection#autoCommit(false)
> > > and
> > > > a
> > > > > sufficient fetch size we could force the driver to stream data as
> > > > required.
> > > > > This is how postgres driver works.
> > > > >
> > > >
> > >
> >
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
> > > > >
> > > > > We will have to see the behaviour of other drivers too.
> > > > >
> > > > > Let me know your thoughts here.
> > > > >
> > > > > Regards,
> > > > > Rahul
> > > > >
> > > > >
> > > > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj 
> > > wrote:
> > > > >
> > > > > > Hi Vitalii,
> > > > > >
> > > > > > There are two concrete implementations of the class -
> > > > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > > > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > > > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies to
> > > > Parquet.
> > > > > > And pushdown only apply limit but not offset"
> > > > > >
> > > > > > Also I enabled debug mode and found LIMIT is not getting pushed
> to
> > > the
> > > > > > query.
> > > > > > LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
> {83.0
> > > > rows,
> > > > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory},

Re: Drill JDBC Plugin limit queries

2018-10-13 Thread Vitalii Diravka

To update the documentation, since that issues were solved by using these
properties in connection URL:
defaultRowFetchSize=1  [1]
defaultAutoCommit=false[2]
The full URL was there "url": "jdbc:postgresql://
myhost.mydomain.com/mydb?useCursorFetch=true=false=TRACE=/tmp/jdbc.log=1
"

If some issues are still present, it is also reasonable to create tickets
to track them.

[1]
https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
[2]
https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E

On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj  wrote:

> Should I create tickets to track these issues or should I create a ticket
> to update the documentation?
>
> Rahul
>
> On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka 
> wrote:
>
> > 1. You are right, it means it is reasonable to extend this rule for
> > applying on other Scan operators (or possibly to create the separate
> one).
> > 2. There was a question about OOM issues in Drill + PostgreSQL, please
> take
> > a look [1].
> > Since you are trying to setup this configs, It will be good, if you
> > create a Jira ticket to add this info to Drill docs [2]
> >
> > [1]
> > https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
> > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
> >
> > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj  wrote:
> >
> > > Regarding the heap out of error, it could be that the jdbc driver is
> > > prefetching the entire record set to memory. I just had a look at
> > > JdbcRecordReader, looks like by setting connection#autoCommit(false)
> and
> > a
> > > sufficient fetch size we could force the driver to stream data as
> > required.
> > > This is how postgres driver works.
> > >
> >
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
> > >
> > > We will have to see the behaviour of other drivers too.
> > >
> > > Let me know your thoughts here.
> > >
> > > Regards,
> > > Rahul
> > >
> > >
> > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj 
> wrote:
> > >
> > > > Hi Vitalii,
> > > >
> > > > There are two concrete implementations of the class -
> > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies to
> > Parquet.
> > > > And pushdown only apply limit but not offset"
> > > >
> > > > Also I enabled debug mode and found LIMIT is not getting pushed to
> the
> > > > query.
> > > > LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {83.0
> > rows,
> > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> > > >   UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0
> rows,
> > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> > > > LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
> {61.0
> > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> > > >   JdbcPrel(sql=[SELECT * FROM "u_g001"."executioniteration"
> > WHERE
> > > > "id" > 36050 ]): rowcount = 50.0, cumulative cost = {50.0 rows,
> > 50.0
> > > cpu
> > > >
> > > > Regarding the second point, its the java heap getting filled with
> jdbc
> > > > results. How do we address this?
> > > >
> > > > Regards,
> > > > Rahul
> > > >
> > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka 
> > > > wrote:
> > > >
> > > >> Hi Rahul,
> > > >>
> > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do this
> > > >> optimization, whether the GroupScan supports Limit Push Down.
> > > >> Also you can verify in debug mode whether this rule is fired.
> > > >> Possibly for some external DB (like MapR-DB) Drill should have the
> > > >> separate
> > > >> class for this optimization [2].
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule

Re: Drill JDBC Plugin limit queries

2018-10-13 Thread Vitalii Diravka

1. You are right, it means it is reasonable to extend this rule for
applying on other Scan operators (or possibly to create the separate one).
2. There was a question about OOM issues in Drill + PostgreSQL, please take
a look [1].
Since you are trying to setup this configs, It will be good, if you
create a Jira ticket to add this info to Drill docs [2]

[1] https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
[2] https://drill.apache.org/docs/rdbms-storage-plugin/

On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj  wrote:

> Regarding the heap out of error, it could be that the jdbc driver is
> prefetching the entire record set to memory. I just had a look at
> JdbcRecordReader, looks like by setting connection#autoCommit(false) and a
> sufficient fetch size we could force the driver to stream data as required.
> This is how postgres driver works.
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
>
> We will have to see the behaviour of other drivers too.
>
> Let me know your thoughts here.
>
> Regards,
> Rahul
>
>
> On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj  wrote:
>
> > Hi Vitalii,
> >
> > There are two concrete implementations of the class -
> > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > LIMIT_ON_SCAN has a comment mentioning "For now only applies to Parquet.
> > And pushdown only apply limit but not offset"
> >
> > Also I enabled debug mode and found LIMIT is not getting pushed to the
> > query.
> > LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {83.0 rows,
> > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> >   UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0 rows,
> > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> > LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {61.0
> > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> >   JdbcPrel(sql=[SELECT * FROM "u_g001"."executioniteration" WHERE
> > "id" > 36050 ]): rowcount = 50.0, cumulative cost = {50.0 rows, 50.0
> cpu
> >
> > Regarding the second point, its the java heap getting filled with jdbc
> > results. How do we address this?
> >
> > Regards,
> > Rahul
> >
> > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka 
> > wrote:
> >
> >> Hi Rahul,
> >>
> >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do this
> >> optimization, whether the GroupScan supports Limit Push Down.
> >> Also you can verify in debug mode whether this rule is fired.
> >> Possibly for some external DB (like MapR-DB) Drill should have the
> >> separate
> >> class for this optimization [2].
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> >> [2]
> >>
> >>
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> >>
> >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj  wrote:
> >>
> >> > Hi,
> >> >
> >> > Drill does not push the LIMIT queries to external databases and I
> >> assume it
> >> > could be more related to Calcite. This leads to out of memory
> situations
> >> > while querying large table to view few records.  Is there something
> that
> >> > could be improved here? One solutions would be to push filters down to
> >> the
> >> > DB and/or combined with some JDBC batch size limit to flush a part as
> >> > parquet.
> >> >
> >> > Regards,
> >> > Rahul
> >> >
> >> > --
> >> > _*** This email and any files transmitted with it are confidential and
> >> > intended solely for the use of the individual or entity to whom it is
> >> > addressed. If you are not the named addressee then you should not
> >> > disseminate, distribute or copy this e-mail. Please notify the sender
> >> > immediately and delete this e-mail from your system.***_
> >> >
> >>
> >
>
> --
> _*** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.***_
>

Re: Drill JDBC Plugin limit queries

2018-10-12 Thread Vitalii Diravka

Hi Rahul,

Drill has *DrillPushLimitToScanRule* [1] rule, which should do this
optimization, whether the GroupScan supports Limit Push Down.
Also you can verify in debug mode whether this rule is fired.
Possibly for some external DB (like MapR-DB) Drill should have the separate
class for this optimization [2].

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
[2]
https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37

On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj  wrote:

> Hi,
>
> Drill does not push the LIMIT queries to external databases and I assume it
> could be more related to Calcite. This leads to out of memory situations
> while querying large table to view few records.  Is there something that
> could be improved here? One solutions would be to push filters down to the
> DB and/or combined with some JDBC batch size limit to flush a part as
> parquet.
>
> Regards,
> Rahul
>
> --
> _*** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.***_
>

Re: [ANNOUNCE] New Committer: Chunhui Shi

2018-09-28 Thread Vitalii Diravka

Congrats Chunhui and welcome to the Drill comittership!

On Fri, Sep 28, 2018 at 11:23 PM Timothy Farkas  wrote:

> Congrats!
>
> On Fri, Sep 28, 2018 at 1:17 PM Sorabh Hamirwasia 
> wrote:
>
> > Congratulations Chunhui!!
> >
> > Thanks,
> > Sorabh
> >
> > On Fri, Sep 28, 2018 at 12:56 PM Paul Rogers 
> > wrote:
> >
> > > Congrats Chunhui!
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > > On Friday, September 28, 2018, 2:17:42 AM PDT, Arina Ielchiieva <
> > > ar...@apache.org> wrote:
> > >
> > >  The Project Management Committee (PMC) for Apache Drill has invited
> > > Chunhui
> > > Shi to become a committer, and we are pleased to announce that he has
> > > accepted.
> > >
> > > Chunhui Shi has become a contributor since 2016, making changes in
> > various
> > > Drill areas. He has shown profound knowledge in Drill planning side
> > during
> > > his work to support lateral join. He is also one of the contributors of
> > the
> > > upcoming feature to support index based planning and execution.
> > >
> > > Welcome Chunhui, and thank you for your contributions!
> > >
> > > - Arina
> > > (on behalf of Drill PMC)
> > >
> >
>

Re: How does apache drill support DateTimeOffset type?

2018-09-20 Thread Vitalii Diravka

Hi,

There is no screenshot in this mail. Please upload it to Google Drive and
share here.
Also what do you mean DateTimeOffset type? There is no DateTimeOffset
logical Date/Time datatype in parquet [1].
Please use "parquet-tools" to define the schema of your parquet file [2].

[1]
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#datetime-types
[2] https://github.com/apache/parquet-mr/tree/master/parquet-tools#build

On Thu, Sep 20, 2018 at 8:55 AM chen xinyi 
wrote:

> Hi,
>
>
> Recently, I've encountered issue to query table with parquet file, there
> is column with DateTimeOffset type,
>
> but apache drill couldn't read DateTimeOffset as well, see below
> screenshot, I tried 'cast', 'convert_to' and other functions , neither
> works.
>
>
> Do you any plan to support DateTimeOffset with parquet?
>
>
>
>
> Thanks!
>
> Simon
>

Re: Long running query succeeds but UI times out?

2018-09-15 Thread Vitalii Diravka

Hi James,

This is the mail for user mailing list.
There is no attachment, please upload it to Google Drive, for instance, and
give us the link.

Did you try to use Drill SqlLine?


Kind regards
Vitalii


On Sat, Sep 15, 2018 at 7:45 PM James Barney 
wrote:

> Hey,
> I've had pretty great success using drill on top of S3 but I'm hitting one
> big issue: a "long running" query (more than 4.5 minutes) will succeed
> after submitting but the UI times out with  'network error (tcp error):
> ""'. See attachment.
>
> Basics:
> Running Drill 1.14 on Amazon Linux. Only modification I made is this
> parameter at runtime to drill-env.sh for reading encrypted files from S3:
> export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS
> -Dcom.amazonaws.services.s3.enableV4"
>
> To simplify things I'm just on one drill node with this query:
> select distinct(column_name) from s3.`/path/to/files/year/month/day/hour/`
>
> All the files are well-formed parquet files and querying any single file
> returns fine in a few seconds. When I scale the cluster up to 50+ nodes,
> the query obviously returns much faster and no time out occurs. However,
> more complicated/higher data volume queries (ie, querying a whole days
> worth of data instead of one hour) suffer the same timeout.
>
> Are there settings I can tweak to prevent this timeout from occurring? Can
> I save the results of the query somewhere since it's succeeding in the
> background?
>
> Drill demolishes our current solution with its performance and we really
> want to use it but this bug is making it tricky to sell.
>
> Thanks,
> James
>

Re: Drill Hangout tomorrow 08/21

2018-09-12 Thread Vitalii Diravka

Oleksandr,

You couldn't connect to this hangout meeting. But you can share your ideas
in the answer to our last comment regarding Drill Metastore [1].
Could you please take a look?

[1]
https://issues.apache.org/jira/browse/DRILL-6552?focusedCommentId=16612437=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16612437

Kind regards
Vitalii


On Wed, Aug 22, 2018 at 8:28 AM Hanumath Rao Maduri 
wrote:

> Hangout attendees on 08/21:
> Pritesh, Salim, Hanumath, Boaz, Robert, Jyothsna, Karthik, Gautam, Vitalli,
> Vova, Parth, Olek
>
> Vitalli and Vova gave a presentation on Drill Metadata management project.
>
> Some of the questions which were discussed during the discussion.
> 1) Gautam suggested to use native operators for collecting stats instead of
> aggregation operators.
> 2) The metadata API should be made abstract such that metastore can use a
> dfs or hive metastore etc.
> 3) Schema change exception can be minimized by hive metastore but not
> totally overcome.
> 4) Discussion on how to refresh the metadata.
> 5) Caching the metadata and discussion on what problems the eariler caching
> solutions had in Drill.
>
>
> Further metadata discussion will be continued in the next hangout.
>
> -Hanu
>
> On Tue, Aug 21, 2018 at 9:53 AM Vitalii Diravka  >
> wrote:
>
> > Hi Alex,
> >
> > The issues pointed by you really exist. And using of HMS is still open
> > question.
> >
> > The main goal is to make Drill Metastore API, which can be used for
> > different Drill data sources. Then to adapt current Parquet metadata
> cache
> > files mechanism to this API.
> > It will be the first implementation. The second one could be HMS.
> > Although it has limitations, it has also benefits: it is easy to leverage
> > it in Drill, a lot of projects already use HMS (Spark, Presto ...),
> > so for some users it can be a good choice for storing metadata.
> >
> > Other implementations for Drill Metastore could be discussed (MetaCat,
> > WhereHow, new own implementation based on HBase/MapR-DB).
> >
> >
> > Kind regards
> > Vitalii
> >
> >
> > On Tue, Aug 21, 2018 at 7:04 PM Oleksandr Kalinin 
> > wrote:
> >
> > > Hi Volodymyr,
> > >
> > > Just recalling on recent discussions in DEV list, it would be
> interesting
> > > to see if following topics are addressed in the Drill metadata
> management
> > > initiative:
> > >
> > > 1. Avoiding repetition of Hive mistakes (mainly relying on RDBMS)
> > > Just to substantiate this point of view from practical experience, and
> if
> > > we reflect on ambition to integrate and operate Drill in
> mission-critical
> > > environment, following aspects could be listed:
> > >   - Need of DBA support if cluster is subject to service level
> > > objectives/agreements, which is somehow remote from Hadoop world. Need
> of
> > > strong DBA skills if resulting DB workload is challenging in terms of
> > > performance tuning.
> > >   - Common RDBMS setups offer active-standby HA model. In secure
> > > environments, e.g. environments which are subject to PCI-DSS
> compliancy,
> > > that implies frequent OS patching and reboot (in reality every 30 days
> > > max), thus causing an additional coordination effort and service outage
> > for
> > > duration of the failovers.
> > >   - Active-active HA clusters like Galera / Percona are free of above
> > > disadvantage, but require specific skill set which is not widespread in
> > DBA
> > > community. Also they are sensitive to even disk IO performance across
> the
> > > cluster which may require additional hardware adjustment and IO
> > isolation.
> > >   - Need of backup / restore mechanism, which is probably lesser of
> > > concerns
> > >
> > > 2. Bottleneck in foreman when performing initial metadata collection
> (and
> > > eventually pruning) on large amount of Parquet files
> > >   - From discussion in the mailing list it was not fully clear whether
> > > metastore will address it
> > >   - Or shall this discussion be continued outside of metastore
> initiative
> > > from your point of view?
> > >
> > > I hope it would be OK with you and Vitalii to share some thoughts on
> > this.
> > >
> > > Thanks & Best Regards,
> > > Alex
> > >
> > > On Mon, Aug 20, 2018 at 10:50 PM Volodymyr Vysotskyi <
> > volody...@apache.org
> > > >
> > > wrote:
> > >
> > > > Hi all,

Re: [ANNOUNCE] New PMC member: Charles Givre

2018-09-03 Thread Vitalii Diravka

Congrats Charles!
And thank you for your enthusiasm and work on Drill

On Mon, Sep 3, 2018 at 4:22 PM Arina Ielchiieva  wrote:

> I am pleased to announce that Drill PMC invited Charles Givre to the PMC
> and he has accepted the invitation.
>
> Congratulations Charles and welcome to PMC squad :)
>
> - Arina
> (on behalf of Drill PMC)
>

Re: Drill High availability

2018-08-22 Thread Vitalii Diravka

Hi Satish!

I think it really depends on your system, cluster configurations and use
cases for which you are using Apache Drill.
Please specify them, to do more accurate suggestions for you.

Kind regards
Vitalii

On Wed, Aug 22, 2018 at 5:00 AM pujari Satish 
wrote:

> Hi Team,
>
> Good Evening. I am Satish i need your help regarding Apache drill high
> availability using Haproxy. Is it possible to make drill high
> availability.I am checking so many sites still i am struck . please guide
> me in this.please let me know how to approach in this.
>
>
> -Thanks,
> Satish
>

Re: Drill Hangout tomorrow 08/21

2018-08-21 Thread Vitalii Diravka

Hi Alex,

The issues pointed by you really exist. And using of HMS is still open
question.

The main goal is to make Drill Metastore API, which can be used for
different Drill data sources. Then to adapt current Parquet metadata cache
files mechanism to this API.
It will be the first implementation. The second one could be HMS.
Although it has limitations, it has also benefits: it is easy to leverage
it in Drill, a lot of projects already use HMS (Spark, Presto ...),
so for some users it can be a good choice for storing metadata.

Other implementations for Drill Metastore could be discussed (MetaCat,
WhereHow, new own implementation based on HBase/MapR-DB).


Kind regards
Vitalii


On Tue, Aug 21, 2018 at 7:04 PM Oleksandr Kalinin 
wrote:

> Hi Volodymyr,
>
> Just recalling on recent discussions in DEV list, it would be interesting
> to see if following topics are addressed in the Drill metadata management
> initiative:
>
> 1. Avoiding repetition of Hive mistakes (mainly relying on RDBMS)
> Just to substantiate this point of view from practical experience, and if
> we reflect on ambition to integrate and operate Drill in mission-critical
> environment, following aspects could be listed:
>   - Need of DBA support if cluster is subject to service level
> objectives/agreements, which is somehow remote from Hadoop world. Need of
> strong DBA skills if resulting DB workload is challenging in terms of
> performance tuning.
>   - Common RDBMS setups offer active-standby HA model. In secure
> environments, e.g. environments which are subject to PCI-DSS compliancy,
> that implies frequent OS patching and reboot (in reality every 30 days
> max), thus causing an additional coordination effort and service outage for
> duration of the failovers.
>   - Active-active HA clusters like Galera / Percona are free of above
> disadvantage, but require specific skill set which is not widespread in DBA
> community. Also they are sensitive to even disk IO performance across the
> cluster which may require additional hardware adjustment and IO isolation.
>   - Need of backup / restore mechanism, which is probably lesser of
> concerns
>
> 2. Bottleneck in foreman when performing initial metadata collection (and
> eventually pruning) on large amount of Parquet files
>   - From discussion in the mailing list it was not fully clear whether
> metastore will address it
>   - Or shall this discussion be continued outside of metastore initiative
> from your point of view?
>
> I hope it would be OK with you and Vitalii to share some thoughts on this.
>
> Thanks & Best Regards,
> Alex
>
> On Mon, Aug 20, 2018 at 10:50 PM Volodymyr Vysotskyi  >
> wrote:
>
> > Hi all,
> >
> > I and Vitalii Diravka want to give the presentation with our ideas
> > connected with Drill Metadata management project (DRILL-6552
> > <https://issues.apache.org/jira/browse/DRILL-6552>).
> >
> > We will be happy to discuss it and choose the right way for further
> > development.
> >
> > Kind regards,
> > Volodymyr Vysotskyi
> >
> >
> > On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri  >
> > wrote:
> >
> > > The Apache Drill Hangout will be held tomorrow at 10:00am PST; please
> let
> > > us know should you have a topic for tomorrow's hangout. We will also
> ask
> > > for topics at the beginning of the hangout.
> > >
> > > Hangout Link -
> > >
> https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
> > >
> > > Regards,
> > > Hanu
> > >
> >
>

Re: [ANNOUNCE] New PMC member: Boaz Ben-Zvi

2018-08-17 Thread Vitalii Diravka

Congrats Boaz!

Kind regards
Vitalii


On Fri, Aug 17, 2018 at 12:51 PM Arina Ielchiieva  wrote:

> I am pleased to announce that Drill PMC invited Boaz Ben-Zvi to the PMC and
> he has accepted the invitation.
>
> Congratulations Boaz and thanks for your contributions!
>
> - Arina
> (on behalf of Drill PMC)
>

Re: distributed drill on local file system

2018-08-16 Thread Vitalii Diravka

Hi Mehran,

This is a question for user mailing list.

Looks like there are no issues with it, you can run Drill in distributed
mode on Windows, Linux or MacOS based machines.
It necessary to specify *zk.connect* for Zookeeper hostname and port number
in *drill-override.conf* file and to run *>bin/drillbit.sh start *[1].
But a Hadoop cluster is recommended for this purpose [2], therefore not
sure which issues can arise with this system.

[1]
https://drill.apache.org/docs/starting-drill-in-distributed-mode/#drillbit.sh-command-syntax
[2] https://drill.apache.org/docs/distributed-mode-prerequisites/

Kind regards
Vitalii

On Thu, Aug 16, 2018 at 7:11 PM Mehran Dashti [ BR - PD ] <
m_das...@behinrahkar.com> wrote:

> Hi,
>
> I wanted to know if it is possible or possible by minimal effort to have
> distributed drills that work on local file system of their own?
>
> We  do not want to have HDFS as file system?
>
>
>
> Thank you in advance.
>
>
>
>
>
> *Best Regards,*
>
>
>
> *  [image: LOGO1]*
>
> *Mehran Dashti*
>
> *Product Leader*
>
> *09125902452*
>
>
>

Re: Requesting guidance. Having trouble generating parquet files from jdbc connection to PostgreSQL. "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2018-08-15 Thread Vitalii Diravka

Glad to see that it helps and you've solved the issue.

I have seen you asked about PostgreSQL JSONB in another topic,
but looks like it is not supported by now and should be implemented in
context of DRILL-5087 <https://issues.apache.org/jira/browse/DRILL-5087>



Kind regards
Vitalii


On Wed, Aug 15, 2018 at 3:36 PM Reid Thompson 
wrote:

> Vitalii,
>
> yes. Per https://jdbc.postgresql.org/documentation/head/connect.html
> (page lists numerous settings available)
>
> defaultRowFetchSize = int
>
> Determine the number of rows fetched in ResultSet by one fetch with trip
> to the database. Limiting the number of rows are fetch with each trip to
> the database allow avoids unnecessary memory
> consumption and as a consequence OutOfMemoryException.
>
> The default is zero, meaning that in ResultSet will be fetch all rows at
> once. Negative number is not available.
>
>
> on another topic,
> is there any way to have drill properly recognize postgresql's json and
> jsonb types?  I have tables with both, and am getting this error
>
>  org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION
>  ERROR: A column you queried has a data type that is not currently
>  supported by the JDBC storage plugin. The column's name was actionjson
>  and its JDBC data type was OTHER.
>
>
> thanks,
> reid
>
> On Wed, 2018-08-15 at 14:44 +0300, Vitalii Diravka wrote:
> > [EXTERNAL SOURCE]
> >
> > Hi Reid,
> >
> > Am I right, defaultRowFetchSize=1 property in URL solves that OOM
> issue?
> > If so possibly it can be useful to have this information in Drill docs
> [1].
> >
> > [1] https://drill.apache.org/docs/rdbms-storage-plugin/
> >
> > Kind regards
> > Vitalii
> >
> >
> > On Tue, Aug 14, 2018 at 4:17 PM Reid Thompson <
> reid.thomp...@omnicell.com> wrote:
> > > using the below parameters in the URL and looking in the defined
> logfile
> > > indicates that the fetch size is being set to 1, as expected.
> > >
> > > just to note that it appears that the param defaultRowFetchSize sets
> the
> > > fetch size and signifies that a cursor should be used.  It is different
> > > from the originally noted defaultFetchSize param, and it appears that
> > > postgresql doesn't require the useCursorFetch=true or the
> defaultAutoCommit=false.
> > >
> > > ...snip..
> > >   "url": "jdbc:postgresql://
> myhost.mydomain.com/mydb?useCursorFetch=true=false=TRACE=/tmp/jdbc.log=1
> ",
> > > ...snip..
> > >
> > >
> > >
> > > On Tue, 2018-08-14 at 07:26 -0400, Reid Thompson wrote:
> > > > attempting with the below still fails.
> > > > looking at pg_stat_activity it doesn't appear that a cursor is being
> > > > created.  It's still attempting to pull all the data at once.
> > > >
> > > > thanks,
> > > > reid
> > > > On Mon, 2018-08-13 at 14:18 -0400, Reid Thompson wrote:
> > > > > Vitalii,
> > > > >
> > > > > Ok, thanks, I had found that report, but didn't note the option
> related
> > > > > to defaultAutoCommit.
> > > > > > [1] https://issues.apache.org/jira/browse/DRILL-4177
> > > > >
> > > > >
> > > > > so, something along the lines of
> > > > >
> > > > > ..snip..
> > > > >   "url": "jdbc:postgresql://
> myhost.mydomain.com/ateb?useCursorFetch=true=1=false
> ",
> > > > > ..snip..
> > > > >
> > > > >
> > > > > thanks,
> > > > > reid
> > > > >
> > > > > On Mon, 2018-08-13 at 20:33 +0300, Vitalii Diravka wrote:
> > > > > > [EXTERNAL SOURCE]
> > > > > >
> > > > > > Hi Reid,
> > > > > >
> > > > > > Look like your issue is similar to DRILL-4177 [1].
> > > > > > It was related to MySQL connection. Looks like the similar issue
> is with PostgreSQL.
> > > > > > Looking at the Postgres documentation, the code needs to
> explicitly set the connection autocommit mode
> > > > > > to false e.g. conn.setAutoCommit(false) [2]. For data size of 10
> million plus, this is a must.
> > > > > >
> > > > > > You could disable "Auto Commit" option as session option [3]
> > > > > > or to do it within plugin config URL with the following
> property: defaultAutoCommit

Re: Requesting guidance. Having trouble generating parquet files from jdbc connection to PostgreSQL. "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2018-08-15 Thread Vitalii Diravka

Hi Reid,

Am I right, *defaultRowFetchSize=1 *property in URL solves that OOM
issue?
If so possibly it can be useful to have this information in Drill docs [1].

[1] https://drill.apache.org/docs/rdbms-storage-plugin/

Kind regards
Vitalii


On Tue, Aug 14, 2018 at 4:17 PM Reid Thompson 
wrote:

> using the below parameters in the URL and looking in the defined logfile
> indicates that the fetch size is being set to 1, as expected.
>
> just to note that it appears that the param defaultRowFetchSize sets the
> fetch size and signifies that a cursor should be used.  It is different
> from the originally noted defaultFetchSize param, and it appears that
> postgresql doesn't require the useCursorFetch=true or the
> defaultAutoCommit=false.
>
> ...snip..
>   "url": "jdbc:postgresql://
> myhost.mydomain.com/mydb?useCursorFetch=true=false=TRACE=/tmp/jdbc.log=1
> ",
> ...snip..
>
>
>
> On Tue, 2018-08-14 at 07:26 -0400, Reid Thompson wrote:
> > attempting with the below still fails.
> > looking at pg_stat_activity it doesn't appear that a cursor is being
> > created.  It's still attempting to pull all the data at once.
> >
> > thanks,
> > reid
> > On Mon, 2018-08-13 at 14:18 -0400, Reid Thompson wrote:
> > > Vitalii,
> > >
> > > Ok, thanks, I had found that report, but didn't note the option related
> > > to defaultAutoCommit.
> > > > [1] https://issues.apache.org/jira/browse/DRILL-4177
> > >
> > >
> > > so, something along the lines of
> > >
> > > ..snip..
> > >   "url": "jdbc:postgresql://
> myhost.mydomain.com/ateb?useCursorFetch=true=1=false
> ",
> > > ..snip..
> > >
> > >
> > > thanks,
> > > reid
> > >
> > > On Mon, 2018-08-13 at 20:33 +0300, Vitalii Diravka wrote:
> > > > [EXTERNAL SOURCE]
> > > >
> > > > Hi Reid,
> > > >
> > > > Look like your issue is similar to DRILL-4177 [1].
> > > > It was related to MySQL connection. Looks like the similar issue is
> with PostgreSQL.
> > > > Looking at the Postgres documentation, the code needs to explicitly
> set the connection autocommit mode
> > > > to false e.g. conn.setAutoCommit(false) [2]. For data size of 10
> million plus, this is a must.
> > > >
> > > > You could disable "Auto Commit" option as session option [3]
> > > > or to do it within plugin config URL with the following property:
> defaultAutoCommit=false [4]
> > > >
> > > > [1] https://issues.apache.org/jira/browse/DRILL-4177
> > > > [2]
> https://jdbc.postgresql.org/documentation/93/query.html#fetchsize-example
> > > > [3]
> https://www.postgresql.org/docs/9.3/static/ecpg-sql-set-autocommit.html
> > > > [4] https://jdbc.postgresql.org/documentation/head/ds-cpds.html
> > > >
> > > > Kind regards
> > > > Vitalii
> > > >
> > > >
> > > > On Mon, Aug 13, 2018 at 3:03 PM Reid Thompson <
> reid.thomp...@omnicell.com> wrote:
> > > > > My standalone host is configured with 16GB RAM, 8 cpus.  Using
> > > > > drill-embedded (single host standalone), I am attempting to pull
> data
> > > > > from PostgreSQL tables to parquet files via CTAS. Smaller datasets
> work
> > > > > fine, but larger data sets fail (for example ~11GB) with
> > > > > "java.lang.OutOfMemoryError: GC overhead limit exceeded"  Can
> someone
> > > > > advise on how to get past this?
> > > > >
> > > > > Is there a way to have drill stream this data from PostgreSQL to
> parquet
> > > > > files on disk, or does the data set have to be completely loaded
> into
> > > > > memory before it can be written to disk?  The documentation
> indicates
> > > > > that drill will spill to disk to avoid memory issues, so I had
> hoped
> > > > > that it would be straightforward to extract from the DB to disk.
> > > > >
> > > > > Should I not be attempting this via CTAS?  What are the other
> options?
> > > > >
> > > > >
> > > > > thanks,
> > > > > reid
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
> >
> >
>
>
>
>

Re: Google Sheets plugin

2018-07-19 Thread Vitalii Diravka

Charles,

+1, I didn't know that there was some implementations for it too.
It will be good if you take a look at plugin provided by bizreach, compare
with your one and prepare a new plugin officially for Drill.

Kind regards
Vitalii


On Thu, Jul 19, 2018 at 11:21 PM Charles Givre  wrote:

> As luck would have it, I also worked on this not knowing that someone else
> was working on it:  https://github.com/cgivre/drill-excel-plugin <
> https://github.com/cgivre/drill-excel-plugin>
> I’d be happy to contribute this for Drill 1.15.  When I was working on
> this, I was thinking that it would be really useful to have a plugin for
> Google Sheets, however, the lack of documentation around storage plugins,
> makes that really difficult.  I’d love to take that project on as well, but
> I really need to better understand storage plugins.
> —C
>
>
>
>
> > On Jul 19, 2018, at 16:17, Kunal Khatua  wrote:
> >
> > Vitalii
> >
> > I think Pedro is referring to this project:
> > https://github.com/bizreach/drill-excel-plugin
> >
> > It would be worth considering adding this if it is mature enough.
> >
> > Pedro
> > Like Vitalii suggested, please create a JIRA. This helps gain visibility
> within the community and someone (including yourself) can consider writing
> a Storage Plugin for it,
> >
> > Kunal
> > On 7/19/2018 12:56:24 PM, Vitalii Diravka 
> wrote:
> > Hi Pedro,
> >
> > Currently there is no any Excel plugin in Drill, but you can export the
> > data from Excel file into CSV file and query it by Drill.
> >
> > There is a ticket for Excel files plugin [1].
> > You can create a similar one for Google sheets.
> >
> > [1] https://issues.apache.org/jira/browse/DRILL-3738
> >
> >
> > Kind regards
> > Vitalii
> >
> >
> > On Thu, Jul 19, 2018 at 3:37 PM Pedro H S Teixeira
> > wrote:
> >
> >> Hello,
> >>
> >> Is there any plugin already developed to connect to a google sheets,
> like
> >> there a stotage plugin for excel files?
> >>
> >> Best Regards,
> >> Pedro
> >>
>
>

Re: Google Sheets plugin

2018-07-19 Thread Vitalii Diravka

Hi Pedro,

Currently there is no any Excel plugin in Drill, but you can export the
data from Excel file into CSV file and query it by Drill.

There is a ticket for Excel files plugin [1].
You can create a similar one for Google sheets.

[1] https://issues.apache.org/jira/browse/DRILL-3738

Kind regards
Vitalii

On Thu, Jul 19, 2018 at 3:37 PM Pedro H S Teixeira 
wrote:

> Hello,
>
> Is there any plugin already developed to connect to a google sheets, like
> there a stotage plugin for excel files?
>
> Best Regards,
> Pedro
>

Re: unit tests

2018-07-01 Thread Vitalii Diravka

Hi Padma,

Looks like you have wrong some hostname or ip in your /etc/hosts.
Please find out more here [1], Sorabh has already answered a similar
question :)

[1]
https://lists.apache.org/thread.html/%3che1pr07mb33068e59f257a4d78f29304d84...@he1pr07mb3306.eurprd07.prod.outlook.com%3E

Kind regards
Vitalii


On Mon, Jul 2, 2018 at 1:06 AM Padma Penumarthy 
wrote:

> I am getting the following error while trying to run unit tests on my mac.
> Anyone has any idea what might be wrong ?
>
> 15:02:01.479 [Client-1] ERROR o.a.d.e.rpc.ConnectionMultiListener - Failed
> to establish connection
> java.util.concurrent.ExecutionException:
> io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection
> refused: homeportal/192.168.1.254:31010
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:54)
> ~[netty-common-4.0.48.Final.jar:4.0.48.Final]
> at
>
> org.apache.drill.exec.rpc.ConnectionMultiListener$ConnectionHandler.operationComplete(ConnectionMultiListener.java:90)
> [classes/:na]
> at
>
> org.apache.drill.exec.rpc.ConnectionMultiListener$ConnectionHandler.operationComplete(ConnectionMultiListener.java:77)
> [classes/:na]
> at
>
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
> [netty-common-4.0.48.Final.jar:4.0.48.Final]
> at
>
> io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
> [netty-common-4.0.48.Final.jar:4.0.48.Final]
> at
>
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
> [netty-common-4.0.48.Final.jar:4.0.48.Final]
> at
>
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
> [netty-common-4.0.48.Final.jar:4.0.48.Final]
> at
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
> [netty-common-4.0.48.Final.jar:4.0.48.Final]
> at
>
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:278)
> [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at
>
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:294)
> [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
> [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at
>
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at
>
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> [netty-transport-4.0.48.Final.jar:4.0.48.Final]
> at
>
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
> [netty-common-4.0.48.Final.jar:4.0.48.Final]
>
>
> Thanks
> Padma
>

Re: Drill error

2018-06-28 Thread Vitalii Diravka

Thank you, Nitin

Kind regards
Vitalii


On Thu, Jun 28, 2018 at 6:04 PM Nitin Pawar  wrote:

> created https://issues.apache.org/jira/browse/DRILL-6551
>
> On Thu, Jun 28, 2018 at 7:43 PM, Vitalii Diravka <
> vitalii.dira...@gmail.com>
> wrote:
>
> > Hi Nitin,
> >
> > This is definitely the regression. Could you please post the Jira ticket
> > with details of your case, data and workaround [1].
> > It will help for developers to solve this issue.
> >
> > [1] https://issues.apache.org/jira/projects/DRILL
> > Thanks.
> >
> > Kind regards
> > Vitalii
> >
> >
> > On Thu, Jun 28, 2018 at 10:50 AM Nitin Pawar 
> > wrote:
> >
> > > I was able to fix this issue by doing concat(string1, ' ', string2)
> > instead
> > > of concat(string1, string2)
> > > Not sure how adding a separator helps but it solved the problem
> > >
> > > On Thu, Jun 28, 2018 at 11:55 AM, Nitin Pawar  >
> > > wrote:
> > >
> > > > Could this cause an issue if one of the field in concat function has
> > > large
> > > > text ?
> > > >
> > > > On Thu, Jun 28, 2018 at 11:10 AM, Nitin Pawar <
> nitinpawar...@gmail.com
> > >
> > > > wrote:
> > > >
> > > >> Hi Khurram,
> > > >>
> > > >> This is a parquet table.
> > > >> all the columns in the table are string columns (even date column is
> > > >> stored as string)
> > > >>
> > > >> I am currently using drill 1.13.0. Same query used to work fine on
> > drill
> > > >> 1.8.0
> > > >>
> > > >> Thanks,
> > > >> Nitin
> > > >>
> > > >> On Thu, Jun 28, 2018 at 12:17 AM, Khurram Faraaz 
> > > >> wrote:
> > > >>
> > > >>> Hi Nitin,
> > > >>>
> > > >>> Can you please share the description of the table (i.e. column
> > types) ?
> > > >>> Is this a parquet table or JSON ?
> > > >>> Also please share the version of Drill and the drillbit.log
> > > >>>
> > > >>> Thanks,
> > > >>> Khurram
> > > >>>
> > > >>> On Wed, Jun 27, 2018 at 9:45 AM, Nitin Pawar <
> > nitinpawar...@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>> > here are the details
> > > >>> >
> > > >>> > query:
> > > >>> >
> > > >>> > select Account Account,Name Name,CONCAT(DateString , string2)
> > > >>> > Merged_String from dfs.tmp.`/nitin/` t1
> > > >>> >
> > > >>> > There is no custom UDF in this query;
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> > On Wed, Jun 27, 2018 at 2:22 PM, Nitin Pawar <
> > > nitinpawar...@gmail.com>
> > > >>> > wrote:
> > > >>> >
> > > >>> > > Hi Vitalii,
> > > >>> > >
> > > >>> > > Thanks for the description.
> > > >>> > > I will try to get the log for this but as this is happening in
> > > >>> production
> > > >>> > > and drill log gets overwritten in our case very fast.
> > > >>> > >
> > > >>> > > Also I will get the query for the same while getting the log
> > > >>> > >
> > > >>> > > I will get the details and update the thread
> > > >>> > >
> > > >>> > > Thanks,
> > > >>> > > Nitin
> > > >>> > >
> > > >>> > > On Tue, Jun 26, 2018 at 7:32 PM, Vitalii Diravka <
> > > >>> > > vitalii.dira...@gmail.com> wrote:
> > > >>> > >
> > > >>> > >> Hi Nitin,
> > > >>> > >>
> > > >>> > >> It happens in the process of reallocation of the size of
> buffers
> > > in
> > > >>> the
> > > >>> > >> memory.
> > > >>> > >> It isn't a User Exception, so it looks like a bug, if you get
> it
> > > in
> > > >>> some
> > > >>> > >> existed plugin.
> > > >>> > >

Re: Drill error

2018-06-28 Thread Vitalii Diravka

Hi Nitin,

This is definitely the regression. Could you please post the Jira ticket
with details of your case, data and workaround [1].
It will help for developers to solve this issue.

[1] https://issues.apache.org/jira/projects/DRILL
Thanks.

Kind regards
Vitalii


On Thu, Jun 28, 2018 at 10:50 AM Nitin Pawar 
wrote:

> I was able to fix this issue by doing concat(string1, ' ', string2) instead
> of concat(string1, string2)
> Not sure how adding a separator helps but it solved the problem
>
> On Thu, Jun 28, 2018 at 11:55 AM, Nitin Pawar 
> wrote:
>
> > Could this cause an issue if one of the field in concat function has
> large
> > text ?
> >
> > On Thu, Jun 28, 2018 at 11:10 AM, Nitin Pawar 
> > wrote:
> >
> >> Hi Khurram,
> >>
> >> This is a parquet table.
> >> all the columns in the table are string columns (even date column is
> >> stored as string)
> >>
> >> I am currently using drill 1.13.0. Same query used to work fine on drill
> >> 1.8.0
> >>
> >> Thanks,
> >> Nitin
> >>
> >> On Thu, Jun 28, 2018 at 12:17 AM, Khurram Faraaz 
> >> wrote:
> >>
> >>> Hi Nitin,
> >>>
> >>> Can you please share the description of the table (i.e. column types) ?
> >>> Is this a parquet table or JSON ?
> >>> Also please share the version of Drill and the drillbit.log
> >>>
> >>> Thanks,
> >>> Khurram
> >>>
> >>> On Wed, Jun 27, 2018 at 9:45 AM, Nitin Pawar 
> >>> wrote:
> >>>
> >>> > here are the details
> >>> >
> >>> > query:
> >>> >
> >>> > select Account Account,Name Name,CONCAT(DateString , string2)
> >>> > Merged_String from dfs.tmp.`/nitin/` t1
> >>> >
> >>> > There is no custom UDF in this query;
> >>> >
> >>> >
> >>> >
> >>> > On Wed, Jun 27, 2018 at 2:22 PM, Nitin Pawar <
> nitinpawar...@gmail.com>
> >>> > wrote:
> >>> >
> >>> > > Hi Vitalii,
> >>> > >
> >>> > > Thanks for the description.
> >>> > > I will try to get the log for this but as this is happening in
> >>> production
> >>> > > and drill log gets overwritten in our case very fast.
> >>> > >
> >>> > > Also I will get the query for the same while getting the log
> >>> > >
> >>> > > I will get the details and update the thread
> >>> > >
> >>> > > Thanks,
> >>> > > Nitin
> >>> > >
> >>> > > On Tue, Jun 26, 2018 at 7:32 PM, Vitalii Diravka <
> >>> > > vitalii.dira...@gmail.com> wrote:
> >>> > >
> >>> > >> Hi Nitin,
> >>> > >>
> >>> > >> It happens in the process of reallocation of the size of buffers
> in
> >>> the
> >>> > >> memory.
> >>> > >> It isn't a User Exception, so it looks like a bug, if you get it
> in
> >>> some
> >>> > >> existed plugin.
> >>> > >> But to say you exactly, please describe your case. What kind of
> >>> query
> >>> > did
> >>> > >> you perform, any UDF's, which data source?
> >>> > >> Also logs can help.
> >>> > >>
> >>> > >> Thanks.
> >>> > >>
> >>> > >> Kind regards
> >>> > >> Vitalii
> >>> > >>
> >>> > >>
> >>> > >> On Tue, Jun 26, 2018 at 1:27 PM Nitin Pawar <
> >>> nitinpawar...@gmail.com>
> >>> > >> wrote:
> >>> > >>
> >>> > >> > Hi,
> >>> > >> >
> >>> > >> > Can someone help me understand below error? and how do I not let
> >>> this
> >>> > >> > happen ??
> >>> > >> >
> >>> > >> > SYSTEM ERROR: IllegalStateException: Tried to remove unmanaged
> >>> buffer.
> >>> > >> >
> >>> > >> > Fragment 0:0
> >>> > >> >
> >>> > >> > [Error Id: bcd510f6-75ee-49a7-b723-7b35d8575623 on
> >>> > >> > ip-10-0-103-63.ec2.internal:31010]
> >>> > >> > Caused By: SYSTEM ERROR: IllegalStateException: Tried to remove
> >>> > >> unmanaged
> >>> > >> > buffer.
> >>> > >> >
> >>> > >> > Fragment 0:0
> >>> > >> >
> >>> > >> > [Error Id: bcd510f6-75ee-49a7-b723-7b35d8575623 on
> >>> > >> > ip-10-0-103-63.ec2.internal:31010]
> >>> > >> >
> >>> > >> > --
> >>> > >> > Nitin Pawar
> >>> > >> >
> >>> > >>
> >>> > >
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Nitin Pawar
> >>> > >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Nitin Pawar
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >> Nitin Pawar
> >>
> >
> >
> >
> > --
> > Nitin Pawar
> >
>
>
>
> --
> Nitin Pawar
>

Re: Drill Hangout tomorrow 06/26

2018-06-26 Thread Vitalii Diravka

Lately Drill Travis Build fails more often because of Travis job time
expires.
The right way is to accelerate Drill execution :)

Nevertheless I believe we should consider excluding some more tests from
Travis Build.
We can add all TPCH tests (
TestTpchLimit0, TestTpchExplain, TestTpchPlanning, TestTpchExplain) to the
SlowTest category.

Is there other solution for this issue? What are other tests are executed
very slowly?

Kind regards
Vitalii

On Tue, Jun 26, 2018 at 3:34 AM Aman Sinha  wrote:

> We'll have the Drill hangout tomorrow Jun26th, 2018 at 10:00 PDT.
>
> If you have any topics to discuss, send a reply to this post or just join
> the hangout.
>
> ( Drill hangout link
>  )
>

Re: Drill error

2018-06-26 Thread Vitalii Diravka

Hi Nitin,

It happens in the process of reallocation of the size of buffers in the
memory.
It isn't a User Exception, so it looks like a bug, if you get it in some
existed plugin.
But to say you exactly, please describe your case. What kind of query did
you perform, any UDF's, which data source?
Also logs can help.

Thanks.

Kind regards
Vitalii

On Tue, Jun 26, 2018 at 1:27 PM Nitin Pawar  wrote:

> Hi,
>
> Can someone help me understand below error? and how do I not let this
> happen ??
>
> SYSTEM ERROR: IllegalStateException: Tried to remove unmanaged buffer.
>
> Fragment 0:0
>
> [Error Id: bcd510f6-75ee-49a7-b723-7b35d8575623 on
> ip-10-0-103-63.ec2.internal:31010]
> Caused By: SYSTEM ERROR: IllegalStateException: Tried to remove unmanaged
> buffer.
>
> Fragment 0:0
>
> [Error Id: bcd510f6-75ee-49a7-b723-7b35d8575623 on
> ip-10-0-103-63.ec2.internal:31010]
>
> --
> Nitin Pawar
>

Re: Drill 1.12 query hive transactional orc table

2018-06-26 Thread Vitalii Diravka

Hi,

Thanks for your question.
Drill started supporting queries on Hive ACID tables past DRILL-1.13.0
version[1].
Please do upgrade onto the last Drill version, then you will able to
perform queries for Hive transnational tables.

[1] https://drill.apache.org/docs/hive-storage-plugin/

Kind regards
Vitalii


On Tue, Jun 26, 2018 at 7:04 AM qi...@tsingning.com 
wrote:

> Hi:
>  I am sorry about that my English is poor.
>  I have a problem and need your help.
>  Drill 1.12 uses Hive 1.2.1.
>  My  Drill 1.12.
>  My Hive version is 1.2.1
>  Things working fine ：  use drill to query normal hive table .
>
> Now a  Hive table :
>  create table db_test.t_test_log(
>   create_time string,
>   log_id string,
>   log_type string)
> clustered by (log_id) into 2 buckets
>   ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\001'  LINES TERMINATED BY
> '\n'
> stored as orc
> tblproperties ('transactional'='true');
> data stream : flume -->hive,it's Quasi real-time insertion.
> Query this table,things working fine with hive sql,but when I use drill to
> query this table it do not work. Then Exception info：
>
>
> ==
> 2018-06-25 16:28:25,650 [24cf5855-cf24-48e7-92c7-be27fbae9370:foreman]
> INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id
> 24cf5855-cf24-48e7-92c7-be27fbae9370: select count(*) cnt  from
> hive.db_test.t_test_log
> 2018-06-25 16:28:25,969 [24cf5855-cf24-48e7-92c7-be27fbae9370:frag:0:0]
> INFO  o.a.d.e.w.fragment.FragmentExecutor -
> 24cf5855-cf24-48e7-92c7-be27fbae9370:0:0: State change requested
> AWAITING_ALLOCATION --> RUNNING
> 2018-06-25 16:28:25,969 [24cf5855-cf24-48e7-92c7-be27fbae9370:frag:0:0]
> INFO  o.a.d.e.w.f.FragmentStatusReporter -
> 24cf5855-cf24-48e7-92c7-be27fbae9370:0:0: State to report: RUNNING
> 2018-06-25 16:28:27,251 [24cf5855-cf24-48e7-92c7-be27fbae9370:frag:0:0]
> ERROR o.a.d.exec.physical.impl.ScanBatch - SYSTEM ERROR: IOException:
> Cannot obtain block length for
> LocatedBlock{BP-2057246263-10.30.208.135-1515072017012:blk_1074371083_630359;
> getBlockSize()=904; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[
> 10.30.208.135:50010,DS-8fc25c0e-3c81-49d5-b6d9-d229129b5525,DISK],
> DatanodeInfoWithStorage[10.31.0.7:50010,DS-e91fa806-0e81-48ca-864f-e9019001822c,DISK],
> DatanodeInfoWithStorage[10.31.76.49:50010
> ,DS-edfb09a8-dc1f-4e8e-b99f-c72a89cd2b1e,DISK]]}
>
> Setup failed for HiveOrcReader
>
> [Error Id: d7a136a7-c880-4356-947f-90e68238a4f0 ]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> IOException: Cannot obtain block length for
> LocatedBlock{BP-2057246263-10.30.208.135-1515072017012:blk_1074371083_630359;
> getBlockSize()=904; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[
> 10.30.208.135:50010,DS-8fc25c0e-3c81-49d5-b6d9-d229129b5525,DISK],
> DatanodeInfoWithStorage[10.31.0.7:50010,DS-e91fa806-0e81-48ca-864f-e9019001822c,DISK],
> DatanodeInfoWithStorage[10.31.76.49:50010
> ,DS-edfb09a8-dc1f-4e8e-b99f-c72a89cd2b1e,DISK]]}
>
> Setup failed for HiveOrcReader
>
> [Error Id: d7a136a7-c880-4356-947f-90e68238a4f0 ]
> at
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
> ~[drill-common-1.12.0.jar:1.12.0]
> at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:213)
> [drill-java-exec-1.12.0.jar:1.12.0]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> [drill-java-exec-1.12.0.jar:1.12.0]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> [drill-java-exec-1.12.0.jar:1.12.0]
> at
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
> [drill-java-exec-1.12.0.jar:1.12.0]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
> [drill-java-exec-1.12.0.jar:1.12.0]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
> [drill-java-exec-1.12.0.jar:1.12.0]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> [drill-java-exec-1.12.0.jar:1.12.0]
> at
> org.apache.drill.exec.test.generated.StreamingAggregatorGen1.doWork(StreamingAggTemplate.java:187)
> [na:na]
> at
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext(StreamingAggBatch.java:181)
> [drill-java-exec-1.12.0.jar:1.12.0]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
> [drill-java-exec-1.12.0.jar:1.12.0]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> [drill-java-exec-1.12.0.jar:1.12.0]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> [drill-java-exec-1.12.0.jar:1.12.0]
> at
>

Re: Unable to run Drill queries on Drill1.13

2018-06-14 Thread Vitalii Diravka

Hi Peter,

Do you use User Authentication [1]? What kind of?
Could you share please drill-override.conf file and connection string,
which you are using to connect to the drillbit?

anonymous user has not access to perform your query.
You should specify username in connecting string [2], which has that access.

If you don't need User Impersonation [3] and Authentication you can disable
it in drill-override.conf file and restart drillbits.

[1] https://drill.apache.org/docs/configuring-user-authentication/
[2]
https://drill.apache.org/docs/configuring-plain-authentication/#connecting-with-sqlline
[3] https://drill.apache.org/docs/configuring-user-impersonation/

Kind regards
Vitalii


On Thu, Jun 14, 2018 at 11:36 AM Peter Edike <
peter.ed...@interswitchgroup.com> wrote:

> Hello Everyone,
>
>
>
> Please I am trying to run a simple query on my drill installation. I keep
> getting the following error
>
>
>
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> IOException: Error getting user info for current user, anonymous Setup
> failed for null Fragment 1:31 [Error Id:
> ee07252c-597e-4602-a06e-583bedd4eaff on BGDTEST5.INTERSWITCH.COM:31010]
>
>
>
> I always see a username text box but I don’t quite like the fact that I
> must enter a username to run queries. How can I disable this behavior
>
>
>
>
> --
>
> *Peter Edike*
> Senior Software Engineer
> Research and Development
> Group Shared Technology
>
> *Office  NO: *
> *Mobile NO: *
> *Email:* peter.ed...@interswitchgroup.com
> Interswitch Limited
> 1648C Oko-Awo Street, Victoria Island Lagos
> Customer Contact Centre 0700-9065000
> *ü* *http://www.interswitchgroup.com* 
>
> 
>
> This e-mail and all  attachments transmitted with it remain the property
> of Interswitch Limited , the information contained herein  are private
> confidential and intended solely for the use of the addressee. If you have
> received this e-mail in error, kindly notify the sender. If you are not the
> addressee, you should not disseminate, distribute or copy this e-mail.
> Kindly notify Interswitch immediately by email if you have received this
> email in error and delete this email and any attachment from your system
> Emails cannot be guaranteed to be secure or error free as the message and
> any attachments could be intercepted, corrupted, lost, delayed, incomplete
> or amended. the contents of this email or its attachments have been scanned
> for all viruses and all reasonable measures have been taken to ensure that
> no viruses are present. Interswitch Limited and its subsidiaries do not
> accept liability for damage caused by this email or any attachments.This
> message has been marked as *CONFIDENTIAL *on *Thursday, June 14, 2018* @ 
> *9:35:50
> AM*
>
>
>

Re: running embedded mode under windows

2018-05-31 Thread Vitalii Diravka

Can you share also the error from the log file?

Kind regards
Vitalii


On Thu, May 31, 2018 at 3:47 AM Divya Gehlot 
wrote:

> Can share the steps you followed ?
>
> On Thu, 31 May 2018 at 11:54 AM, Bo Qiang  wrote:
>
> > Hi,
> >
> > Sorry for this newbie question. I have been following the documentation
> to
> > start drill in embedded mode. But so far, I have no luck. I also tried on
> > two different machines, the error happened.
> >
> >
> >
> > I also cannot access the web console. Thanks in advance!
> >
> > Bo
> >
>

Re: How to deal with Parquet files containing no rows without Drill errors?

2018-05-24 Thread Vitalii Diravka

Hi Dave,

The issue is not in joining, Drill can join empty schemaless table (for
example empty JSON file or empty directory).
DRILL-4517 is exactly describes the issue. You can add your test case with
data to that jira ticket.

Regarding workarounds, I am not aware of any.

Kind regards
Vitalii


On Thu, May 24, 2018 at 5:19 AM Dave Challis 
wrote:

> We've got some processes that dump some reporting data as a bunch of
> parquet files, then runs queries involving joins with those tables (i.e. we
> have a main table which is always non-empty, then a number of link tables
> which join against which can be empty).
>
> The Parquet files contain schema metadata, but some contain no row data.
>
> Trying to join against them in Drill using e.g.
>
> SELECT *
> FROM dfs.`a.parquet` AS A
> JOIN dfs.`b.parquet` AS B ON (A.id=B.id)
> JOIN dfs.`c.parquet` AS C ON (A.id=C.id);
>
> Fails with: "SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 0 has
> no read entries assigned" if either b.parquet or c.parquet contain no rows.
>
> It looks like it might have been reported as an issue here
> https://issues.apache.org/jira/browse/DRILL-4517 , but as it hasn't been
> fixed since 2016, I'm wondering if there are any suggested workarounds for
> the above, rather than waiting for a fix.
>
> In MySQL/Postgres etc., joining against empty tables is fine, so this
> behaviour was a bit unexpected, and is a major blocker for a project I'm
> using Drill for.
>
> Thanks,
> Dave
>

Re: question about views

2018-04-30 Thread Vitalii Diravka

Ted,

The rules are enabled and DRILL-3855 [1] is resolved.
Please try your queries with latest Drill master version.

[1] https://issues.apache.org/jira/browse/DRILL-3855

Kind regards
Vitalii


On Mon, Apr 30, 2018 at 4:31 PM Nicolas Paris  wrote:

> Hi
>
> This looks an interesting design.
>
> Am I correct such view
> would hit the RDBMS for every query ?
> However such view would hit the parquet file only when
> the timestamp predicate would match a partition ?
>
> Any news on a recent test to confirm the design ?
>
> Thanks
>
> 2018-03-20 6:49 GMT+01:00 Ted Dunning :
>
> > Aman,
> >
> > That is exactly the clarification that I needed. I had a hazy memory of a
> > problem in this area, but not enough to actually figure out the current
> > state.
> >
> > In case anybody cares, being able to do this is really handy. The basic
> > idea is to keep long history in files and recent history in a DB. That
> > allows you to create files with data that is advantageously sorted in
> order
> > to get excellent compression. You can get nearly atomic switch-over to
> > newly created files with lazy deletion of database entries by using a
> > reference to a cutoff date in a database row. The file side would only
> look
> > for data before the cutoff and the DB would only look for data after the
> > cut. By positioning new files (created by CTAS on an about to be obsolete
> > part of the DB) before changing the cutoff date, we get apparent
> atomicity.
> >
> > After the switch, and after a reasonable delay beyond that (to let all
> > pending queries finish), the DB can be trimmed.
> >
> > Without a working pushdown through unions, this is all kind of pointless.
> > If that is working now, it would be fabulous.
> >
> > An example of how big a win this can be, consider a use case where we
> want
> > to keep all old states of customer preferences and context (say for a
> > mobile phone). Almost all of the hundreds of settings for an individual
> > would be unchanged even if a few do change. That means that if you could
> > arrange a day (or more) of data by user id, the columnar compression of
> > parquet would crush the data size. This only works, however, if you can
> > collect a fair number of rows for each user. Thus the idea of a hybrid
> > setup.
> >
> >
> >
> > On Mon, Mar 19, 2018 at 11:57 PM, Aman Sinha 
> wrote:
> >
> > > Due to an infinite loop occurring in Calcite planning, we had to
> disable
> > > the filter pushdown past the union (SetOps).  See
> > > https://issues.apache.org/jira/browse/DRILL-3855.
> > > Now that we have rebased on Calcite 1.15.0, we should re-enable this
> and
> > > test and if the pushdown works then the partition pruning on both sides
> > of
> > > the union should automatically work after that.
> > >
> > > Will follow-up on this..
> > >
> > > -Aman
> > >
> > > On Mon, Mar 19, 2018 at 3:02 PM, Kunal Khatua 
> > > wrote:
> > >
> > > > I think Ted's question is 2 fold, with the former being more
> important.
> > > > 1. Can we push filters past a union.
> > > > 2. Will Drill push filters down to the source.
> > > >
> > > > For the latter, it depends on the source.
> > > > For the former, it depends primarily on whether Calcite supports
> this.
> > I
> > > > haven't tried it, so I can't say.
> > > >
> > > > On 3/19/2018 2:22:54 PM, rahul challapalli <
> challapallira...@gmail.com
> > >
> > > > wrote:
> > > > First I would suggest to ignore the view and try out a query which
> has
> > > the
> > > > required filters as part of the subqueries on both sides of the union
> > > (for
> > > > both the database and partitioned parquet data). The plan for such a
> > > query
> > > > should have the answers to your question. If both the subqueries
> > > > independently prune out un-necessary data, using partitions or
> > indexes, I
> > > > don't think adding a union between them would alter that behavior.
> > > >
> > > > -Rahul
> > > >
> > > > On Mon, Mar 19, 2018 at 1:44 PM, Ted Dunning wrote:
> > > >
> > > > > IF I create a view that is a union of partitioned parquet files
> and a
> > > > > database that has secondary indexes, will Drill be able to properly
> > > push
> > > > > down query limits into both parts of the union?
> > > > >
> > > > > In particular, if I have lots of archival data and parquet
> > partitioned
> > > by
> > > > > time but my query only asks for recent data that is in the
> database,
> > > will
> > > > > the query avoid the parquet files entirely (as you would wish)?
> > > > >
> > > > > Conversely, if the data I am asking for is entirely in the archive,
> > > will
> > > > > the query make use of the partitioning on my parquet files
> correctly?
> > > > >
> > > >
> > >
> >
>

Re: Source for drill-calcite

2018-03-28 Thread Vitalii Diravka

Hi Rahul,

Updating of Calcite onto 1.16.0 version is on review.
You can test it if you wish, but it requires some changes in Drill code.
Please find the PR here:
https://github.com/mapr/incubator-calcite/pull/18

Kind regards
Vitalii

On Wed, Mar 28, 2018 at 4:28 PM, Kunal Khatua  wrote:

> Yes, that is correct.
> On 3/28/2018 3:45:13 AM, Rahul Raj 
> wrote:
> Is Drill fork of Calcite maintained at
> https://github.com/mapr/incubator-calcite/?
>
> I assume that the required calcite branch for Drill 13.0 is
> DrillCalcite1.15.0. I would like to test a newer patch from calcite on
> Drill 13.0.
>
> Regards,
> Rahul
>
> --
>  This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.
>

Re: [Drill 1.13.0] : org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'

2018-03-24 Thread Vitalii Diravka

Hi Anup.

The API of Hive2.3 is changed, but not documented fully yet.
So the difference should be found and resolved in Drill.

Please go ahead and create Jira [1] with description of your environment,
settings, CTAS and query, which doesn't work.

Thanks

[1] https://issues.apache.org/jira/projects/DRILL/

Kind regards
Vitalii

On Sat, Mar 24, 2018 at 12:50 PM, Anup Tiwari <anup.tiw...@games24x7.com>
wrote:

> I have not upgraded hive version but installed hive 2.3.2 on a server and
> tried to read data and its working.Can we have any workaround to run drill
> 1.13
> with hive 2.1 or up-gradation is the only option?
>
>
>
>
>
> On Sat, Mar 24, 2018 3:52 PM, Anup Tiwari anup.tiw...@games24x7.com
> wrote:
> Sorry for delayed response as i didn't got time to test this.
> @Vitalii, I have tried setting hive.metastore.client.capability.check=false
> in
> both ways which are :-
>  1.  "hive.metastore.uris":
> "thrift://prod-hadoop-107.bom-prod.aws.games24x7.com:9083?
> hive.metastore.client.capability.check=false",
> in drill hive plugin and restarted metastore and tried to access hive
> tables
> via drill.
>
>  2. Added capability property in hive-site.xml and restarted metastore and
> tried
> to access hive tables via drill.
>
> Both ways didn't work. So does that mean Drill 1.13.0 version is
> compatible with
> Hive 2.3 and above?
>
>
>
>
>
> On Tue, Mar 20, 2018 6:28 PM, Vitalii Diravka vitalii.dira...@gmail.com
> wrote:
> @Anup, it should. If it isn't back compatible, it is a Hive issue.
>
>
>
>
> Hive Trift Metastore API was changed, but still isn't documented in
>
> cwiki.apache.org [1]
>
> *hive.metastore.client.capability.check *[2] property is true by default.
>
> Possibly changing this could help you.
>
> You can change it in Drill Hive plugin or in hive-site.xml
>
> Looks like, the issue will be the same for using hive-server2 2.3 version
>
> and hive-metastore 2.1 version. If so it is a Hive issue.
>
>
>
>
> So you can try to change the property before updating to Hive 2.3 version.
>
>
>
>
> [1] https://issues.apache.org/jira/browse/HIVE-15062
>
> [2]
>
> https://issues.apache.org/jira/browse/HIVE-15062?
> focusedCommentId=15659298=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-15659298
>
>
>
>
> Kind regards
>
> Vitalii
>
>
>
>
> On Tue, Mar 20, 2018 at 1:54 PM, Anup Tiwari <anup.tiw...@games24x7.com>
>
> wrote:
>
>
>
>
> > Please find below information :-
>
> > Apache Hadoop 2.7.3Apache Hive 2.1.1
>
> > @Vitalii, For testing i can setup upgrade hive but upgrading hive will
>
> > take time
>
> > on our production server. Don't you think it should be back compatible?
>
> >
>
> >
>
> >
>
> >
>
> > On Tue, Mar 20, 2018 4:33 PM, Vitalii Diravka vitalii.dira...@gmail.com
>
> > wrote:
>
> > Anup,
>
> >
>
> >
>
> >
>
> >
>
> > "get_table_req" method is present in ThriftHiveMetastore header of Apache
>
> >
>
> > Hive 2.3 version.
>
> >
>
> > I believe Hive upgrade will help you. Probably it is Hive's back
>
> >
>
> > compatibility issue.
>
> >
>
> > Please let us know, whether upgrade helps.
>
> >
>
> >
>
> >
>
> >
>
> > Kind regards
>
> >
>
> > Vitalii
>
> >
>
> >
>
> >
>
> >
>
> > On Tue, Mar 20, 2018 at 12:56 PM, Vitalii Diravka <
>
> > vitalii.dira...@gmail.com
>
> >
>
> > wrote:
>
> >>
>
> >
>
> >
>
> >
>
> >
>
> > Hi Anup,
>
> >>
>
> >
>
> >
>
> >>
>
> > Maybe we missed something after updating onto hive-2.3 client versions.
>
> >>
>
> >
>
> > Could you provide the following info:
>
> >>
>
> >
>
> > * What is your hive-server and metastore versions? If it is not 2.3
>
> >>
>
> >
>
> > version could you update onto this?
>
> >>
>
> >
>
> > * What is your hadoop distribution?
>
> >>
>
> >
>
> >
>
> >>
>
> > Kind regards
>
> >>
>
> >
>
> > Vitalii
>
> >>
>
> >
>
> >
>
> >>
>
> > On Tue, Mar 20, 2018 at 12:31 PM, Abhishek Girish <agir...@apache.org>
>
> >>
>
> >
>
> > wrote:
>
> >>
>
> >
&g

Re: [Drill 1.13.0] : org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'

2018-03-20 Thread Vitalii Diravka

@Anup, it should. If it isn't back compatible, it is a Hive issue.

Hive Trift Metastore API was changed, but still isn't documented in
cwiki.apache.org [1]
*hive.metastore.client.capability.check *[2] property is true by default.
Possibly changing this could help you.
You can change it in Drill Hive plugin or in hive-site.xml
Looks like, the issue will be the same for using hive-server2 2.3 version
and hive-metastore 2.1 version. If so it is a Hive issue.

So you can try to change the property before updating to Hive 2.3 version.

[1] https://issues.apache.org/jira/browse/HIVE-15062
[2]
https://issues.apache.org/jira/browse/HIVE-15062?focusedCommentId=15659298=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15659298

Kind regards
Vitalii

On Tue, Mar 20, 2018 at 1:54 PM, Anup Tiwari <anup.tiw...@games24x7.com>
wrote:

> Please find below information :-
> Apache Hadoop 2.7.3Apache Hive 2.1.1
> @Vitalii, For testing i can setup upgrade hive but upgrading hive will
> take time
> on our production server. Don't you think it should be back compatible?
>
>
>
>
> On Tue, Mar 20, 2018 4:33 PM, Vitalii Diravka vitalii.dira...@gmail.com
> wrote:
> Anup,
>
>
>
>
> "get_table_req" method is present in ThriftHiveMetastore header of Apache
>
> Hive 2.3 version.
>
> I believe Hive upgrade will help you. Probably it is Hive's back
>
> compatibility issue.
>
> Please let us know, whether upgrade helps.
>
>
>
>
> Kind regards
>
> Vitalii
>
>
>
>
> On Tue, Mar 20, 2018 at 12:56 PM, Vitalii Diravka <
> vitalii.dira...@gmail.com
>
> wrote:
>>
>
>
>
>
> Hi Anup,
>>
>
>
>>
> Maybe we missed something after updating onto hive-2.3 client versions.
>>
>
> Could you provide the following info:
>>
>
> * What is your hive-server and metastore versions? If it is not 2.3
>>
>
> version could you update onto this?
>>
>
> * What is your hadoop distribution?
>>
>
>
>>
> Kind regards
>>
>
> Vitalii
>>
>
>
>>
> On Tue, Mar 20, 2018 at 12:31 PM, Abhishek Girish <agir...@apache.org>
>>
>
> wrote:
>>
>
>
>>
> Okay, that confirms that the Hive storage plugin is not configured
>>>
>>
> correctly - you are unable to access any Hive table. What's your Hive
>>>
>>
> server version?
>>>
>>
>
>>>
> On Tue, Mar 20, 2018 at 3:39 PM, Anup Tiwari <anup.tiw...@games24x7.com>
>>>
>>
> wrote:
>>>
>>
>
>>>
> > Hi,
>>>
>>
> > Please find my reply :-
>>>
>>
> > Can you do a 'use hive;` followed by 'show tables;' and see if table
>>>
>>
> > 'cad' is listed? : Did and got empty set(No rows selected).
>>>
>>
> >
>>>
>>
> > If you try via hive shell, do you see it? : Yes
>>>
>>
> >
>>>
>>
> > can you check if this is impacting accessing all hive tables (may be
>>>
>>
> > create a new one and try) or if this is specific to a certain table /
>>>
>>
> > database in Hive? : Tried 2 tables but getting same error. I have not
>>>
>>
> tried
>>>
>>
> > creating anew one, will try that and let you know.
>>>
>>
> >
>>>
>>
> >
>>>
>>
> >
>>>
>>
> >
>>>
>>
> > On Tue, Mar 20, 2018 3:19 PM, Abhishek Girish agir...@apache.org
>>>
>>
> wrote:
>>>
>>
> > Down in the stack trace it's complaining that the table name 'cad' was
>>>
>>
> not
>>>
>>
> >
>>>
>>
> > found; Can you do a 'use hive;` followed by 'show tables;' and see if
>>>
>>
> table
>>>
>>
> >
>>>
>>
> > 'cad' is listed?
>>>
>>
> >
>>>
>>
> >
>>>
>>
> >
>>>
>>
> >
>>>
>>
> > If you try via hive shell, do you see it?
>>>
>>
> >
>>>
>>
> >
>>>
>>
> >
>>>
>>
> >
>>>
>>
> > Also, can you check if this is impacting accessing all hive tables (may
>>>
>>
> be
>>>
>>
> >
>>>
>>
> > create a new one and try) or if this is specific to a certain table /
>>>
>>
> >
>>>
>>
> > database in Hive?
>>>
>>
> >
>>>
>>
> >
>>>
>>
> >
>>>
>>
> >
>>>
>>
> > -Abhishek
>>>
>>
> >
>>>
>>
> >
>>>
>>
>
>>>
>
>>
>
>>
>
>
>
>
>
> Regards,
> Anup Tiwari

Re: [Drill 1.13.0] : org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'

2018-03-20 Thread Vitalii Diravka

Anup,

"get_table_req" method is present in ThriftHiveMetastore header of Apache
Hive 2.3 version.
I believe Hive upgrade will help you. Probably it is Hive's back
compatibility issue.
Please let us know, whether upgrade helps.

Kind regards
Vitalii

On Tue, Mar 20, 2018 at 12:56 PM, Vitalii Diravka <vitalii.dira...@gmail.com
> wrote:

> Hi Anup,
>
> Maybe we missed something after updating onto hive-2.3 client versions.
> Could you provide the following info:
> * What is your hive-server and metastore versions? If it is not 2.3
> version could you update onto this?
> * What is your hadoop distribution?
>
> Kind regards
> Vitalii
>
> On Tue, Mar 20, 2018 at 12:31 PM, Abhishek Girish <agir...@apache.org>
> wrote:
>
>> Okay, that confirms that the Hive storage plugin is not configured
>> correctly - you are unable to access any Hive table. What's your Hive
>> server version?
>>
>> On Tue, Mar 20, 2018 at 3:39 PM, Anup Tiwari <anup.tiw...@games24x7.com>
>> wrote:
>>
>> > Hi,
>> > Please find my reply :-
>> > Can you do a 'use hive;` followed by 'show tables;' and see if table
>> > 'cad' is listed? : Did and got empty set(No rows selected).
>> >
>> > If you try via hive shell, do you see it? : Yes
>> >
>> > can you check if this is impacting accessing all hive tables (may be
>> > create a new one and try) or if this is specific to a certain table /
>> > database in Hive? : Tried 2 tables but getting same error. I have not
>> tried
>> > creating anew one, will try that and let you know.
>> >
>> >
>> >
>> >
>> > On Tue, Mar 20, 2018 3:19 PM, Abhishek Girish agir...@apache.org
>> wrote:
>> > Down in the stack trace it's complaining that the table name 'cad' was
>> not
>> >
>> > found; Can you do a 'use hive;` followed by 'show tables;' and see if
>> table
>> >
>> > 'cad' is listed?
>> >
>> >
>> >
>> >
>> > If you try via hive shell, do you see it?
>> >
>> >
>> >
>> >
>> > Also, can you check if this is impacting accessing all hive tables (may
>> be
>> >
>> > create a new one and try) or if this is specific to a certain table /
>> >
>> > database in Hive?
>> >
>> >
>> >
>> >
>> > -Abhishek
>> >
>> >
>>
>
>

Re: [Drill 1.13.0] : org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'

2018-03-20 Thread Vitalii Diravka

Hi Anup,

Maybe we missed something after updating onto hive-2.3 client versions.
Could you provide the following info:
* What is your hive-server and metastore versions? If it is not 2.3 version
could you update onto this?
* What is your hadoop distribution?

Kind regards
Vitalii

On Tue, Mar 20, 2018 at 12:31 PM, Abhishek Girish 
wrote:

> Okay, that confirms that the Hive storage plugin is not configured
> correctly - you are unable to access any Hive table. What's your Hive
> server version?
>
> On Tue, Mar 20, 2018 at 3:39 PM, Anup Tiwari 
> wrote:
>
> > Hi,
> > Please find my reply :-
> > Can you do a 'use hive;` followed by 'show tables;' and see if table
> > 'cad' is listed? : Did and got empty set(No rows selected).
> >
> > If you try via hive shell, do you see it? : Yes
> >
> > can you check if this is impacting accessing all hive tables (may be
> > create a new one and try) or if this is specific to a certain table /
> > database in Hive? : Tried 2 tables but getting same error. I have not
> tried
> > creating anew one, will try that and let you know.
> >
> >
> >
> >
> > On Tue, Mar 20, 2018 3:19 PM, Abhishek Girish agir...@apache.org  wrote:
> > Down in the stack trace it's complaining that the table name 'cad' was
> not
> >
> > found; Can you do a 'use hive;` followed by 'show tables;' and see if
> table
> >
> > 'cad' is listed?
> >
> >
> >
> >
> > If you try via hive shell, do you see it?
> >
> >
> >
> >
> > Also, can you check if this is impacting accessing all hive tables (may
> be
> >
> > create a new one and try) or if this is specific to a certain table /
> >
> > database in Hive?
> >
> >
> >
> >
> > -Abhishek
> >
> >
>

Re: Code too large

2018-02-21 Thread Vitalii Diravka

Hi all!

Looks like the above issue is connected to much expressions in the
generated code. I believe it can be resolved by reducing the value for
"exec.java.compiler.exp_in_method_size" option on the session or system
level [1]. If it is not working, it means that logic is not implemented
for operators which are used for yours query. And it will be good to place
the Jira to implement that.

Also you can try to use Janino compiler "exec.java_compiler" = "JANINO" [1].

[1]
https://drill.apache.org/docs/configuration-options-introduction/#system-options


Kind regards
Vitalii

On Wed, Feb 21, 2018 at 1:21 AM, Khurram Faraaz  wrote:

> Hello Anup,
>
>
> I could not repro the issue with my data, can you please share the data
> that you used so I can try it again with your data.
>
>
> Thanks,
>
> Khurram
>
> 
> From: Anup Tiwari 
> Sent: Monday, February 19, 2018 3:09:22 AM
> To: user@drill.apache.org
> Subject: Re: Code too large
>
> Hi Khurram/Arjun,
> Anyone got time to look into it?
>
>
>
>
>
> On Fri, Feb 16, 2018 4:53 PM, Anup Tiwari anup.tiw...@games24x7.com
> wrote:
> Hi Arjun,
> After posting this reply ; i have found the same answer on net and that
> parameter to 30 and then query worked but it took bit more time than
> expected.
> Also don't you think these type of things should be adjusted automatically?
> @khurram, Please find below query and logs(since log is huge in
> drillbit.log for
> this query so i have divided logs into 3 parts in an order which i got for
> the
> query - error + some drill code(which was too large) + error). FYI :
> hive.cs_all
> is a hive(2.1.1) parquet table.
> Query :-
> create table dfs.tmp.cs_all_test AS select log_date,ssid ,select
> log_date,ssid ,
> count((case when ((id like 'cta-action-%' and event = 'click' and sit =
> 'pnow'
> and ptype = '1' and stype = '1') OR (id like '1:100%' and event =
> 'pnowclick'
> and STRPOS(url,'mrc/player/sit.html') > 0) OR (id like
> '/fmg/110%/pn/pnow.html'
> or (id like '110%/fmgopt/pnow'))) then ssid end)) as pnow_prac_c ,
> count((case
> when ((id like 'btsit%' and event = 'click' and sit like '%TSit%' and
> ptype1 =
> '1' and stype1 = '1') OR (event = 'ts.click' and id like '1:100%') OR (id
> like
> '/mgems/over/110%/ts.html')) then ssid end)) as ts_prac_c , count((case
> when
> ((id = '/res/vinit/' and mptype = '1' and (mgtype = 'cfp' OR mftype =
> '100')) OR
> (id like '/dl%/fmg/110%/pn/ftpracga/vinit.html' or id like
> '/dl%/fmg/110%/pn/vinit.html')) then ssid end)) as vinit_prac_c ,
> count((case
> when (id = '/res/tiu/' and mptype = '1' and (mgtype = 'cfp' OR mftype =
> '100'))
> then ssid end)) as tiu_prac_c , count((case when (id =
> '/onstClick/btnStHr/' and
> event='click' and mptype = '1' and (mgtype = 'cfp' OR mftype = '100'))
> then ssid
> end)) as StHr_prac_c , count((case when ((id = '/res/dcd/' and mptype =
> '1' and
> (mgtype = 'cfp' OR mftype = '100')) OR (id like
> '/dl%/fmg/110%/pn/ftpracga/dcd.html' or id like
> '/dl%/fmg/110%/pn/dcd.html'))
> then ssid end)) as dcd_prac_c , count((case when ((id = '/confirmdr/btnY/'
> and
> event in ('click','Click') and mptype = '1' and (mgtype = 'cfp' OR mftype =
> '100')) OR (id like '/dl%/fmg/110%/pn/dr.html')) then ssid end)) as
> dr_prac_c ,
> count((case when ((id = '/res/finish/' and mptype = '1' and (mgtype =
> 'cfp' OR
> mftype = '100')) OR (id like '/dl%/fmg/110%/pn/ftpracga/finish.html' or
> id like
> '/dl%/fmg/110%/pn/finish.html')) then ssid end)) as finish_prac_c ,
> count((case
> when ((id like 'cta-action-%' and event = 'click' and sit = 'pnow' and
> ptype =
> '2' and stype = '1') OR (id like '2:100%' and event = 'pnowclick' and
> STRPOS(url,'mrc/player/sit.html') > 0) OR (id like
> '/fmg/210%/pn/pnow.html' or
> (id like '210%/fmgopt/pnow'))) then ssid end)) as pnow_cash_c ,
> count((case when
> (id like '2:100%' and event = 'pnowclick' and STRPOS(url,'mrc/player/sit.
> html')
> = 0) then ssid end)) as pnow_cash_c_pac , count((case when ((id like
> 'btsit%'
> and event = 'click' and sit like '%TSit%' and ptype1 = '2' and stype1 =
> '1') OR
> (event = 'ts.click' and id like '2:100%') OR (id like
> '/mgems/over/210%/ts.html')) then ssid end)) as ts_cash_c , count((case
> when
> ((id = '/res/vinit/' and mptype = '2' and (mgtype = 'cfp' OR mftype =
> '100')) OR
> (id like '/dl%/fmg/210%/pn/ftpracga/vinit.html' or id like
> '/dl%/fmg/210%/pn/vinit.html')) then ssid end)) as vinit_cash_c ,
> count((case
> when (id = '/res/tiu/' and mptype = '2' and (mgtype = 'cfp' OR mftype =
> '100'))
> then ssid end)) as tiu_cash_c , count((case when (id =
> '/onstClick/btnStHr/' and
> event='click' and mptype = '2' and (mgtype = 'cfp' OR mftype = '100'))
> then ssid
> end)) as StHr_cash_c , count((case when ((id = '/res/dcd/' and mptype =
> '2' and
> (mgtype = 'cfp' OR mftype = '100')) OR (id like
> '/dl%/fmg/210%/pn/ftpracga/dcd.html' or id like
> '/dl%/fmg/210%/pn/dcd.html'))
> then ssid end)) as

Re: Issue with time zone

2017-12-14 Thread Vitalii Diravka

Hi Kostyantyn,

* Without zookeeper you can just run drillbit in embedded mode:
*bin/drill-embedded*
* Also "*timeofday()*" Drill function can help you to identify a timezone
used by your drillbit:

"start your sql engine"
0: jdbc:drill:zk=local> SELECT TIMEOFDAY() FROM (VALUES(1));
+--+
|EXPR$0|
+--+
| 2017-12-14 12:42:44.508 Europe/Kiev  |
+--+
1 row selected (0.332 seconds)

Kind regards
Vitalii

On Thu, Dec 14, 2017 at 1:36 PM, Vova Vysotskyi  wrote:

> Hi Kostyantyn,
>
> I just checked this issue:
> 1) With timezone America/New_York query fails as it was described:
> 0: jdbc:drill:zk=local> select to_timestamp('2015-03-08
> 02:58:51','-MM-dd HH:mm:ss') from sys.version;
> Error: SYSTEM ERROR: IllegalInstantException: Cannot parse "2015-03-08
> 02:58:51": Illegal instant due to time zone offset transition
> (America/New_York)
>
> 2) When I set the timezone in drill-env.sh using
> export DRILL_JAVA_OPTS="-Duser.timezone=UTC"
> timezone is set correctly, and the query returned the correct result:
> 0: jdbc:drill:zk=local> select to_timestamp('2015-03-08
> 02:58:51','-MM-dd HH:mm:ss') from sys.version;
> ++
> | EXPR$0 |
> ++
> | 2015-03-08 02:58:51.0  |
> ++
> 1 row selected (1.697 seconds)
>
> Perhaps, you have used the wrong character instead of the double quote at
> the end of the export string.
> Please confirm if this helped to avoid this issue.
>
> When I set the timezone in drill-env.sh, it was seen with ps.
>
> Option user.timezone was deleted from drill options, so you could not see
> it when running select * from sys.options where name like '%timezone%'
>
> Also, drillbit could not be started on a local computer without zookeeper.
>
>
>
> 2017-12-13 18:53 GMT+02:00 Kostyantyn Krakovych :
>
> > Hi Team,
> >
> > I faced with the issue described in http://www.openkb.info/2015/
> > 05/understanding-drills-timestamp-and.html  > 05/understanding-drills-timestamp-and.html>
> >
> > Drill 1.11
> >
> > I run sqlline -u jdbc:drill:zk=local on local computer.
> > Meantime I do not see user.timestamp option neither in sys.options nor in
> > sys.boot.
> > And the issue is not resolved when I set the parameter in drill-env.sh as
> > export DRILL_JAVA_OPTS="-Duser.timezone=UTC”
> > I do not see the option with ps -ef.
> >
> > N.B. I do not start drillbit. Though for the tool I confirm I see
> > -Duser.timestamp=UTC with ps -ef | grep “user.timestamp” IF I start it,
> so
> > it fails with other reason on local computer (Failure to connect to the
> > zookeeper cluster service within the allotted time of 1
> milliseconds.).
> >
> > Could you please advice on the issue.
> >
> >
> > Best regards,
> > Kostyantyn
>
>
>
>
> --
> Kind regards,
> Volodymyr Vysotskyi
>

Re: Error reading int96 fields

2017-12-07 Thread Vitalii Diravka

Thank you Rahul.

Kind regards
Vitalii

On Thu, Dec 7, 2017 at 7:35 AM, Rahul Raj <rahul@option3consulting.com>
wrote:

> I have created https://issues.apache.org/jira/browse/DRILL-6016.
>
> You can find the sample dataset at
> https://github.com/rajrahul/files/blob/master/result.tar.gz
>
> Regards,
> Rahul
>
> On Tue, Dec 5, 2017 at 5:57 PM, Vitalii Diravka <vitalii.dira...@gmail.com
> >
> wrote:
>
> > Hi Rahul,
> >
> > It looks like a bug.
> > Could you please open the jira ticket and provide the query and dataset
> to
> > reproduce the issue?
> >
> > Thanks
> >
> > Kind regards
> > Vitalii
> >
> > On Tue, Dec 5, 2017 at 2:01 PM, Rahul Raj <rahul.raj@option3consulting.
> com
> > >
> > wrote:
> >
> > > I am getting the error - SYSTEM ERROR : ClassCastException:
> > > prg.apache.drill.exec.vector.TimeStampVector cannot be cast to
> > > org.apache.drill.exec.vector.VariableWidthVector while trying to read
> a
> > > spark INT96 datetime field on Drill 1.11 in spite of setting the
> property
> > > store.parquet.reader.int96_as_timestamp to  true.
> > >
> > > I believe this was fixed in drill 1.10(
> > > https://issues.apache.org/jira/browse/DRILL-4373). What could be
> wrong.
> > >
> > > Regards,
> > > Rahul
> > >
> > > --
> > >  This email and any files transmitted with it are confidential and
> > > intended solely for the use of the individual or entity to whom it is
> > > addressed. If you are not the named addressee then you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately and delete this e-mail from your system.
> > >
> >
>
> --
>  This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.
>

Re: Error reading int96 fields

2017-12-05 Thread Vitalii Diravka

Hi Rahul,

It looks like a bug.
Could you please open the jira ticket and provide the query and dataset to
reproduce the issue?

Thanks

Kind regards
Vitalii

On Tue, Dec 5, 2017 at 2:01 PM, Rahul Raj 
wrote:

> I am getting the error - SYSTEM ERROR : ClassCastException:
> prg.apache.drill.exec.vector.TimeStampVector cannot be cast to
> org.apache.drill.exec.vector.VariableWidthVector while trying to read a
> spark INT96 datetime field on Drill 1.11 in spite of setting the property
> store.parquet.reader.int96_as_timestamp to  true.
>
> I believe this was fixed in drill 1.10(
> https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong.
>
> Regards,
> Rahul
>
> --
>  This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.
>

Re: Reading Drill generated Timestamp from spark

2017-04-03 Thread Vitalii Diravka

Hi Rahul,

According to the parquet specification

the primitive datatype for the TIMESTAMP logical type is INT64 (INT96 has a
deprecated label). That's why Drill has not a mechanism to generate such
values.
But it seems that Spark is going to support INT64 TIMESTAMP values -
SPARK-10364 

Kind regards
Vitalii

On Sat, Apr 1, 2017 at 2:56 PM, Rahul Raj 
wrote:

> I'm unable to read drill generated timestamp column inside a spark program.
> Drill 1.10 has support for reading int96 as timestamp. Is it possible to
> generate the same from drill?
>
> Is there any mechanism to read drills int64 from spark?
>
> Rahul
>
> --
>  This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.
>

Re: Looking for workaround to Schema detection problems

2016-07-08 Thread Vitalii Diravka

Hi Alexander,

Please try with turning on the union type:

ALTER SESSION SET `exec.enable_union_type` = true;

Kind regards
Vitalii

2016-07-08 10:50 GMT+00:00 Holy Alexander :

> My JSON data looks - simplified - like this
>
> {"ID":1,"a":"some text"}
> {"ID":2,"a":"some text","b":"some other text"}
> {"ID":3,"a":"some text"}
>
> Column b is only physically serialized when it is not null.
> It is the equivalent of a NULLable VARCHAR() column in SQL.
>
> I run queries like these:
>
> SELECT b
> FROM dfs.`D:\MyData\test.json`
> WHERE b IS NOT NULL
>
> And normally all is fine.
> However, among my thousands of data files, I have two files where the
> first occurrence of b happens a few thousand records down the file.
> These two data files would look like this:
>
> {"ID":1,"a":"some text"}
> {"ID":2,"a":"some text"}
> ... 5000 more records without column b ...
> {"ID":5002,"a":"some text","b":"some other text"}
> {"ID":5003,"a":"some text"}
>
> In this case, my simple SQL query above fails:
>
> [30027]Query execution error. Details:[
> DATA_READ ERROR: Error parsing JSON - You tried to write a VarChar type
> when you are using a ValueWriter of type NullableIntWriterImpl.
> File  /D:/MyData/test.json
> Record 5002 Fragment ...
>
> It seems that the Schema inference mechanism of Drill only samples a
> certain amount of bytes (or records) to determine the schema.
> If the first occurrence of a schema detail happens to far down things go
> boom.
>
> I am now looking for a sane way to work around this.
> Preferred by extending the query and not by altering my massive amounts of
> data.
>
> BTW, I tried altering the data by chaning the first line:
> {"ID":1,"a":"some text","b":null}
> does not help.
>
> Of course, changing the first line to
> {"ID":1,"a":"some text","b":""}
> solves the problem, but this is not a practical solution.
>
> Any help appreciated.
> Alexander
>

Implement "DROP TABLE IIF EXISTS" statement

2016-06-29 Thread Vitalii Diravka

Hi all!

I'm going to implement "DROP TABLE IIF EXISTS" and "DROP VIEW IIF EXISTS"
statements in Drill (DRILL-4673
).
The reason of using "IIF" is inability of adding "IF" keyword to
non-reserved words list (due to SQL:2011 standard which calcite parser
uses). Adding of "IF" to reserved words list leads to not working hive "IF"
UDF.

I'm interested are there any concerns with using "IIF" ?

Kind regards
Vitalii

67 matches

Mail list logo