from:"Abhishek Girish"

Re: [ANNOUNCE] New Drill Committer Maksym Rymar

2022-10-08 Thread Abhishek Girish

Congratulations, Maksym!

On Fri, Oct 7, 2022 at 10:22 AM James Turton  wrote:

> The Project Management Committee (PMC) for Apache Drill is pleased to
> announce that we have invited Maksym Rymar to join us as a committer of
> the Drill project and he has accepted. Please join me in congratulating
> Maksym and welcoming him to Drill committers!
>
> James Turton
> Drill PMC
>

Re: [ANNOUNCE] Apache Drill 1.19.0 Released

2021-06-15 Thread Abhishek Girish

Congratulations all!

Thanks to everyone who contributed. Laurent, thanks for successfully
managing the release.

On Mon, Jun 14, 2021 at 5:58 PM Ted Dunning  wrote:

> Congratulations to Laurent as a first time release manager!
>
> Well done.
>
>
>
> On Mon, Jun 14, 2021 at 5:56 PM Laurent Goujon  wrote:
>
> > On behalf of the Apache Drill community, I am happy to announce the
> release
> > of Apache Drill 1.19.0.
> >
> > Drill is an Apache open-source SQL query engine for Big Data exploration.
> > Drill is designed from the ground up to support high-performance analysis
> > on the semi-structured and rapidly evolving data coming from modern Big
> > Data applications, while still providing the familiarity and ecosystem of
> > ANSI SQL, the industry-standard query language. Drill provides
> > plug-and-play integration with existing Apache Hive and Apache HBase
> > deployments.
> >
> > For information about Apache Drill, and to get involved, visit the
> project
> > website [1].
> >
> > Total of 115 JIRA's are resolved in this release of Drill with following
> > new features and improvements [2]:
> >
> >  - Cassandra Storage Plugin (DRILL-92)
> >  - Elasticsearch Storage Plugin (DRILL-3637)
> >  - XML Storage Plugin (DRILL-7823)
> >  - Splunk Storage Plugin (DRILL-7751)
> >  - Avro with schema registry support for Kafka (DRILL-5940)
> >  - Secure mechanism for specifying storage plugin credentials
> (DRILL-7855)
> >  - Linux ARM64 based system support (DRILL-7921)
> >  - Rowset based JSON reader (DRILL-6953)
> >  - Use streaming for REST JSON queries (DRILL-7733)
> >  - Several plugins have been converted to the Enhanced Vector Framework
> > (EVF)
> >- Convert SequenceFiles to EVF (DRILL-7525)
> >- Convert SysLog to EVF (DRILL-7532)
> >- Convert Pcapng to EVF (DRILL-7533)
> >- Convert HTTPD format plugin to EVF (DRILL-7534)
> >- Convert Image Format to EVF (DRILL-7533)
> >
> > For the full list please see release notes [3].
> >
> > The binary and source artifacts are available here [4].
> >
> > Thanks to everyone in the community who contributed to this release!
> >
> > 1. https://drill.apache.org/
> > 2. https://drill.apache.org/blog/2021/06/10/drill-1.19-released/
> > 3. https://drill.apache.org/docs/apache-drill-1-19-0-release-notes/
> > 4. https://drill.apache.org/download/
> >
>

Re: Videos on the front page unavailable

2021-02-04 Thread Abhishek Girish

I have good news on this front. We've located the videos and are working on
bringing them back to the site. Thanks for your patience.

On Thu, Jan 28, 2021 at 2:29 AM luoc  wrote:

> Hello,
> Due to M of MapR, these video may have been privatized, and we are
> contacting MapR for good news.
> We have a "Learning Apache Drill" book, the authors are all our PMC
> members, hope you like it.
> Also welcome to join our Slack Channel: https://bit.ly/3t4rozO
>
> > 2021年1月28日 下午3:52，Damjan Dvorsek  写道：
> >
> > Hi,
> > I"m trying to learn more about Drill and I tried to check videos on
> Drills
> > front page , but I get this message:
> > Video unavailable
> > This video is private.
> >
> > Cheers.
> > Damjan
>
>

Re: Release Notes for 1.18.0

2020-09-22 Thread Abhishek Girish

Hey Vineeth!

AD 1.18.0 release is done except for the RN. I plan to get that done
shortly. Sorry for those who've been waiting..

Regards,
Abhishek

On Fri, Sep 18, 2020 at 10:28 PM Vineeth Narayanan 
wrote:

> Hello Drill Team,
>
> Has version 1.18.0 been "officially" released yet? I see that RC0 passed
> voting and has been released on GitHub, but the Release Notes page hasn't
> been updated to include the latest version yet. Would it be okay to
> consider the GitHub Release as an official announcement?
>
> Thanks,
> Vineeth
>

[RESULT] [VOTE] Release Apache Drill 1.18.0 - RC0

2020-09-03 Thread Abhishek Girish

Hey all,

The vote passes! The very first release candidate for 1.18.0, RC0, passed
the voting criteria and is ready to be released. Thanks again, to everyone
who voted.

Total Votes: 5
4x +1 (binding): Vova, Charles, Paul, Boaz
1x +1 (non-binding): Abhishek
No 0s or -1s.

I'll start with the next steps of the release process shortly, including
publishing of the artifacts. I'll send out an announcement once it's all
completed.

Regards,
Abhishek

Re: [VOTE] Release Apache Drill 1.18.0 - RC0

2020-09-02 Thread Abhishek Girish

Thanks to everyone who voted. The vote closes. I'll share results in
another email.

On Sun, Aug 30, 2020 at 10:14 AM Abhishek Girish  wrote:

> Hi all,
>
> I'd like to propose the first release candidate (RC0) of Apache Drill,
> version 1.18.0.
>
> The release candidate covers a total of 164 resolved JIRAs [1]. Thanks to
> everyone who contributed to this release.
>
> The tarball artifacts are hosted at [2] and the maven artifacts are hosted
> at [3].
>
> This release candidate is based on commit
> 91678ca6a48509b11530f1ce6c3d75fc9f4eadc0 located at [4].
>
> The vote ends at 17:00 PM UTC (10:00 AM PDT, 7:00 PM EET, 10:30 PM IST),
> Sep 2, 2020.
>
> [ ] +1
> [ ] +0
> [ ] -1
>
> Here's my vote: +1 (non-binding). Please note that while all votes matter,
> only PMC votes are binding.
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345459=Html=12313820
> [2] http://home.apache.org/~agirish/drill/releases/1.18.0/rc0/
> [3]
> https://repository.apache.org/content/repositories/orgapachedrill-1080/
> [4] https://github.com/agirish/drill/commits/drill-1.18.0
>
> Regards,
> Abhishek
>

Re: [VOTE] Release Apache Drill 1.18.0 - RC0

2020-09-01 Thread Abhishek Girish

Thanks Charles! Hope you are recovering well. Please take care.

On Tue, Sep 1, 2020 at 5:28 PM Charles Givre  wrote:

> Hey Abhishek,
> I’ll take a look tomorrow.
> Best,
> — C
>
> > On Sep 1, 2020, at 8:27 PM, Abhishek Girish  wrote:
> >
> > Thanks Vova!
> >
> > Hey folks, we need more votes to validate the release. Please give RC0 a
> > try.
> >
> > Special request to PMCs - please vote as we only have 1 binding vote at
> > this point. I am fine extending the voting window by a day or two if
> anyone
> > is or plans to work on it soon.
> >
> > On Tue, Sep 1, 2020 at 12:09 PM Volodymyr Vysotskyi <
> volody...@apache.org>
> > wrote:
> >
> >> Verified checksums and signatures for binary and source tarballs and for
> >> jars published to the maven repo.
> >> Run all unit tests on Ubuntu with JDK 8 using tar with sources.
> >> Run Drill in embedded mode on Ubuntu, submitted several queries,
> verified
> >> that profiles displayed correctly.
> >> Checked JDBC driver using SQuirreL SQL client and custom java client,
> >> ensured that it works correctly with the custom authenticator.
> >>
> >> +1 (binding)
> >>
> >> Kind regards,
> >> Volodymyr Vysotskyi
> >>
> >>
> >> On Mon, Aug 31, 2020 at 1:37 PM Volodymyr Vysotskyi <
> volody...@apache.org>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I have looked into the DRILL-7785, and the problem is not in Drill, so
> it
> >>> is not a blocker for the release.
> >>> For more details please refer to my comment
> >>> <
> >>
> https://issues.apache.org/jira/browse/DRILL-7785?focusedCommentId=17187629=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17187629
> >>>
> >>> on this ticket.
> >>>
> >>> Kind regards,
> >>> Volodymyr Vysotskyi
> >>>
> >>>
> >>> On Mon, Aug 31, 2020 at 4:26 AM Abhishek Girish 
> >>> wrote:
> >>>
> >>>> Yup we can certainly include it if RC0 fails. So far I’m inclined to
> not
> >>>> consider it a blocker. I’ve requested Vova and Anton to take a look.
> >>>>
> >>>> So folks, please continue to test the candidate.
> >>>>
> >>>> On Sun, Aug 30, 2020 at 6:16 PM Charles Givre 
> wrote:
> >>>>
> >>>>> Ok.  Are you looking to include DRILL-7785?  I don't think it's a
> >>>> blocker,
> >>>>> but if we find anything with RC0... let's make sure we get it in.
> >>>>>
> >>>>> -- C
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Aug 30, 2020, at 9:14 PM, Abhishek Girish 
> >>>> wrote:
> >>>>>
> >>>>>>
> >>>>>
> >>>>>> Hey Charles,
> >>>>>
> >>>>>>
> >>>>>
> >>>>>> I would have liked to. We did get one of the PRs merged after the
> >>>> master
> >>>>>
> >>>>>> branch was closed as I hadn't made enough progress with the release
> >>>> yet.
> >>>>>
> >>>>>> But that’s not the case now.
> >>>>>
> >>>>>>
> >>>>>
> >>>>>> Unless DRILL-7781 is a release blocker, we should probably skip it.
> >> So
> >>>>> far,
> >>>>>
> >>>>>> a lot of effort has gone into getting RC0 ready. So I'm hoping to
> >> get
> >>>>> this
> >>>>>
> >>>>>> closed asap.
> >>>>>
> >>>>>>
> >>>>>
> >>>>>> Regards,
> >>>>>
> >>>>>> Abhishek
> >>>>>
> >>>>>>
> >>>>>
> >>>>>> On Sun, Aug 30, 2020 at 6:07 PM Charles Givre 
> >>>> wrote:
> >>>>>
> >>>>>>
> >>>>>
> >>>>>>> HI Abhishek,
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>> Can we merge DRILL-7781?  We really shouldn't ship something with a
>

Re: [VOTE] Release Apache Drill 1.18.0 - RC0

2020-09-01 Thread Abhishek Girish

Thanks Vova!

Hey folks, we need more votes to validate the release. Please give RC0 a
try.

Special request to PMCs - please vote as we only have 1 binding vote at
this point. I am fine extending the voting window by a day or two if anyone
is or plans to work on it soon.

On Tue, Sep 1, 2020 at 12:09 PM Volodymyr Vysotskyi 
wrote:

> Verified checksums and signatures for binary and source tarballs and for
> jars published to the maven repo.
> Run all unit tests on Ubuntu with JDK 8 using tar with sources.
> Run Drill in embedded mode on Ubuntu, submitted several queries, verified
> that profiles displayed correctly.
> Checked JDBC driver using SQuirreL SQL client and custom java client,
> ensured that it works correctly with the custom authenticator.
>
> +1 (binding)
>
> Kind regards,
> Volodymyr Vysotskyi
>
>
> On Mon, Aug 31, 2020 at 1:37 PM Volodymyr Vysotskyi 
> wrote:
>
> > Hi all,
> >
> > I have looked into the DRILL-7785, and the problem is not in Drill, so it
> > is not a blocker for the release.
> > For more details please refer to my comment
> > <
> https://issues.apache.org/jira/browse/DRILL-7785?focusedCommentId=17187629=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17187629
> >
> > on this ticket.
> >
> > Kind regards,
> > Volodymyr Vysotskyi
> >
> >
> > On Mon, Aug 31, 2020 at 4:26 AM Abhishek Girish 
> > wrote:
> >
> >> Yup we can certainly include it if RC0 fails. So far I’m inclined to not
> >> consider it a blocker. I’ve requested Vova and Anton to take a look.
> >>
> >> So folks, please continue to test the candidate.
> >>
> >> On Sun, Aug 30, 2020 at 6:16 PM Charles Givre  wrote:
> >>
> >> > Ok.  Are you looking to include DRILL-7785?  I don't think it's a
> >> blocker,
> >> > but if we find anything with RC0... let's make sure we get it in.
> >> >
> >> > -- C
> >> >
> >> >
> >> >
> >> > > On Aug 30, 2020, at 9:14 PM, Abhishek Girish 
> >> wrote:
> >> >
> >> > >
> >> >
> >> > > Hey Charles,
> >> >
> >> > >
> >> >
> >> > > I would have liked to. We did get one of the PRs merged after the
> >> master
> >> >
> >> > > branch was closed as I hadn't made enough progress with the release
> >> yet.
> >> >
> >> > > But that’s not the case now.
> >> >
> >> > >
> >> >
> >> > > Unless DRILL-7781 is a release blocker, we should probably skip it.
> So
> >> > far,
> >> >
> >> > > a lot of effort has gone into getting RC0 ready. So I'm hoping to
> get
> >> > this
> >> >
> >> > > closed asap.
> >> >
> >> > >
> >> >
> >> > > Regards,
> >> >
> >> > > Abhishek
> >> >
> >> > >
> >> >
> >> > > On Sun, Aug 30, 2020 at 6:07 PM Charles Givre 
> >> wrote:
> >> >
> >> > >
> >> >
> >> > >> HI Abhishek,
> >> >
> >> > >>
> >> >
> >> > >> Can we merge DRILL-7781?  We really shouldn't ship something with a
> >> > simple
> >> >
> >> > >> bug like this.
> >> >
> >> > >>
> >> >
> >> > >> -- C
> >> >
> >> > >>
> >> >
> >> > >>
> >> >
> >> > >>
> >> >
> >> > >>
> >> >
> >> > >>
> >> >
> >> > >>> On Aug 30, 2020, at 8:40 PM, Abhishek Girish 
> >> > wrote:
> >> >
> >> > >>
> >> >
> >> > >>>
> >> >
> >> > >>
> >> >
> >> > >>> Advanced tests from [5] are also complete. All 7500+ tests passed,
> >> > except
> >> >
> >> > >>
> >> >
> >> > >>> for a few relating to known resource issues (drillbit
> connectivity /
> >> > OOM
> >> >
> >> > >>
> >> >
> >> > >>> /...). Plus a few with the same symptoms as DRILL-7785.
> >> >
> >> > >>
> >> >
> >> > >>>
> >> >
> >&g

Re: [VOTE] Release Apache Drill 1.18.0 - RC0

2020-08-30 Thread Abhishek Girish

Yup we can certainly include it if RC0 fails. So far I’m inclined to not
consider it a blocker. I’ve requested Vova and Anton to take a look.

So folks, please continue to test the candidate.

On Sun, Aug 30, 2020 at 6:16 PM Charles Givre  wrote:

> Ok.  Are you looking to include DRILL-7785?  I don't think it's a blocker,
> but if we find anything with RC0... let's make sure we get it in.
>
> -- C
>
>
>
> > On Aug 30, 2020, at 9:14 PM, Abhishek Girish  wrote:
>
> >
>
> > Hey Charles,
>
> >
>
> > I would have liked to. We did get one of the PRs merged after the master
>
> > branch was closed as I hadn't made enough progress with the release yet.
>
> > But that’s not the case now.
>
> >
>
> > Unless DRILL-7781 is a release blocker, we should probably skip it. So
> far,
>
> > a lot of effort has gone into getting RC0 ready. So I'm hoping to get
> this
>
> > closed asap.
>
> >
>
> > Regards,
>
> > Abhishek
>
> >
>
> > On Sun, Aug 30, 2020 at 6:07 PM Charles Givre  wrote:
>
> >
>
> >> HI Abhishek,
>
> >>
>
> >> Can we merge DRILL-7781?  We really shouldn't ship something with a
> simple
>
> >> bug like this.
>
> >>
>
> >> -- C
>
> >>
>
> >>
>
> >>
>
> >>
>
> >>
>
> >>> On Aug 30, 2020, at 8:40 PM, Abhishek Girish 
> wrote:
>
> >>
>
> >>>
>
> >>
>
> >>> Advanced tests from [5] are also complete. All 7500+ tests passed,
> except
>
> >>
>
> >>> for a few relating to known resource issues (drillbit connectivity /
> OOM
>
> >>
>
> >>> /...). Plus a few with the same symptoms as DRILL-7785.
>
> >>
>
> >>>
>
> >>
>
> >>> On Sun, Aug 30, 2020 at 2:17 PM Abhishek Girish 
>
> >> wrote:
>
> >>
>
> >>>
>
> >>
>
> >>>> Wanted to share an update on some of the testing I've done from my
> side:
>
> >>
>
> >>>>
>
> >>
>
> >>>> All Functional tests from [5] (plus private Customer tests) are
>
> >> complete.
>
> >>
>
> >>>> 10,000+ tests have passed. However, I did see an issue with Hive ORC
>
> >> tables
>
> >>
>
> >>>> (DRILL-7785). Need to investigate if it's a blocker for the release.
>
> >>
>
> >>>>
>
> >>
>
> >>>> Of course, all unit tests (part of the AD repo) - for both default and
>
> >>
>
> >>>> 'mapr' profiles are also successful.
>
> >>
>
> >>>>
>
> >>
>
> >>>>
>
> >>
>
> >>>>
>
> >>
>
> >>>> [5] https://github.com/mapr/drill-test-framework
>
> >>
>
> >>>>
>
> >>
>
> >>>> On Sun, Aug 30, 2020 at 10:14 AM Abhishek Girish 
>
> >>
>
> >>>> wrote:
>
> >>
>
> >>>>
>
> >>
>
> >>>>> Hi all,
>
> >>
>
> >>>>>
>
> >>
>
> >>>>> I'd like to propose the first release candidate (RC0) of Apache
> Drill,
>
> >>
>
> >>>>> version 1.18.0.
>
> >>
>
> >>>>>
>
> >>
>
> >>>>> The release candidate covers a total of 164 resolved JIRAs [1].
> Thanks
>
> >> to
>
> >>
>
> >>>>> everyone who contributed to this release.
>
> >>
>
> >>>>>
>
> >>
>
> >>>>> The tarball artifacts are hosted at [2] and the maven artifacts are
>
> >> hosted
>
> >>
>
> >>>>> at [3].
>
> >>
>
> >>>>>
>
> >>
>
> >>>>> This release candidate is based on commit
>
> >>
>
> >>>>> 91678ca6a48509b11530f1ce6c3d75fc9f4eadc0 located at [4].
>
> >>
>
> >>>>>
>
> >>
>
> >>>>> The vote ends at 17:00 PM UTC (10:00 AM PDT, 7:00 PM EET, 10:30 PM
>
> >> IST),
>
> >>
>
> >>>>> Sep 2, 2020.
>
> >>
>
> >>>>>
>
> >>
>
> >>>>> [ ] +1
>
> >>
>
> >>>>> [ ] +0
>
> >>
>
> >>>>> [ ] -1
>
> >>
>
> >>>>>
>
> >>
>
> >>>>> Here's my vote: +1 (non-binding). Please note that while all votes
>
> >>
>
> >>>>> matter, only PMC votes are binding.
>
> >>
>
> >>>>>
>
> >>
>
> >>>>> [1]
>
> >>
>
> >>>>>
>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345459=Html=12313820
>
> >>
>
> >>>>> [2] http://home.apache.org/~agirish/drill/releases/1.18.0/rc0/
>
> >>
>
> >>>>> [3]
>
> >>
>
> >>>>>
>
> >> https://repository.apache.org/content/repositories/orgapachedrill-1080/
>
> >>
>
> >>>>> [4] https://github.com/agirish/drill/commits/drill-1.18.0
>
> >>
>
> >>>>>
>
> >>
>
> >>>>> Regards,
>
> >>
>
> >>>>> Abhishek
>
> >>
>
> >>>>>
>
> >>
>
> >>>>
>
> >>
>
> >>
>
> >>
>
> >>
>
>
>
>

Re: [VOTE] Release Apache Drill 1.18.0 - RC0

2020-08-30 Thread Abhishek Girish

Hey Charles,

I would have liked to. We did get one of the PRs merged after the master
branch was closed as I hadn't made enough progress with the release yet.
But that’s not the case now.

Unless DRILL-7781 is a release blocker, we should probably skip it. So far,
a lot of effort has gone into getting RC0 ready. So I'm hoping to get this
closed asap.

Regards,
Abhishek

On Sun, Aug 30, 2020 at 6:07 PM Charles Givre  wrote:

> HI Abhishek,
>
> Can we merge DRILL-7781?  We really shouldn't ship something with a simple
> bug like this.
>
> -- C
>
>
>
>
>
> > On Aug 30, 2020, at 8:40 PM, Abhishek Girish  wrote:
>
> >
>
> > Advanced tests from [5] are also complete. All 7500+ tests passed, except
>
> > for a few relating to known resource issues (drillbit connectivity / OOM
>
> > /...). Plus a few with the same symptoms as DRILL-7785.
>
> >
>
> > On Sun, Aug 30, 2020 at 2:17 PM Abhishek Girish 
> wrote:
>
> >
>
> >> Wanted to share an update on some of the testing I've done from my side:
>
> >>
>
> >> All Functional tests from [5] (plus private Customer tests) are
> complete.
>
> >> 10,000+ tests have passed. However, I did see an issue with Hive ORC
> tables
>
> >> (DRILL-7785). Need to investigate if it's a blocker for the release.
>
> >>
>
> >> Of course, all unit tests (part of the AD repo) - for both default and
>
> >> 'mapr' profiles are also successful.
>
> >>
>
> >>
>
> >>
>
> >> [5] https://github.com/mapr/drill-test-framework
>
> >>
>
> >> On Sun, Aug 30, 2020 at 10:14 AM Abhishek Girish 
>
> >> wrote:
>
> >>
>
> >>> Hi all,
>
> >>>
>
> >>> I'd like to propose the first release candidate (RC0) of Apache Drill,
>
> >>> version 1.18.0.
>
> >>>
>
> >>> The release candidate covers a total of 164 resolved JIRAs [1]. Thanks
> to
>
> >>> everyone who contributed to this release.
>
> >>>
>
> >>> The tarball artifacts are hosted at [2] and the maven artifacts are
> hosted
>
> >>> at [3].
>
> >>>
>
> >>> This release candidate is based on commit
>
> >>> 91678ca6a48509b11530f1ce6c3d75fc9f4eadc0 located at [4].
>
> >>>
>
> >>> The vote ends at 17:00 PM UTC (10:00 AM PDT, 7:00 PM EET, 10:30 PM
> IST),
>
> >>> Sep 2, 2020.
>
> >>>
>
> >>> [ ] +1
>
> >>> [ ] +0
>
> >>> [ ] -1
>
> >>>
>
> >>> Here's my vote: +1 (non-binding). Please note that while all votes
>
> >>> matter, only PMC votes are binding.
>
> >>>
>
> >>> [1]
>
> >>>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345459=Html=12313820
>
> >>> [2] http://home.apache.org/~agirish/drill/releases/1.18.0/rc0/
>
> >>> [3]
>
> >>>
> https://repository.apache.org/content/repositories/orgapachedrill-1080/
>
> >>> [4] https://github.com/agirish/drill/commits/drill-1.18.0
>
> >>>
>
> >>> Regards,
>
> >>> Abhishek
>
> >>>
>
> >>
>
>
>
>

Re: [VOTE] Release Apache Drill 1.18.0 - RC0

2020-08-30 Thread Abhishek Girish

Advanced tests from [5] are also complete. All 7500+ tests passed, except
for a few relating to known resource issues (drillbit connectivity / OOM
/...). Plus a few with the same symptoms as DRILL-7785.

On Sun, Aug 30, 2020 at 2:17 PM Abhishek Girish  wrote:

> Wanted to share an update on some of the testing I've done from my side:
>
> All Functional tests from [5] (plus private Customer tests) are complete.
> 10,000+ tests have passed. However, I did see an issue with Hive ORC tables
> (DRILL-7785). Need to investigate if it's a blocker for the release.
>
> Of course, all unit tests (part of the AD repo) - for both default and
> 'mapr' profiles are also successful.
>
>
>
> [5] https://github.com/mapr/drill-test-framework
>
> On Sun, Aug 30, 2020 at 10:14 AM Abhishek Girish 
> wrote:
>
>> Hi all,
>>
>> I'd like to propose the first release candidate (RC0) of Apache Drill,
>> version 1.18.0.
>>
>> The release candidate covers a total of 164 resolved JIRAs [1]. Thanks to
>> everyone who contributed to this release.
>>
>> The tarball artifacts are hosted at [2] and the maven artifacts are hosted
>> at [3].
>>
>> This release candidate is based on commit
>> 91678ca6a48509b11530f1ce6c3d75fc9f4eadc0 located at [4].
>>
>> The vote ends at 17:00 PM UTC (10:00 AM PDT, 7:00 PM EET, 10:30 PM IST),
>> Sep 2, 2020.
>>
>> [ ] +1
>> [ ] +0
>> [ ] -1
>>
>> Here's my vote: +1 (non-binding). Please note that while all votes
>> matter, only PMC votes are binding.
>>
>> [1]
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345459=Html=12313820
>> [2] http://home.apache.org/~agirish/drill/releases/1.18.0/rc0/
>> [3]
>> https://repository.apache.org/content/repositories/orgapachedrill-1080/
>> [4] https://github.com/agirish/drill/commits/drill-1.18.0
>>
>> Regards,
>> Abhishek
>>
>

Re: [VOTE] Release Apache Drill 1.18.0 - RC0

2020-08-30 Thread Abhishek Girish

Wanted to share an update on some of the testing I've done from my side:

All Functional tests from [5] (plus private Customer tests) are complete.
10,000+ tests have passed. However, I did see an issue with Hive ORC tables
(DRILL-7785). Need to investigate if it's a blocker for the release.

Of course, all unit tests (part of the AD repo) - for both default and
'mapr' profiles are also successful.

[5] https://github.com/mapr/drill-test-framework

On Sun, Aug 30, 2020 at 10:14 AM Abhishek Girish  wrote:

> Hi all,
>
> I'd like to propose the first release candidate (RC0) of Apache Drill,
> version 1.18.0.
>
> The release candidate covers a total of 164 resolved JIRAs [1]. Thanks to
> everyone who contributed to this release.
>
> The tarball artifacts are hosted at [2] and the maven artifacts are hosted
> at [3].
>
> This release candidate is based on commit
> 91678ca6a48509b11530f1ce6c3d75fc9f4eadc0 located at [4].
>
> The vote ends at 17:00 PM UTC (10:00 AM PDT, 7:00 PM EET, 10:30 PM IST),
> Sep 2, 2020.
>
> [ ] +1
> [ ] +0
> [ ] -1
>
> Here's my vote: +1 (non-binding). Please note that while all votes matter,
> only PMC votes are binding.
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345459=Html=12313820
> [2] http://home.apache.org/~agirish/drill/releases/1.18.0/rc0/
> [3]
> https://repository.apache.org/content/repositories/orgapachedrill-1080/
> [4] https://github.com/agirish/drill/commits/drill-1.18.0
>
> Regards,
> Abhishek
>

[VOTE] Release Apache Drill 1.18.0 - RC0

2020-08-30 Thread Abhishek Girish

Hi all,

I'd like to propose the first release candidate (RC0) of Apache Drill,
version 1.18.0.

The release candidate covers a total of 164 resolved JIRAs [1]. Thanks to
everyone who contributed to this release.

The tarball artifacts are hosted at [2] and the maven artifacts are hosted
at [3].

This release candidate is based on commit
91678ca6a48509b11530f1ce6c3d75fc9f4eadc0 located at [4].

The vote ends at 17:00 PM UTC (10:00 AM PDT, 7:00 PM EET, 10:30 PM IST),
Sep 2, 2020.

[ ] +1
[ ] +0
[ ] -1

Here's my vote: +1 (non-binding). Please note that while all votes matter,
only PMC votes are binding.

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345459=Html=12313820
[2] http://home.apache.org/~agirish/drill/releases/1.18.0/rc0/
[3] https://repository.apache.org/content/repositories/orgapachedrill-1080/
[4] https://github.com/agirish/drill/commits/drill-1.18.0

Regards,
Abhishek

Re: scaling drill in an openshift (K8s) cluster

2020-03-24 Thread Abhishek Girish

I've added support for auto-scaling and I've tested that it works well.
Please see:
https://github.com/Agirish/drill-helm-charts#autoscaling-drill-clusters
And I have a script to test this:
https://github.com/Agirish/drill-helm-charts/blob/master/scripts/runCPULoadTest.sh

In  case of cloud deployments like  GKE, where I had a Loadbalancer (and an
external IP), I could connect with external clients as well, using  the
JDBC Drillbit direct connection.

On Tue, Mar 24, 2020 at 9:17 AM Arina Ielchiieva 
wrote:

> Please see https://issues.apache.org/jira/browse/DRILL-7563 <
> https://issues.apache.org/jira/browse/DRILL-7563>, maybe it will be
> helpful.
>
> Kind regards,
> Arina
>
> > On Mar 24, 2020, at 6:04 PM, Dobes Vandermeer  wrote:
> >
> > I was able to get drill up and running inside a k8s cluster but I didn't
> connect to it from outside the cluster, so the DNS names were always
> resolvable by the client(s).
> >
> > I had to run it as a statefulset to ensure the DNS names are stable,
> otherwise the drillbits couldn't talk to each other, either.
> >
> > On 3/24/2020 6:37:44 AM, Jaimes, Rafael - 0993 - MITLL <
> rafael.jai...@ll.mit.edu> wrote:
> > I’m seeing a problem with scaling the number of pod instances in the
> replication controller because they aren’t reporting their hostnames
> properly. This was a common problem that got fixed in scalable
> architectures like ZooKeeper and Kafka (see reference at bottom I think
> this was related).
> >
> > In Drill’s case, ZooKeeper is able to see all of the drillbits, however,
> the hostnames are only locally addressable within the cluster, so as soon
> as you perform a query it fails since the client can’t find the drillbit
> that it got assigned, its hostname isn’t externally addressable.
> >
> > Kafka fixes this by allowing an override for advertised names. Has
> anyone gotten Drill to scale in a K8s cluster?
> >
> > https://issues.apache.org/jira/browse/KAFKA-1070
>
>

Re: Problem running Drill in a Docker container in OpenShift

2020-02-06 Thread Abhishek Girish

Hey Paul,

Sorry for my delayed response. And thanks for your encouragement.

So here's a brief history of my work in this area.

(1) I started with simple YAML based deployments for Drill. Used standard
Kubernetes APIs and Controllers. This supported bringing up Drill in
distributed mode, used MapR ZK for Cluster coordination and had a MapR
client to connect to MapR FS. It had standard features such as resizing and
such. So pretty much basic Drill on Kubernetes use cases were supported.

(2) I then added Helm charts support to make (1) more easy to use.

(3) I began parallel effort to build an Operator for Drill (written in go).
This would create a Custom Resource called DrillCluster  - so in YAML files
you would see Kind as "DrillCluster" instead of Pod / Statefulset / or
similar. The operator model in K8S is now seeing more adoption as it is
more powerful, flexible and simpler for users. For instance, I could add
more code checks and validations, logging for debugging & more when
compared to approaches (1) and (2). Also has more potential for adding more
features and fixes in this model. This is what we shipped at MapR and this
is what I'm working on for open source Drill and planning on sharing soon.
What's pending for an initial preview release is replacing MapR client with
say HDFS client and MapR ZK with Apache ZK. I can also share (1) and (2)
soon after that.

I'll definitely count on your vast experience and knowledge for help in
this regard.

Regards,
Abhishek

On Tue, Feb 4, 2020 at 2:00 AM Paul Rogers 
wrote:

> Hi Abhishek,
>
> Thanks for the update! Seems to make sense to wait for you to open source
> your work than to spend time on duplicating your effort. And, people who
> want a solution short term can perhaps work with MapR/HPE as you suggest.
> Sounds like you have access to the various systems and have worked though
> the myriad details involved in creating a good integration.
>
> Does your work include Helm integration?
>
> The key challenge for any K8s integration is that Drill needs access to
> data, which requires some kind of distributed storage. This has long been a
> K8s weakness. But, it is, of course, a MapR strength.
>
> Please let us know if you need help with the open source efforts.
>
> Thanks,
> - Paul
>
>
>
> On Monday, February 3, 2020, 3:13:28 AM PST, Abhishek Girish <
> agir...@apache.org> wrote:
>
>  Hey Ron,
>
> As a part of MapR (now HPE), I've created a native operator for Apache
> Drill and this works on multiple variants of Kubernetes including
> OpenShift. With this, we introduce a new Kind called "DrillCluster" via a
> Custom Resource Definition (CRD) and a Custom Controller (logic to manage
> this DrillCluster kind - written in Golang) for the same. Using this, users
> can easily deploy Drill clusters by submitting Custom Resource YAML files
> (CRs) for the DrillCluster kind. It supports creation of multiple Drill
> clusters (multiple Drillbits launched in distributed mode), multiple
> versions (such as 1.15.0 and 1.16.0), auto-scaling the number of Drillbits
> (based on CPU utilization) and more. I can share more details of this if
> anyone's interested.
>
> While Vanilla K8S, and GKE worked out of the box, I had to make some
> changes to support OpenShift (related to Service Accounts, Security Context
> Constraints, etc). Perhaps you ran into similar issues (I'm yet to read
> this thread fully).
>
> We recently had a v1.0.0 GA release [1], [2] & [3]. One thing to note is
> that the current release has dependencies and integrations with MapR's
> distribution of Apache Drill and is close sourced at the moment (there is
> plan to open source that in the near future).
>
> I have an open source variant of this in the works - to support vanilla
> Apache Drill. In the current state, it has all similar features , it
> removes the MapR specific integration (reliance on MapR-FS instead of HDFS,
> MapR ZooKeeper and such). I shortly plan to add Apache HDFS and ZooKeeper
> integration instead. Let me know if you're interested - and I can share the
> GitHub branch.
>
> Regards,
> Abhishek
>
> [1]
>
> https://mapr.com/blog/mapr-releases-kubernetes-ecosystem-operators-for-apache-spark-and-apache-drill/
> [2]
>
> https://mapr.com/docs/home/PersistentStorage/running_drillbits_in_compute_space.html
> [3] https://github.com/mapr/mapr-operators
>
> On Wed, Jan 29, 2020 at 11:11 AM Ron Cecchini 
> wrote:
>
> >
> > Hi, all.  Drill and OpenShift newbie here.
> >
> > Has anyone successfully deployed a Drill Docker container to an OpenShift
> > environment?
> >
> > While there is information about Drill Docker, there seems to be zero
> > information about OpenShift in particular.
>

Re: Problem running Drill in a Docker container in OpenShift

2020-02-06 Thread Abhishek Girish

Arina,

There is a v1.0 of the Drill Operator for Kubernetes out, but it is
currently tied to the MapR platform. Of the people here, Anton & Denys have
tried it out.

The open source version of it that I'm working on is based on that, minus
MapR integration and plus Apache integration. My plan is to get a preview
out as soon as possible. I don't have an ETA, but I think end of Feb is a
timeframe odd like to get something out.

Regards,
Abhishek

On Tue, Feb 4, 2020 at 6:06 PM Arina Yelchiyeva 
wrote:

> Abhishek, looks like Paul is planning to put on hold his efforts since he
> expects you open-source version to be out soon.
> Do you have an ETA?
> This work is important for the community and it would be nice if is
> completed in this release.
>
> Here Jira to tract the efforts:
> https://issues.apache.org/jira/browse/DRILL-7563 <
> https://issues.apache.org/jira/browse/DRILL-7563>
>
> Kind regards,
> Arina
>
> > On Feb 3, 2020, at 10:30 PM, Paul Rogers 
> wrote:
> >
> > Hi Abhishek,
> >
> > Thanks for the update! Seems to make sense to wait for you to open
> source your work than to spend time on duplicating your effort. And, people
> who want a solution short term can perhaps work with MapR/HPE as you
> suggest.  Sounds like you have access to the various systems and have
> worked though the myriad details involved in creating a good integration.
> >
> > Does your work include Helm integration?
> >
> > The key challenge for any K8s integration is that Drill needs access to
> data, which requires some kind of distributed storage. This has long been a
> K8s weakness. But, it is, of course, a MapR strength.
> >
> > Please let us know if you need help with the open source efforts.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >On Monday, February 3, 2020, 3:13:28 AM PST, Abhishek Girish <
> agir...@apache.org> wrote:
> >
> > Hey Ron,
> >
> > As a part of MapR (now HPE), I've created a native operator for Apache
> > Drill and this works on multiple variants of Kubernetes including
> > OpenShift. With this, we introduce a new Kind called "DrillCluster" via a
> > Custom Resource Definition (CRD) and a Custom Controller (logic to manage
> > this DrillCluster kind - written in Golang) for the same. Using this,
> users
> > can easily deploy Drill clusters by submitting Custom Resource YAML files
> > (CRs) for the DrillCluster kind. It supports creation of multiple Drill
> > clusters (multiple Drillbits launched in distributed mode), multiple
> > versions (such as 1.15.0 and 1.16.0), auto-scaling the number of
> Drillbits
> > (based on CPU utilization) and more. I can share more details of this if
> > anyone's interested.
> >
> > While Vanilla K8S, and GKE worked out of the box, I had to make some
> > changes to support OpenShift (related to Service Accounts, Security
> Context
> > Constraints, etc). Perhaps you ran into similar issues (I'm yet to read
> > this thread fully).
> >
> > We recently had a v1.0.0 GA release [1], [2] & [3]. One thing to note is
> > that the current release has dependencies and integrations with MapR's
> > distribution of Apache Drill and is close sourced at the moment (there is
> > plan to open source that in the near future).
> >
> > I have an open source variant of this in the works - to support vanilla
> > Apache Drill. In the current state, it has all similar features , it
> > removes the MapR specific integration (reliance on MapR-FS instead of
> HDFS,
> > MapR ZooKeeper and such). I shortly plan to add Apache HDFS and ZooKeeper
> > integration instead. Let me know if you're interested - and I can share
> the
> > GitHub branch.
> >
> > Regards,
> > Abhishek
> >
> > [1]
> >
> https://mapr.com/blog/mapr-releases-kubernetes-ecosystem-operators-for-apache-spark-and-apache-drill/
> > [2]
> >
> https://mapr.com/docs/home/PersistentStorage/running_drillbits_in_compute_space.html
> > [3] https://github.com/mapr/mapr-operators
> >
> > On Wed, Jan 29, 2020 at 11:11 AM Ron Cecchini 
> > wrote:
> >
> >>
> >> Hi, all.  Drill and OpenShift newbie here.
> >>
> >> Has anyone successfully deployed a Drill Docker container to an
> OpenShift
> >> environment?
> >>
> >> While there is information about Drill Docker, there seems to be zero
> >> information about OpenShift in particular.
> >>
> >> Per the instructions at drill.apache.org/docs/running-drill-on-docker,
> I
> >> pulled the Drill Docker image f

Re: Problem running Drill in a Docker container in OpenShift

2020-02-03 Thread Abhishek Girish

Hey Ron,

As a part of MapR (now HPE), I've created a native operator for Apache
Drill and this works on multiple variants of Kubernetes including
OpenShift. With this, we introduce a new Kind called "DrillCluster" via a
Custom Resource Definition (CRD) and a Custom Controller (logic to manage
this DrillCluster kind - written in Golang) for the same. Using this, users
can easily deploy Drill clusters by submitting Custom Resource YAML files
(CRs) for the DrillCluster kind. It supports creation of multiple Drill
clusters (multiple Drillbits launched in distributed mode), multiple
versions (such as 1.15.0 and 1.16.0), auto-scaling the number of Drillbits
(based on CPU utilization) and more. I can share more details of this if
anyone's interested.

While Vanilla K8S, and GKE worked out of the box, I had to make some
changes to support OpenShift (related to Service Accounts, Security Context
Constraints, etc). Perhaps you ran into similar issues (I'm yet to read
this thread fully).

We recently had a v1.0.0 GA release [1], [2] & [3]. One thing to note is
that the current release has dependencies and integrations with MapR's
distribution of Apache Drill and is close sourced at the moment (there is
plan to open source that in the near future).

I have an open source variant of this in the works - to support vanilla
Apache Drill. In the current state, it has all similar features , it
removes the MapR specific integration (reliance on MapR-FS instead of HDFS,
MapR ZooKeeper and such). I shortly plan to add Apache HDFS and ZooKeeper
integration instead. Let me know if you're interested - and I can share the
GitHub branch.

Regards,
Abhishek

[1]
https://mapr.com/blog/mapr-releases-kubernetes-ecosystem-operators-for-apache-spark-and-apache-drill/
[2]
https://mapr.com/docs/home/PersistentStorage/running_drillbits_in_compute_space.html
[3] https://github.com/mapr/mapr-operators

On Wed, Jan 29, 2020 at 11:11 AM Ron Cecchini 
wrote:

>
> Hi, all.   Drill and OpenShift newbie here.
>
> Has anyone successfully deployed a Drill Docker container to an OpenShift
> environment?
>
> While there is information about Drill Docker, there seems to be zero
> information about OpenShift in particular.
>
> Per the instructions at drill.apache.org/docs/running-drill-on-docker, I
> pulled the Drill Docker image from Docker Hub, and then pushed it to our
> OpenShift environment.  But when I tried to deploy it, I immediately ran
> into an error about /opt/drill/conf/drill-override.conf not being readable.
>
> I understand why the problem is happening (because of who OpenShift runs
> the container as), so I downloaded the source from GitHub and modified the
> Dockerfile to include:
>
> RUN chgrp -R 0 /opt/drill && chmod -R g=u /opt/drill
>
> so that all of /opt/drill would be available to everyone.  But then
> 'docker build' kept failing, giving the error:
>
> Non-resolvable parent POM for
> org.apache.drill:drill-root:1.18.0-SNAPSHOT:
> Could not transfer artifact org.apache:apache:pom:21
>
> I tried researching that error but couldn't figure out what was going on.
> So I finally decided to start trying to mount persistent volumes, creating
> one PV for /opt/drill/conf (and then copying the default
> drill-override.conf there) and one PV for /opt/drill/log.
>
> Now the container gets much further, but eventually fails on something
> Hadoop related.  I'm not trying to do anything with Hadoop, so I don't know
> what that's about, but it says I don't have HADOOP_HOME set.
>
> Hopefully I can figure out the remaining steps I need (an environment
> variable?  more configs?), but I was wondering if anyone else had already
> successfully figured out how to deploy to OpenShift, or might know why the
> 'docker build' fails with that error?
>
> For what it's worth, I copied over only that drill-override.conf and
> nothing else.  And I did not set any Drill environment variables in
> OpenShift.  I'm basically trying to run the "vanilla" Drill Docker as-is.
>
> Thanks for any help!
>
> Ron
>

Re: Official Apache Drill Docker Images

2020-01-09 Thread Abhishek Girish

This is great. Thanks Vova!

On Thu, Jan 9, 2020 at 9:40 AM Charles Givre  wrote:

> Great work!
>
> > On Jan 9, 2020, at 12:39 PM, Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
> >
> > Nice ;)
> >
> >> On Jan 9, 2020, at 7:33 PM, Volodymyr Vysotskyi 
> wrote:
> >>
> >> Hi all,
> >>
> >> Some time ago we have introduced Docker Images for Drill and published
> them
> >> under custom repository.
> >> But now we have Official Docker Repository for Apache Drill placed in
> >> https://hub.docker.com/r/apache/drill.
> >>
> >> All images from our previous repository were pushed there and also
> >> DockerHub Automated Build was set up for the master branch which
> publishes
> >> images with master tag after the master branch is updated.
> >>
> >> Feel free to test it, and now even on the actual master branch!
> >>
> >> For the instructions on how to run Drill on Docker, please refer to
> >> https://drill.apache.org/docs/running-drill-on-docker/.
> >>
> >> Kind regards,
> >> Volodymyr Vysotskyi
> >
>
>

Re: Question about foreman restart

2020-01-07 Thread Abhishek Girish

Thanks Nitin.

As mentioned on Slack, Drill would not resubmit the queries. If any
drillbit being used in query execution goes down, the query in question is
cancelled.

On Tue, Jan 7, 2020 at 10:51 AM Nitin Pawar  wrote:

> I have created DRILL-7517 <
> https://issues.apache.org/jira/browse/DRILL-7517>
> this for drill shutting down issue.
>
> DRILL setup
> MAX Memory given : 56GB
> HEAP-12GB
> Direct memory: 40GB
>
> Thanks,
> Nitin
>
> On Tue, Jan 7, 2020 at 10:15 PM Nitin Pawar 
> wrote:
>
> > Hello Team
> > We have recently upgraded to drill-1.16 from drill-1.13 version
> > and we have started to notice lots of OOM issues .. its same setup with
> > changed binaries
> > till we figured out what’s the issue, we wanted to keep restarting
> > drillbits with cronjobs
> >
> > my question is : *If a drill is restarted .. would the queries with this
> > node as foreman be resubmitted automatically ?*
> >
> > Also we have a 64GB RAM machines. Can someone recommend memory setting
> for
> > this environment
> >
> > --
> > Nitin Pawar
> >
>
>
> --
> Nitin Pawar
>

Re: [ANNOUNCE] Apache Drill 1.17.0 Released

2019-12-26 Thread Abhishek Girish

Congratulations, everyone!

On Thu, Dec 26, 2019 at 10:20 AM Volodymyr Vysotskyi 
wrote:

> On behalf of the Apache Drill community, I am happy to announce the release
> of Apache Drill 1.17.0.
>
> Drill is an Apache open-source SQL query engine for Big Data exploration.
> Drill is designed from the ground up to support high-performance analysis
> on the semi-structured and rapidly evolving data coming from modern Big
> Data applications, while still providing the familiarity and ecosystem of
> ANSI SQL, the industry-standard query language. Drill provides
> plug-and-play integration with existing Apache Hive and Apache HBase
> deployments.
>
> For information about Apache Drill, and to get involved, visit the project
> website [1].
>
> Total of 200 JIRA's are resolved in this release of Drill with following
> new features and improvements [2]:
>
> - Hive complex types support (DRILL-7251,
> DRILL-7252, DRILL-7253, DRILL-7254)
> - ESRI Shapefile (shp) (DRILL-4303) and Excel (DRILL-7177) format
> plugins support
> - Drill Metastore support (DRILL-7272, DRILL-7273, DRILL-7357)
> - Upgrade to HADOOP-3.2 (DRILL-6540)
> - Schema Provision using File / Table Function (DRILL-6835)
> - Parquet runtime row group pruning (DRILL-7062)
> - User-Agent UDFs (DRILL-7343)
> - Canonical Map support (DRILL-7096)
> - Kafka storage plugin improvements
> (DRILL-6739, DRILL-6723, DRILL-7164, DRILL-7290, DRILL-7388)
>
> For the full list please see release notes [3].
>
> The binary and source artifacts are available here [4].
>
> Thanks to everyone in the community who contributed to this release!
>
> 1. https://drill.apache.org/
> 2. https://drill.apache.org/blog/2019/12/26/drill-1.17-released/
> 3. https://drill.apache.org/docs/apache-drill-1-17-0-release-notes/
> 4. https://drill.apache.org/download/
>
> Kind regards,
> Volodymyr Vysotskyi
>

Re: Slack Channel invitation Link

2019-12-02 Thread Abhishek Girish

Hey Rameshwar,

Can you please try with the below link:
https://join.slack.com/t/apache-drill/shared_invite/enQtNTQ4MjM1MDA3MzQ2LTJlYmUxMTRkMmUwYmQ2NTllYmFmMjU4MDk0NjYwZjBmYjg0MDZmOTE2ZDg0ZjBlYmI3Yjc4Y2I2NTQyNGVlZTc

I just tried and it looks active. Maybe the README needs to be updated.
I'll take a look.

Regards,
Abhishek

On Mon, Dec 2, 2019 at 3:05 AM Rameshwar Mane  wrote:

> Hi ,
>
> I tried to join slack channel using the link available in README.md of
> github repo. It says that the invite link is no longer active. If possible
> can you share slack workspace invite link.
> Thanks and Regards
> Rameshwar Mane
>

Re: [ANNOUNCE] New PMC Chair of Apache Drill

2019-08-22 Thread Abhishek Girish

Congratulations, Charles!! Looking forward to what's next.

Thanks a lot Arina, for your leadership in the last year. I think we may
have added more committers and PMC members in your tenure, than ever
before. The community is growing well, and I'm so glad to be a part of it.

On Thu, Aug 22, 2019 at 8:30 PM Kunal Khatua  wrote:

> Congratulations, Charles!
>

Re: Documentation feedback

2019-08-13 Thread Abhishek Girish

Hey Anastasiia,

Thanks a lot for the feedback and for including the solution!

Regards,
Abhishek

On Tue, Aug 13, 2019 at 3:23 AM Arina Yelchiyeva 
wrote:

> Feel free to submit PRs to update site documentation (
> https://github.com/apache/drill/tree/gh-pages <
> https://github.com/apache/drill/tree/gh-pages>) and Dev documentation (
> https://github.com/apache/drill/blob/master/docs/dev/Docker.md <
> https://github.com/apache/drill/blob/master/docs/dev/Docker.md>).
>
> Kind regards,
> Arina
>
> > On Aug 13, 2019, at 1:06 PM, Anastasiia Sergienko <
> anastasiia.sergie...@exasol.com> wrote:
> >
> > Dear Apache Drill Team,
> >
> > I'm currently working with a docker version of Drill and I'd like to
> > provide a small feedback.
> >
> > I installed a docker version according to the official documentation (
> > https://drill.apache.org/docs/running-drill-on-docker/) and after that
> > I tried to connect to Drill via JDBC following this instruction:
> > https://drill.apache.org/docs/using-the-jdbc-driver/. It didn't work,
> > because of the missing port forwarding settings. It took quite a lot of
> > time to find the problem and the right port. It turned out that I
> > needed port 31010 for the JDBC connection. After I added it to the
> > docker run command, it worked fine: docker run -i --name drill-1.16.0
> > -p 8047:8047 -p 31010:31010 -t drill/apache-drill:1.16.0 /bin/bash.
> >
> > I think it would be nice if you could add information about JDBC port
> > forwarding to the documentation.
> >
> > Best Regards,
> > Anastasiia Sergienko
> > Java Developer in Exasol
>
>

Re: [ANNOUNCE] New PMC member: Sorabh Hamirwasia

2019-04-05 Thread Abhishek Girish

Congratulations, Sorabh!!

On Fri, Apr 5, 2019 at 9:07 AM Timothy Farkas  wrote:

> Congrats!
>
> Tim
>
> On Fri, Apr 5, 2019 at 9:06 AM Arina Ielchiieva  wrote:
>
> > I am pleased to announce that Drill PMC invited Sorabh Hamirwasia to
> > the PMC and
> > he has accepted the invitation.
> >
> > Congratulations Sorabh and welcome!
> >
> > - Arina
> > (on behalf of Drill PMC)
> >
>

Re: Bonjour,

2019-03-21 Thread Abhishek Girish

Hello Justin,

The main goal of this project is to query existing datasets - which may be
in any of the supported formats [1]. There is currently no support for
INSERT INTO. However, Drill does support CREATE TABLE AS [2]. Please give
the docs a read and let us know if that helps.

Regards,
Abhishek

[1] http://drill.apache.org/docs/connect-a-data-source-introduction/
[2] http://drill.apache.org/docs/create-table-as-ctas/

On Thu, Mar 21, 2019 at 2:09 PM Justin RAFANOMEZANTSOA <
justinamad...@gmail.com> wrote:

> Bonjour,
>Je voudrais vous demander comment insérer de données dans drill afin de
> pouvoir les requêter. Je travaille sur ce projet de drill mais là je suis
> un peu coincé.
> Votre aide sera appréciée.
> Je vous remercie d'avance,
> Cordialement,
> Justin
>

Re: Query Compilation error with 80+ CASE statements

2019-02-27 Thread Abhishek Girish

Rahul,

Can you please share plans for both queries (one with fewer which succeeds
and one which fails). Also the verbose error.

On Tue, Feb 26, 2019 at 11:33 PM Rahul Raj  wrote:

> Some more update to the mail above:
>
> The query above has a UDF 'checkNull' used . The UDF code is placed inside
> the compiled query code, causing it to fail where there are more case
> statements. The below snippet is from the UDF.
>
> {
> if (input.end - input.start == 0) {
> throw new RuntimeException("IllegalArgumentException : null values
> in non nullable fields");
> } else
> {
> out = input;
> }
> }
>
> Any thoughts on this? Are there any naming conventions while developing a
> UDF?
>
> Regards,
> Rahul
>
>
>
> On Wed, Feb 27, 2019 at 12:14 PM Rahul Raj  wrote:
>
> > Hi,
> >
> > I am getting compilation error on Drill 1.15 when query contains a large
> > number of case statements. I have included the query below. Query works
> > fine when few case statements are removed.
> >
> > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> > CompileException: File
> >
> 'org.apache.drill.exec.compile.DrillJavaFileObject[ProjectorGen8635.java]',
> > Line 872, Column 9: ProjectorGen8635.java:872: error: cannot assign a
> value
> > to final variable out out = input; ^
> > (compiler.err.cant.assign.val.to.final.var) Fragment 0:0 Please, refer to
> > logs for more information. [Error Id:
> a0d3f054-7c60-4915-9629-55e5dacd8606
> > on jiffydemo:31010]
> >
> > Query is :
> >
> > SELECT
> >  CAST(`A1` AS INT) `A1`
> >, CAST(`A2` AS INT) `A2`
> >,  `A3`
> >,  `A4`
> >,  `A5`
> >, (CASE WHEN (`A6` = '') THEN null ELSE `A6` END) `A6`
> >,  `A7`
> >,  `A8`
> >,  `A9`
> >,  `A10`
> >, CAST(A11 AS INT) `A11`
> >, (CASE WHEN (`A12` = '') THEN null ELSE `A12` END) `A12`
> >, CAST(`checkNull`(`A13`) AS INT) `A13`
> >, CAST(`checkNull`(`A14`) AS INT) `A14`
> >, (CASE WHEN (`A15` = '') THEN null ELSE `A15` END) `A15`
> >, CAST(`checkNull`(`A16`) AS INT) `A16`
> >, CAST(`checkNull`(`A17`) AS INT) `A17`
> >, CAST(`checkNull`(`A18`) AS INT) `A18`
> >, (CASE WHEN (`A19` = '') THEN null ELSE `A19` END) `A19`
> >,  `A20`
> >,  `A21`
> >,  `A22`
> >, (CASE WHEN (`_1` = '') THEN null ELSE `_1` END) `_1`
> >, (CASE WHEN (`_2` = '') THEN null ELSE `_2` END) `_2`
> >, (CASE WHEN (`_3` = '') THEN null ELSE `_3` END) `_3`
> >, (CASE WHEN (`_4` = '') THEN null ELSE `_4` END) `_4`
> >, (CASE WHEN (`_5` = '') THEN null ELSE `_5` END) `_5`
> >, (CASE WHEN (`_6` = '') THEN null ELSE `_6` END) `_6`
> >, (CASE WHEN (`_7` = '') THEN null ELSE `_7` END) `_7`
> >, (CASE WHEN (`_8` = '') THEN null ELSE `_8` END) `_8`
> >, (CASE WHEN (`_9` = '') THEN null ELSE `_9` END) `_9`
> >, (CASE WHEN (`_10` = '') THEN null ELSE `_10` END) `_10`
> >, (CASE WHEN (`_11` = '') THEN null ELSE `_11` END) `_11`
> >, (CASE WHEN (`_12` = '') THEN null ELSE `_12` END) `_12`
> >, (CASE WHEN (`_13` = '') THEN null ELSE `_13` END) `_13`
> >, (CASE WHEN (`_14` = '') THEN null ELSE `_14` END) `_14`
> >, (CASE WHEN (`_15` = '') THEN null ELSE `_15` END) `_15`
> >, (CASE WHEN (`_16` = '') THEN null ELSE `_16` END) `_16`
> >, (CASE WHEN (`_17` = '') THEN null ELSE `_17` END) `_17`
> >, (CASE WHEN (`_18` = '') THEN null ELSE `_18` END) `_18`
> >, (CASE WHEN (`_19` = '') THEN null ELSE `_19` END) `_19`
> >, (CASE WHEN (`_20` = '') THEN null ELSE `_20` END) `_20`
> >, (CASE WHEN (`_21` = '') THEN null ELSE `_21` END) `_21`
> >, (CASE WHEN (`_22` = '') THEN null ELSE `_22` END) `_22`
> >, (CASE WHEN (`_23` = '') THEN null ELSE `_23` END) `_23`
> >, (CASE WHEN (`_24` = '') THEN null ELSE `_24` END) `_24`
> >, (CASE WHEN (`_25` = '') THEN null ELSE `_25` END) `_25`
> >, (CASE WHEN (`_26` = '') THEN null ELSE `_26` END) `_26`
> >, (CASE WHEN (`_27` = '') THEN null ELSE `_27` END) `_27`
> >, (CASE WHEN (`_28` = '') THEN null ELSE `_28` END) `_28`
> >, (CASE WHEN (`_29` = '') THEN null ELSE `_29` END) `_29`
> >, (CASE WHEN (`_30` = '') THEN null ELSE `_30` END) `_30`
> >, (CASE WHEN (`_31` = '') THEN null ELSE `_31` END) `_31`
> >, (CASE WHEN (`_32` = '') THEN null ELSE `_32` END) `_32`
> >, (CASE WHEN (`_33` = '') THEN null ELSE `_33` END) `_33`
> >, (CASE WHEN (`_34` = '') THEN null ELSE `_34` END) `_34`
> >, (CASE WHEN (`_35` = '') THEN null ELSE `_35` END) `_35`
> >, (CASE WHEN (`_36` = '') THEN null ELSE `_36` END) `_36`
> >, (CASE WHEN (`_37` = '') THEN null ELSE `_37` END) `_37`
> >, (CASE WHEN (`_38` = '') THEN null ELSE `_38` END) `_38`
> >, (CASE WHEN (`_39` = '') THEN null ELSE `_39` END) `_39`
> >, (CASE WHEN (`_40` = '') THEN null ELSE `_40` END) `_40`
> >, (CASE WHEN (`_41` = '') THEN null ELSE `_41` END) `_41`
> >, (CASE WHEN (`_42` = '') THEN null ELSE `_42` END) `_42`
> >, (CASE

Re: Anouncing Powered-By-Drill page

2019-02-27 Thread Abhishek Girish

This is great! Thanks for making it happen.

On Wed, Feb 27, 2019 at 4:12 PM Kunal Khatua  wrote:

> Hi everyone
>
> It gives me great pleasure in announcing the launch of the "Powered By
> Drill" page on the official Apache Drill website :
> https://drill.apache.org/poweredBy
>
> As a start, the page currently has a handful Drill users that shared a
> short blurb about their usage in production, and one of them (RedBus) also
> has a link to a published blog detailing their usecase.
>
> We encourage that more users share their usecases in a similar format and
> hoping to grow this list to represent the true Drill community.
>
> If you would you like to be included on this page, please reach out by
> directly send a message to @ApacheDrill on Twitter or me at kunal[at]
> apache.org
>
> Please do not hesitate to ask any questions you have as well.
>
> Thank you
> Kunal
> kunal[at]apache.org

Re: HDFS storage prefix returning Error: VALIDATION ERROR: null

2019-02-12 Thread Abhishek Girish

I meant for you to run
show files in hdfs.tmp

But it looks like the plugin might not be initialized correctly (check if
the hostname provided in the connection string can be resolved)

Or you may not have used the right user when launching sqlline (user may
not have permissions on the hdfs root dir or somewhere in the file path).

On Tue, Feb 12, 2019 at 10:57 PM Krishnanand Khambadkone
 wrote:

>  The command show files in dfs.tmp does return the right output.
> However when I try to run a simple hdfs query
> select
> s.application_id  from 
> hdfs.`/user/hive/spark_data/dt=2019-01-25/part-4-ae91cbe2-5410-4bec-ad68-10a053fb2b68.json`
> it returns,
>
> Error: VALIDATION ERROR: Schema [[hdfs]] is not valid with respect to
> either root schema or current default schema.
>
>
> On Tuesday, February 12, 2019, 5:10:57 PM PST, Abhishek Girish <
> agir...@apache.org> wrote:
>
>  Can you please share the full error message (please see [1])
>
> Also, can you please see if this works: show files in dfs.tmp; This is to
> check if the DFS plugin is successfully initialized and Drill can see the
> files on HDFS. And if that works, check if simpler queries on the data
> works: select * from hdfs.``
>
> [1] https://drill.apache.org/docs/troubleshooting/#enable-verbose-errors
>
> On Tue, Feb 12, 2019 at 4:38 PM Krishnanand Khambadkone
>  wrote:
>
> >  Here is the hdfs storage definition and query I am using.  Same query
> > runs fine if run off local filesystem with dfs storage prefix.  All I am
> > doing is swapping dfs for hdfs.
> >
> > {
> >
> >  "type": "file",
> >
> >  "connection": "hdfs://host18-namenode:8020/",
> >
> >  "config": null,
> >
> >  "workspaces": {
> >
> >"tmp": {
> >
> >  "location": "/tmp",
> >
> >  "writable": true,
> >
> >  "defaultInputFormat": null,
> >
> >  "allowAccessOutsideWorkspace": false
> >
> >},
> >
> >"root": {
> >
> >  "location": "/",
> >
> >  "writable": false,
> >
> >  "defaultInputFormat": null,
> >
> >  "allowAccessOutsideWorkspace": false
> >
> >}
> >
> >  },
> >
> >  "formats": null,
> >
> >  "enabled": true
> >
> > }
> >
> >
> >
> >
> > select s.application_id,
> > get_spark_attrs(s.spark_event,'spark.executor.memory') as
> spark_attributes
> >  from
> >
> hdfs.`/user/hive/spark_data/dt=2019-01-25/part-00004-ae91cbe2-5410-4bec-ad68-10a053fb2b68.json`
> > s where (REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11),
> > '[^0-9A-Za-z]"', ''),'(".*)','') = 'SparkListenerEnvironmentUpdate' or
> > REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11), '[^0-9A-Za-z]"',
> > ''),'(".*)','') = 'SparkListenerApplicationStart' or
> > REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11), '[^0-9A-Za-z]"',
> > ''),'(".*)','') = 'SparkListenerApplicationEnd') group by application_id,
> > spark_attributes  order by application_id;
> >
> >
> >
> >On Tuesday, February 12, 2019, 3:04:40 PM PST, Abhishek Girish <
> > agir...@apache.org> wrote:
> >
> >  This message is eligible for Automatic Cleanup! (agir...@apache.org)
> Add
> > cleanup rule | More info
> >  Hey Krishnanand,
> >
> > As mentioned by other folks in earlier threads, can you make sure to
> > include ALL RELEVANT details in your emails? That includes the query,
> > storage plugin configuration, data format, sample data / description of
> the
> > data, the full log for the query failure? It's necessary if one needs to
> be
> > able to understand the issue or offer help.
> >
> > Regards,
> > Abhishek
> >
> > On Tue, Feb 12, 2019 at 2:37 PM Krishnanand Khambadkone
> >  wrote:
> >
> > > I have defined a hdfs storage type with all the required properties.
> > > However, when I try to use that in the query it returns
> > > Error: VALIDATION ERROR: null
> > >
> >
>

Re: HDFS storage prefix returning Error: VALIDATION ERROR: null

2019-02-12 Thread Abhishek Girish

Can you please share the full error message (please see [1])

Also, can you please see if this works: show files in dfs.tmp; This is to
check if the DFS plugin is successfully initialized and Drill can see the
files on HDFS. And if that works, check if simpler queries on the data
works: select * from hdfs.``

[1] https://drill.apache.org/docs/troubleshooting/#enable-verbose-errors

On Tue, Feb 12, 2019 at 4:38 PM Krishnanand Khambadkone
 wrote:

>  Here is the hdfs storage definition and query I am using.  Same query
> runs fine if run off local filesystem with dfs storage prefix.  All I am
> doing is swapping dfs for hdfs.
>
> {
>
>   "type": "file",
>
>   "connection": "hdfs://host18-namenode:8020/",
>
>   "config": null,
>
>   "workspaces": {
>
> "tmp": {
>
>   "location": "/tmp",
>
>   "writable": true,
>
>   "defaultInputFormat": null,
>
>   "allowAccessOutsideWorkspace": false
>
> },
>
> "root": {
>
>   "location": "/",
>
>   "writable": false,
>
>   "defaultInputFormat": null,
>
>   "allowAccessOutsideWorkspace": false
>
> }
>
>   },
>
>   "formats": null,
>
>   "enabled": true
>
> }
>
>
>
>
> select s.application_id,
> get_spark_attrs(s.spark_event,'spark.executor.memory') as spark_attributes
>   from
> hdfs.`/user/hive/spark_data/dt=2019-01-25/part-4-ae91cbe2-5410-4bec-ad68-10a053fb2b68.json`
> s where (REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11),
> '[^0-9A-Za-z]"', ''),'(".*)','') = 'SparkListenerEnvironmentUpdate' or
> REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11), '[^0-9A-Za-z]"',
> ''),'(".*)','') = 'SparkListenerApplicationStart' or
> REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11), '[^0-9A-Za-z]"',
> ''),'(".*)','') = 'SparkListenerApplicationEnd') group by application_id,
> spark_attributes  order by application_id;
>
>
>
> On Tuesday, February 12, 2019, 3:04:40 PM PST, Abhishek Girish <
> agir...@apache.org> wrote:
>
>  This message is eligible for Automatic Cleanup! (agir...@apache.org) Add
> cleanup rule | More info
>  Hey Krishnanand,
>
> As mentioned by other folks in earlier threads, can you make sure to
> include ALL RELEVANT details in your emails? That includes the query,
> storage plugin configuration, data format, sample data / description of the
> data, the full log for the query failure? It's necessary if one needs to be
> able to understand the issue or offer help.
>
> Regards,
> Abhishek
>
> On Tue, Feb 12, 2019 at 2:37 PM Krishnanand Khambadkone
>  wrote:
>
> > I have defined a hdfs storage type with all the required properties.
> > However, when I try to use that in the query it returns
> > Error: VALIDATION ERROR: null
> >
>

Re: HDFS storage prefix returning Error: VALIDATION ERROR: null

2019-02-12 Thread Abhishek Girish

Hey Krishnanand,

As mentioned by other folks in earlier threads, can you make sure to
include ALL RELEVANT details in your emails? That includes the query,
storage plugin configuration, data format, sample data / description of the
data, the full log for the query failure? It's necessary if one needs to be
able to understand the issue or offer help.

Regards,
Abhishek

On Tue, Feb 12, 2019 at 2:37 PM Krishnanand Khambadkone
 wrote:

> I have defined a hdfs storage type with all the required properties.
> However, when I try to use that in the query it returns
> Error: VALIDATION ERROR: null
>

Re: Slack workspace for Drill discussions

2019-02-11 Thread Abhishek Girish

I found a way to create an invite link
<https://join.slack.com/t/apache-drill/shared_invite/enQtNTQ4MjM1MDA3MzQ2LThkNDBjMmNiMDY4ZGE5YzRhY2VlNjZhZTQzYTI2NDhmZTcxODcwZGU5OGY2OTk0NDUxZjBlYWQ0YTRlYzJjZDQ>which
is open to all and never expires. Once we decide on the workspace, we can
probably add it to the Drill website in the Community page.

On Mon, Feb 11, 2019 at 3:43 PM Charles Givre  wrote:

> 
>
> Sent from my iPhone
>
> > On Feb 11, 2019, at 18:30, Abhishek Girish  wrote:
> >
> > Hey folks,
> >
> > There have been some questions on a Slack workspace for Drill - it's
> > popular for multiple open source projects and I think we should encourage
> > active participation as well. The previous Slack workspace for Apache
> Drill
> > has been idle for ~3 years and we've been unable to invite new users due
> to
> > lack of admin privileges on the workspace.
> >
> > I've created a new Slack workspace :
> https://apache-drill.slack.com/signup.
> > I've whitelisted a few popular domains - we can add more and also find a
> > way for making it more easier for people to sign-up.
> >
> > And we can provide admin access to PMC/Committers / interested
> > contributors, so that it's easy to administer the workspace.
> >
> > Or we could find a way to get access to the original workspace and do the
> > same as as above. Let me know what you think.
> >
> > Regards,
> > Abhishek
>

Slack workspace for Drill discussions

2019-02-11 Thread Abhishek Girish

Hey folks,

There have been some questions on a Slack workspace for Drill - it's
popular for multiple open source projects and I think we should encourage
active participation as well. The previous Slack workspace for Apache Drill
has been idle for ~3 years and we've been unable to invite new users due to
lack of admin privileges on the workspace.

I've created a new Slack workspace : https://apache-drill.slack.com/signup.
I've whitelisted a few popular domains - we can add more and also find a
way for making it more easier for people to sign-up.

And we can provide admin access to PMC/Committers / interested
contributors, so that it's easy to administer the workspace.

Or we could find a way to get access to the original workspace and do the
same as as above. Let me know what you think.

Regards,
Abhishek

Re: January Apache Drill board report

2019-01-31 Thread Abhishek Girish

+1. Looks good!

On Thu, Jan 31, 2019 at 9:15 AM Vitalii Diravka  wrote:

> +1
>
> Kind regards
> Vitalii
>
>
> On Thu, Jan 31, 2019 at 6:18 PM Aman Sinha  wrote:
>
> > Thanks for putting this together, Arina.
> > The Drill Developer Day and Meetup were separate events, so you can split
> > them up.
> >   - A half day Drill Developer Day was held on Nov 14.  A variety of
> > technical design issues were discussed.
> >   - A Drill user meetup was held on the same evening.  2 presentations -
> > one on use case for Drill and one about indexing support in Drill were
> > presented.
> >
> > Rest of the report LGTM.
> >
> > -Aman
> >
> >
> > On Thu, Jan 31, 2019 at 7:58 AM Arina Ielchiieva 
> wrote:
> >
> > > Hi all,
> > >
> > > please take a look at the draft board report for the last quarter and
> let
> > > me know if you have any comments.
> > >
> > > Thanks,
> > > Arina
> > >
> > > =
> > >
> > > ## Description:
> > >  - Drill is a Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud
> > > Storage.
> > >
> > > ## Issues:
> > >  - There are no issues requiring board attention at this time.
> > >
> > > ## Activity:
> > >  - Since the last board report, Drill has released version 1.15.0,
> > >including the following enhancements:
> > >- Add capability to do index based planning and execution
> > >- CROSS join support
> > >- INFORMATION_SCHEMA FILES and FUNCTIONS were added
> > >- Support for TIMESTAMPADD and TIMESTAMPDIFF functions
> > >- Ability to secure znodes with custom ACLs
> > >- Upgrade to SQLLine 1.6
> > >- Parquet filter pushdown for VARCHAR and DECIMAL data types
> > >- Support JPPD (Join Predicate Push Down)
> > >- Lateral join functionality was enabled by default
> > >- Multiple Web UI improvements to simplify the use of options and
> > submit
> > > queries
> > >- Query performance with the semi-join functionality was improved
> > >- Support for aliases in the GROUP BY clause
> > >- Options to return null for empty string and prevents Drill from
> > > returning
> > >  a result set for DDL statements
> > >- Storage plugin names became case-insensitive
> > >
> > > - Drill developer meet up was held on November 14, 2018.
> > >
> > > ## Health report:
> > >  - The project is healthy. Development activity
> > >as reflected in the pull requests and JIRAs is good.
> > >  - Activity on the dev and user mailing lists are stable.
> > >  - Three committers were added in the last period.
> > >
> > > ## PMC changes:
> > >
> > >  - Currently 23 PMC members.
> > >  - No new PMC members added in the last 3 months
> > >  - Last PMC addition was Charles Givre on Mon Sep 03 2018
> > >
> > > ## Committer base changes:
> > >
> > >  - Currently 51 committers.
> > >  - New commmitters:
> > > - Hanumath Rao Maduri was added as a committer on Thu Nov 01 2018
> > > - Karthikeyan Manivannan was added as a committer on Fri Dec 07
> 2018
> > > - Salim Achouche was added as a committer on Mon Dec 17 2018
> > >
> > > ## Releases:
> > >
> > >  - 1.15.0 was released on Mon Dec 31 2018
> > >
> > > ## Mailing list activity:
> > >
> > >  - d...@drill.apache.org:
> > > - 415 subscribers (down -12 in the last 3 months):
> > > - 2066 emails sent to list (2653 in previous quarter)
> > >
> > >  - iss...@drill.apache.org:
> > > - 18 subscribers (up 0 in the last 3 months):
> > > - 2480 emails sent to list (3228 in previous quarter)
> > >
> > >  - user@drill.apache.org:
> > > - 592 subscribers (down -5 in the last 3 months):
> > > - 249 emails sent to list (310 in previous quarter)
> > >
> > >
> > > ## JIRA activity:
> > >
> > >  - 196 JIRA tickets created in the last 3 months
> > >  - 171 JIRA tickets closed/resolved in the last 3 months
> > >
> >
>

Re: Drill on YARN Questions

2019-01-11 Thread Abhishek Girish

Hello Teddy,

I don't recollect a restart option for the drill-on-yarn.sh script. I've
always used a combination of stop and start, like Paul mentions. Could you
please try if that works and get back to us? We could certainly have a
minor enhancement to support restart - until then i'll request Bridget to
update the documentation.

Regards,
Abhishek

On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy 
wrote:

> Hello Paul ,
>
> Thanks you for your response with some interesting information(files in
> /tmp).
>
> For my side all other command line  work normally(start|stop|status...|)
> but no restart(this option not recognized). I tried to search the code
> source and I found that the restart command is not implemented . then I
> wonder why the documentation does not match the source code ?.
>
> Thanks .Teddy
>
>
> On Sat, Jan 12, 2019, 02:39 Paul Rogers 
> > Let's try to troubleshoot. Does the combination of stop and start work?
> If
> > so, then there could be a bug with the restart command itself.
> >
> > If neither start nor stop work, it could be that you are missing the
> > application ID file created when you first started DoY. Some background.
> >
> > When we submit an app to YARN, YARN gives us an app ID. We need this in
> > order to track down the app master for DoY so we can send it commands
> later.
> >
> > When the command line tool starts DoY, it writes the YARN app ID to a
> > file. Can't remember the details, but it is probably in the $DRILL_SITE
> > directory. The contents are, as I recall, a long hexadecimal string.
> >
> > When you invoke the command line, the tool reads this file to figure to
> > track down the DoY app master. The tool then sends commands to the app
> > master: in this case, a request to shut down. Then, for reset, the tool
> > will communicate with YARN to start a new instance.
> >
> > The tool is suppose to give detailed error messages. Did you get any?
> That
> > might tell us which of these steps failed.
> >
> > Can you connect to the DoY Web UI at the URL provided when you started
> > DoY? If you can, this means that the DoY App Master is up and running.
> >
> > Are you running the client from the same node on which you started it?
> > That file I mentioned is local to the "DoY client" machine; it is not in
> > DFS.
> >
> > Then, there is one more very obscure bug you can check. On some
> > distributions, the YARN task files are written to the /tmp directory.
> Some
> > Linux systems remove these files from time to time. Once the files are
> > gone, YARN can no longer control its containers: it won't be able to stop
> > the app master or the Drillbit containers. There are two fixes. First, go
> > kill all the processes by hand. Then, move the YARN state files out of
> > /tmp, or exclude YARN's files from the periodic cleanup.
> >
> > Try some of the above and let us know what you find.
> >
> > Also, perhaps Abhishek can offer some suggestions as he tested the heck
> > out of the feature and may have additional suggestions.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> > On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy <
> > nbted2...@gmail.com> wrote:
> >
> >  hello,
> >
> >  2 weeks ago, I began to discover DoY. Today by reading drill documents (
> > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw
> that
> > we can restart drill cluster by :
> >
> >  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
> >
> > But doesn't work when I tested it.
> >
> > No idea about it?
> >
> > Thanks.
> >
> >
> >
> >
> > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers 
> > wrote:
> >
> > > Hi Charles,
> > >
> > > Your engineers have identified a common need, but one which is very
> > > difficult to satisfy.
> > >
> > > TL;DR: DoY gets as close to the requirements as possible within the
> > > constraints of YARN and Drill. But, future projects could do more.
> > >
> > > Your engineers want resource segregation among tenants: multi-tenancy.
> > > This is very difficult to achieve at the application level. Consider
> > Drill.
> > > It would need some way to identify users to know which tenant they
> belong
> > > to. Then, Drill would need a way to enqueue users whose queries would
> > > exceed the memory or CPU limit for that tenant. Plus, Drill would have
> to
> > > be able to limit memory and CPU for each query. Much work has been done
> > to
> > > limit memory, but CPU is very difficult. Mature products such as
> Teradata
> > > can do this, but Teradata has 40 years of effort behind it.
> > >
> > > Since it is hard to build multi-tenancy in at the app level (not
> > > impossible, just very, very hard), the thought is to apply it at the
> > > cluster level. This is done in YARN via limiting the resources
> available
> > to
> > > processes (typically map/reduce) and to limit the number of running
> > > processes. Works for M/R because each map task uses disk to shuffle
> > results
> > > to a reduce task, so map and reduce tasks can run

Apache Drill 1.15.0 Docker image now available

2019-01-07 Thread Abhishek Girish

Hey folks,

Updated docker image for AD 1.15.0 is now available.

To pull image, use the following command:


docker pull drill/apache-drill:latest

or

docker pull drill/apache-drill:1.15.0

For instructions on how to bring up the Drill container, please refer to
[1].

[1] http://drill.apache.org/docs/running-drill-on-docker/

Regards,
Abhishek

Re: [ANNOUNCE] Apache Drill 1.15.0 released

2018-12-31 Thread Abhishek Girish

Congratulations everyone, on yet another great release!

And Happy New Year 

On Mon, Dec 31, 2018 at 5:47 AM Vitalii Diravka  wrote:

> On behalf of the Apache Drill community, I am happy to announce the release
> of Apache Drill 1.15.0.
>
> Drill is an Apache open-source SQL query engine for Big Data exploration.
> Drill is designed from the ground up to support high-performance analysis
> on the semi-structured and rapidly evolving data coming from modern Big
> Data applications, while still providing the familiarity and ecosystem of
> ANSI SQL, the industry-standard query language. Drill provides
> plug-and-play integration with existing Apache Hive and Apache HBase
> deployments.
>
> For information about Apache Drill, and to get involved, visit the project
> website [1].
>
> With over 200 commits this release of Drill provides the following new
> features and improvements [2]:
> - SQLLine upgrade to 1.6 (DRILL-3853)
> - Index support (DRILL-6381)
> - Ability to create custom ACLs to secure znodes (DRILL-5671)
> - INFORMATION_SCHEMA FILES table (DRILL-6680)
> - System functions table (DRILL-3988)
>
> For the full list please see release notes [3].
>
> The binary and source artifacts are available here [4].
>
> Thanks to everyone in the community who contributed to this release!
> Congratulations on another Drill release!
> Have a Happy New Year!!!
>
> 1. https://drill.apache.org/
> 2. https://drill.apache.org/blog/2018/12/31/drill-1.15-released/
> 3. https://drill.apache.org/docs/apache-drill-1-15-0-release-notes/
> 4. https://drill.apache.org/download/
>
> Kind regards
> Vitalii
>

Re: [ANNOUNCE] New Committer: Salim Achouche

2018-12-18 Thread Abhishek Girish

Congratulations, Salim!

On Mon, Dec 17, 2018 at 2:40 AM Arina Ielchiieva  wrote:

> The Project Management Committee (PMC) for Apache Drill has invited Salim
> Achouche to become a committer, and we are pleased to announce that he has
> accepted.
>
> Salim Achouche [1] started contributing to the Drill project in 2017. He
> has made many improvements for the parquet reader, including performance
> for flat data types, columnar parquet batch sizing functionality, fixed
> various bugs and memory leaks. He also optimized implicit columns handling
> with scanner and improved sql pattern contains performance.
>
> Welcome Salim, and thank you for your contributions!
>
> - Arina
> (on behalf of Drill PMC)
>

Re: [ANNOUNCE] New Committer: Karthikeyan Manivannan

2018-12-07 Thread Abhishek Girish

Congratulations Karthik!

On Fri, Dec 7, 2018 at 11:11 AM Arina Ielchiieva  wrote:

> The Project Management Committee (PMC) for Apache Drill has invited
> Karthikeyan
> Manivannan to become a committer, and we are pleased to announce that he
> has accepted.
>
> Karthik started contributing to the Drill project in 2016. He has
> implemented changes in various Drill areas, including batch sizing,
> security, code-gen, C++ part. One of his latest improvements is  ACL
> support for Drill ZK nodes.
>
> Welcome Karthik, and thank you for your contributions!
>
> - Arina
> (on behalf of Drill PMC)
>

Re: [ANNOUNCE] New Committer: Hanumath Rao Maduri

2018-11-01 Thread Abhishek Girish

Congratulations, Hanu!

On Thu, Nov 1, 2018 at 10:56 AM Khurram Faraaz  wrote:

> Congratulations Hanu!
>
> On Thu, Nov 1, 2018 at 10:14 AM Gautam Parai  wrote:
>
> > Congratulations Hanumath! Well deserved :)
> >
> > Gautam
> >
> > On Thu, Nov 1, 2018 at 9:44 AM AnilKumar B 
> wrote:
> >
> > > Congratulations Hanumath.
> > >
> > > Thanks & Regards,
> > > B Anil Kumar.
> > >
> > >
> > > On Thu, Nov 1, 2018 at 9:39 AM Vitalii Diravka 
> > wrote:
> > >
> > > > Congratulations!
> > > >
> > > > Kind regards
> > > > Vitalii
> > > >
> > > >
> > > > On Thu, Nov 1, 2018 at 5:43 PM salim achouche 
> > > > wrote:
> > > >
> > > > > Congrats Hanu!
> > > > >
> > > > > On Thu, Nov 1, 2018 at 6:05 AM Arina Ielchiieva 
> > > > wrote:
> > > > >
> > > > > > The Project Management Committee (PMC) for Apache Drill has
> invited
> > > > > > Hanumath
> > > > > > Rao Maduri to become a committer, and we are pleased to announce
> > that
> > > > he
> > > > > > has accepted.
> > > > > >
> > > > > > Hanumath became a contributor in 2017, making changes mostly in
> the
> > > > Drill
> > > > > > planning side, including lateral / unnest support. He is also one
> > of
> > > > the
> > > > > > contributors of index based planning and execution support.
> > > > > >
> > > > > > Welcome Hanumath, and thank you for your contributions!
> > > > > >
> > > > > > - Arina
> > > > > > (on behalf of Drill PMC)
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Salim
> > > > >
> > > >
> > >
> >
>

Re: unable to connect to mongodb

2018-10-10 Thread Abhishek Girish

Hey Bhavik,

The error indicates a authentication failure - can you double check your
login info? Also can you try specifying the IP address of the Mongo
instance instead of localhost - in case you have multiple Drillbits? I
haven't used Drill with Mongo - so hopefully someone who has can chime in
with their suggestions.

Regards,
Abhishek

On Wed, Sep 26, 2018 at 10:33 AM Bhavik Shah  wrote:

> Hello
> When I try to connect to MongoDB database I get the following error:
>
>
> o.a.d.e.s.m.s.MongoSchemaFactory - Failure while loading databases in
> Mongo. Timed out after 3 ms while waiting for a server that matches
> ReadPreferenceServerSelector{readPreference=primary}. Client view of
> cluster state is {type=UNKNOWN, servers=[{address=localhost:27017,
> type=UNKNOWN, state=CONNECTING,
> exception={com.mongodb.MongoSecurityException: Exception authenticating
> MongoCredential{mechanism=null, userName='drillbit', source='admin',
> password=, mechanismProperties={}}}, caused by
> {com.mongodb.MongoCommandException: Command failed with error 18:
> 'Authentication failed.' on server localhost:27017. The full response is {
> "ok" : 0.0, "errmsg" : "Authentication failed.", "code" : 18, "codeName" :
> "AuthenticationFailed" }}}]
>
>
> Following is my mongo connector configuration:
>
>
> {
>   "type": "mongo",
>   "connection": "mongodb://drillbit:txyzwswn60@localhost:27017/",
>   "enabled": true
> }
>

Re: KVGEN - cann't select the drill on hbase with json value

2018-10-10 Thread Abhishek Girish

Hey liuwenkai,

Can you please share a sample row from the table? Also the output of Drill
query with (1) select field, (2) select convert_from(field) ?

On Fri, Sep 28, 2018 at 10:15 AM kvnew <272301...@qq.com> wrote:

> Hi,
>
>
> About drill on hbase,  when i search the result of kvgen ,but cann't match
> the value on hbase.
>
>
> select KVGEN(CONVERT_FROM(united_user_profile_test1.f4.province_freq,
> 'UTF8')) from hbase.united_user_profile_test1 limit 10
>
>
> It take out error:
> "
>
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> DrillRuntimeException: kvgen function only supports Simple maps as input
> Fragment 0:0 [Error Id: 91b98616-de1c-49bf-bbbc-3d39e8b57a5a on
> tencent-recom-hdp04:31010]"
>
>
>
>
> Hbase:
>  00026b888c2eef20bad9de column=f4:province_freq, timestamp=1537861268399,
> value={"\xE5\xB9
>  \xBF\xE4\xB8\x9C":1}
>
>
>
>
>
> Thanks,
> liuwenkai
>
>
> --

Re: Kafka Plugin in Drill 1.14

2018-10-10 Thread Abhishek Girish

Hey Divya,

Can you please share if Arina's suggestion worked? It will be helpful for
others who encounter a similar problem.

On Mon, Oct 8, 2018 at 12:37 AM Arina Yelchiyeva 
wrote:

> Did you run older Drill versions on Windows before? Bootstap plugins are
> loaded when there is no stored plugins configs on your environment.
> Check if you have drill/sys.storage_plugins folder on your system.
> Location depends on your settings, for Windows in could be in tmp folder on
> one of your disks. Stop Drill, delete drill folder with its content and
> start Drill.
>
> Kind regards,
> Arina
>
> > On Oct 8, 2018, at 10:09 AM, Divya Gehlot 
> wrote:
> >
> > Even I was in that assumption but when I installed Drill-1.14 on embedded
> > mode on Windows I don't see the Kafka Plugin
> > You can view the screenshot here  
> > Even I have Drill -1.10 in cluster which also doesn't show up the Kafka
> > Plugin .
> > Am I missing something ? As Kafka Plugin is one of the Default plugin
> which
> > shows up after Drill installation ?
> >
> > Thanks,
> > Divya
> >
> >
> >
> > On Thu, 4 Oct 2018 at 15:07, Khurram Faraaz  wrote:
> >
> >> Hi,
> >>
> >> You can find details here -
> >> https://drill.apache.org/docs/kafka-storage-plugin/
> >>
> >> When you install Drill, a preconfigured Kafka storage plugin is
> available
> >> on the Storage page in the Drill Web Console. Once you enable and
> configure
> >> the storage plugin, you can query Kafka from Drill.
> >>
> >> Thanks,
> >> Khurram
> >>
> >> On Wed, Oct 3, 2018 at 7:25 PM Divya Gehlot 
> >> wrote:
> >>
> >>> Hi,
> >>> I installed Drill in embedded mode and I don't see Kafka plugin under
> >>> Storage plugin in Web UI  . Do I need to create or its available by
> >>> default?
> >>>
> >>> Thanks,
> >>> Divya
> >>>
> >>
>
>

Re: Syntax Issue Between Drill Explorer and Spotfire

2018-10-10 Thread Abhishek Girish

Hey Kevin,

It looks like the storageplugin.workspace.table_path are specified with
single quotes (' ') instead of back-ticks (` `). Is that a email formatting
issue, or the actual syntax with your query? If latter, can you try
correcting it? Also a note in general: you do not need those back-ticks
unless you are using a reserved word or a period (.).

Regards,
Abhishek

On Mon, Oct 8, 2018 at 10:41 AM Kevin Porsolt 
wrote:

> Hello, I am looking to configure Apache Drill as an application between
> our data sources and Spotfire as a way to preserve robust files, while also
> limiting the stress put on our downstream BI applications. I got though 90%
> of the configuration setup, but identified a syntax issue between the view
> created in Drill Explorer and the view read in from ODBC and attempted to
> be loaded into Spotfire. I tried copying and pasting the created view
> directly from Drill Explorer into the Spotfire data connection setup, so
> the view and the table are correctly connected, I just keep on getting a
> "table does not exist" error due to the syntax issue. I added some
> additional details below. We are currently using Hadoop and Spark in our
> technical architecture, so being able to fix this issue and implement
> Apache Drill would provide a lot of opportunity. Thank you
>
> Apache Drill:
> SELECT * FROM 'dfs'.'tmp'.'my_sql_pel_query' LIMIT 100
>
> Spotfire:
> SELECT
> 'dfs.tmp'.'my_sql_pel_query'.*
> FROM
> 'dfs.tmp'.'my_sql_pel_query'
>
> **notice the missing ' ' in 'dfs.tmp'
> This is the field that it seems to be reading from the ODBC
>
> Spotfire.Dxp56f0-4d14   EXIT  SQLGetData  with return code
> 0 (SQL_SUCCESS)
>
> HSTMT   0x1BAD4260
>
> UWORD2
>
> SWORD   -8
> 
>
> PTR 0x22B20210 [
> 14] "dfs.tmp"
>
> SQLLEN  4094
>
> SQLLEN *0x2AB5DC90 (14)
>
>
>
>
>
>
>
>
> Kevin D Porsolt
> Technical Product Manager
> t: 646-839-9385 | m: 732-213-9621
>
> CROSSIX | Data Driven. Data Proven. | crossix.com
>
>
>
> ---
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> please telephone or email the sender and delete this message and any
> attachment from your system. If you are not the intended recipient you must
> not copy this message or attachment or disclose the contents to any other
> person.
>
> For further information about Crossix please see our website at
> http://www.crossix.com.
>

Re: Failed to fetch parquet metadata after 15000ms

2018-10-10 Thread Abhishek Girish

Hey Karthik,

This is a bug and there are a few JIRAs to track this and one of those is
DRILL-5788 . It's likely
because of a hard-coded default for the timeout which is sometimes not
sufficient. Can you please update the JIRA with your findings, which can
help resolving the issue?

Regards,
Abhishek

On Mon, Oct 8, 2018 at 10:40 AM karthik.R  wrote:

> Hi,
> I am frequently getting below exception when running a query in drill 1.14.
> Could you please help what option to set to increase this timeout?
>
> Waited for 15000ms , but tasks for 'Fetch parquet metadata' are not
> complete. Total runnable size 4 , parallelism 4
>
>
> My parquet files present in s3 location with 5 parquet file partitions.
>
> Please help
>

Re: Failed to create schema tree when running Drill View Query

2018-10-10 Thread Abhishek Girish

Hey Divha,

That's usually seen when the underlying FS cannot recognize the session
user when creating the schema tree. For example if impersonation is enabled
and no user is passed, Drill tries to use an "anonymous" user. I've seen
that in case of MapR-FS, that it cannot use such a user to proceed with
accessing the filesystem.

Can you try if (1) username is passed when launching your client such as
Sqlline, (2), the user (with same UID) exists on all nodes on the cluster.

On Wed, Oct 10, 2018 at 8:07 PM Divya Gehlot 
wrote:

> Hi ,
> At times I get below error when running the View query which fetches the
> whole data set
>
> org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR:
> Failed to create schema tree.
>
>
> The Strange behaviour is if I wait for sometime for say 2-3 minutes and run
> the query again it works perfectly .
>
>
> Appreciate  if anybody can help me to get me the root cause for it !
>
>
> Thanks,
>
> Divya
>

Re: [ANNOUNCE] New Committer: Chunhui Shi

2018-09-28 Thread Abhishek Girish

Congrats Chunhui!
On Fri, Sep 28, 2018 at 7:39 AM Vova Vysotskyi  wrote:

> Congratulations! Well deserved!
>
> Kind regards,
> Volodymyr Vysotskyi
>
>
> On Fri, Sep 28, 2018 at 12:17 PM Arina Ielchiieva 
> wrote:
>
> > The Project Management Committee (PMC) for Apache Drill has invited
> Chunhui
> > Shi to become a committer, and we are pleased to announce that he has
> > accepted.
> >
> > Chunhui Shi has become a contributor since 2016, making changes in
> various
> > Drill areas. He has shown profound knowledge in Drill planning side
> during
> > his work to support lateral join. He is also one of the contributors of
> the
> > upcoming feature to support index based planning and execution.
> >
> > Welcome Chunhui, and thank you for your contributions!
> >
> > - Arina
> > (on behalf of Drill PMC)
> >
>

Re: Apache Drill meetup session ideas

2018-09-23 Thread Abhishek Girish

Hey Divya,

I'm curious to know where the meetup session is and any related details you
could share on agenda, target audience and more. We are planning to have
one too, so would be good to hear more about the one you are helping to
organize.

Regards,
Abhishek

On Sun, Sep 23, 2018 at 7:53 PM Divya Gehlot 
wrote:

> Hi ,
> I  will be delivering an Apache Drill Meet up session including hands on
> workshop.
> Would like to get help ideas and advise from the community members based on
> their Enterprise Experience
> Like what all the core components of the Drill I should highlight ?
> The Drill strong points which I could think of :
> 1. Schema less .
> 2. Doesn't requires data modelling if a new column gets added to the data
> sets , Drill picks it automatically .
> 3. Complex JSON parsing
> 4. Drill has embedded installation(works for windows machine too ) mode
> helps users work on their desktop for small data sets
> 5. Multiple file format support
>
> Anybody has good workshop/git hub project which highlights the core strong
> areas of Drill , Please share !!
> Appreciate the community user's help which can attract and increase more
> Users of Apache Drill !!
>
> Thanks,
> Divya
>

Re: [ANNOUNCE] New Committer: Weijie Tong

2018-08-31 Thread Abhishek Girish

Congrats and thanks, Weijie!
On Fri, Aug 31, 2018 at 8:51 AM Arina Ielchiieva  wrote:

> The Project Management Committee (PMC) for Apache Drill has invited Weijie
> Tong to become a committer, and we are pleased to announce that he has
> accepted.
>
> Weijie Tong has become a very active contributor to Drill in recent months.
> He contributed the Join predicate push down feature which will be available
> in Apache Drill 1.15. The feature is non trivial and has covered changes
> to all aspects of Drill: RPC layer, Planning, and Execution.
>
> Welcome Weijie, and thank you for your contributions!
>
> - Arina
> (on behalf of Drill PMC)
>

Re: distributed drill on local file system

2018-08-16 Thread Abhishek Girish

I'd also like to add that with the DFS storage plugin configured with local
file system, one will have to make sure all nodes with Drillbits have the
same files (under the same directory structure). If a given query on a
dataset spawns a distributed plan (multiple fragments), then it may fail if
one of the fragments on a remote Drillbit cannot find the file being
referenced by the foreman. Also, this is not something that's been well
tried out and documented, so there may be some surprises.

-Abhishek

On Thu, Aug 16, 2018 at 9:44 AM Vitalii Diravka 
wrote:

> Hi Mehran,
>
> This is a question for user mailing list.
>
> Looks like there are no issues with it, you can run Drill in distributed
> mode on Windows, Linux or MacOS based machines.
> It necessary to specify *zk.connect* for Zookeeper hostname and port number
> in *drill-override.conf* file and to run *>bin/drillbit.sh start *[1].
> But a Hadoop cluster is recommended for this purpose [2], therefore not
> sure which issues can arise with this system.
>
> [1]
>
> https://drill.apache.org/docs/starting-drill-in-distributed-mode/#drillbit.sh-command-syntax
> [2] https://drill.apache.org/docs/distributed-mode-prerequisites/
>
> Kind regards
> Vitalii
>
>
> On Thu, Aug 16, 2018 at 7:11 PM Mehran Dashti [ BR - PD ] <
> m_das...@behinrahkar.com> wrote:
>
> > Hi,
> >
> > I wanted to know if it is possible or possible by minimal effort to have
> > distributed drills that work on local file system of their own?
> >
> > We  do not want to have HDFS as file system?
> >
> >
> >
> > Thank you in advance.
> >
> >
> >
> >
> >
> > *Best Regards,*
> >
> >
> >
> > *  [image: LOGO1]*
> >
> > *Mehran Dashti*
> >
> > *Product Leader*
> >
> > *09125902452*
> >
> >
> >
>

Re: Requesting ETA on drill/apache-drill-centos:1.14.0 docker image

2018-08-13 Thread Abhishek Girish

Hello Vedant,

Thanks for trying out Drill on Docker! For the official Docker image for
Apache Drill 1.14.0, please try out instructions from [1] (note that it
refers to a different image: drill/apache-drill:1.14.0). However, it only
supports embedded mode as of this release.

Since you were using the drill/apache-drill-centos image, i'd like to point
out that this refers to work in an independent GitHub repo [2] - a variant
of the Dockerfile made it into the Apache Drill project. If you'd like to
continue to use that until we incorporate all features into the official
image, please do so - but let me know directly when you face issues (as
it's outside of the Drill project). I've updated the drill/apache-drill-centos
image with the latest tag: 1.14.0 (supports both embedded and distributed
modes).

Regards,
Abhishek

[1] http://drill.apache.org/docs/running-drill-on-docker/
[2] https://github.com/Agirish/drill-containers

On Mon, Aug 13, 2018 at 7:40 AM Vedant Naik  wrote:

> Hi,
>
> We are using Apache Drill (love it so far!!)
> We use the drill/apache-drill-centos docker image to deploy on kubernetes.
>
> We tried out the latest drill version 1.14.0 locally, and are keen to try
> it out on the k8s cluster. However, the
> drill/apache-drill-centos:1.14.0-SNAPSHOT image ends in "Crash Loop
> Back-off" when deployed.
>
> Curious to know the ETA on the 1.14.0 stable version of the image. Would be
> happy to be help in any way if we can. Please let us know.
>
> Thank you,
> Vedant Naik
> --
> *Kind Regards,*
> *Vedant Naik.*
>

Re: Drill Configuration Requirements To Query Data in Tera Bytes

2018-07-30 Thread Abhishek Girish

Hey Tilak,

We don't have any official sizing guidelines - for planning a Drill
cluster. A lot of it depends on the type of queries being executed (simple
look-ups vs complex joins), data format (columnar data such as Parquet
shows best performance), and system load (running a single query on nodes
dedicated for Drill).

It also depends on the type of machines you have - for example with beefy
nodes with lots of RAM and CPU, you'll need fewer number of nodes running
Drill.

I would recommend getting started with a 4-10 node cluster with a good
amount of memory you can spare. And based on the results try and figure out
your own sizing guideline (either to add more nodes or increase memory
[1]).

If you share more details, it could be possible to suggest more.

[1] http://drill.apache.org/docs/configuring-drill-memory/

On Mon, Jul 30, 2018 at 1:57 AM Surneni Tilak 
wrote:

> Hi Team,
>
> May I know the ideal configuration requirements to query data of size 10
> TB with query time under 5 minutes. Please suggest me regarding the number
> of Drilbits that I have to use and the RAM(Direct-Memory  & Heap_Memory)
> that each drill bit should consists of to complete the queries within the
> desired time.
>
> Best regards,
> _
> Tilak
>
>
>

Re: Apache Drill on Kubernetes

2018-07-26 Thread Abhishek Girish

@Arjun, sure. I've not faced the need for any changes for drillbit
discovery. Internally at MapR, we are using Helm charts for deployment -
for both on-prem and on GKE. For on-prem, we use Weave Net for networking
(plus a few other configurations shared on the K8S docs). We've also made
sure to automate ZK discovery - which helps avoiding manual config. The
GitHub repo I shared only has some basic support for Apache Drill on K8S
(it should still work out of the box - with limited manual config). I'll
work on updating the repo soon.

And yes, as we make progress with the official support into Drill, your
help will be appreciated. Will be tracked by DRILL-6598
<https://issues.apache.org/jira/browse/DRILL-6598>

On Thu, Jul 26, 2018 at 6:26 PM Arjun Rao  wrote:

> Thanks for the responses. These are all very promising.
>
> @Saurabh - We are heavy users of Kubernetes and are considering adding
> Apache Drill to our ecosystem.
> @Abhishek - We actually got it to run in our cluster but had to modify the
> drillbit discovery code a little bit on a local fork of ours to enable a
> multinode drill installation. We have our own docker container/helm charts
> for this multinode deployment. Before we went further down that road, we
> wanted to see what the community was doing. I will take a look at the repo
> you linked and let you know if I have any questions. If there is anything I
> can contribute to help with the K8s work, please let me know what I can do.
>
> Best,
> Arjun
>
> On Thu, Jul 26, 2018 at 1:51 PM John Omernik  wrote:
>
> > Dril, in it's current form, has had all the features required to be run
> on
> > cluster managers with multi-tenacy.  There are many caveats but mostly
> > realted to how things are configured, not limitations of Drill itself.
> The
> > work being done for Drill in containers and Drill on K8s not only
> provides
> > a roadmap of how to do it, but reviewing the yaml and docker files is a
> > great way to see how devs are approaching a proper multitenant capable
> > Drill cluster. This is exciting work!
> >
> > On Thu, Jul 26, 2018, 11:40 AM Abhishek Girish 
> wrote:
> >
> > > Hey everyone,
> > >
> > > Like John and Saurabh mentioned, yes this is possible. We've been using
> > > Drill on Kubernetes for a while now. I have a draft of my work
> > > (Dockerfiles + YAML definitions) available in [1]. Drill should come up
> > > successfully in distributed mode (multiple Drillbits) under K8S. Please
> > > give it a try when you can and reach out to me if you need any
> > > clarifications.
> > >
> > > For 1.14.0, we added official support for Docker. For the next release,
> > i'm
> > > working on incorporating the K8S support into the Apache Drill repo.
> > >
> > > [1] https://github.com/Agirish/drill-containers
> > >
> > > On Thu, Jul 26, 2018 at 8:59 AM Saurabh Mahapatra <
> > > saurabhmahapatr...@gmail.com> wrote:
> > >
> > > > Hey Arjun,
> > > >
> > > > Is the need for kubernetes a top down requirement in your
> architecture?
> > > >
> > > > John is right when it comes to running Drill inside a container. But
> > > there
> > > > was some talk of addressing the other problem which is whether K8 can
> > be
> > > a
> > > > resource manager for multiple Drill clusters...an alternative to
> > > splitting
> > > > up interactive and batch loads. I know this has been done within a
> YARN
> > > > setting.
> > > >
> > > > Paul and Abhishek are more knowledgeable about this. Thoughts?
> > > >
> > > > Maybe a topic for this year’s conference?
> > > >
> > > > Thanks,
> > > > Saurabh
> > > >
> > > > Sent from my iPhone
> > > >
> > > >
> > > >
> > > > > On Jul 26, 2018, at 8:48 AM, John Omernik 
> wrote:
> > > > >
> > > > > Absolutely it can! I don't have a docker file for it handy but I
> know
> > > it
> > > > > works well!
> > > > >
> > > > >> On Thu, Jul 26, 2018, 10:45 AM Arjun Rao 
> > > > wrote:
> > > > >>
> > > > >> Hi,
> > > > >> I am new to this forum and am excited to use Drill. This might
> have
> > > been
> > > > >> discussed in the past but I wanted to know if Apache Drill can be
> > run
> > > on
> > > > >> Kubernetes and if not, if it's on the roadmap for Drill?
> > > > >>
> > > > >> Appreciate the help!
> > > > >>
> > > > >> Best,
> > > > >> Arjun
> > > > >>
> > > >
> > >
> >
>

Re: Best Practice to check Drillbit status(Cluster mode)

2018-07-16 Thread Abhishek Girish

I think logs may be the only way to figure it out, at the present. You
could have a watch on your logs to be informed of such events. For
notifications, I would say file an enhancement JIRA - if it gathers enough
attention, perhaps someone would volunteer to work or comment on it.

On Mon, Jul 16, 2018 at 2:08 AM Divya Gehlot 
wrote:

> Hi ,
> Thanks Abhishek !
> I would like to have a notification of that orphan drillbit process when it
> gets disconnected from other running drillbits for some reason , definitely
> not because of the unclean shut down as those drill bits are running for
> months .
> I know I can check the logs and kill that orphaned , which what I did in my
> case, but I  would like to have notification for down drillbit.
>
>
> Thanks,
> Divya
>
> On Fri, 13 Jul 2018 at 04:15, Abhishek Girish  wrote:
>
> > Hey Divya,
> >
> > It would depend on the situation, afaik. The sys.drillbits table
> contains a
> > list of all running drillibits. If one of the Drillbit has issues and
> > cannot stay connected to the cluster, I would assume it would be
> > unregistered and may not show up in the output of sys.drillbits. If it's
> an
> > intermittent issue and Drillbit process maintains it's heartbeat
> > connection, it may show up in the output.
> >
> > If you take a look at the logs, you might be able to figure out what is
> > causing the issue. There may be orphan Drillbit processes which may be
> have
> > left behind due to a previous unclean shutdown. Can you clean up all
> > Drillbit processes (using 'ps -ef | grep -i drillbit' and then a kill -9)
> > on nodes where you suspect issues and restart Drillbits?
> >
> > -Abhishek
> >
> > On Tue, Jul 10, 2018 at 7:16 PM Divya Gehlot 
> > wrote:
> >
> > > Hi ,
> > > select * from sys.drillbits;
> > > What does above query shows if drillbits process hangs ?
> > >
> > >
> > > Thanks
> > >
> > > On Tue, 10 Jul 2018 at 15:36, Khurram Faraaz  wrote:
> > >
> > > > You can run the below query, and look for the *state *column in the
> > > result
> > > > of the query. Online drillbits will be marked as ONLINE.
> > > >
> > > > select * from sys.drillbits;
> > > >
> > > > - Khurram
> > > >
> > > > On Tue, Jul 10, 2018 at 12:24 AM, Divya Gehlot <
> > divya.htco...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > I would like to know the best practice to check the Drillbits
> status
> > in
> > > > > cluster mode.
> > > > > I have encountered the scenario when check Drillbits process
> running
> > > fine
> > > > > and When check in Drll WebUI , some of the Drillbits are down.
> > > > > When do RCA(root cause analysis) , got to know due to some reason
> > > > drillbits
> > > > > process hanged .
> > > > > For now the alert system which I have implemented now is checking
> the
> > > > >
> > > > >
> > > > > > drill/bin/drillbit.sh status
> > > > >
> > > > >
> > > > > Is there any other best way to catch the hung Drillbit process?
> > > > > Appreciate the advise from Drill community users.
> > > > >
> > > > > Thanks,
> > > > > Divya
> > > > >
> > > >
> > >
> >
>

Re: help drill down in production

2018-07-12 Thread Abhishek Girish

Hey Jose,

Can you share more details oh your setup? For Drill usage in production,
stand-alone / distributed modes are recommended. Embedded mode is only a
good way to get started with Drill. Drillbits are started with you launch
Drill in embedded mode and stops when you exit from Sqlline.

-Abhishek

On Thu, Jul 12, 2018 at 12:20 PM jose luis 
wrote:

> I installed drill in production, but when I connect via ssh and I raise
> the bin / drill-embedded process and when I exit ssh, the drill process
> shuts down, how do I solve this? TxU
>
> mail: jo-c-l...@universitarios.com
>
> Obtener Outlook para Android
>
>

Re: CTAS AccessControlException

2018-07-02 Thread Abhishek Girish

Hey Divya,

Here is one way to check if all nodes have the same UID/GID:

clush -a 'cat /etc/passwd | grep -i user1'
Node1: user1:x:5000:5000::/home/user1:/bin/bash
Node2: user1:x:5000:5000::/home/user1:/bin/bash
Node3: user1:x:6000:6000::/home/user1:/bin/bash


You can update the UID and GID using usermod and groupmod commands. Make
sure to restart your DFS and Drill services after that

For example, on Node3,

usermod -u 5000 user1
groupmod -g 5000 user1



Regards,
Abhishek

On Mon, Jul 2, 2018 at 2:58 AM Divya Gehlot  wrote:

> Hi Abhishek,
> Thanks for the prompt response !
> Yes I have Big Data Cluster and Apache Drill is part of it and security is
> plain authentication and connected through AD .
> And Recently I have added 3 more nodes to the cluster.
> How do I ensure that all the nodes have same UID + GID , which you
> mentioned in the email?
>
> Thanks,
> Divya
>
>
>
> On Mon, 2 Jul 2018 at 11:37, Abhishek Girish  wrote:
>
> > Hey Divya,
> >
> > I have a suspicion: There is chance you have a distributed Drill
> > environment and not all of the nodes have the same user (with same UID +
> > GID). And your dataset isn't large like you mentioned, so not all
> Drillbits
> > are always involved in the query execution. So you might intermittently
> see
> > such failures if one of the Drillbit working on this query doesn't have
> the
> > right user and hence the required access to the path on DFS. Can you
> please
> > check and let us know?
> >
> > Regards,
> > Abhishek
> >
> > On Sun, Jul 1, 2018 at 7:16 PM Divya Gehlot 
> > wrote:
> >
> > > Hi,
> > > When I checked the error in Profile section of the ran query :
> > >
> > > Apache Drill
> > >
> > > "error": "SYSTEM ERROR: Drill Remote Exception\n\n",
> > > "verboseError": "SYSTEM ERROR: Drill Remote Exception\n\n\n\n",
> > >
> > > When I turned on Verbose true I could see the below error when I run
> the
> > > query :
> > > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> > > AccessControlException: User (user id 829131620) does not have
> > > access to /path/to/directory/peoplecount/2018_06_29/17/0_0_0.parquet
> > > Fragment 0:0 [Error Id: 148a32c7-3af4-4929-982f-3c06ef505eed on
> > > :31010] (org.apache.hadoop.security.AccessControlException)
> > > User (user id 829131620) does not have access to
> > > /path/to/directory/peoplecount/2018_06_29/17/0_0_0.parquet
> > > com.mapr.fs.MapRClientImpl.create():233
> > > com.mapr.fs.MapRFileSystem.create():806
> > > com.mapr.fs.MapRFileSystem.create():899
> > > org.apache.hadoop.fs.FileSystem.createNewFile():1192
> > > org.apache.drill.exec.store.StorageStrategy.createFileAndApply():122
> > > org.apache.drill.exec.store.parquet.ParquetRecordWriter.endRecord():374
> > > org.apache.drill.exec.store.EventBasedRecordWriter.write():68
> > > org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():106
> > > org.apache.drill.exec.record.AbstractRecordBatch.next():162
> > > org.apache.drill.exec.record.AbstractRecordBatch.next():119
> > > org.apache.drill.exec.record.AbstractRecordBatch.next():109
> > > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> > >
> > >
> >
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():134
> > > org.apache.drill.exec.record.AbstractRecordBatch.next():162
> > > org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> > >
> >
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> > > org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> > > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> > > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> > > java.security.AccessController.doPrivileged():-2
> > > javax.security.auth.Subject.doAs():422
> > > org.apache.hadoop.security.UserGroupInformation.doAs():1633
> > > org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> > > org.apache.drill.common.SelfCleaningRunnable.run():38
> > > java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> > > java.lang.Thread.run():748
> > >
> > > Thanks,
> > > Divya
> > >
> > > On Fri, 29 Jun 2018 at 18:41, Divya Gehlot 
> > > wrote:
> > >
> > > > Hi,
> > > > At times I am getting error whlile CTAS and it doesn't happen all the
> > > time
> > > > like next run for 18 hours it will create the table .
> > > > Here are the details :
> > > > ls -ltr /path/to/directory/parquetfiles/2018_06_29
> > > > total 9
> > > > drwxrwxr-x 2   1 Jun 28 12:05 00
> > > > drwxrwxr-x 2   1 Jun 28 13:05 01
> > > >
> > > > Error Logs :
> > > > SYSTEM ERROR: AccessControlException: User (user id
> 829131620)
> > > > does not have access to
> > > > /path/to/directory/parquetfiles/2018_06_29/17/0_0_0.parquet
> > > >
> > > > Appreciate the help !
> > > >
> > > > Thanks,
> > > > Divya
> > > >
> > >
> >
>

Re: CTAS AccessControlException

2018-07-01 Thread Abhishek Girish

Hey Divya,

I have a suspicion: There is chance you have a distributed Drill
environment and not all of the nodes have the same user (with same UID +
GID). And your dataset isn't large like you mentioned, so not all Drillbits
are always involved in the query execution. So you might intermittently see
such failures if one of the Drillbit working on this query doesn't have the
right user and hence the required access to the path on DFS. Can you please
check and let us know?

Regards,
Abhishek

On Sun, Jul 1, 2018 at 7:16 PM Divya Gehlot  wrote:

> Hi,
> When I checked the error in Profile section of the ran query :
>
> Apache Drill
>
> "error": "SYSTEM ERROR: Drill Remote Exception\n\n",
> "verboseError": "SYSTEM ERROR: Drill Remote Exception\n\n\n\n",
>
> When I turned on Verbose true I could see the below error when I run the
> query :
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> AccessControlException: User (user id 829131620) does not have
> access to /path/to/directory/peoplecount/2018_06_29/17/0_0_0.parquet
> Fragment 0:0 [Error Id: 148a32c7-3af4-4929-982f-3c06ef505eed on
> :31010] (org.apache.hadoop.security.AccessControlException)
> User (user id 829131620) does not have access to
> /path/to/directory/peoplecount/2018_06_29/17/0_0_0.parquet
> com.mapr.fs.MapRClientImpl.create():233
> com.mapr.fs.MapRFileSystem.create():806
> com.mapr.fs.MapRFileSystem.create():899
> org.apache.hadoop.fs.FileSystem.createNewFile():1192
> org.apache.drill.exec.store.StorageStrategy.createFileAndApply():122
> org.apache.drill.exec.store.parquet.ParquetRecordWriter.endRecord():374
> org.apache.drill.exec.store.EventBasedRecordWriter.write():68
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():106
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():134
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1633
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
>
> Thanks,
> Divya
>
> On Fri, 29 Jun 2018 at 18:41, Divya Gehlot 
> wrote:
>
> > Hi,
> > At times I am getting error whlile CTAS and it doesn't happen all the
> time
> > like next run for 18 hours it will create the table .
> > Here are the details :
> > ls -ltr /path/to/directory/parquetfiles/2018_06_29
> > total 9
> > drwxrwxr-x 2   1 Jun 28 12:05 00
> > drwxrwxr-x 2   1 Jun 28 13:05 01
> >
> > Error Logs :
> > SYSTEM ERROR: AccessControlException: User (user id 829131620)
> > does not have access to
> > /path/to/directory/parquetfiles/2018_06_29/17/0_0_0.parquet
> >
> > Appreciate the help !
> >
> > Thanks,
> > Divya
> >
>

Re: Drill with Docker?

2018-06-24 Thread Abhishek Girish

Hey Paul,

I have Docker images for Drill published here:
https://hub.docker.com/u/drill/. Instructions here:
https://issues.apache.org/jira/browse/DRILL-6346?focusedCommentId=16448703=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16448703

Building with CentOS base was the straightforward. Ubuntu works well too.
I've been trying to get it working under Alpine OS, but there have been
some issues which I haven't resolved yet.

The relevant Docker Files and YAML definitions are available here:
https://github.com/Agirish/drill-containers. Drill should come up
successfully, either in Embedded or Distributed modes - under a vanilla
Docker env or with K8S. I have some pending enhancements yet to be
checked-in. Let me know if you have any questions.

-Abhishek

On Sun, Jun 24, 2018 at 3:55 PM Paul Rogers 
wrote:

> Hi All,
>
> Has anyone published a Dockerfile to show how to run Drill under Docker?
> I believe that the Drill QA group at MapR has run Drill this way. Are those
> Dockerfiles published anywhere?
>
> Any advice for which OS base image to use? Any tricks of the trade?
>
> As a bonus, has anyone then used these images (or modified versions) under
> Kubernetes?
>
> Thanks,
> - Paul
>
>

Re: [DISCUSS] case insensitive storage plugin and workspaces names

2018-06-13 Thread Abhishek Girish

The issue is that for those customers who do have such storage plugin
names, it's too late to rename after an offline upgrade - as there is no
easy way to access the storage plugin configurations if Drillbits are down
(due to Drillbit start-up failing). Might be okay, if admins perform a
rolling upgrade (newer Drillbits would fail, but older Drillbits can be
used to update storage plugin config), but that's not fully supported.
Ideally, we'll need to find a way to not fail startup, instead disable the
plugins which have issues, but if that's a complex and separate task, for
now we should perhaps clearly document that this would be a breaking change
after upgrade, so users should fix the plugins before they proceed.

On Wed, Jun 13, 2018 at 3:42 AM Arina Yelchiyeva 
wrote:

> From the Drill code workspaces are already case insensitive (though the
> documentation states the opposite). Since there were no complaints from the
> users so far, I believe there are not many (if any) who uses the same names
> in different case.
> Regarding those users that already have duplicating storage plugins names,
> after the change Drill start up will fail with appropriate error message
> and they would have to rename those storage plugins.
>
> Kind regards,
> Arina
>
>
> On Tue, Jun 12, 2018 at 8:45 PM Abhishek Girish 
> wrote:
>
> > Paul, I think this proposal was specific to storage plugin and workspace
> > *names*. And not for the whole of Drill.
> >
> > I agree it makes sense to have these names case insensitive, to improve
> > user experience. The only impact to current users I can think of is if
> > someone created two storage plugins dfs and DFS. Or configured workspaces
> > tmp and TMP. In this case, they'd need to rename those. One thing I'm not
> > clear on is how we'll handle upgrades in these cases.
> >
> > On Tue, Jun 12, 2018 at 10:31 AM Paul Rogers 
> > wrote:
> >
> > > Hi All,
> > >
> > > As it turns out, this topic has been discussed, in depth, previously.
> > > Can't recall if it was on this list, or in a JIRA.
> > >
> > > We face a number of constraints:
> > >
> > > * As was noted, for some data sources, the data source itself has case
> > > insensitive names. (Windows file systems, RDBMSs, etc.)
> > > * In other cases, the data source itself has case sensitive names.
> (HDFS
> > > file system, Linux file systems, JSON, etc.)
> > > * SQL is defined to be case insensitive.
> > > * We now have several years of user queries, in production, based on
> the
> > > current semantics.
> > >
> > > Given all this, it is very likely that simply shifting to
> case-sensitive
> > > will break existing applications.
> > >
> > > Perhaps a more subtle solution is to make the case-sensitivity a
> property
> > > of the symbol that is carried through the query pipeline as another
> piece
> > > of metadata.
> > >
> > > Thus, a workspace that corresponds to a DB schema would be labeled as
> > case
> > > insensitive. A workspace that corresponds to an HDFS directory would be
> > > case sensitive. Names defined within Drill (as part of an AS clause),
> > would
> > > follow SQL rules and be case insensitive.
> > >
> > > I believe that, if we sit down and work out exactly what users would
> > > expect, and what is required to handle both case sensitive and case
> > > insensitive names, we'll end up with a solution not far from the above
> --
> > > out of simple necessity.
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > > On Tuesday, June 12, 2018, 8:36:01 AM PDT, Arina Yelchiyeva <
> > > arina.yelchiy...@gmail.com> wrote:
> > >
> > >  To make it clear we have three notions here: storage plugin name,
> > > workspace
> > > (schema) and table name (dfs.root.`/tmp/t`).
> > > My suggestion is the following:
> > > Storage plugin names to be case insensitive (DFS vs dfs,
> > INFORMATION_SCHEMA
> > > vs information_schema).
> > > Workspace  (schemas) names to be case insensitive (ROOT vs root, TMP vs
> > > tmp). Even if user has two directories /TMP and /tmp, he can create two
> > > workspaces but not both with tmp name. For example, tmp vs tmp_u.
> > > Table names case sensitivity are treated per plugin. For example,
> system
> > > plugins (information_schema, sys) table names (views, tables) should be
> > > case insensitive. Actually, currently for sys plugin table names are
> case
> > > insensit

Re: [DISCUSS] case insensitive storage plugin and workspaces names

2018-06-12 Thread Abhishek Girish

Paul, I think this proposal was specific to storage plugin and workspace
*names*. And not for the whole of Drill.

I agree it makes sense to have these names case insensitive, to improve
user experience. The only impact to current users I can think of is if
someone created two storage plugins dfs and DFS. Or configured workspaces
tmp and TMP. In this case, they'd need to rename those. One thing I'm not
clear on is how we'll handle upgrades in these cases.

On Tue, Jun 12, 2018 at 10:31 AM Paul Rogers 
wrote:

> Hi All,
>
> As it turns out, this topic has been discussed, in depth, previously.
> Can't recall if it was on this list, or in a JIRA.
>
> We face a number of constraints:
>
> * As was noted, for some data sources, the data source itself has case
> insensitive names. (Windows file systems, RDBMSs, etc.)
> * In other cases, the data source itself has case sensitive names. (HDFS
> file system, Linux file systems, JSON, etc.)
> * SQL is defined to be case insensitive.
> * We now have several years of user queries, in production, based on the
> current semantics.
>
> Given all this, it is very likely that simply shifting to case-sensitive
> will break existing applications.
>
> Perhaps a more subtle solution is to make the case-sensitivity a property
> of the symbol that is carried through the query pipeline as another piece
> of metadata.
>
> Thus, a workspace that corresponds to a DB schema would be labeled as case
> insensitive. A workspace that corresponds to an HDFS directory would be
> case sensitive. Names defined within Drill (as part of an AS clause), would
> follow SQL rules and be case insensitive.
>
> I believe that, if we sit down and work out exactly what users would
> expect, and what is required to handle both case sensitive and case
> insensitive names, we'll end up with a solution not far from the above --
> out of simple necessity.
>
> Thanks,
> - Paul
>
>
>
> On Tuesday, June 12, 2018, 8:36:01 AM PDT, Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
>
>  To make it clear we have three notions here: storage plugin name,
> workspace
> (schema) and table name (dfs.root.`/tmp/t`).
> My suggestion is the following:
> Storage plugin names to be case insensitive (DFS vs dfs, INFORMATION_SCHEMA
> vs information_schema).
> Workspace  (schemas) names to be case insensitive (ROOT vs root, TMP vs
> tmp). Even if user has two directories /TMP and /tmp, he can create two
> workspaces but not both with tmp name. For example, tmp vs tmp_u.
> Table names case sensitivity are treated per plugin. For example, system
> plugins (information_schema, sys) table names (views, tables) should be
> case insensitive. Actually, currently for sys plugin table names are case
> insensitive, information_schema table names are case sensitive. That needs
> to be synchronized. For file system plugins table names must be case
> sensitive, since under table name we imply directory / file name and their
> case sensitivity depends on file system.
>
> Kind regards,
> Arina
>
> On Tue, Jun 12, 2018 at 6:13 PM Aman Sinha  wrote:
>
> > Drill is dependent on the underlying file system's case sensitivity.  On
> > HDFS one can create  'hadoop fs -mkdir /tmp/TPCH'  and /tmp/tpch which
> are
> > separate directories.
> > These could be set as workspace in Drill's storage plugin configuration
> and
> > we would want the ability to query both.  If we change the current
> > behavior, we would want
> > some way, either using back-quotes `  or other way to support that.
> >
> > RDBMSs seem to have vendor-specific behavior...
> > In MySQL [1] the database name and schema name are case-sensitive on
> Linux
> > and case-insensitive on Windows.  Whereas in Postgres it converts the
> > database name and schema name to lower-case by default but one can put
> > double-quotes to make it case-sensitive [2].
> >
> > [1]
> > https://dev.mysql.com/doc/refman/8.0/en/identifier-case-sensitivity.html
> > [2]
> >
> http://www.postgresqlforbeginners.com/2010/11/gotcha-case-sensitivity.html
> >
> >
> >
> > On Tue, Jun 12, 2018 at 5:01 AM, Arina Yelchiyeva <
> > arina.yelchiy...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Currently Drill we treat storage plugin names and workspaces as
> > > case-sensitive [1].
> > > Names for storage plugins and workspaces are defined by the user. So we
> > > allow to create plugin -> DFS and dfs, workspace -> tmp and TMP.
> > > I have a suggestion to move to case insensitive approach and won't
> allow
> > > creating two plugins / workspaces with the same name in different case
> at
> > > least for the following reasons:
> > > 1. usually rdbms schema and table names are case insensitive and many
> > users
> > > are used to this approach;
> > > 2. in Drill we have INFORMATION_SCHEMA schema which is in upper case,
> sys
> > > in lower case.
> > > personally I find it's extremely inconvenient.
> > >
> > > Also we should consider making table names case insensitive for system
> > > schemas (info, sys).
> > >
>

Re: Which perform better JSON or convert JSON to parquet format ?

2018-06-10 Thread Abhishek Girish

I would suggest converting the JSON files to parquet for better
performance. JSON supports a more free form data model, so that's a
trade-off you need to consider, in my opinion.
On Sun, Jun 10, 2018 at 8:08 PM Divya Gehlot 
wrote:

> Hi,
> I am looking for the advise regarding the performance for below :
> 1. keep the JSON as is
> 2. Convert the JSON file to parquet files
>
> My JSON files data is not in fixed format and  file size varies from 10 KB
> to 1 MB.
>
> Appreciate the community users advise on above !
>
>
> Thanks,
> Divya
>

Re: Drill and orc file support

2018-03-20 Thread Abhishek Girish

Drill can read ORC format files via the Hive plugin. If you have a Hive
table with underlying data stored as ORC, try and configure the Hive
storage plugin in Drill [1]. And then you can attempt to query the table in
Hive from Drill [2].

[1] http://drill.apache.org/docs/hive-storage-plugin/
[2] http://drill.apache.org/docs/querying-hive/

On Tue, Mar 20, 2018 at 3:07 PM, Андрей Смирнов 
wrote:

> Hello!
>
> Please, help me
>
> Does drill can read from orc format files ( orc.apache.org ) ?
>

Re: [Drill 1.13.0] : org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'

2018-03-20 Thread Abhishek Girish

Okay, that confirms that the Hive storage plugin is not configured
correctly - you are unable to access any Hive table. What's your Hive
server version?

On Tue, Mar 20, 2018 at 3:39 PM, Anup Tiwari <anup.tiw...@games24x7.com>
wrote:

> Hi,
> Please find my reply :-
> Can you do a 'use hive;` followed by 'show tables;' and see if table
> 'cad' is listed? : Did and got empty set(No rows selected).
>
> If you try via hive shell, do you see it? : Yes
>
> can you check if this is impacting accessing all hive tables (may be
> create a new one and try) or if this is specific to a certain table /
> database in Hive? : Tried 2 tables but getting same error. I have not tried
> creating anew one, will try that and let you know.
>
>
>
>
> On Tue, Mar 20, 2018 3:19 PM, Abhishek Girish agir...@apache.org  wrote:
> Down in the stack trace it's complaining that the table name 'cad' was not
>
> found; Can you do a 'use hive;` followed by 'show tables;' and see if table
>
> 'cad' is listed?
>
>
>
>
> If you try via hive shell, do you see it?
>
>
>
>
> Also, can you check if this is impacting accessing all hive tables (may be
>
> create a new one and try) or if this is specific to a certain table /
>
> database in Hive?
>
>
>
>
> -Abhishek
>
>

Re: How to get data from mongo database into saiku using apache drill

2018-03-20 Thread Abhishek Girish

I see that you have previously asked this question and folks have responded
- but you haven't responded back. I'm not sure if you are getting emails
sent to the list. If you are and missed those, perhaps check Spam? Also,
I'd request you to not start duplicate threads for the same issue.

On Tue, Mar 20, 2018 at 3:27 PM, Abhishek Girish <agir...@apache.org> wrote:

> Is the issue specific to Mongo datasource (can you access regular files
> through the tool)? Do you see any errors in the drillbit.log when you
> attempt to access the mongo table?
>
> On Mon, Mar 19, 2018 at 5:27 PM, Sonu Kumawat <sonu.kuma...@infozech.com>
> wrote:
>
>> Hi,
>>
>>I am trying to get data from mongo database using apache drill in
>> Saiku tool. Right now I am able to get tables from mongo using apache
>> drill
>> but columns (fields)  are not coming in tables ( empty tables are coming
>> ).
>> Please help me out of this problem ASAP.
>>
>>
>>
>>
>>
>> Thank you
>>
>
>

Re: How to get data from mongo database into saiku using apache drill

2018-03-20 Thread Abhishek Girish

Is the issue specific to Mongo datasource (can you access regular files
through the tool)? Do you see any errors in the drillbit.log when you
attempt to access the mongo table?

On Mon, Mar 19, 2018 at 5:27 PM, Sonu Kumawat 
wrote:

> Hi,
>
>I am trying to get data from mongo database using apache drill in
> Saiku tool. Right now I am able to get tables from mongo using apache drill
> but columns (fields)  are not coming in tables ( empty tables are coming ).
> Please help me out of this problem ASAP.
>
>
>
>
>
> Thank you
>

Re: [Drill 1.13.0] : org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'

2018-03-20 Thread Abhishek Girish

Down in the stack trace it's complaining that the table name 'cad' was not
found; Can you do a 'use hive;` followed by 'show tables;' and see if table
'cad' is listed?

If you try via hive shell, do you see it?

Also, can you check if this is impacting accessing all hive tables (may be
create a new one and try) or if this is specific to a certain table /
database in Hive?

-Abhishek

On Tue, Mar 20, 2018 at 2:37 PM, Anup Tiwari 
wrote:

> Note :  Using Show databases, i can see hive schemas.
>
>
>
>
>
> On Tue, Mar 20, 2018 2:36 PM, Anup Tiwari anup.tiw...@games24x7.com
> wrote:
> Hi,
> I am not able to read my hive tables in drill 1.13.0 and with same plugin
> conf
> it was working in Drill 1.12.0 and 1.10.0. Please look into it asap and
> let me
> know if i have missed anything.
> Hive Plugin :-
> {  "type": "hive",  "enabled": true,  "configProps":
> {"hive.metastore.uris":
> "thrift://prod-hadoop-1xx.com:9083","hive.metastore.sasl.enabled":
> "false",
> "fs.default.name": "hdfs://prod-hadoop-1xx.com:9000"  }}
> Query :-
> select id from hive.cad where log_date = '2018-03-18' limit 3
> Error :-
> 2018-03-20 14:25:27,351 [254f337f-9ac3-b66f-ed17-1de459da3283:foreman]
> INFO
> o.a.drill.exec.work.foreman.Foreman - Query text for query id
> 254f337f-9ac3-b66f-ed17-1de459da3283: select id from hive.cad where
> log_date =
> '2018-03-18' limit 32018-03-20 14:25:27,354
> [254f337f-9ac3-b66f-ed17-1de459da3283:foreman] WARN
> o.a.d.e.s.h.DrillHiveMetaStoreClient - Failure while attempting to get
> hive
> table. Retries once.org.apache.thrift.TApplicationException: Invalid
> method
> name: 'get_table_req' at
> org.apache.thrift.TApplicationException.read(TApplicationExc
> eption.java:111)
> ~[drill-hive-exec-shaded-1.13.0.jar:1.13.0] at
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
> ~[drill-hive-exec-shaded-1.13.0.jar:1.13.0] at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$
> Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
> ~[drill-hive-exec-shaded-1.13.0.jar:1.13.0] at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$
> Client.get_table_req(ThriftHiveMetastore.java:1550)
> ~[drill-hive-exec-shaded-1.13.0.jar:1.13.0] at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTabl
> e(HiveMetaStoreClient.java:1344)
> ~[drill-hive-exec-shaded-1.13.0.jar:1.13.0] at
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient.ge
> tHiveReadEntryHelper(DrillHiveMetaStoreClient.java:285)
> ~[drill-storage-hive-core-1.13.0.jar:1.13.0] at
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$Ta
> bleLoader.load(DrillHiveMetaStoreClient.java:535)
> [drill-storage-hive-core-1.13.0.jar:1.13.0] at
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$Ta
> bleLoader.load(DrillHiveMetaStoreClient.java:531)
> [drill-storage-hive-core-1.13.0.jar:1.13.0] at
> com.google.common.cache.LocalCache$LoadingValueReference.loa
> dFuture(LocalCache.java:3527)
> [guava-18.0.jar:na] at
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
> [guava-18.0.jar:na] at
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(
> LocalCache.java:2282)
> [guava-18.0.jar:na] at
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
> [guava-18.0.jar:na] at
> com.google.common.cache.LocalCache.get(LocalCache.java:3937)
> [guava-18.0.jar:na]
>  at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
> [guava-18.0.jar:na] at
> com.google.common.cache.LocalCache$LocalLoadingCache.get(
> LocalCache.java:4824)
> [guava-18.0.jar:na] at
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$Hi
> veClientWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:495)
> [drill-storage-hive-core-1.13.0.jar:1.13.0] at
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$Hi
> veSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233)
> [drill-storage-hive-core-1.13.0.jar:1.13.0] at
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$Hi
> veSchema.getDrillTable(HiveSchemaFactory.java:213)
> [drill-storage-hive-core-1.13.0.jar:1.13.0] at
> org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.
> getTable(HiveDatabaseSchema.java:62)
> [drill-storage-hive-core-1.13.0.jar:1.13.0] at
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$Hi
> veSchema.getTable(HiveSchemaFactory.java:201)
> [drill-storage-hive-core-1.13.0.jar:1.13.0] at
> org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable
> (SimpleCalciteSchema.java:82)
> [calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at
> org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:257)
> [calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at
> org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEnt
> ryFrom(SqlValidatorUtil.java:1003)
> [calcite-core-1.15.0-drill-r0.jar:1.15.0-drill-r0] at
> org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEnt
> ry(SqlValidatorUtil.java:960)
>

Re: [ANNOUNCE] Apache Drill release 1.13.0

2018-03-18 Thread Abhishek Girish

Congratulations everyone, on yet another great release of Apache Drill!
On Mon, Mar 19, 2018 at 6:57 AM Parth Chandra  wrote:

> On behalf of the Apache Drill community, I am happy to announce the
> release of
> Apache Drill 1.13.0.
>
> For information about Apache Drill, and to get involved, visit the
> project website
> [1].
>
> This release of Drill provides the following new features and improvements:
>
> - YARN support for Drill [DRILL-1170
> ]
>
> - Support HTTP Kerberos auth using SPNEGO [DRILL-5425
> ]
>
> - Support SQL syntax highlighting of queries [DRILL-5868
> ]
>
> - Drill should support user/distribution specific configuration checks
> during startup [DRILL-6068
> ]
>
> - Upgrade DRILL to Calcite 1.15.0 [DRILL-5966
> ]
>
> - Batch Sizing improvements to reduce memory footprint of operators
>
> - [DRILL-6071 <
> https://issues.apache.org/jira/browse/DRILL-6071>]
> - Limit batch size for flatten operator
>
> - [DRILL-6126 <
> https://issues.apache.org/jira/browse/DRILL-6126>]
> - Allocate memory for value vectors upfront in flatten operator
>
> - [DRILL-6123 <
> https://issues.apache.org/jira/browse/DRILL-6123>]
> - Limit batch size for Merge Join based on memory.
>
> - [DRILL-6177 <
> https://issues.apache.org/jira/browse/DRILL-6177>]
> - Merge Join - Allocate memory for outgoing value vectors based on sizes of
> incoming batches.
>
>
> For the full list please see release notes [2].
>
> The binary and source artifacts are available here [3].
>
> Thanks to everyone in the community who contributed to this release!
>
> 1. https://drill.apache.org/
> 2. https://drill.apache.org/docs/apache-drill-1-13-0-release-notes/
> 3. https://drill.apache.org/download/
>

Re: MapR Drill 1.12 Mismatch between Native and Library Versions

2018-02-09 Thread Abhishek Girish

e.drill.exec.server.Drillbit
> -
> > Construction completed (3461 ms).
> >
> >
> > On Fri, Feb 9, 2018 at 8:10 AM, John Omernik <j...@omernik.com> wrote:
> >
> > > So already, you have given me some things to work with. Knowing that
> > there
> > > may links to jars/3rdparty was very helpful.  In opening that folder on
> > > drill-1.12.0 I found there were no links/files related to mapr.  In my
> > > drill-1.10.0 version (both of them from MapR) there were three files,
> > > maprfs, maprdb, and mapr-hbase.
> > >
> > > So I added only those three files from /opt/mapr/lib to
> $DRILL_CLASSPATH
> > > in my drill_env.sh.   This allowed drill to start without that same
> > error.
> > >
> > > Now, I am being a little different. Instead of "installing" drill via
> > > RPMs, I download the RPMs (and I did this for both 1.10 and 1.12 from
> > MapR)
> > > The difference I think is in 1.10 there was a "drill" package and now
> in
> > > 1.12, there is a drill-internal package.  Perhaps the drill in 1.10
> moved
> > > some things around better. For what ever reason that changed (I think
> it
> > > changed in 1.11, but I was having cluster issues during that).
> > >
> > > That said, I have a drill bit running, but it's not starting it's web
> > > service. I am going to go reverse engineer what the RPM actually does.
> In
> > > my case, for 1.10 and every version previous, I just took the mapr RPM,
> > > unpacked it, and grabbed the drill directory and it worked great. This
> is
> > > obviously no longer the case, and I will have to dig deeper.
> > >
> > >
> > >
> > > On Thu, Feb 8, 2018 at 4:01 PM, Abhishek Girish <agir...@apache.org>
> > > wrote:
> > >
> > >> Can you also share the contents of (1) MapR build version on the
> cluster
> > >> nodes (cat /opt/mapr/MapRBuildVersion) (2) Drill RPM version installed
> > >> (rpm
> > >> -qa |grep -i mapr-drill)
> > >>
> > >> And also verify if the maprfs and maprdb jars inside
> > >> $DRILL_HOME/jars/3rdparty are links to the corresponding jars in
> > >> /opt/mapr/lib?
> > >>
> > >> On Thu, Feb 8, 2018 at 1:50 PM, Kunal Khatua <kkha...@mapr.com>
> wrote:
> > >>
> > >> > It might be to do with the way you've installed Drill.
> > >> >
> > >> > If you've built and deployed Drill, odds are that the client will be
> > >> > different. With the RPM installation, however, the installer has
> > >> symlinks
> > >> > to make the mapr-client libraries required by Drill be pointing to
> the
> > >> > libraries available in /opt/mapr/lib/
> > >> >
> > >> > I don't know the exact details of what all gets symlinked, but this
> > step
> > >> > should have ensured that you don't see mismatch between the
> versions.
> > >> >
> > >> > That said... Support would be better equipped to help you with this.
> > >> >
> > >> > -Original Message-
> > >> > From: John Omernik [mailto:j...@omernik.com]
> > >> > Sent: Thursday, February 08, 2018 1:38 PM
> > >> > To: user <user@drill.apache.org>
> > >> > Subject: MapR Drill 1.12 Mismatch between Native and Library
> Versions
> > >> >
> > >> > I am running MapR's 1.12 drill on a node that only has posix client
> > >> > installed (and thus has a MapR client from that).
> > >> >
> > >> > I've recently had to work with MapR Support to get a fix to posix,
> and
> > >> > that fixed one issue, but now when I try to start a drill bit, I get
> > >> this
> > >> > error.
> > >> > The fix was a small patch that only updated the posix library... I
> > >> guess I
> > >> > am confused why I am seeing one "non-patched" version (in the java
> > >> library)
> > >> > and one "patched" version in the native library and why I can't
> start
> > >> > drill. I was debating where to post this, MapR community or here,
> but
> > >> it's
> > >> > only Drill I having an issue... any thoughts?
> > >> >
> > >> > John
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > 2018-02-08 15:20:54,3305 ERROR JniCommon
> > >> > fs/client/fileclient/cc/jni_MapRClient.cc:687 Thread: 71 Mismatch
> > found
> > >> > for java and native libraries java build version
> > >> 6.0.0.20171109191718.GA,
> > >> > native build version 6.0.0.20171229015939.GA java patch vserion
> $Id:
> > >> > mapr-version: 6.0.0.20171109191718.GA e892229b271c98c75ccb, native
> > >> patch
> > >> > version $Id: mapr-version:
> > >> > 6.0.0.20171229015939.GA bd8dae73f45572194c89
> > >> > 2018-02-08 15:20:54,3305 ERROR JniCommon
> > >> > fs/client/fileclient/cc/jni_MapRClient.cc:704 Thread: 71 Client
> > >> > initialization failed.
> > >> > Exception in thread "main"
> > >> > org.apache.drill.exec.exception.DrillbitStartupException: Failure
> > while
> > >> > initializing values in Drillbit.
> > >> >
> > >>
> > >
> > >
> >
>

Re: MapR Drill 1.12 Mismatch between Native and Library Versions

2018-02-08 Thread Abhishek Girish

Can you also share the contents of (1) MapR build version on the cluster
nodes (cat /opt/mapr/MapRBuildVersion) (2) Drill RPM version installed (rpm
-qa |grep -i mapr-drill)

And also verify if the maprfs and maprdb jars inside
$DRILL_HOME/jars/3rdparty are links to the corresponding jars in
/opt/mapr/lib?

On Thu, Feb 8, 2018 at 1:50 PM, Kunal Khatua  wrote:

> It might be to do with the way you've installed Drill.
>
> If you've built and deployed Drill, odds are that the client will be
> different. With the RPM installation, however, the installer has symlinks
> to make the mapr-client libraries required by Drill be pointing to the
> libraries available in /opt/mapr/lib/
>
> I don't know the exact details of what all gets symlinked, but this step
> should have ensured that you don't see mismatch between the versions.
>
> That said... Support would be better equipped to help you with this.
>
> -Original Message-
> From: John Omernik [mailto:j...@omernik.com]
> Sent: Thursday, February 08, 2018 1:38 PM
> To: user 
> Subject: MapR Drill 1.12 Mismatch between Native and Library Versions
>
> I am running MapR's 1.12 drill on a node that only has posix client
> installed (and thus has a MapR client from that).
>
> I've recently had to work with MapR Support to get a fix to posix, and
> that fixed one issue, but now when I try to start a drill bit, I get this
> error.
> The fix was a small patch that only updated the posix library... I guess I
> am confused why I am seeing one "non-patched" version (in the java library)
> and one "patched" version in the native library and why I can't start
> drill. I was debating where to post this, MapR community or here, but it's
> only Drill I having an issue... any thoughts?
>
> John
>
>
>
>
>
>
>
>
>
>
>
>
> 2018-02-08 15:20:54,3305 ERROR JniCommon
> fs/client/fileclient/cc/jni_MapRClient.cc:687 Thread: 71 Mismatch found
> for java and native libraries java build version 6.0.0.20171109191718.GA,
> native build version 6.0.0.20171229015939.GA java patch vserion $Id:
> mapr-version: 6.0.0.20171109191718.GA e892229b271c98c75ccb, native patch
> version $Id: mapr-version:
> 6.0.0.20171229015939.GA bd8dae73f45572194c89
> 2018-02-08 15:20:54,3305 ERROR JniCommon
> fs/client/fileclient/cc/jni_MapRClient.cc:704 Thread: 71 Client
> initialization failed.
> Exception in thread "main"
> org.apache.drill.exec.exception.DrillbitStartupException: Failure while
> initializing values in Drillbit.
>

Re: No FileSystem for scheme: maprfs

2018-02-01 Thread Abhishek Girish

Hey,

Images haven't come through (usually attachments aren't supported in
mailing lists). Can you please find another way of sharing them?

Also, can you share how you deployed Drill - did you build from source or
download packages from MapR? The message usually means your DFS storage
plugin is unable to talk to MapR-FS. Please share the DFS storage plugin
contents, the drill-env.sh [and distrib-env.sh] files. Also, a verbose
version of the error message you observe.

Regards,
Abhishek

On Thu, Feb 1, 2018 at 10:57 AM, Willian Ribeiro <
willian.ribe...@blueshift.com.br> wrote:

> Good morning,
>
> I've tried a lot of suggestions for this error, but i couldn't find any
> solution for my problem.
>
>  - I  have a mapr cluster with hive tables in it.
>
>  - I need to use the Drill to query those tables outside the cluster.
>
>  - I have a CentOS VM configured with distributed Drill but as a single
> node (just for now).
>
>  - I can see the tables and query them when they are empty like this one:
>
> [image: Inline image 1]
>
>
> but when i try to query a table that isn't empty, i get this error:
>
> [image: Inline image 4]
>
>
> My question is: I can query hive tables (located in a mapr cluster) from
> Drill that is installed in a VM (CentOS) outside this cluster?
>
>
> This is how i setup Drill:
>
> STORAGE PLUGIN:
>
> [image: Inline image 5]
>
> drill-override.conf (Configured as a cluster single node)
>
> [image: Inline image 7]
>
> DRILL JARS:
>
> [image: Inline image 8]
>
>
> 3rparty JARS:
>
> 3rdparty/antlr-2.7.7.jar
>
> 3rdparty/antlr-runtime-3.4.jar
>
> 3rdparty/apacheds-i18n-2.0.0-M15.jar
>
> 3rdparty/apacheds-kerberos-codec-2.0.0-M15.jar
>
> 3rdparty/api-asn1-api-1.0.0-M20.jar
>
> 3rdparty/api-util-1.0.0-M20.jar
>
> 3rdparty/asm-debug-all-5.0.3.jar
>
> 3rdparty/async-1.4.1.jar
>
> 3rdparty/avro-1.7.7.jar
>
> 3rdparty/avro-ipc-1.7.7.jar
>
> 3rdparty/avro-ipc-1.7.7-tests.jar
>
> 3rdparty/avro-mapred-1.7.7.jar
>
> 3rdparty/aws-java-sdk-1.7.4.jar
>
> 3rdparty/bcpkix-jdk15on-1.52.jar
>
> 3rdparty/bcprov-jdk15on-1.52.jar
>
> 3rdparty/bonecp-0.8.0.RELEASE.jar
>
> 3rdparty/calcite-avatica-1.4.0-drill-r23.jar
>
> 3rdparty/calcite-core-1.4.0-drill-r23.jar
>
> 3rdparty/calcite-linq4j-1.4.0-drill-r23.jar
>
> 3rdparty/commons-beanutils-1.8.3.jar
>
> 3rdparty/commons-beanutils-core-1.8.0.jar
>
> 3rdparty/commons-cli-1.2.jar
>
> 3rdparty/commons-codec-1.10.jar
>
> 3rdparty/commons-collections-3.2.1.jar
>
> 3rdparty/commons-compiler-2.7.6.jar
>
> 3rdparty/commons-compress-1.4.1.jar
>
> 3rdparty/commons-configuration-1.6.jar
>
> 3rdparty/commons-dbcp-1.4.jar
>
> 3rdparty/commons-digester-1.8.1.jar
>
> 3rdparty/commons-httpclient-3.1.jar
>
> 3rdparty/commons-io-2.4.jar
>
> 3rdparty/commons-lang-2.6.jar
>
> 3rdparty/commons-lang3-3.1.jar
>
> 3rdparty/commons-math-2.2.jar
>
> 3rdparty/commons-math3-3.1.1.jar
>
> 3rdparty/commons-net-3.6.jar
>
> 3rdparty/commons-pool-1.5.4.jar
>
> 3rdparty/commons-pool2-2.1.jar
>
> 3rdparty/commons-validator-1.4.1.jar
>
> 3rdparty/config-1.0.0.jar
>
> 3rdparty/converter-jackson-2.1.0.jar
>
> 3rdparty/curator-client-2.7.1.jar
>
> 3rdparty/curator-framework-2.7.1.jar
>
> 3rdparty/curator-recipes-2.7.1.jar
>
> 3rdparty/curator-x-discovery-2.7.1.jar
>
> 3rdparty/datanucleus-api-jdo-3.2.6.jar
>
> 3rdparty/datanucleus-core-3.2.10.jar
>
> 3rdparty/datanucleus-rdbms-3.2.9.jar
>
> 3rdparty/de.huxhorn.lilith.data.converter-0.9.44.jar
>
> 3rdparty/de.huxhorn.lilith.data.eventsource-0.9.44.jar
>
> 3rdparty/de.huxhorn.lilith.data.logging-0.9.44.jar
>
> 3rdparty/de.huxhorn.lilith.data.logging.protobuf-0.9.44.jar
>
> 3rdparty/de.huxhorn.lilith.logback.appender.multiplex-classic-0.9.44.jar
>
> 3rdparty/de.huxhorn.lilith.logback.appender.multiplex-core-0.9.44.jar
>
> 3rdparty/de.huxhorn.lilith.logback.classic-0.9.44.jar
>
> 3rdparty/de.huxhorn.lilith.logback.converter-classic-0.9.44.jar
>
> 3rdparty/de.huxhorn.lilith.sender-0.9.44.jar
>
> 3rdparty/de.huxhorn.sulky.codec-0.9.17.jar
>
> 3rdparty/de.huxhorn.sulky.formatting-0.9.17.jar
>
> 3rdparty/de.huxhorn.sulky.io-0.9.17.jar
>
> 3rdparty/derby-driver.jar
>
> 3rdparty/disruptor-3.3.0.jar
>
> 3rdparty/dom4j-1.6.1.jar
>
> 3rdparty/eigenbase-properties-1.1.5.jar
>
> 3rdparty/esri-geometry-api-2.0.0.jar
>
> 3rdparty/findbugs-annotations-1.3.9-1.jar
>
> 3rdparty/foodmart-data-json-0.4.jar
>
> 3rdparty/freemarker-2.3.26-incubating.jar
>
> 3rdparty/gson-2.2.4.jar
>
> 3rdparty/guava-18.0.jar
>
> 3rdparty/hadoop-annotations-2.7.1.jar
>
> 3rdparty/hadoop-auth-2.7.1.jar
>
> 3rdparty/hadoop-aws-2.7.1.jar
>
> 3rdparty/hadoop-client-2.7.1.jar
>
> 3rdparty/hadoop-common-2.7.1.jar
>
> 3rdparty/hadoop-hdfs-2.7.1.jar
>
> 3rdparty/hadoop-mapreduce-client-app-2.7.1.jar
>
> 3rdparty/hadoop-mapreduce-client-common-2.7.1.jar
>
> 3rdparty/hadoop-mapreduce-client-core-2.7.1.jar
>
> 3rdparty/hadoop-mapreduce-client-jobclient-2.7.1.jar
>
> 3rdparty/hadoop-mapreduce-client-shuffle-2.7.1.jar
>
> 3rdparty/hadoop-yarn-client-2.7.1.jar
>
> 3rdparty/hadoop-yarn-common-2.7.1.jar
>
>

Re: [ANNOUNCE] Apache Drill 1.11.0 Released

2017-07-31 Thread Abhishek Girish

Congratulations everyone!

On Mon, Jul 31, 2017 at 5:16 AM, Arina Yelchiyeva <
arina.yelchiy...@gmail.com> wrote:

> On behalf of the Apache Drill community, I am happy to announce the release
> of Apache Drill 1.11.0.
>
> For information about Apache Drill, and to get involved, visit the project
> website [1].
>
> This release of Drill provides the following new features and improvements
> [2]:
>
> Cryptography-related functions. (DRILL-5634)
> Spill to disk for the hash aggregate operator. (DRILL-5457)
> Format plugin support for PCAP files. (DRILL-5432)
> Ability to change the HDFS block Size for Parquet files. (DRILL-5379)
> Ability to store query profiles in memory. (DRILL-5481)
> Configurable CTAS directory and file permissions option. (DRILL-5391)
> Support for network encryption. (DRILL-4335)
> Relative paths stored in the metadata file. (DRILL-3867)
> Support for ANSI_QUOTES. (DRILL-3510)
>
> The binary and source artifacts are available here [3].
>
> Thanks to everyone in the community who contributed to this release!
>
> 1. https://drill.apache.org/
> 2. https://drill.apache.org/docs/apache-drill-1-11-0-release-notes/
> 3. https://drill.apache.org/download/
>
> Kind regards
> Arina
>

Re: append data to already existing table saved in parquet format

2017-07-25 Thread Abhishek Girish

Drill doesn't have support for an insert into command. You could try using
the CTAS command to write to a specific partition directory, may be? Also
look at CTAS auto partitioning [1]

[1] https://drill.apache.org/docs/partition-by-clause/

On Tue, Jul 25, 2017 at 10:52 PM, Divya Gehlot 
wrote:

> Hi,
> I am naive to Apache drill.
> As I have data coming in every hour , when I searched I couldnt find the
> insert into partition command in Apache drill.
> How can we insert data to particular partition without rewriting the whole
>  data set ?
>
>
> Appreciate the help.
> Thanks,
> Divya
>

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Abhishek Girish

Filed DRILL-5684 <https://issues.apache.org/jira/browse/DRILL-5684> to
track the doc issue.

On Mon, Jul 24, 2017 at 8:33 AM, Abhishek Girish <agir...@apache.org> wrote:

> Glad to know that it worked!
>
> As you are using Drill on Windows, the new line delimiter in text files
> can be different from that on Linux / Mac. We could see \r\n as the
> lineDelimiter (carriage return & new line) and hence when we set the same
> in the format plugin, the issue gets resolved. This particular one doesn't
> seem to be documented - however ideally you should be able to see this in
> [1], [2].
>
> Coming to your question on date formats, you can refer to [3]. It has
> clear examples on how to convert to a different date format. Do let us know
> if that helped.
>
> [1] https://drill.apache.org/docs/plugin-configuration-
> basics/#list-of-attributes-and-definitions
> [2] https://drill.apache.org/docs/text-files-csv-tsv-psv/#
> configuring-drill-to-read-text-files
> [3] https://drill.apache.org/docs/data-type-conversion/#to_date
>
> On Sun, Jul 23, 2017 at 11:52 PM, Divya Gehlot <divya.htco...@gmail.com>
> wrote:
>
>> Thank you so much it worked
>>
>> Can you please provide me the pointer to the documentation where updation
>> for different format type are mentioned .
>>
>> As I am facing another issue with date type as the data which I receive
>> in csv format has the format of 15/1/2016 when I try to cast or convert
>> to_date it throws me error
>>
>>
>>
>>
>> Thanks ,
>> Divya
>>
>> On 24 July 2017 at 14:17, Abhishek Girish <agir...@apache.org> wrote:
>>
>>> Can you update your csv format plugin as shown below and retry your
>>> query?
>>>
>>> "csv": {
>>>   "type": "text",
>>>   "extensions": [
>>> "csv"
>>>   ],
>>>   "lineDelimiter": "\r\n",
>>>   "extractHeader": true,
>>>   "delimiter": ","
>>> }
>>>
>>> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot <divya.htco...@gmail.com>
>>> wrote:
>>>
>>> > 0: jdbc:drill:zk=local> select * FROM
>>> >  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>>> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
>>> > PublicHoliday/PublicHoliday.csv`
>>> > limit 10 ;
>>> > +-+
>>> > | columns |
>>> > +-+
>>> > | ["Day","Date","Area\r"] |
>>> > | ["Friday","15/1/2016","Karnataka\r"]|
>>> > | ["Tuesday","26/1/2016","Karnataka\r"]   |
>>> > | ["Monday","7/3/2016","Karnataka\r"] |
>>> > | ["Friday","25/3/2016","Karnataka\r"]|
>>> > | ["Friday","1/4/2016","Karnataka\r"] |
>>> > | ["Friday","8/4/2016","Karnataka\r"] |
>>> > | ["Thursday","14/4/2016","Karnataka\r"]  |
>>> > | ["Tuesday","19/4/2016","Karnataka\r"]   |
>>> > | ["Sunday","1/5/2016","Karnataka\r"] |
>>> > +-+
>>> > 10 rows selected (0.122 seconds)
>>> > 0: jdbc:drill:zk=local> select * from
>>> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>>> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
>>> > limit 10 ;
>>> > +---++---+
>>> > |Day|Date| Area  |
>>> > +---++---+
>>> > | Friday| 15/1/2016  | null  |
>>> > | Tuesday   | 26/1/2016  | null  |
>>> > | Monday| 7/3/2016   | null  |
>>> > | Friday| 25/3/2016  | null  |
>>> > | Friday| 1/4/2016   | null  |
>>> > | Friday| 8/4/2016   | null  |
>>> > | Thursday  | 14/4/2016  | null  |
>>> > | Tuesday   | 19/4/2016  | null  |
>>> > | Sunday| 1/5/2016   | null  |
>>> > | Monday| 9/5/2016   | null  |
>>> > +---++---+
>>> > 10 rows selected (0.1 seconds)
>>> > 0: jdbc:drill:zk=local>

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Abhishek Girish

Glad to know that it worked!

As you are using Drill on Windows, the new line delimiter in text files can
be different from that on Linux / Mac. We could see \r\n as the
lineDelimiter (carriage return & new line) and hence when we set the same
in the format plugin, the issue gets resolved. This particular one doesn't
seem to be documented - however ideally you should be able to see this in
[1], [2].

Coming to your question on date formats, you can refer to [3]. It has clear
examples on how to convert to a different date format. Do let us know if
that helped.

[1]
https://drill.apache.org/docs/plugin-configuration-basics/#list-of-attributes-and-definitions
[2]
https://drill.apache.org/docs/text-files-csv-tsv-psv/#configuring-drill-to-read-text-files
[3] https://drill.apache.org/docs/data-type-conversion/#to_date

On Sun, Jul 23, 2017 at 11:52 PM, Divya Gehlot <divya.htco...@gmail.com>
wrote:

> Thank you so much it worked
>
> Can you please provide me the pointer to the documentation where updation
> for different format type are mentioned .
>
> As I am facing another issue with date type as the data which I receive in
> csv format has the format of 15/1/2016 when I try to cast or convert
> to_date it throws me error
>
>
>
>
> Thanks ,
> Divya
>
> On 24 July 2017 at 14:17, Abhishek Girish <agir...@apache.org> wrote:
>
>> Can you update your csv format plugin as shown below and retry your query?
>>
>> "csv": {
>>   "type": "text",
>>   "extensions": [
>> "csv"
>>   ],
>>   "lineDelimiter": "\r\n",
>>   "extractHeader": true,
>>   "delimiter": ","
>> }
>>
>> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot <divya.htco...@gmail.com>
>> wrote:
>>
>> > 0: jdbc:drill:zk=local> select * FROM
>> >  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
>> > PublicHoliday/PublicHoliday.csv`
>> > limit 10 ;
>> > +-+
>> > | columns |
>> > +-+
>> > | ["Day","Date","Area\r"] |
>> > | ["Friday","15/1/2016","Karnataka\r"]|
>> > | ["Tuesday","26/1/2016","Karnataka\r"]   |
>> > | ["Monday","7/3/2016","Karnataka\r"] |
>> > | ["Friday","25/3/2016","Karnataka\r"]|
>> > | ["Friday","1/4/2016","Karnataka\r"] |
>> > | ["Friday","8/4/2016","Karnataka\r"] |
>> > | ["Thursday","14/4/2016","Karnataka\r"]  |
>> > | ["Tuesday","19/4/2016","Karnataka\r"]   |
>> > | ["Sunday","1/5/2016","Karnataka\r"] |
>> > +-+
>> > 10 rows selected (0.122 seconds)
>> > 0: jdbc:drill:zk=local> select * from
>> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
>> > limit 10 ;
>> > +---++---+
>> > |Day|Date| Area  |
>> > +---++---+
>> > | Friday| 15/1/2016  | null  |
>> > | Tuesday   | 26/1/2016  | null  |
>> > | Monday| 7/3/2016   | null  |
>> > | Friday| 25/3/2016  | null  |
>> > | Friday| 1/4/2016   | null  |
>> > | Friday| 8/4/2016   | null  |
>> > | Thursday  | 14/4/2016  | null  |
>> > | Tuesday   | 19/4/2016  | null  |
>> > | Sunday| 1/5/2016   | null  |
>> > | Monday| 9/5/2016   | null  |
>> > +---++---+
>> > 10 rows selected (0.1 seconds)
>> > 0: jdbc:drill:zk=local>
>> >
>> >
>> > *Drill set up* : Aapche drill is set up on Windows machine in embedded
>> mode
>> > .
>> >
>> > On 24 July 2017 at 13:30, Divya Gehlot <divya.htco...@gmail.com> wrote:
>> >
>> > >
>> > > Pasting the result set in text format
>> > >
>> > > *Reading parquet file format :*
>> > >
>> > >> Day   Date Area
>> > >> Friday 15/1/2016 null
>> > >> Tuesday 26/1/2016 null
>> > &

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Abhishek Girish

Can you update your csv format plugin as shown below and retry your query?

"csv": {
  "type": "text",
  "extensions": [
"csv"
  ],
  "lineDelimiter": "\r\n",
  "extractHeader": true,
  "delimiter": ","
}

On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot <divya.htco...@gmail.com>
wrote:

> 0: jdbc:drill:zk=local> select * FROM
>  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
> PublicHoliday/PublicHoliday.csv`
> limit 10 ;
> +-+
> | columns |
> +-+
> | ["Day","Date","Area\r"] |
> | ["Friday","15/1/2016","Karnataka\r"]|
> | ["Tuesday","26/1/2016","Karnataka\r"]   |
> | ["Monday","7/3/2016","Karnataka\r"] |
> | ["Friday","25/3/2016","Karnataka\r"]|
> | ["Friday","1/4/2016","Karnataka\r"] |
> | ["Friday","8/4/2016","Karnataka\r"] |
> | ["Thursday","14/4/2016","Karnataka\r"]  |
> | ["Tuesday","19/4/2016","Karnataka\r"]   |
> | ["Sunday","1/5/2016","Karnataka\r"] |
> +-+
> 10 rows selected (0.122 seconds)
> 0: jdbc:drill:zk=local> select * from
> `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
> limit 10 ;
> +---++---+
> |Day|Date| Area  |
> +---++---+
> | Friday| 15/1/2016  | null  |
> | Tuesday   | 26/1/2016  | null  |
> | Monday| 7/3/2016   | null  |
> | Friday| 25/3/2016  | null  |
> | Friday| 1/4/2016   | null  |
> | Friday| 8/4/2016   | null  |
> | Thursday  | 14/4/2016  | null  |
> | Tuesday   | 19/4/2016  | null  |
> | Sunday| 1/5/2016   | null  |
> | Monday| 9/5/2016   | null  |
> +---++---+
> 10 rows selected (0.1 seconds)
> 0: jdbc:drill:zk=local>
>
>
> *Drill set up* : Aapche drill is set up on Windows machine in embedded mode
> .
>
> On 24 July 2017 at 13:30, Divya Gehlot <divya.htco...@gmail.com> wrote:
>
> >
> > Pasting the result set in text format
> >
> > *Reading parquet file format :*
> >
> >> Day   Date Area
> >> Friday 15/1/2016 null
> >> Tuesday 26/1/2016 null
> >> Monday 7/3/2016 null
> >> Friday 25/3/2016 null
> >> Friday 1/4/2016 null
> >> Friday 8/4/2016 null
> >
> >
> >
> > *Reading csv file format *
> >
> >> columns
> >> ["Day","Date","Area\r"]
> >> ["Friday","1/4/2016","Karnataka\r"]
> >> ["Friday","15/1/2016","Karnataka\r"]
> >> ["Friday","25/3/2016","Karnataka\r"]
> >> ["Friday","8/4/2016","Karnataka\r"]
> >> ["Monday","7/3/2016","Karnataka\r"]
> >
> >
> >
> >
> >
> > *CTAS query csv to parquet :*
> >
> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> >> SELECT
> >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
> >> VARCHAR(100)) END AS `Day`,
> >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date`
> AS
> >> VARCHAR(100)) END AS `Date`,
> >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area`
> AS
> >> VARCHAR(100)) END AS `Area`
> >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> ',',
> >> extractHeader => true))
> >
> >
> >
> > Thanks,
> > Divya
> >
> > On 24 July 2017 at 13:20, Abhishek Girish <agir...@apache.org> wrote:
> >
> >> Unfortunately, the attachments / pictures haven't come through. Mailing
> >> lists sometimes do not support these. Can you paste as text or share
> links
> >> to it instead?
> >>
> >> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot <divya.htco...@gmail.com>
> >> wrote:
> >>
> >> > yes it shows the proper values when I query the csv file.
> >> > C

Re: CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Abhishek Girish

Unfortunately, the attachments / pictures haven't come through. Mailing
lists sometimes do not support these. Can you paste as text or share links
to it instead?

On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot <divya.htco...@gmail.com>
wrote:

> yes it shows the proper values when I query the csv file.
> CTAS query csv to parquet :
> Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> SELECT
> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
> VARCHAR(100)) END AS `Day`,
> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date` AS
> VARCHAR(100)) END AS `Date`,
> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area` AS
> VARCHAR(100)) END AS `Area`
> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => ',',
> extractHeader => true))
>
> CSV File
>
> Parquet File
>
>
>
> Appreciate the help !
>
> Thanks,
> Divya 
>
> On 24 July 2017 at 11:52, Abhishek Girish <agir...@apache.org> wrote:
>
>> Can you share a sample row from the CSV and the CTAS query? Also test if a
>> select columns[n] query on the CSV file works as expected [1] ?
>>
>> It could be an issue with delimiters.
>>
>> [1]
>> https://drill.apache.org/docs/querying-plain-text-files/#col
>> umns[n]-syntax
>> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot <divya.htco...@gmail.com>
>> wrote:
>>
>> > Hi ,
>> > I am facing as weird issue when I CTAS and save the csv file as parquet
>> it
>> > displays the last column values as null .
>> > This is not the case with one file .
>> > If I take any csv file with even with any data type and do a
>> > select column1,column2,column3 from table.parquet
>> > it shows the column3 values as null.
>> >
>> > Appreciate the help.
>> >
>> > Thanks,
>> > Divya
>> >
>>
>
>

Re: CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Abhishek Girish

Can you share a sample row from the CSV and the CTAS query? Also test if a
select columns[n] query on the CSV file works as expected [1] ?

It could be an issue with delimiters.

[1]
https://drill.apache.org/docs/querying-plain-text-files/#columns[n]-syntax
On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot 
wrote:

> Hi ,
> I am facing as weird issue when I CTAS and save the csv file as parquet it
> displays the last column values as null .
> This is not the case with one file .
> If I take any csv file with even with any data type and do a
> select column1,column2,column3 from table.parquet
> it shows the column3 values as null.
>
> Appreciate the help.
>
> Thanks,
> Divya
>

Re: Add oracle as a storage fails: Please retry: error (unable to create/ update storage)

2017-07-21 Thread Abhishek Girish

Thanks for sharing. We should probably add a note on this to the jdbc
storage plugin doc.

On Fri, Jul 21, 2017 at 1:00 PM, Dan Holmes 
wrote:

> Turns out after you put the jar in the directory you have to restart drill.
>
> Works now.
>
> Dan Holmes | Revenue Analytics, Inc.
> Direct: 770.859.1255
> www.revenueanalytics.com
>
> -Original Message-
> From: Dan Holmes [mailto:dhol...@revenueanalytics.com]
> Sent: Friday, July 21, 2017 3:51 PM
> To: user@drill.apache.org
> Subject: Add oracle as a storage fails: Please retry: error (unable to
> create/ update storage)
>
> I have followed the instructions here:  https://drill.apache.org/docs/
> rdbms-storage-plugin/#Example-Oracle-Configuration
>
> I get the error in the subject.  I don't see anything in the logs about it.
>
> {
>   type: "jdbc",
>   enabled: true,
>   driver: "oracle.jdbc.OracleDriver",
>   url:"jdbc:oracle:thin:user/pwd@yoda:1523/ORCL"
> }
>
> I have the thin driver installed.
> dan@ubuntu:~/apache-drill-1.10.0/jars/3rdparty$ ll o* -rwxrwxr-x 1 dan
> dan 3698857 Jul 21 15:25 ojdbc7.jar*
>
> I can telnet to the server.
> dan@ubuntu:~/apache-drill-1.10.0/jars/3rdparty$ telnet yoda 1523 Trying
> 10.10.10.10...
> Connected to yoda.RADOM.local.
> Escape character is '^]'.
> ^]
> telnet> quit
> Connection closed.
>
> I don't know how to troubleshoot this further.
>
> Thank you any help.
>
> Dan Holmes | Architect | Revenue Analytics, Inc.
> 300 Galleria Parkway, Suite 1900 | Atlanta, Georgia 30339
> Direct: 770.859.1255 Cell: 404.617.3444
> www.revenueanalytics.com com/owa/redir.aspx?SURL=RqmyOJRm3r383jV2nPQLyg9BvjWZqM
> X4-tL3BHj81WfaslMWau_SCGgAdAB0AHAAOgAvAC8AdwB3AHcAL
> gByAGUAdgBlAG4AdQBlAGEAbgBhAGwAeQB0AGkAYwBzAC4AYwBvAG0A=
> http%3a%2f%2fwww.revenueanalytics.com>
> LinkedIn SrcaeiXxVTCDhl49ibCO7CHhTsNynunc_8gSjHDaikXaslMWau_
> SCGgAdAB0AHAAcwA6AC8ALwB3AHcAdwAuAGwAaQBuAGsAZQBkAGkAbgAuAGM
> AbwBtAC8AYwBvAG0AcABhAG4AeQAvAHIAZQB2AGUAbgB1AGUALQBhAG4AYQB
> sAHkAdABpAGMAcwAtAGkAbgBjAC0A=https%3a%2f%2fwww.
> linkedin.com%2fcompany%2frevenue-analytics-inc-> | Twitter<
> https://webmail.revenueanalytics.com/owa/redir.aspx?SURL=
> cdePsMV8TCGx8O_Rugbj-maE9C9DVT373vSJwbUc23faslMWau_
> SCGgAdAB0AHAAcwA6AC8ALwB0AHcAaQB0AHQAZQByAC4AYwBvAG0ALwBSAGU
> AdgBfAEEAbgBhAGwAeQB0AGkAYwBzAA..=https%3a%2f%2ftwitter.
> com%2fRev_Analytics>
>
>

Re: Running drill on Windows

2017-07-19 Thread Abhishek Girish

I think we should document two things - (1) JAVA_HOME requirements (2) Lack
of support for Drill in distributed mode on Windows, unless someone
volunteers to certify Drill on the platform.

On Wed, Jul 19, 2017 at 10:19 AM, Arina Yelchiyeva <
arina.yelchiy...@gmail.com> wrote:

> Having JAVA_HOME without spaces is general requirement for installation on
> Windows.
> For example, the same is for Hadoop -
> https://wiki.apache.org/hadoop/Hadoop2OnWindows
> Regarding, memory configuration issues, this should be checked.
>
> Kind regards
> Arina
>
> On Wed, Jul 19, 2017 at 7:44 PM, Abhishek Girish <agir...@apache.org>
> wrote:
>
> > Hey Arina,
> >
> > This is pretty helpful. However, this can only constitute as a workaround
> > and not native Windows support, correct? - as cygwin or similar utilities
> > are a pre-requisite. And further, I'm not sure all configurations would
> > take effect as expected - for instance, I vaguely remember that while
> > running drill in embedded mode on Windows, the memory configurations
> > weren't picked up, unless the scripts were meddled with. Would you expect
> > this to be resolved with your steps and that all of Drill should function
> > as expected (provided run under cygwin env)? We could then document this.
> >
> > -Abhishek
> >
> >
> > On Wed, Jul 19, 2017 at 2:35 AM Arina Yelchiyeva <
> > arina.yelchiy...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > it might be that the problem is that your JAVA_HOME contains spaces.
> > > Discussed in DRILL-5280.
> > > Posted detailed answer on stackoverflow as well.
> > >
> > > Kind regards
> > > Arina
> > >
> > > On Tue, Jul 18, 2017 at 1:11 PM, Med Dhaker Abdeljawed <
> > > med.dhaker.abdelja...@gmail.com> wrote:
> > >
> > > > 1down votefavorite
> > > > <https://stackoverflow.com/questions/45141935/how-to-
> > > > start-drillbit-distributed-and-in-single-node-with-
> windows/45156594#>
> > > >
> > > > I want to start drillbit sevrer in Distributed mode in Windows but
> > didn't
> > > > work,
> > > >
> > > > I started ZooKeeper and works fine with " zkServer.cmd ", and started
> > > drill
> > > > with cygwin command like this : " sh drillbit.sh start " but the
> server
> > > > don't start and give this error in drillbit.out log file :
> > > >
> > > > C:\Drill/bin/runbit: line 107: exec: C:\Program: not found
> > > >
> > >
> >
>

Re: Running drill on Windows

2017-07-19 Thread Abhishek Girish

Hey Arina,

This is pretty helpful. However, this can only constitute as a workaround
and not native Windows support, correct? - as cygwin or similar utilities
are a pre-requisite. And further, I'm not sure all configurations would
take effect as expected - for instance, I vaguely remember that while
running drill in embedded mode on Windows, the memory configurations
weren't picked up, unless the scripts were meddled with. Would you expect
this to be resolved with your steps and that all of Drill should function
as expected (provided run under cygwin env)? We could then document this.

-Abhishek

On Wed, Jul 19, 2017 at 2:35 AM Arina Yelchiyeva 
wrote:

> Hi,
>
> it might be that the problem is that your JAVA_HOME contains spaces.
> Discussed in DRILL-5280.
> Posted detailed answer on stackoverflow as well.
>
> Kind regards
> Arina
>
> On Tue, Jul 18, 2017 at 1:11 PM, Med Dhaker Abdeljawed <
> med.dhaker.abdelja...@gmail.com> wrote:
>
> > 1down votefavorite
> >  > start-drillbit-distributed-and-in-single-node-with-windows/45156594#>
> >
> > I want to start drillbit sevrer in Distributed mode in Windows but didn't
> > work,
> >
> > I started ZooKeeper and works fine with " zkServer.cmd ", and started
> drill
> > with cygwin command like this : " sh drillbit.sh start " but the server
> > don't start and give this error in drillbit.out log file :
> >
> > C:\Drill/bin/runbit: line 107: exec: C:\Program: not found
> >
>

Re: 1.11 Release date

2017-06-19 Thread Abhishek Girish

Thanks for sharing details! It's always helpful to know the context in any
request. It's also exciting to learn about new Drill use-cases.

Regarding the release timeline, I hope someone will start a thread in the
Dev mailing list soon. Or for someone in the PMC to comment on it, here.

I think you should file a JIRA for the S3 regions issue. It's a useful
reference for others encountering similar issues. We should also document
it, until it's resolved.

-Abhishek

On Mon, Jun 19, 2017 at 7:55 AM, Jack Ingoldsby <jack.ingold...@gmail.com>
wrote:

> Hi,
> Here is the JIRA for the JDBC issue, BTW.
> https://issues.apache.org/jira/browse/DRILL-5373
>
> Regards,
> Jack
>
> On Mon, Jun 19, 2017 at 10:46 AM, Jack Ingoldsby <jack.ingold...@gmail.com
> >
> wrote:
>
> > Hi,
> > Thanks, I appreciate the offer to help to help very much.
> > So, the situation is that I'm working as a presales engineer at a
> business
> > intelligence tech startup (not really a startup any more, they have c300
> > employees).
> > We provide an easily installed tool to enable business users to mash up
> > data from various sources and perform analysis.
> > I wanted to see if we could use Drill to get data from S3 as an
> additional
> > data source.
> >
> > Thanks to help from the user group was able to do so, used Drill ODBC to
> > load 37 million records successfully, so have proof of concept.
> >
> > Currently the standard architecture  is Analytic Server, Web Server all
> > installed on a single Windows server. We are looking to provide a Linux
> > offering in the coming months,
> >
> > I'd like to get the product team to buy into using Apache Drill as a
> > standard data source.  So a couple of issues
> >
> >1. Need to get JDBC up and running, (ideally on a Windows embedded
> >instance for testing purposes prior to the Linux release)
> >2. I think there is an issue connecting to different regions of AWS
> >S3. I can connect successfully to N Virgina, but not to Ohio, I think
> >because of signature versions (http://drill-user.incubator.
> >apache.narkive.com/Ue0zF3kp/s3-storage-plugin-not-working-
> >for-signature-v4-regions
> ><http://drill-user.incubator.apache.narkive.com/Ue0zF3kp/
> s3-storage-plugin-not-working-for-signature-v4-regions>
> >)
> >
> > So, I think rather than using a forked Drill version for short term
> > purposes, I'd like to present the product team with with a demo using an
> > official Drill release where these issues are addressed, even if that is
> > weeks or months away, hence my interest in likely release schedule
> >
> > Thanks,
> > Jack
> >
> >
> > On Sun, Jun 18, 2017 at 12:49 PM, Abhishek Girish <agir...@apache.org>
> > wrote:
> >
> >> That's a good question - however, I don't think it hasn't been discussed
> >> yet.
> >>
> >> If not a 1.11.0 release right away, we should consider having a 1.10.1
> >> minor release.
> >>
> >> Coming to your question on issue with Windows embedded JDBC, if you see
> >> that's now been resolved (is there a JIRA?), you could build from source
> >> (Apache master) and try it out. Let us know if you need help with that.
> >>
> >> On Sun, Jun 18, 2017 at 9:33 AM, Jack Ingoldsby <
> jack.ingold...@gmail.com
> >> >
> >> wrote:
> >>
> >> > Hi,
> >> > Is there a likely release date for 1.11?
> >> > I hit a known issue with Windows embedded JDBC, looks like it will
> have
> >> > been addressed in 1.11.
> >> > Thanks,
> >> > Jack
> >> >
> >>
> >
> >
>

Re: 1.11 Release date

2017-06-18 Thread Abhishek Girish

That's a good question - however, I don't think it hasn't been discussed
yet.

If not a 1.11.0 release right away, we should consider having a 1.10.1
minor release.

Coming to your question on issue with Windows embedded JDBC, if you see
that's now been resolved (is there a JIRA?), you could build from source
(Apache master) and try it out. Let us know if you need help with that.

On Sun, Jun 18, 2017 at 9:33 AM, Jack Ingoldsby 
wrote:

> Hi,
> Is there a likely release date for 1.11?
> I hit a known issue with Windows embedded JDBC, looks like it will have
> been addressed in 1.11.
> Thanks,
> Jack
>

Re: Connecting to S3 bucket which does not seem to require a key

2017-06-12 Thread Abhishek Girish

That's good to know. I just didn't want Drill community to be the place
your keys were leaked :)

I attempted with your keys and could reproduce the issue. One guess is that
it could be due to location constraints [1].

You can attempt to set the "fs.s3a.endpoint" property in S3 config and give
it a try. For example:

{
  "type": "file",
  "enabled": true,
  "connection": "s3a://sisense.citibike",
  "config": {
"fs.s3a.access.key": "AKIAJELPGZYEPGRP6VBA",
"fs.s3a.secret.key": "h3CyqC/VzpRirOMi3nCImYJL2oNV1xwOcEBiYi02",
"fs.s3a.endpoint": "s3-us-west-2.amazonaws.com"  // Pointing to the
region of the bucket
  }
...
...
}


[1] http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

On Mon, Jun 12, 2017 at 9:13 AM, Jack Ingoldsby <jack.ingold...@gmail.com>
wrote:

> Well, these are for a specific user I created for this bucket. The user
> only has read access to this bucket, which only contains this public
> citibike data and has no permissions access.
> So, I'm fine if anyone can connect (at least until I figure out the
> problem)
>
> On Mon, Jun 12, 2017 at 11:59 AM, Abhishek Girish <agir...@apache.org>
> wrote:
>
> > I hope you haven't shared your actual access / secret keys with the
> > community. If not, please work on securing your account [1]!
> >
> >
> > [1] https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/
> >
> >
> >
> > On Mon, Jun 12, 2017 at 8:34 AM, Jack Ingoldsby <
> jack.ingold...@gmail.com>
> > wrote:
> >
> > > Hi,
> > > Thanks. I'm actually more playing around with a proof of concept that I
> > can
> > > query S3 using our tool via Drill.
> > > So, what I did was to download the citibike and data and create my own
> s3
> > > bucket with an accessid,secretket , but I'm having some problem
> > connecting
> > > I get the following error message when running a query
> > >
> > > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> > > AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS
> Request
> > > ID: 439EE2E823001E80, AWS Error Code: null, AWS Error Message: Bad
> > Request
> > > [Error Id: 9da0c6bd-b173-48e0-aeac-47179812e696 on
> > > LAP-NY-CHENO.corp.sisense.com:31010]
> > >
> > > It appears to be a connection issue but i can connect to the bucket
> > > sisense.citibike using AWS command line utility, using the same
> > accesskey,
> > > secretkey
> > > Does anything leap out ?
> > >
> > > The configuration is set to
> > >
> > > {
> > >   "type": "file",
> > >   "enabled": true,
> > >   "connection": "s3a://sisense.citibike",
> > >   "config": {
> > > "fs.s3a.access.key": "ID",
> > > "fs.s3a.secret.key": "SECRET"
> > >   },
> > >
> > >
> > > Core-site.xml is set to
> > >
> > > 
> > >
> > > 
> > > fs.s3a.access.key
> > > AKIAJELPGZYEPGRP6VBA
> > > 
> > >
> > > 
> > > fs.s3a.secret.key
> > > h3CyqC/VzpRirOMi3nCImYJL2oNV1xwOcEBiYi02
> > > 
> > >
> > > 
> > >
> > > Thanks,
> > > Jack
> > >
> > > On Mon, Jun 12, 2017 at 10:43 AM, Andries Engelbrecht <
> > > aengelbre...@mapr.com
> > > > wrote:
> > >
> > > > You may be better of downloading the NYC bike data set locally and
> > > convert
> > > > to parquet.
> > > > Converting from csv.zip to parquet will result in large improvements
> in
> > > > performance if you do various queries on the data set.
> > > >
> > > > --Andries
> > > >
> > > > On 6/11/17, 10:48 PM, "Abhishek Girish" <agir...@apache.org> wrote:
> > > >
> > > > Drill connects to to S3 buckets (AWS) via the S3a library. And
> the
> > > > storage
> > > > plugin configuration requires the access & secret keys [1].
> > > >
> > > > I'm not sure if Drill can access S3 without the credentials. It
> > might
> > > > be
> > > > possible via custom authenticators [2]. Hopefully others who have
> > > tried
> > > > this will comment.
> >

Re: Connecting to S3 bucket which does not seem to require a key

2017-06-12 Thread Abhishek Girish

I hope you haven't shared your actual access / secret keys with the
community. If not, please work on securing your account [1]!


[1] https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/



On Mon, Jun 12, 2017 at 8:34 AM, Jack Ingoldsby <jack.ingold...@gmail.com>
wrote:

> Hi,
> Thanks. I'm actually more playing around with a proof of concept that I can
> query S3 using our tool via Drill.
> So, what I did was to download the citibike and data and create my own s3
> bucket with an accessid,secretket , but I'm having some problem connecting
> I get the following error message when running a query
>
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request
> ID: 439EE2E823001E80, AWS Error Code: null, AWS Error Message: Bad Request
> [Error Id: 9da0c6bd-b173-48e0-aeac-47179812e696 on
> LAP-NY-CHENO.corp.sisense.com:31010]
>
> It appears to be a connection issue but i can connect to the bucket
> sisense.citibike using AWS command line utility, using the same accesskey,
> secretkey
> Does anything leap out ?
>
> The configuration is set to
>
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://sisense.citibike",
>   "config": {
> "fs.s3a.access.key": "ID",
> "fs.s3a.secret.key": "SECRET"
>   },
>
>
> Core-site.xml is set to
>
> 
>
> 
> fs.s3a.access.key
> AKIAJELPGZYEPGRP6VBA
> 
>
> 
> fs.s3a.secret.key
> h3CyqC/VzpRirOMi3nCImYJL2oNV1xwOcEBiYi02
> 
>
> 
>
> Thanks,
> Jack
>
> On Mon, Jun 12, 2017 at 10:43 AM, Andries Engelbrecht <
> aengelbre...@mapr.com
> > wrote:
>
> > You may be better of downloading the NYC bike data set locally and
> convert
> > to parquet.
> > Converting from csv.zip to parquet will result in large improvements in
> > performance if you do various queries on the data set.
> >
> > --Andries
> >
> > On 6/11/17, 10:48 PM, "Abhishek Girish" <agir...@apache.org> wrote:
> >
> > Drill connects to to S3 buckets (AWS) via the S3a library. And the
> > storage
> > plugin configuration requires the access & secret keys [1].
> >
> > I'm not sure if Drill can access S3 without the credentials. It might
> > be
> > possible via custom authenticators [2]. Hopefully others who have
> tried
> > this will comment.
> >
> >
> > [1] https://drill.apache.org/docs/s3-storage-plugin/
> > [2] http://docs.aws.amazon.com/AmazonS3/latest/API/sig-
> > v4-authenticating-requests.html
> >
> > On Wed, Jun 7, 2017 at 3:02 PM, Jack Ingoldsby <
> > jack.ingold...@gmail.com>
> > wrote:
> >
> > > Hi,
> > > I'm trying to access the NYC Citibike S3 bucket, which seems to
> > publicly
> > > available
> > >
> > > https://s3.amazonaws.com/tripdata/index.html
> > > If I leave the Access Key & Secret Key empty, I get the following
> > message
> > >
> > > 0: jdbc:drill:zk=local> !tables
> > > Error: Failure getting metadata: Unable to load AWS credentials
> from
> > any
> > > provider in the chain (state=,code=0)
> > >
> > > If I try entering random numbers as keys, I get the following
> message
> > >
> > > Error: Failure getting metadata: Status Code: 403, AWS Service:
> > Amazon S3,
> > > AWS Request ID: 1C888A3A21D79F87, AWS Error Code:
> > InvalidAccessKeyId, AWS
> > > Error Message: The AWS Access Key Id you provided does not exist in
> > our
> > > records. (state=,code=0)
> > >
> > > Is it possible to connect to a data source that does not seem to
> > require a
> > > key?
> > >
> > > Thanks,
> > > Jack
> > >
> >
> >
> >
>

Re: Connecting to S3 bucket which does not seem to require a key

2017-06-11 Thread Abhishek Girish

Drill connects to to S3 buckets (AWS) via the S3a library. And the storage
plugin configuration requires the access & secret keys [1].

I'm not sure if Drill can access S3 without the credentials. It might be
possible via custom authenticators [2]. Hopefully others who have tried
this will comment.


[1] https://drill.apache.org/docs/s3-storage-plugin/
[2] http://docs.aws.amazon.com/AmazonS3/latest/API/sig-
v4-authenticating-requests.html

On Wed, Jun 7, 2017 at 3:02 PM, Jack Ingoldsby 
wrote:

> Hi,
> I'm trying to access the NYC Citibike S3 bucket, which seems to publicly
> available
>
> https://s3.amazonaws.com/tripdata/index.html
> If I leave the Access Key & Secret Key empty, I get the following message
>
> 0: jdbc:drill:zk=local> !tables
> Error: Failure getting metadata: Unable to load AWS credentials from any
> provider in the chain (state=,code=0)
>
> If I try entering random numbers as keys, I get the following message
>
> Error: Failure getting metadata: Status Code: 403, AWS Service: Amazon S3,
> AWS Request ID: 1C888A3A21D79F87, AWS Error Code: InvalidAccessKeyId, AWS
> Error Message: The AWS Access Key Id you provided does not exist in our
> records. (state=,code=0)
>
> Is it possible to connect to a data source that does not seem to require a
> key?
>
> Thanks,
> Jack
>

Re: Drill with Cassandra

2017-06-11 Thread Abhishek Girish

The code isn't complete afaik. It might require more work than just
implementing the functionality you are interested in. You can begin by
taking a look at what's already there (refer to Patches / PRs on the JIRA)
and then asking specific questions if you get stuck at any point. I'm
certain someone in the community will offer to help.

Regarding how Drill and Presto compare, you can refer to [1]. Also, there
are a few threads in the User archives which discusses this as well. If you
have any specific questions, feel free to send an email to the list.

[1] https://www.quora.com/How-does-Apache-Drill-compare-to-Facebooks-Presto

On Tue, Jun 6, 2017 at 9:49 AM, Sandeep Dixit <sdi...@ohioedge.com> wrote:

> Does this just require update to the existing code or implement
> additional/missing functionality as well? I am particularly interested in
> EXISTS/NOT EXISTS etc implementation. Also how does this project
> differentiate from Presto which seem to have Cassandra adapter?
>
> --
>
> Thanks,
> Sandeep
>
>
>
> On Mon, Jun 5, 2017 at 12:55 PM, Abhishek Girish <agir...@apache.org>
> wrote:
>
> > Currently Drill does not support Cassandra as a datasource. There was
> some
> > previous work on a cassandra plugin [1], but I do not think that's been
> > completed.
> >
> > You are welcome to contribute towards a Cassandra plugin.
> >
> > [1] https://issues.apache.org/jira/browse/DRILL-92
> >
> > On Mon, Jun 5, 2017 at 9:31 AM, Sandeep Dixit <sdi...@ohioedge.com>
> wrote:
> >
> > > I am not finding any Cassandra specific documentation - how to
> configure,
> > > etc. - in the documentation site. Does Drill fully support Cassandra?
> If
> > > yes - which version? I am migrating my group chat app to Cassandra and
> > was
> > > not sure how to proceed 1) replace all joins with multiple simple table
> > > queries and then compose object graphs in middle-tier or 2) use Drill
> and
> > > reuse existing queries. My complex queries include EXISTS, JOIN, AND/OR
> > etc
> > > clauses. Has anyone used Drill in similar scenario and can provide me
> > some
> > > pointers? Also this is not for analytical purpose - this would be for
> > > user-centric app.
> > >
> > > --
> > >
> > > Thanks,
> > > Sandeep
> > >
> >
>

Re: md5 function

2017-06-11 Thread Abhishek Girish

There is no in-built function for MD5 afaik, but you can create a UDF for
it.

Please refer to a related discussion on the user list [1] and Drill
documentation on UDFs [2], [3]. Once you are done, it would be helpful if
you can please share your UDF with the community.

[1]
http://mail-archives.apache.org/mod_mbox/drill-user/201611.mbox/%3CCAHp%2BJo4c4x2N6Ltfdju8YWE3mY6JJz3QZ%2BYn47Qk0T9im4XiAA%40mail.gmail.com%3E
(Also refer to other replies in this thread)

[2] https://drill.apache.org/docs/adding-custom-functions-to-drill/

[3] https://drill.apache.org/docs/develop-custom-functions-introduction/

-Abhishek

On Sun, Jun 11, 2017 at 7:11 PM, Mick Bisignani 
wrote:

> is an Md5 (varchar) function available in Apache drill sql . I am looking
> to standardize on this particular hash function across multiple sql
> environments including presto, postgres and mysql
>
> cheers
>
> Sent from my iPad

Re: UNORDERED_RECEIVER taking 70% of query time

2017-06-01 Thread Abhishek Girish

Attachment hasn't come through. Can you upload the query profile to some
cloud storage and share a link to it?

Also, please share details on how large your dataset is, number of
Drillbits, memory and other configurations.


On Thu, Jun 1, 2017 at 10:18 PM,  wrote:

> Hi,
>
>
>
> I am running a simple query which performs JOIN operation between two
> parquet files and it takes around 3-4 secs and I noticed that 70% of the
> time is used by UNORDERED_RECEIVER.
>
>
>
> Sample query is –
>
>
>
> select sum(sales),week from dfs.`C:\parquet-location\
> F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet` where
> model_component_id in(
>
> select model_component_id from dfs.`C:\parquet-location\poc48k.parquet`)
> group by week
>
>
>
>
>
> Can we somehow reduce unordered receiver time?
>
>
>
> Please find the below screenshot of Visualized plan
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
> 
> __
>
> www.accenture.com
>

Re: Parquet on S3 - timeouts

2017-06-01 Thread Abhishek Girish







Cool, thanks for confirming.



_
From: Raz Baluchi <raz.balu...@gmail.com>
Sent: Thursday, June 1, 2017 2:14 PM
Subject: Re: Parquet on S3 - timeouts
To:  <user@drill.apache.org>


setting

  
fs.s3a.connection.maximum
100
  

does fix the problem. No more timeouts and very quick response. No need to
'prime' the query...

On Thu, Jun 1, 2017 at 4:08 PM, Abhishek Girish <agir...@apache.org> wrote:

> Can you take a look at [1] and let us know if that helps resolve your
> issue?
>
> [1]
> https://drill.apache.org/docs/s3-storage-plugin/#quering-
> parquet-format-files-on-s3
>
> On Thu, Jun 1, 2017 at 12:55 PM, Raz Baluchi <raz.balu...@gmail.com>
> wrote:
>
> > Now that I have Drill working with parquet files on dfs, the next step
> was
> > to move the parquet files to S3.
> >
> > I get pretty good performance - I can query for events  by date range
> > within 10 seconds. ( out of a total of ~ 800M events across 25 years)
> >  However, there seems to be some threshold beyond which queries start
> > timing out.
> >
> > SYSTEM ERROR: ConnectionPoolTimeoutException: Timeout waiting for
> > connection from pool
> >
> > My first question is, is there a default timeout value to queries against
> > S3? Anything that takes longer than ~ 150 seconds seems to hit the
> timeout
> > error.
> >
> > The second question has to do with the possible conditions that trigger
> the
> > prolonged query time. It seems that if I increase the filters beyond a
> > certain number - it doesn't take much - the query times out.
> >
> > For example the query:
> >
> > select * from events where YEAR in (2012, 2013) works fine - however,
> > select * from events where YEAR in (2012, 2013, 2014) fails with a
> timeout.
> >
> > To make it worse, I can't use the first query either  until I restart
> > drill...
> >
>

Re: Parquet on S3 - timeouts

2017-06-01 Thread Abhishek Girish

Can you take a look at [1] and let us know if that helps resolve your issue?

[1]
https://drill.apache.org/docs/s3-storage-plugin/#quering-parquet-format-files-on-s3

On Thu, Jun 1, 2017 at 12:55 PM, Raz Baluchi  wrote:

> Now that I have Drill working with parquet files on dfs, the next step was
> to move the parquet files to S3.
>
> I get pretty good performance - I can query for events  by date range
> within 10 seconds. ( out of a total of ~ 800M events across 25 years)
>  However, there seems to be some threshold beyond which queries start
> timing out.
>
> SYSTEM ERROR: ConnectionPoolTimeoutException: Timeout waiting for
> connection from pool
>
> My first question is, is there a default timeout value to queries against
> S3? Anything that takes longer than ~ 150 seconds seems to hit the timeout
> error.
>
> The second question has to do with the possible conditions that trigger the
> prolonged query time. It seems that if I increase the filters beyond a
> certain number - it doesn't take much - the query times out.
>
> For example the query:
>
> select * from events where YEAR in (2012, 2013) works fine - however,
> select * from events where YEAR in (2012, 2013, 2014) fails with a timeout.
>
> To make it worse, I can't use the first query either  until I restart
> drill...
>

Re: Apache Drill 1.0 Web Console - Profiles issue

2017-05-25 Thread Abhishek Girish

Hey Federico,

Drill persists query profiles on the local disk of the foreman node.
Assuming you checked the Web UI only on one node, can you see if query
profiles are listed on the Web UI of other nodes? If yes, then it's
expected. You can configure your profiles dir to be on DFS [1], to get a
unified view.

If you are not able to see query profiles on the Web UI of any nodes, then
it's a different issue. On your Drill server nodes, can you cd into the log
directory and see the profiles directory and profiles inside it? If you
haven't configured anything, the log directory can be inside your Drill
installation dir or at /var/log/drill.

[1] https://drill.apache.org/docs/persistent-configuration-storage/

-Abhishek

On Thu, May 25, 2017 at 5:05 AM, Federico Santini <
federico.sant...@avanade.com> wrote:

> Hi all,
>
> Just a short question.
>
> We are currently scouting Apache Drill 1.10 capabilities and just finished
> to setup a 4 drillbit cluster MS Azure Linux VMs (+ 3 separate ZooKeeper
> nodes).
>
> Everything is working fine but the portal on 8047 has just stopped
> recording queries.
>
> I can see a query on the Profiles section while is executing, but as soon
> as it completes is no more present in the page list.
>
> I have done some googling on this with no luck.
>
> Please, can you point me to some relevant link or a list of things to
> check?
>
>
>
> Many thanks,
>
> --Federico
>
>
>
> Federico Santini Falorsi
>
> Group Manager / Analytics Solution Architect
>
> Avanade Inc.
>
> Via Panciatichi 17, Firenze, Italy
>
> www.avanade.com
>
>
>
>
>
> NOTICE: This communication may contain confidential and/or privileged
> information. Do not print, copy, forward, or otherwise use the information
> for any unintended purpose. Also, if you are not the intended recipient,
> immediately notify the sender that you have received this email in error,
> and delete the copy you received. Thank you.
>
>
>

Re: Can we Import the HDFS Query results to Any RDBMS using Apache Drill

2017-05-25 Thread Abhishek Girish

Hello Jagadeesh,

Drill supports writing to disk via the CTAS command [1]. So you can read
from RDBMS and write to HDFS. However, currently we don't support the other
way round.

Just curious, can you share what you are trying to achieve here?

[1] https://drill.apache.org/docs/create-table-as-ctas/

-Abhishek

On Thu, May 25, 2017 at 7:49 AM, Jagadeesh G  wrote:

> Hello,
>
> Hope you are doing Great!
>
> My name is Jagadeesh,
>
> I am new to the Aache Drill,I have a question that,
> I would like to know that can we load the HDFS/Hive query results to any
> RDBMS?
>
> If yes, Please share me the reference link.
>
> Your reply is highly appreciated.
>
> Thanks,
> Jagadeesh
>

Re: S3 configuration for ceph or atmos

2017-05-24 Thread Abhishek Girish

Hey thanks for sharing!

Regarding your degraded query performance, it's a known issue [1]. Please
add a comment to it, so that someone can verify this scenario when working
on it.

[1] DRILL-5089 <https://issues.apache.org/jira/browse/DRILL-5089>

On Wed, May 24, 2017 at 5:58 PM, Raz Baluchi <raz.balu...@gmail.com> wrote:

> I was able to connect to the endpoint by setting the property
> 'fs.s3a.endpoint' to the appropriate url  'https://storage.xxx.com:8181'
>
>
> I am now able to query the data in the bucket. However, as soon as I enable
> the S3 plugin - the response from Drill becomes extremely slow. This is
> true even if I am not querying the S3 bucket. As an example, just issuing a
> 'use' command takes forever:
>
> with the S3 plugin disabled:
>
> 0: jdbc:drill:zk=local> use cp;
>
> +---+-+
>
> |  ok   | summary |
>
> +---+-+
>
> | true  | Default schema changed to [cp]  |
>
> +---+-+
>
> 1 row selected (0.543 seconds)
>
>
> with the S3 plugin enabled:
>
>
> 0: jdbc:drill:zk=local> use cp;
>
> +---+-+
>
> |  ok   | summary |
>
> +---+-+
>
> | true  | Default schema changed to [cp]  |
>
> +---+-+
>
> 1 row selected (221.293 seconds)
>
>
> The S3 bucket configured in the plugin has approximately 20,000 objects. My
> assumption is that there is some sort of metadata scan that occurs anytime
> a command is executed? Any suggestions on how to improve performance?
>
>
> Thanks
>
>
>
>
> On Wed, May 24, 2017 at 3:14 PM, Abhishek Girish <agir...@apache.org>
> wrote:
>
> > I'm not sure if anyone has ever tried that. Connecting to S3 buckets
> (AWS)
> > works via the S3a library. You could file a enhancement request on JIRA
> > [1].
> >
> > If someone has any experience with it, they can share details on the
> JIRA,
> > or work on it. You are welcome to contribute yourself.
> >
> > [1] https://issues.apache.org/jira/browse/DRILL
> >
> > On Wed, May 24, 2017 at 12:01 PM, Raz Baluchi <raz.balu...@gmail.com>
> > wrote:
> >
> > > Where would I specify to use SSL since the endpoint is https?
> > >
> > > On Wed, May 24, 2017 at 1:13 PM, Gautam Parai <gpa...@mapr.com> wrote:
> > >
> > > > Hi Raz,
> > > >
> > > >
> > > > Please see here for an example https://drill.apache.org/docs/
> > > > s3-storage-plugin/
> > > >
> > > > Gautam
> > > >
> > > >
> > > > 
> > > > From: yousef.l...@gmail.com <yousef.l...@gmail.com> on behalf of Raz
> > > > Baluchi <raz.balu...@gmail.com>
> > > > Sent: Wednesday, May 24, 2017 7:03:12 AM
> > > > To: user@drill.apache.org
> > > > Subject: S3 configuration for ceph or atmos
> > > >
> > > > Is there a guide for configuring the S3 storage plugin for non AWS S3
> > > > storage?
> > > >
> > > > As and example, we have Ceph storage that is accessible via the S3
> API
> > at
> > > > an endpoint like: "https://storage.xxx.com:8181; and bucket:"xyz"
> > > >
> > > > How would I go about configuring the S3 storage plugin?
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Re: Writing to s3 using Drill

2017-05-24 Thread Abhishek Girish

Sorry, I was wrong - please ignore my previous message. Looks like we do
support writing to S3, but there were small differences necessary to make
this work:

First, I had to prefix the CTAS table name with the S3 plugin name. And
second, I had to either update the s3 storage plugin configuration to
include the default workspace and set writable to true, or create a
workspace with a path and set the writable option to true.

Example:

create table s3.abc.a_ctas as select * from s3.a

   "abc": {
  "location": "/a",
  "writable": true,
  "defaultInputFormat": null
}

OR

create table s3.a_ctas as select * from s3.a

"default": {
  "location": "/",
  "writable": true,
  "defaultInputFormat": null
}



On Wed, May 24, 2017 at 12:22 PM, Abhishek Girish <agir...@apache.org>
wrote:

> I don't think we support writing to Object stores such as S3. We do
> support reading from S3 buckets via the S3a library. However, we have
> limited support with the plugin. You could file a enhancement request on
> JIRA [1].
>
> If someone has any experience with it, they can share details on the JIRA, or
> work on it. You are welcome to contribute yourself.
>
> [1] https://issues.apache.org/jira/browse/DRILL
>
> On Mon, May 22, 2017 at 3:27 AM, Shuporno Choudhury <
> shuporno.choudh...@manthan.com> wrote:
>
>> Hi,
>>
>> Is it possible to write to a folder in an s3 bucket using the *s3.tmp*
>> workspace?
>> Whenever I try, it gives me the follwing error:
>>
>> *Error: VALIDATION ERROR: Schema [s3.tmp] is not valid with respect to
>> either root schema or current default schema.*
>> *Current default schema:  s3.root*
>>
>> Also, s3.tmp doesn't appear while using the command "*show schemas*"
>> though
>> the tmp workspace exists in the web console
>>
>> I am using Drill Version 1.10; embedded mode on my local system.
>>
>> However, I have no problem reading from an s3 bucket, the problem is only
>> writing to a s3 bucket.
>> --
>> Regards,
>> Shuporno Choudhury
>>
>
>

Re: Does s3 plugin support AWS S3 signature version 4 ?

2017-05-24 Thread Abhishek Girish

There hasn't been any updates to Drill's S3 support. Also, we are able to
query recently created buckets - so I'm guessing the specific version of
signature shouldn't matter, as we use Amazon's library (S3a) [1].

You could file a enhancement request on JIRA [2] to support custom clients
and authentication. If someone has any experience with it, they can share
details on the JIRA, or work on it. You are welcome to contribute yourself.

[1]
http://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html
[2] https://issues.apache.org/jira/browse/DRILL

On Fri, May 19, 2017 at 9:51 PM, Anup Tiwari 
wrote:

> Any updates on this?
> Since we have migrated to Aws Mumbai, we are not able to connect s3 and
> Drill.
>
> On 04-Apr-2017 11:02 PM, "Shankar Mane" 
> wrote:
>
> > Quick question here:
> >
> > Does s3 plugin support S3 signature version 4  ?
> >
> > FYI: s3 plugin works in case when region has support for both v2 and v4
> > signature. Whereas it seems problematic, for regions (eg. ap-south-1)
> which
> > only has v4 signature version support.
> >
> > regards,
> > shankar
> >
>

Re: Writing to s3 using Drill

2017-05-24 Thread Abhishek Girish

I don't think we support writing to Object stores such as S3. We do support
reading from S3 buckets via the S3a library. However, we have limited
support with the plugin. You could file a enhancement request on JIRA [1].

If someone has any experience with it, they can share details on the JIRA, or
work on it. You are welcome to contribute yourself.

[1] https://issues.apache.org/jira/browse/DRILL

On Mon, May 22, 2017 at 3:27 AM, Shuporno Choudhury <
shuporno.choudh...@manthan.com> wrote:

> Hi,
>
> Is it possible to write to a folder in an s3 bucket using the *s3.tmp*
> workspace?
> Whenever I try, it gives me the follwing error:
>
> *Error: VALIDATION ERROR: Schema [s3.tmp] is not valid with respect to
> either root schema or current default schema.*
> *Current default schema:  s3.root*
>
> Also, s3.tmp doesn't appear while using the command "*show schemas*" though
> the tmp workspace exists in the web console
>
> I am using Drill Version 1.10; embedded mode on my local system.
>
> However, I have no problem reading from an s3 bucket, the problem is only
> writing to a s3 bucket.
> --
> Regards,
> Shuporno Choudhury
>

Re: S3 configuration for ceph or atmos

2017-05-24 Thread Abhishek Girish

I'm not sure if anyone has ever tried that. Connecting to S3 buckets (AWS)
works via the S3a library. You could file a enhancement request on JIRA [1].

If someone has any experience with it, they can share details on the JIRA,
or work on it. You are welcome to contribute yourself.

[1] https://issues.apache.org/jira/browse/DRILL

On Wed, May 24, 2017 at 12:01 PM, Raz Baluchi  wrote:

> Where would I specify to use SSL since the endpoint is https?
>
> On Wed, May 24, 2017 at 1:13 PM, Gautam Parai  wrote:
>
> > Hi Raz,
> >
> >
> > Please see here for an example https://drill.apache.org/docs/
> > s3-storage-plugin/
> >
> > Gautam
> >
> >
> > 
> > From: yousef.l...@gmail.com  on behalf of Raz
> > Baluchi 
> > Sent: Wednesday, May 24, 2017 7:03:12 AM
> > To: user@drill.apache.org
> > Subject: S3 configuration for ceph or atmos
> >
> > Is there a guide for configuring the S3 storage plugin for non AWS S3
> > storage?
> >
> > As and example, we have Ceph storage that is accessible via the S3 API at
> > an endpoint like: "https://storage.xxx.com:8181; and bucket:"xyz"
> >
> > How would I go about configuring the S3 storage plugin?
> >
> > Thanks
> >
>

Re: querying from multiple directories in S3

2017-05-09 Thread Abhishek Girish

Can you share more details of how the data is structured within the S3
bucket, using some examples? Also some representative queries of what you
are currently doing and what you hope was possible to do instead? I'm not
clear on what your question is.

The drill special attributes - filename, dir0, dir1, ... does work for data
within S3 storage plugin.

On Tue, May 9, 2017 at 7:06 AM, Wesley Chow  wrote:

> What is the recommended way to issue a query against a large number of
> tables in S3? At the moment I'm aliasing the table as a giant UNION ALL,
> but is there a better way to do this?
>
> Our data is stored as a time hierarchy, like /MM/DD/HH/MM in UTC, but
> unfortunately I can't simply run the query recursively on an entire day of
> data. I usually need a day of data in a non-UTC time zone. Is there some
> elegant way to grab that data using the dir0, dir1 magic columns?
>
> Thanks,
> Wes
>

Re: Drill Cluster without HDFS/MapR-FS?

2017-05-09 Thread Abhishek Girish

Do you wish to use Drill in distributed mode with each node having it's own
local file system or do you plan to use it with a different data source
which is also a distributed file system (but not HDFS / MapR-FS)?

If the former, yes you should be able to form a Drill cluster by bringing
up Drillbits in standalone mode on multiple disjoint nodes. You will still
need ZooKeeper for cluster coordination. But understand that since each
node can only talk to files on it's local file system, the Drill cluster
will not have a unified view and access of the files for distributed
processing. Your queries may fail, as a Drillbit might fail to access data.
To experiment, you can make sure the directories and files you need to
query are identical on each node. However, this is untested and I'm not
sure if it will indeed work.

If it's the latter, can you share what data source you have in mind?

On Mon, May 8, 2017 at 11:41 AM, Matt  wrote:

> I have seen some posts in the past about Drill nodes mounted "close to the
> data", and am wondering if its possible to use Drill as a cluster without
> HDFS?
>
> Using ZK would not be an issue in itself, and there are apparently options
> like https://github.com/mhausenblas/dromedar
>
> Any experiences with this?
>

Re: Connecting Apache Drill to kudu

2017-04-23 Thread Abhishek Girish

Cool, thanks! This should be helpful.



On Sun, Apr 23, 2017 at 7:06 PM Jean-Christophe Clavier <jcclav...@free.fr>
wrote:

> Sorry for the late reply, i was away...
>
> To give some details.
>
> I used a docker image that embeded an old version of drill. The kudu
> driver was not in.
>
> So be careful to check the version number when using a docker image :-)
>
> I finally got it with mkieboom/apache-drill-docker.
>
> Th kudu button appears in the storage tab. I updated it with
>
> {
>   "type": "kudu",
>   "masterAddresses": "172.17.0.2:7051",
>   "enabled": true
> }
>
> as you said and it worked properly.
>
> Thanks.
>
> JC
>
>
> Le 15/04/2017 à 08:43, Abhishek Girish a écrit :
> > Could you please share your kudu storage plugin config, and any other
> > additional configurations you made, so it could help others?
> >
> > On Fri, Apr 14, 2017 at 5:35 AM, Jean-Christophe Clavier <
> jcclav...@free.fr>
> > wrote:
> >
> >> Thank you,
> >>
> >> I finally got it work.
> >>
> >> My problem came from i was using a docker image for drill and it was too
> >> old.
> >>
> >> Taking a new one made it work (at least, i can connect :-) )
> >>
> >> Thanks
> >>
> >>
> >> Le 14/04/2017 à 00:04, Abhishek Girish a écrit :
> >>> There has been no activity on Kudu, for a while now [1]. And I couldn't
> >>> find any documentation either. So I'm guessing it's still experimental.
> >>> Venki or someone from Dremio should be able to help.
> >>>
> >>> [1] https://github.com/apache/drill/tree/master/contrib/storage-kudu
> >>>
> >>> On Mon, Apr 10, 2017 at 11:52 PM, <big.b...@free.fr> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Is there a way to connect Apache Drill to Kudu ?
> >>>>
> >>>> I have seen drill 1.5 added an experimental support for kudu and a
> >>>> drill-storage-kudu on github but I can't figure out how to make it
> >> work...
> >>>> Is this now less experimental ?
> >>>>
> >>>> Thanks
> >>>>
> >>
>
>
>

Re: changing the drill AWS/S3 credentials providers

2017-04-18 Thread Abhishek Girish

Hey Michael,

Have you copied over the core-site.xml file onto Drill's conf directory?
You could also set the credentials directly in the s3 storage plugin [1].

[1] https://drill.apache.org/docs/s3-storage-plugin/

-Abhishek

On Tue, Apr 18, 2017 at 4:17 PM Knapp, Michael 
wrote:

> Drill Developers,
>
> I have been struggling to change the aws credentials when running drill.
> I am using session tokens in a local profile.  I also run the app using IAM
> roles, but right now I just want it to work locally.  This is in my
> core-site.xml:
>
>
> 
> fs.s3a.aws.credentials.provider
>
> com.amazonaws.auth.profile.ProfileCredentialsProvider,org.apache.hadoop.fs.s3a.SharedInstanceProfileCredentialsProvider,com.amazonaws.auth.EnvironmentVariableCredentialsProvider
> 
> 
> fs.s3a.security.credential.provider.path
> com.amazonaws.auth.profile.ProfileCredentialsProvider,org.apache.hadoop.fs.s3a.SharedInstanceProfileCredentialsProvider,com.amazonaws.auth.EnvironmentVariableCredentialsProvider
> 
> 
> hadoop.security.credential.provider.path
> com.amazonaws.auth.profile.ProfileCredentialsProvider,org.apache.hadoop.fs.s3a.SharedInstanceProfileCredentialsProvider,com.amazonaws.auth.EnvironmentVariableCredentialsProvider
> 
>
>
> but unfortunately when I run the application (with several loggers set to
> trace) I still see this:
> 2017-04-18 23:07:31,725 [270963cb-cb2a-aa04-3ad1-1a92384a31f5:foreman]
> TRACE o.a.d.exec.util.ImpersonationUtil - Creating DrillFileSystem for
> proxy user: drill (auth:SIMPLE)
> 2017-04-18 23:07:31,986 [270963cb-cb2a-aa04-3ad1-1a92384a31f5:foreman]
> DEBUG c.a.auth.AWSCredentialsProviderChain - Unable to load credentials
> from BasicAWSCredentialsProvider: Access key or secret key is null
> 2017-04-18 23:07:34,003 [270963cb-cb2a-aa04-3ad1-1a92384a31f5:foreman]
> DEBUG c.a.auth.AWSCredentialsProviderChain - Unable to load credentials
> from InstanceProfileCredentialsProvider: Unable to load credentials from
> Amazon EC2 metadata service
> 2017-04-18 23:07:34,030 [270963cb-cb2a-aa04-3ad1-1a92384a31f5:foreman]
> DEBUG o.a.drill.exec.work.foreman.Foreman -
> 270963cb-cb2a-aa04-3ad1-1a92384a31f5: State change requested STARTING -->
> FAILED
> org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception
> during fragment initialization: Unable to load AWS credentials from any
> provider in the chain
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:298)
> [drill-java-exec-1.10.0.jar:1.10.0]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_101]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> Caused by: com.amazonaws.AmazonClientException: Unable to load AWS
> credentials from any provider in the chain
>
>
> The application is only checking BasicAWDCredentialsProvider and
> InstanceProfileCredentialsProvider.  Neither of those will work for me.  I
> have been searching the source code for a while now and none of the
> properties I try to set seem to actually work.
>
> Would somebody please tell me how to configure the credentials provider
> chain in drill?
>
> Michael Knapp
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

1 2 3 >

1 - 100 of 237 matches

Mail list logo