Re: [DISCUSSION] Replacing Preconditions.checkNotNull() with Objects.requireNonNull()

2018-08-22 Thread Arina Yelchiyeva
Obviously, I would intend to use Java API methods in first place since they
do not introduce extra dependencies. I don't mind the replacements to
Preconditions for consistency (though it's up the PR author in the scope of
which Jira that should be done).

Kind regards,
Arina

On Wed, Aug 22, 2018 at 4:56 PM Vlad Rozov  wrote:

> Please elaborate on you preference regarding Objects.requireNonNull()? As
> currently there are only a few instances of requireNonNull() calls in
> Drill, why not to fix it now and avoid inconsistency in the future?
>
> Thank you,
>
> Vlad
>
> > On Aug 22, 2018, at 03:02, Arina Yelchiyeva 
> wrote:
> >
> > I don't feel like banning Objects.checkNotNull() right now though even
> > Guava suggests not using this method. I suggest we leave it as is in the
> > scope of current PR#1397 (revert replacements where they were done) and
> > discuss further approach on how we should treat checks at runtime.
> >
> > Kind regards,
> > Arina
> >
> > On Wed, Aug 22, 2018 at 5:32 AM Vlad Rozov  vro...@apache.org>> wrote:
> >
> >> My comments inline.
> >>
> >> Thank you,
> >>
> >> Vlad
> >>
> >>
> >>> On Aug 21, 2018, at 17:05, Paul Rogers 
> >> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> My two cents...
> >>>
> >>> The gist of the discussion is that 1) using Objects.checkNotNull()
> >> reduces the Guava import footprint, vs. 2) we are not removing the Guava
> >> dependency, so switching to Objects.checkNotNull() is unnecessary
> >> technically and is instead a personal preference.
> >>
> >> The gist of the discussion in the PR and on the mailing list is whether
> or
> >> not to use a functionality(methods) provided by a library (in this case
> >> Guava) that is(are) also available in JDK or it(they) needs to be
> >> pro-actively replaced by the functionality(methods) provided by the
> JDK. My
> >> take is that it will be justified only in case the entire dependency on
> the
> >> library can be eliminated or when the library author deprecated the
> >> functionality(methods) in use. It is not the case for Guava library and
> >> Preconditions class it provides.
> >>
> >> Guava explicitly recommends to avoid using Objects.checkNotNull()
> method,
> >> so I suggested to prohibit it’s usage as a personal preference.
> >>
> >>>
> >>> We make heavy use of the unique Guava "loading cache". We also use
> other
> >> Guava preconditions not available in Objects. So deprecation of Guava is
> >> unlikely anytime soon. (Though, doing so would be a GOOD THING, that
> >> library causes no end of grief when importing other libraries due to
> >> Guava's habit of removing features.)
> >>
> >> There is a separate PR that takes care of the “grief when importing
> other
> >> libraries" that also depend on Guava caused by “Guava’s habit of
> removing
> >> features”. Additionally, Guava is mostly source compatible across
> version
> >> since version 21.0 (see
> >> https://github.com/google/guava/blob/master/README.md <
> >> https://github.com/google/guava/blob/master/README.md <
> https://github.com/google/guava/blob/master/README.md>>), so I highly
> >> doubt that dependency on Guava will ever go away.
> >>
> >>>
> >>> Given that Guava is not going away, I tend to agree with the suggestion
> >> there is no need to do the null check replacement now. It can always be
> >> done later if/when needed.
> >>>
> >>> If we were to want to make the change, I'd suggest we debate
> >> preconditions vs. assert. Drill is now stable, I can't recall a time
> when I
> >> ever saw a precondition failure in a log file. But, users pay the
> runtime
> >> cost to execute them zillions of times. At this level of maturity, I'd
> >> suggest we use asserts, which are ignored by the runtime in "non-debug"
> >> runs, but which will still catch failures when we run tests.
> >>
> >> Actually, asserts are *not* ignored by the runtime in “non-debug” runs,
> >> they may be optimized away by the hotspot compiler. Additionally, I
> will be
> >> really surprised to see that replacing preconditions with assert will
> save
> >> more time in all customer runs compared to how long it will take to
> discuss
> >> the change, make and merge it.
> >>
> >>>
> >&g

Re: [DISCUSSION] Replacing Preconditions.checkNotNull() with Objects.requireNonNull()

2018-08-22 Thread Arina Yelchiyeva
I don't feel like banning Objects.checkNotNull() right now though even
Guava suggests not using this method. I suggest we leave it as is in the
scope of current PR#1397 (revert replacements where they were done) and
discuss further approach on how we should treat checks at runtime.

Kind regards,
Arina

On Wed, Aug 22, 2018 at 5:32 AM Vlad Rozov  wrote:

> My comments inline.
>
> Thank you,
>
> Vlad
>
>
> > On Aug 21, 2018, at 17:05, Paul Rogers 
> wrote:
> >
> > Hi All,
> >
> > My two cents...
> >
> > The gist of the discussion is that 1) using Objects.checkNotNull()
> reduces the Guava import footprint, vs. 2) we are not removing the Guava
> dependency, so switching to Objects.checkNotNull() is unnecessary
> technically and is instead a personal preference.
>
> The gist of the discussion in the PR and on the mailing list is whether or
> not to use a functionality(methods) provided by a library (in this case
> Guava) that is(are) also available in JDK or it(they) needs to be
> pro-actively replaced by the functionality(methods) provided by the JDK. My
> take is that it will be justified only in case the entire dependency on the
> library can be eliminated or when the library author deprecated the
> functionality(methods) in use. It is not the case for Guava library and
> Preconditions class it provides.
>
> Guava explicitly recommends to avoid using Objects.checkNotNull() method,
> so I suggested to prohibit it’s usage as a personal preference.
>
> >
> > We make heavy use of the unique Guava "loading cache". We also use other
> Guava preconditions not available in Objects. So deprecation of Guava is
> unlikely anytime soon. (Though, doing so would be a GOOD THING, that
> library causes no end of grief when importing other libraries due to
> Guava's habit of removing features.)
>
> There is a separate PR that takes care of the “grief when importing other
> libraries" that also depend on Guava caused by “Guava’s habit of removing
> features”. Additionally, Guava is mostly source compatible across version
> since version 21.0 (see
> https://github.com/google/guava/blob/master/README.md <
> https://github.com/google/guava/blob/master/README.md>), so I highly
> doubt that dependency on Guava will ever go away.
>
> >
> > Given that Guava is not going away, I tend to agree with the suggestion
> there is no need to do the null check replacement now. It can always be
> done later if/when needed.
> >
> > If we were to want to make the change, I'd suggest we debate
> preconditions vs. assert. Drill is now stable, I can't recall a time when I
> ever saw a precondition failure in a log file. But, users pay the runtime
> cost to execute them zillions of times. At this level of maturity, I'd
> suggest we use asserts, which are ignored by the runtime in "non-debug"
> runs, but which will still catch failures when we run tests.
>
> Actually, asserts are *not* ignored by the runtime in “non-debug” runs,
> they may be optimized away by the hotspot compiler. Additionally, I will be
> really surprised to see that replacing preconditions with assert will save
> more time in all customer runs compared to how long it will take to discuss
> the change, make and merge it.
>
> >
> > Yes, we could argue that the JVM will optimize away the call. But, we do
> have code like this, which can't be optimized away:
> >
> >
> > Preconditions.checkArgument(numSlots <= regionsToScan.size(),
> >
> > String.format("Incoming endpoints %d is greater than number of
> scan regions %d", numSlots, regionsToScan.size()));
>
> This is a bad example of using Preconditions. It needs to be changed to
>
> Preconditions.checkArgument(numSlots <= regionsToScan.size(),
> "Incoming endpoints %s is greater than number of scan regions %s”,
> numSlots, regionsToScan.size());
>
> that will be inlined by the hotspot compiler.
>
> >
> >
> > So, my suggestion: leave preconditions for now. At some point, open the
> assertions vs. preconditions debate.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >On Tuesday, August 21, 2018, 1:46:10 PM PDT, Vlad Rozov <
> vro...@apache.org> wrote:
> >
> > I am -1 on the first proposal, -0 on the second and +1 for using
> Preconditions.checkNotNull() with Objects.requireNonNull() be banned.
> Please see [1], [2] (quoted) and [3]:
> >
> >> Projects which use com.google.common should generally avoid the use of
> Objects.requireNonNull(Object) <
> https://docs.oracle.com/javase/9/docs/api/java/util/Objects.html?is-external=true#requireNonNull-T->.
> Instead, use whichever of checkNotNull(Object) <
> https://google.github.io/guava/releases/snapshot/api/docs/com/google/common/base/Preconditions.html#checkNotNull-T->
> or Verify.verifyNotNull(Object) <
> https://google.github.io/guava/releases/snapshot/api/docs/com/google/common/base/Verify.html#verifyNotNull-T->
> is appropriate to the situation. (The same goes for the message-accepting
> overloads.)
> >
> > Thank you,
> >
> > Vlad
> >
> > [1]
> 

Re: [DISCUSS] 1.14.0 release

2018-07-13 Thread Arina Yelchiyeva
Two more regressions:
https://issues.apache.org/jira/browse/DRILL-6603
https://issues.apache.org/jira/browse/DRILL-6605

Kind regards,
Arina

On Fri, Jul 13, 2018 at 11:25 PM Sorabh Hamirwasia 
wrote:

> Hi Boaz,
> Couple of updates.
>
> *Merged In:*
> DRILL-6542: (May be Ready2Commit soon) IndexOutOfBounds exception for
> multilevel lateral ((Sorabh / Parth))
>
> *In Review:*
>
>
> *DRILL-6475: Query with UNNEST causes a Null Pointer .  (( Hanumath ))*
> Thanks,
> Sorabh
>
> On Fri, Jul 13, 2018 at 1:17 PM, Parth Chandra  wrote:
>
> > Our (unwritten) rule has been that a commit cannot even go in unless unit
> > _and_ regression tests pass.
> > Releases are stricter, all tests, longevity tests, UI, are required to
> > pass. In addition, any performance regression needs to be discussed.
> >
> > So far we have not made any exceptions, but that is not to say we cannot.
> >
> > On Fri, Jul 13, 2018 at 1:03 PM, Vlad Rozov  wrote:
> >
> > > My 2 cents:
> > >
> > > From Apache point of view it is OK to do a release even if unit tests
> do
> > > not pass at all or there is a large number of regression introduced.
> > Apache
> > > release is a source release and as long as it compiles and does not
> have
> > > license issues, it is up to community (PMC) to decide on any other
> > criteria
> > > for a release.
> > >
> > > The issue in DRILL-6453 is not limited to a large number of hash joins.
> > It
> > > should be possible to reproduce it even with a single hash join as long
> > as
> > > left and right sides are getting batches from one(many) to many
> exchanges
> > > (broadcast or hash partitioner senders).
> > >
> > > Thank you,
> > >
> > > Vlad
> > >
> > >
> > > On 7/13/18 08:41, Aman Sinha wrote:
> > >
> > >> I would say we have to take a measured approach to this and decide on
> a
> > >> case-by-case which issue is a show stopper.
> > >> While of course we have to make every effort to avoid regression, we
> > >> cannot
> > >> claim that a particular release will not cause any regression.
> > >> I believe there are 1+ passing tests,  so that should provide a
> > level
> > >> of confidence.   The TPC-DS 72 is a 10 table join which in the hadoop
> > >> world
> > >> of
> > >> denormalized schemas is not relatively common.  The main question is
> > does
> > >> the issue reproduce with fewer joins having the same type of
> > distribution
> > >> plan ?
> > >>
> > >>
> > >> Aman
> > >>
> > >> On Fri, Jul 13, 2018 at 7:36 AM Arina Yelchiyeva <
> > >> arina.yelchiy...@gmail.com>
> > >> wrote:
> > >>
> > >> We cannot release with existing regressions, especially taking into
> > >>> account
> > >>> the there are not minor issues.
> > >>> As far as I understand reverting is not an option since hash join
> spill
> > >>> feature are extended into several commits + subsequent fixes.
> > >>> I guess we need to consider postponing the release until issues are
> > >>> resolved.
> > >>>
> > >>> Kind regards,
> > >>> Arina
> > >>>
> > >>> On Fri, Jul 13, 2018 at 5:14 PM Boaz Ben-Zvi 
> wrote:
> > >>>
> > >>> (Guessing ...) It is possible that the root cause for DRILL-6606 is
> > >>>> similar to that in  DRILL-6453 -- that is the new "early sniffing"
> in
> > >>>> the
> > >>>> Hash-Join, which repeatedly invokes next() on the two "children" of
> > the
> > >>>> join *during schema discovery* until non-empty data is returned (or
> > >>>> NONE,
> > >>>> STOP, etc).  Last night Salim, Vlad and I briefly discussed
> > >>>> alternatives,
> > >>>> like postponing the "sniffing" to a later time (beginning of the
> build
> > >>>>
> > >>> for
> > >>>
> > >>>> the right child, and beginning of the probe for the left child).
> > >>>>
> > >>>> However this would require some work time. So what should we do
> about
> > >>>>
> > >>> 1.14
> > >>>
> > >>>> ?
> > >>>>
> > >>>>Thanks,
> > >>>>
> > >>>>Boaz
> > >>>>
> > >>>> On Fri, Jul 13, 2018 at 3:46 AM, Arina Yelchiyeva <
> > >>>> arina.yelchiy...@gmail.com> wrote:
> > >>>>
> > >>>> During implementing late limit 0 optimization, Bohdan has found one
> > more
> > >>>>> regression after Hash Join spill to disk.
> > >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> > apache.org_jira_browse_DRILL-2D6606=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
> > gRpEl0WzXE3EMrwj0KFbZXGXRyadOthF2jlYxvhTlQg=TGqnVoxNweQMiHTgP4J-
> > rOnguFThVKShnQqHE_CmySI=aHUGrClgE_9UsRpRlNM95TbW91ivkqGdF1hV0EDc3xU=
> > >>>>> <
> > >>>>>
> > >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> > >>> apache.org_jira_browse_DRILL-2D6606=DwMFaQ=cskdkSMqhcnjZ
> > >>> xdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=OHnyHeZpNk3hcwkG-JoQG6E
> > >>> 90tKdoS47J1rv5x-hJzw=wm5zpJf9K2zYzrqRB1LqLpKcvmBK5y6XC0ZUqVmSjko=
> > >>>
> > >>>> Boaz please take a look.
> > >>>>>
> > >>>>> Kind regards,
> > >>>>> Arina
> > >>>>>
> > >>>>>
> > >>>>
> > >
> >
>


Re: [DISCUSS] 1.14.0 release

2018-07-10 Thread Arina Yelchiyeva
Hi Boaz,

as far as I understand you either move this feature to 1.15 or wait with
building RC until it's finished. Adding changes later means, you'll have to
build one more RC candidate and conduct one more vote.
I think making people voting twice instead of once, it's not a good thing :)


Kind regards,
Arina

On Tue, Jul 10, 2018 at 8:07 AM Boaz Ben-Zvi  wrote:

>   Hi Charles,
>
>  The main reason for rushing a Release Candidate is that we could
> give it enough testing.
>
> Given that DRILL-6104 is a separate feature, with almost no impact on
> the current code, then it seems low risk to add it a few days later.
>
>Anyone has an objection ?
>
>   Boaz
>
> On 7/9/18 9:54 PM, Charles Givre wrote:
> > Hi Boaz,
> > I’m traveling at the moment, but I can have DRILL-6104 back in Paul’s
> hands by the end of the week.
> > —C
> >
> >> On Jul 10, 2018, at 00:53, Boaz Ben-Zvi  wrote:
> >>
> >>We are making progress towards 1.14.
> >>
> >> Let's aim for a Release Candidate branch off on  Thursday (July 12)  !!!
> >>
> >> Below are the unfinished cases; can most be completed and checked in by
> 7/12 ?
> >>
> >> (( Relevant people:
> >>
> >>  Abhishek, Arina, Boaz, Charles, Hanumath, Jean-Blas, Karthik,
> Kunal,
> >>
> >>  Parth, Paul, Salim, Sorabh, Tim, Vitalii, Vlad, Volodymyr ))
> >>
> >> ==
> >>
> >> Open/blocker - DRILL-6453 + DRILL-6517:
> >>Two issues - Parquet Scanner (?) not setting container's record num
> (to zero), and a hang following this failure.
> >>Currently testing a fix / workaround ((Boaz))
> >>
> >> In Progress - DRILL-6104: Generic Logfile Format Plugin  ((Charles +
> Paul -- can you be done by 7/12 ?))
> >>
> >> PR - DRILL-5999 (DRILL-6516): Support for EMIT outcome in Streaming Agg
> ((Parth + Boaz reviewing))
> >>
> >> Open - DRILL-6542: Index out of bounds ((Sorabh))
> >>
> >> Open - DRILL-6475: Unnest Null fieldId pointer ((Hanumath))
> >>
> >>  The following PRs are still waiting for reviews  
> >>
> >> DRILL-6583: UI usability issue ((Kunal / Sorabh))
> >>
> >> DRILL-6579: Add sanity checks to the Parquet Reader ((Salim / Vlad +
> Boaz))
> >>
> >> DRILL-6578: handle query cancellation in Parquet Reader ((Salim / Vlad
> + Boaz))
> >>
> >> DRILL-6560: Allow options for controlling the batch size per operator
> ((Salim / Karthik))
> >>
> >> DRILL-6559: Travis timing out ((Vitalii / Tim))
> >>
> >> DRILL-6496: VectorUtil.showVectorAccessibleContent does not log vector
> content ((Tim / Volodymyr))
> >>
> >> DRILL-6410: Memory Leak in Parquet Reader during cancellation ((Vlad /
> Parth))
> >>
> >> DRILL-6346: Create an Official Drill Docker Container ((Abhishek / Tim))
> >>
> >> DRILL-6179: Added pcapng-format support ((Vlad / Paul))
> >>
> >> DRILL-5796: Filter pruning for multi rowgroup parquet file ((Jean-Blas
> / Arina))
> >>
> >> DRILL-5365: FileNotFoundException when reading a parquet file ((Tim /
> Vitalii))
> >>
> >> ==
> >>
> >>Thanks,
> >>
> >>   Boaz
> >>
> >> On 7/6/18 2:51 PM, Pritesh Maker wrote:
> >>> Here is the release 1.14 dashboard (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_Dashboard.jspa-3FselectPageId-3D12332463=DwIGaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=V0b4-BeuIMwRczzbiSXCgL7Z7f8lrmLBGH1vnSHLjB4=HRQU6Q4umbONtN4EqY3ryggJNEyCOghAzICypRJOels=
> ) and agile board (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_RapidBoard.jspa-3FrapidView-3D185=DwIGaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=V0b4-BeuIMwRczzbiSXCgL7Z7f8lrmLBGH1vnSHLjB4=GKxSMl97YRnHJu-AL-A-vvRe5SXqw7vdDPDzMzj-Cj4=
> )
> >>>
> >>> I believe Volodymyr is targeting DRILL-6422 (Guava update) for 1.15
> release so it shouldn't be blocking the release. So overall, we have 2 open
> bugs, 2 in progress bugs (+2 doc issues), and 12 in review (+1 ready to
> commit).
> >>>
> >>> If the reviewable commits won't be ready soon, can the developers
> please remove the 1.14 fix version for these issues.
> >>>
> >>> Pritesh
> >>>
> >>>
> >>>
> >>>
> >>> On 7/6/18, 11:54 AM, "Boaz Ben-Zvi"  b...@mapr.com> wrote:
> >>>
> >>>Current status: There's a blocker, and some work in progress
> that will
> >>>  stretch into next week.
> >>>   Current detail:
> >>>   ==
> >>>   Open/blocker - DRILL-6453 + DRILL-6517: Two issues - Parquet
> Scanner not setting record num (to zero), and a hang following this failure.
> >>>   In Progress - DRILL-6104: Generic Logfile Format Plugin
> >>>   PR - DRILL-6422: Update Guava to 23.0 and shade it
> >>>   PR - DRILL-5999 (DRILL-6516): Support for EMIT outcome in
> Streaming Agg (I'm reviewing)
> >>>Ready2Commit: DRILL-6519: Add String Distance and
> Phonetic Functions (Arina gave it a +1 ; is it "Ready-To-Commit" or waiting
> for more reviews ?)
> >>>Committed: DRILL-6570: Mentioned 

Re: [DISCUSS] 1.14.0 release

2018-07-13 Thread Arina Yelchiyeva
During implementing late limit 0 optimization, Bohdan has found one more
regression after Hash Join spill to disk.
https://issues.apache.org/jira/browse/DRILL-6606
Boaz please take a look.

Kind regards,
Arina

On Fri, Jul 13, 2018 at 4:34 AM Boaz Ben-Zvi  wrote:

>We are getting close to a Release Candidate, though some issues are
> still pending, and we need to make decisions soon.
>
> Soliciting opinions -- which of the following issues should be
> considered a RELEASE BLOCKER for 1.14:
>
> = OPEN ==
>
> OPEN - DRILL-6453 : TPCDS query 72 is Hanging (on a cluster)   (( Boaz,
> Salim ))
>
>  We still do not have a lead on the cause, nor a work around to make
> this query run.
>
> OPEN - DRILL-6475: Query with UNNEST causes a Null Pointer .  (( Hanumath
> ))
>
> OPEN - DRILL-5495: convert_from causes ArrayIndexOutOfBounds exception.
> (( Vitalii ))
>
>  In Review ===
>
> DRILL-6589: Push Transitive Closure generated predicates past aggregates
> / projects ((Gautam / Vitalii))
>
> DRILL-6588: System table columns incorrectly marked as non-nullable
> ((Kunal / Aman))
>
> DRILL-6542: (May be Ready2Commit soon) IndexOutOfBounds exception for
> multilevel lateral ((Sorabh / Parth))
>
> DRILL-6517: (May be Ready2Commit soon) IllegalState exception in
> Hash-Join ((Boaz / Padma, Tim))
>
> DRILL-6496: VectorUtil.showVectorAccessibleContent does not log vector
> content ((Tim / Volodymyr))
>
> DRILL-6410: Memory Leak in Parquet Reader during cancellation ((Vlad /
> Parth))
>
> DRILL-6179: Added pcapng-format support ((Vlad / Paul))
>
> DRILL-5796: Filter pruning for multi rowgroup parquet file ((Jean-Blas /
> Arina))
>
> DRILL-5365: FileNotFoundException when reading a parquet file ((Tim /
> Vitalii))
>
> ==
>
>  Thanks,
>
>   -- Boaz
> p.s.
> There's a batch commit in process now with some of the PRs listed in
> the prior email.
>
> On 7/9/18 9:53 PM, Boaz Ben-Zvi wrote:
> >   We are making progress towards 1.14.
> >
> > Let's aim for a Release Candidate branch off on  Thursday (July 12)  !!!
> >
> > Below are the unfinished cases; can most be completed and checked in
> > by 7/12 ?
> >
> > (( Relevant people:
> >
> > Abhishek, Arina, Boaz, Charles, Hanumath, Jean-Blas, Karthik, Kunal,
> >
> > Parth, Paul, Salim, Sorabh, Tim, Vitalii, Vlad, Volodymyr ))
> >
> > ==
> >
> > Open/blocker - DRILL-6453 + DRILL-6517:
> >Two issues - Parquet Scanner (?) not setting container's record num
> > (to zero), and a hang following this failure.
> >Currently testing a fix / workaround ((Boaz))
> >
> > In Progress - DRILL-6104: Generic Logfile Format Plugin  ((Charles +
> > Paul -- can you be done by 7/12 ?))
> >
> > PR - DRILL-5999 (DRILL-6516): Support for EMIT outcome in Streaming
> > Agg ((Parth + Boaz reviewing))
> >
> > Open - DRILL-6542: Index out of bounds ((Sorabh))
> >
> > Open - DRILL-6475: Unnest Null fieldId pointer ((Hanumath))
> >
> >  The following PRs are still waiting for reviews  
> >
> > DRILL-6583: UI usability issue ((Kunal / Sorabh))
> >
> > DRILL-6579: Add sanity checks to the Parquet Reader ((Salim / Vlad +
> > Boaz))
> >
> > DRILL-6578: handle query cancellation in Parquet Reader ((Salim / Vlad
> > + Boaz))
> >
> > DRILL-6560: Allow options for controlling the batch size per operator
> > ((Salim / Karthik))
> >
> > DRILL-6559: Travis timing out ((Vitalii / Tim))
> >
> > DRILL-6496: VectorUtil.showVectorAccessibleContent does not log vector
> > content ((Tim / Volodymyr))
> >
> > DRILL-6410: Memory Leak in Parquet Reader during cancellation ((Vlad /
> > Parth))
> >
> > DRILL-6346: Create an Official Drill Docker Container ((Abhishek / Tim))
> >
> > DRILL-6179: Added pcapng-format support ((Vlad / Paul))
> >
> > DRILL-5796: Filter pruning for multi rowgroup parquet file ((Jean-Blas
> > / Arina))
> >
> > DRILL-5365: FileNotFoundException when reading a parquet file ((Tim /
> > Vitalii))
> >
> > ==
> >
> >Thanks,
> >
> >   Boaz
> >
> > On 7/6/18 2:51 PM, Pritesh Maker wrote:
> >> Here is the release 1.14 dashboard
> >> (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_Dashboard.jspa-3FselectPageId-3D12332463=DwIGaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=V0b4-BeuIMwRczzbiSXCgL7Z7f8lrmLBGH1vnSHLjB4=HRQU6Q4umbONtN4EqY3ryggJNEyCOghAzICypRJOels=
> >> ) and agile board
> >> (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_RapidBoard.jspa-3FrapidView-3D185=DwIGaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=V0b4-BeuIMwRczzbiSXCgL7Z7f8lrmLBGH1vnSHLjB4=GKxSMl97YRnHJu-AL-A-vvRe5SXqw7vdDPDzMzj-Cj4=
> )
> >>
> >> I believe Volodymyr is targeting DRILL-6422 (Guava update) for 1.15
> >> release so it shouldn't be blocking the release. So overall, we have
> >> 2 open bugs, 2 in progress bugs (+2 doc issues), and 12 in review (+1
> >> ready to commit).
> >>
> >> If the reviewable commits won't be 

Re: [DISCUSS] 1.14.0 release

2018-07-13 Thread Arina Yelchiyeva
We cannot release with existing regressions, especially taking into account
the there are not minor issues.
As far as I understand reverting is not an option since hash join spill
feature are extended into several commits + subsequent fixes.
I guess we need to consider postponing the release until issues are
resolved.

Kind regards,
Arina

On Fri, Jul 13, 2018 at 5:14 PM Boaz Ben-Zvi  wrote:

> (Guessing ...) It is possible that the root cause for DRILL-6606 is
> similar to that in  DRILL-6453 -- that is the new "early sniffing" in the
> Hash-Join, which repeatedly invokes next() on the two "children" of the
> join *during schema discovery* until non-empty data is returned (or NONE,
> STOP, etc).  Last night Salim, Vlad and I briefly discussed alternatives,
> like postponing the "sniffing" to a later time (beginning of the build for
> the right child, and beginning of the probe for the left child).
>
> However this would require some work time. So what should we do about 1.14
> ?
>
>   Thanks,
>
>   Boaz
>
> On Fri, Jul 13, 2018 at 3:46 AM, Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
>
>> During implementing late limit 0 optimization, Bohdan has found one more
>> regression after Hash Join spill to disk.
>> https://issues.apache.org/jira/browse/DRILL-6606
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6606=DwMFaQ=cskdkSMqhcnjZxdQVpwTXg=7lXQnf0aC8VQ0iMXwVgNHw=OHnyHeZpNk3hcwkG-JoQG6E90tKdoS47J1rv5x-hJzw=wm5zpJf9K2zYzrqRB1LqLpKcvmBK5y6XC0ZUqVmSjko=>
>> Boaz please take a look.
>>
>> Kind regards,
>> Arina
>>
>
>


Re: DRILL-6104 Question

2018-07-12 Thread Arina Yelchiyeva
Hi Charles,

looked at your PR, it still needs many things to address. Merging code
without addressing those issues, might be not the best idea.

Kind regards,
Arina

On Thu, Jul 12, 2018 at 5:37 AM Charles Givre  wrote:

> Hi Paul,
> Regarding the regex/log reader for Drill, since there are a lot of ways
> that people could actually use this feature, I wanted to ask if we might
> get this into 1.14 as an experimental or alpha feature.   I’ll keep working
> on this for 1.15 as well as a syslog (RFC-5424) format plugin which I
> intend to submit as well. Thoughts?
> — C


Re: [DISCUSS] 1.13.0 release

2018-03-12 Thread Arina Yelchiyeva
Parth,

looks like build on Travis is still running [1].

[1]
https://travis-ci.org/apache/drill/builds/352294695?utm_source=github_status_medium=notification


Kind regards
Arina

On Mon, Mar 12, 2018 at 2:26 PM, Parth Chandra  wrote:

> Travis build is broken after the merge. I'm almost sure this was already
> addressed. @Volodymyr I thought this was part of the PR for DRILL-1491?
>
> On Sat, Mar 10, 2018 at 11:08 AM, Parth Chandra  wrote:
>
> > Update on DRILL-1491 -
> >   The illegal state transition (cancellation_requested -> enqueued) is no
> > longer happening for me after I updated my JDK to the latest version.
> I'll
> > try on and off over the weekend and if it cannot be reproduced, I'll let
> it
> > go.
> >
> > Release plan -
> >   I'll merge the PRs for DRILL-1491 and DRILL-4547 in a bit.
> >   I'm planning on rolling out release candidate RC0 on Monday morning
> > (PDT). I'll create a branch for the release.
> >   Apache master is still open for commits. (Try not to break anything).
> >
> >
>


Re: [DISCUSS] 1.13.0 release

2018-02-27 Thread Arina Yelchiyeva
I want to include DRILL-6174. It has already passed code review.

Regarding JDK8 support. Volodymyr Tkach is working on this issue. Currently
all unit tests have passed. Now he is working on enforcing Java 8 (changes
in travis.yml, drill-config.sh, pom.xml etc).

Regarding Drill on Yarn, Salim has done code review. Failing Travis check
is easy to fix. Tim has already proposed the solution (Paul just needs to
add the dependency).
I think its safe to include these changes in this release. They go in
separate module and have no impact any existing functionality. Even if
there are some flows, users will be able to give it a try and feedback in
case of issues.

On Tue, Feb 27, 2018 at 8:53 AM, Parth Chandra  wrote:

> There are two issues marked as blockers for 1.13.0 -
> TPatch InfoKeySummaryReporterPStatusResolutionCreatedUpdatedDueFix
> Version/s
> Assignee
> [image: Bug]
>  DRILL-6185
> 
>
> Error is displaying while accessing query profiles via the Web-UI
> 
> Anton Gozhiy
> 
> [image:
> Blocker] OPEN *Unresolved* 26/Feb/18 26/Feb/18   1.13.0
>  DRILL+AND+fixVersion+%3D+1.13.0>
> Kunal
> Khatua  name=kkhatua>
> Actions
>  13140915/ActionsAndOperations?atl_token=A5KQ-2QAV-T4JA-FDED|
> 582202a248949006c997dc02e0361c023d2c491d|lin>
> [image: Task]
>  DRILL-1491
> 
>
> Support for JDK 8 
> Aditya Kishore
>  >
> [image:
> Blocker] OPEN *Unresolved* 03/Oct/14 22/Feb/18   1.13.0
>  DRILL+AND+fixVersion+%3D+1.13.0>
> Volodymyr
> Tkach
>  name=volodymyr.tkach>
> Actions
>  12745818/ActionsAndOperations?atl_token=A5KQ-2QAV-T4JA-FDED|
> 582202a248949006c997dc02e0361c023d2c491d|lin>
>
> DRILL-6185 is definitely a blocker. We cannot do a release with a UI
> regression.
>
> From a quick perusal of DRILL-1491, JDK-8 support has issues that no one is
> looking at. Is someone addressing these and JIRA has not been updated?
>
> Also,  - Drill 1170: Drill-on-YARN #1011, is failing Travis checks. Call me
> paranoid but a 21K line PR with one review comment doesn't sound like the
> review is completed. Any committer looking at it?
>
>
>
>
> On Tue, Feb 27, 2018 at 9:28 AM, Ted Dunning 
> wrote:
>
> > There is a pcap improvement about to be ready. And a bug (with a fix)
> that
> > just turned up. I will push for a PR
> >
> > On Feb 23, 2018 00:14, "Parth Chandra"  wrote:
> >
> > > Bit of a tepid response from dev; but Aman's approval is all the
> > > encouragement I need to roll out a release :)
> > >
> > > Thoughts on pending PRs?
> > >
> > >
> > >
> > >
> > > On Thu, Feb 22, 2018 at 9:54 PM, Aman Sinha 
> > wrote:
> > >
> > > > Agreed...it would be good to get the ball rolling on the 1.13.0
> > release.
> > > > Among other things, this release
> > > > has the long pending Calcite rebase changes and the sooner we get it
> it
> > > out
> > > > for users, the better.
> > > >
> > > > Thanks for volunteering !
> > > >
> > > > -Aman
> > > >
> > > > On Wed, Feb 21, 2018 at 9:03 PM, Parth Chandra 
> > > wrote:
> > > >
> > > > > Hello Drillers,
> > > > >
> > > > >   I feel we might benefit from a early release for 1.13.0. We took
> > > longer
> > > > > to do the previous release so it would be nice to bring the release
> > > train
> > > > > back on track.
> > > > >
> > > > >   I'll volunteer (!) to manage the release :)
> > > > >
> > > > >   What do you guys think?
> > > > >
> > > > >   If we are in agreement on starting the release cycle and there
> are
> > > any
> > > > > issues on which work is in progress, that you feel we *must*
> include
> > in
> > > > the
> > > > > release, please post in reply to this thread. Let's at least get a
> > head
> > > > > start on closing pending PRs since these are usually what delays
> > > > releases.
> > > > >
> > > > > Thanks
> > > > >
> > > > > Parth
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Committer: Kunal Khatua

2018-02-27 Thread Arina Yelchiyeva
Congrats, Kunal!

On Tue, Feb 27, 2018 at 6:42 PM, Aman Sinha  wrote:

> The Project Management Committee (PMC) for Apache Drill has invited Kunal
> Khatua  to become a committer, and we are pleased to announce that he
> has accepted.
>
> Over the last couple of years, Kunal has made substantial contributions to
> the process of creating and interpreting of query profiles, among other
> code contributions. He has led the efforts for Drill performance evaluation
> and benchmarking.  He is a prolific writer on the user mailing list,
> providing detailed responses.
>
> Welcome Kunal, and thank you for your contributions.  Keep up the good
> work !
>
> - Aman
> (on behalf of the Apache Drill PMC)
>


Re: Avro storage format behaviour

2018-03-01 Thread Arina Yelchiyeva
As Paul has mentioned in PR [1] when we move to new scan framework it will
handle implicit columns for all file readers.
I guess till that let's treat avro as other file formats (for example,
parquet) so users can benefit from implicit columns for this format as well.

[1] https://github.com/apache/drill/pull/1138

On Wed, Feb 28, 2018 at 7:47 PM, Vova Vysotskyi  wrote:

> Hi all,
>
> I am working on DRILL-4120: dir0 does not work when the directory structure
> contains Avro files.
>
> In DRILL-3810 was added validation of query using avro schema before start
> executing the query.
> Therefore with these changes Drill throws an exception when the
> query contains non-existent column and table has avro format.
> Other storage formats such as json or parquet allow usage of non-existing
> fields.
>
> So here is my question: should we continue to treat avro as a format with
> fixed schema, or we should start treating avro as a dynamic format to be
> consistent with other storage formats?
>
> --
> Kind regards,
> Volodymyr Vysotskyi
>


Re: [VOTE] Apache Drill release 1.13.0 - RC0

2018-03-15 Thread Arina Yelchiyeva
 - Built from the source [4] on Linux, run unit test.
- Downloaded the binary tarball [2], untarred and ran Drill in embedded
mode on Windows.
- Ran sample queries, checked system tables, profiles on Web UI, also logs
and index page.
- Created persistent and temporary tables, loaded custom UDFs.

+1 (binding)

Kind regards
Arina

On Thu, Mar 15, 2018 at 1:39 AM, Aman Sinha  wrote:

> - Downloaded the source tarball from [2] on my Linux VM, built and ran the
> unit tests successfully
> - Downloaded the binary tarball onto my Macbook, untarred and ran Drill in
> embedded mode
> - Ran several queries  against a TPC-DS SF1 data set, including CTAS
> statements with PARTITION BY and ran a few partition pruning queries
> - Tested query cancellation by cancelling a query that was taking long time
> due to expanding join
> - Examined the run-time query profiles of these queries with and without
> parallelism.
> - Checked the maven artifacts on [3].
>
>  - Found one reference to JDK 7 : README.md says 'JDK 7' in the
> Prerequisites.  Ideally, this should be changed to JDK 8
>
> Overall, LGTM  +1 (binding)
>
>
> On Tue, Mar 13, 2018 at 3:58 AM, Parth Chandra  wrote:
>
> > Hi all,
> >
> > I'd like to propose the first release candidate (RC0) of Apache Drill,
> > version 1.13.0.
> >
> > The release candidate covers a total of 113 resolved JIRAs [1]. Thanks
> > to everyone
> > who contributed to this release.
> >
> > The tarball artifacts are hosted at [2] and the maven artifacts are
> hosted
> > at
> > [3].
> >
> > This release candidate is based on commit
> > cac2882d5a9e22fbc251e4caf622fe30242ad557 located at [4].
> >
> > Please download and try out the release.
> >
> > The vote ends at 1:00 PM UTC (5:00 AM PDT, 2:00 PM EET, 5:30 PM IST), Mar
> > 16th, 2018
> >
> > [ ] +1
> > [ ] +0
> > [ ] -1
> >
> > Here's my vote: +1
> >
> >
> > [1 ]
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > projectId=12313820=12342096
> >
> > [2] http://home.apache.org/~parthc/drill/releases/1.13.0/rc0/
> >
> > [3] https://repository.apache.org/content/repositories/
> orgapachedrill-1046
> >
> > [4] https://github.com/parthchandra/drill/tree/drill-1.13.0
> >
>


Re: License Header FYI

2018-04-18 Thread Arina Yelchiyeva
Yes, until https://github.com/apache/drill/pull/1215 is merged in.

On Wed, Apr 18, 2018 at 8:51 AM, Abhishek Girish  wrote:

> Hey Tim,
>
> I tried building master and encountered an error:
>
> mvn clean install -U -Pmapr -Drat.skip=false -Dlicense.skip=false
>
> ...
>
> [ERROR] Failed to execute goal com.mycila:license-maven-plugin:3.0:check
> > (default) on project drill-root: Some files do not have the expected
> > license header -> [Help 1]
>
>
> Is that expected?
>
> On Tue, Apr 17, 2018 at 1:41 PM, Timothy Farkas  wrote:
>
> > Hi All,
> >
> > Recently the license formatting checks have become stricter and all the
> > license headers have been reformatted. The main benefit from this is that
> > it is no longer allowed to have license headers in java doc comments.
> This
> > will help keep our javadocs clean when we publish them. By default
> license
> > checks are disabled, but they are enabled for Travis. To manually enable
> > license checks locally add -Drat.skip-false and -Dlicense.skip=false args
> > to your maven command. Also to automatically add license headers to your
> > new files do mvn license:format
> >
> > For the next couple days please manually check the license headers for
> > your PRs. Also if you regenerate classes in drill/protocol please
> manually
> > run mvn license:format to add the license headers. This is necessary
> > because my last change broke Travis and auto formatting of licenses for
> > generated classes, but this will be fixed after
> https://github.com/apache/
> > drill/pull/1215 is merged.
> >
> > Thanks,
> > Tim
> >
>


Re: gitbox?

2018-04-18 Thread Arina Yelchiyeva
Thanks, Parth, that would be really helpful.

On Wed, Apr 18, 2018 at 4:38 AM, Parth Chandra  wrote:

> Hi Drill devs
>
>   If no one has any objections I will open the Apache infra request to move
> to gitbox. Once this is setup, committers will be able to merge/commit
> without ever leaving github.
>
> Thanks
>
> Parth
>
> On Sun, Nov 12, 2017 at 8:00 AM, Kunal Khatua  wrote:
>
> > My bad... I was trying to go deeper into the specifics of GitBox via
> > Google and mostly client related results came up.
> >
> > Thanks!
> >
> > -Original Message-
> > From: Uwe L. Korn [mailto:uw...@xhochy.com]
> > Sent: Sunday, November 12, 2017 3:20 AM
> > To: dev@drill.apache.org
> > Subject: Re: gitbox?
> >
> > Note that this discussion is about the new Apache server-side Git
> services
> > https://gitbox.apache.org/ and not about any specific client.
> >
> > We are very happy with it in the Arrow and I can recommend switching to
> > any other Apache project as soon as possible.
> >
> > Uwe
> >
> > > Am 12.11.2017 um 09:08 schrieb Kunal Khatua :
> > >
> > > Has anyone tried GitKraken? It's a cross platform client that's proven
> > to be pretty reliable for me for close to a year.
> > >
> > > My concern is that GitBox is exclusive to running on Mac.
> > >
> > > -Original Message-
> > > From: Parth Chandra [mailto:par...@apache.org]
> > > Sent: Tuesday, October 31, 2017 2:52 PM
> > > To: dev 
> > > Subject: Re: gitbox?
> > >
> > > Gitbox allows committers to streamline the review and merge process. It
> > provides a single button in github to merge pull requests to the Apache
> > Drill mirror on github. This is then synchronized seamlessly with the
> > Apache master.
> > >
> > > The process would still require a committer to 1) review code, 2) run
> > the functional tests if doing a batch commit.
> > >
> > > Many other Apache projects have already moved to using gitbox.
> > >
> > >
> > >
> > >> On Tue, Oct 31, 2017 at 11:25 AM, Kunal Khatua 
> > wrote:
> > >>
> > >> For those of us that missed the hangout, can we get the minutes of
> > >> the meeting? Would help in deciding on the vote rather than be an
> > absentee.
> > >>
> > >> -Original Message-
> > >> From: Parth Chandra [mailto:par...@apache.org]
> > >> Sent: Tuesday, October 31, 2017 10:54 AM
> > >> To: dev 
> > >> Subject: Re: gitbox?
> > >>
> > >> Bumping this thread up.
> > >>
> > >> Vlad brought this up in the hangout today and it sounds like we would
> > >> like to move to Gitbox. Thanks Vlad for the patient explanations!
> > >>
> > >> Committers, let's use this thread to vote on the the suggestion.
> > >>
> > >> I'm +1 on moving to gitbox.
> > >>
> > >> Also, I can work with Vlad and Paul on updating the merge process
> > document.
> > >>
> > >>
> > >>
> > >>> On Wed, Aug 30, 2017 at 1:34 PM, Vlad Rozov 
> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> As I am new to Drill, I don't know if migration from "Git WiP" (
> > >>> https://git-wip-us.apache.org) to "Github Dual Master" (
> > >>> https://gitbox.apache.org) was already discussed by the community,
> > >>> but from my Apache Apex experience I would recommend to consider
> > >>> migrating Drill ASF repos to the gitbox. Such move will give
> > >>> committers write access to the Drill repository on Github with all
> > >>> the perks that Github
> > >> provides.
> > >>>
> > >>> Thank you,
> > >>>
> > >>> Vlad
> > >>>
> > >>
> >
>


Re: License Header FYI

2018-04-19 Thread Arina Yelchiyeva
I also agree that these checks should be enabled by default.

On Thu, Apr 19, 2018 at 2:07 AM, Timothy Farkas  wrote:

> Looks like I gave the wrong command. Just tested this incantation and it
> worked on master.
>
> mvn license:format -Dlicense.skip=false
>
> When I removed the license header on Drillbit.java it was added back again.
>
>
>
> 
> From: Parth Chandra 
> Sent: Wednesday, April 18, 2018 3:54:36 PM
> To: dev
> Subject: Re: License Header FYI
>
> mvn license:format  (not man license:format ) seems to not do anything.
>
> On Wed, Apr 18, 2018 at 3:53 PM, Parth Chandra  wrote:
>
> > man license:format does not seem t be doing anything.
> > Also, IMO it would be a good idea to enable rat checks by default. At the
> > very least we need to make sure that the license headers are there before
> > we check anything in.
> >
> > On Tue, Apr 17, 2018 at 1:41 PM, Timothy Farkas 
> wrote:
> >
> >> Hi All,
> >>
> >> Recently the license formatting checks have become stricter and all the
> >> license headers have been reformatted. The main benefit from this is
> that
> >> it is no longer allowed to have license headers in java doc comments.
> This
> >> will help keep our javadocs clean when we publish them. By default
> license
> >> checks are disabled, but they are enabled for Travis. To manually enable
> >> license checks locally add -Drat.skip-false and -Dlicense.skip=false
> args
> >> to your maven command. Also to automatically add license headers to your
> >> new files do mvn license:format
> >>
> >> For the next couple days please manually check the license headers for
> >> your PRs. Also if you regenerate classes in drill/protocol please
> manually
> >> run mvn license:format to add the license headers. This is necessary
> >> because my last change broke Travis and auto formatting of licenses for
> >> generated classes, but this will be fixed after
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_apache_drill_pull_1215=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
> 4eQVr8zB8ZBff-yxTimdOQ=vnpyMg6WPxgAajLqWgn0FRdxqUb1IQJObkIOKqh-MOE=
> guKPyopeVkavMDrswmmScd2mXR_ZQvdwjG2oyy02M9U= is merged.
> >>
> >> Thanks,
> >> Tim
> >>
> >
> >
>


Re: [DISCUSS] 1.13.0 release

2018-03-04 Thread Arina Yelchiyeva
Merged Drill on Yarn. Paul thanks for making the changes.

Regarding JDK8 there is an open PR (
https://github.com/apache/drill/pull/1143) which needs to be merged.
As Volodymyr mentioned there were some issues with unit tests and Travis.
Padma and I have addressed unit tests issues but we still need to check if
problems with Travis are solved.

@Volodymyr please rebase on the latest master and check.

Kind regards
Arina

On Sun, Mar 4, 2018 at 3:46 PM, Parth Chandra  wrote:

> Sounds good.
>
> On Sun, Mar 4, 2018 at 1:18 PM, Paul Rogers 
> wrote:
>
> > Hi Parth,
> > Issues with DRILL-1170 are resolved. Needs one final review by Arina,
> then
> > we should be good to do.
> > Thanks to everyone for getting the other two "batch size" PRs committed
> > recently.
> > Thanks,
> > - Paul
> >
> >
> >
> > On Saturday, March 3, 2018, 10:57:10 PM PST, Parth Chandra <
> > par...@apache.org> wrote:
> >
> >  Thank you Arina !
> >
> > Build and unit tests with JDK 8 passed for me with the latest master. I'm
> > guessing there is still something more to be addressed, since DRILL-1491
> is
> > still open?
> >
> > Updated list:
> >
> > DRILL-1491:  Support for JDK 8 --* In progress.*
> > DRILL-1170: YARN support for Drill -- Needs Committer +1 and Travis fix.
> > DRILL-6027: Implement spill to disk for the Hash Join  --- No PR and is a
> > major feature that should be reviewed (properly!).
> > DRILL-6173: Support transitive closure during filter push down and
> > partition pruning.  -- No PR and depends on 3 Apache Calcite issues that
> > are open.
> > DRILL-6023: Graceful shutdown improvements --  Marked as 1.14.0 (?)
> >
> >
> >
> > On Sun, Mar 4, 2018 at 12:35 AM, Arina Ielchiieva 
> > wrote:
> >
> > > I will take care of  FunctionInitializerTest#
> > > testConcurrentFunctionBodyLoad
> > > in DRILL-6208.
> > >
> > > Also I have merged all Jiras with ready-to-commit label.
> > > Drill on Yarn was NOT merged due to banned dependencies issue.
> > >
> > > Kind regards
> > > Arina
> > >
> > >
> > > On Sat, Mar 3, 2018 at 12:11 PM, Parth Chandra 
> > wrote:
> > >
> > > > Thanks for the updates guys. Please keep updating the status in this
> > > > thread.
> > > >
> > > > On Fri, Mar 2, 2018 at 11:18 PM, Padma Penumarthy <
> > ppenumar...@mapr.com>
> > > > wrote:
> > > >
> > > > > I can look at failure in testFlattenUpperLimit. I added this test
> > > > recently.
> > > > >
> > > > > Thanks
> > > > > Padma
> > > > >
> > > > >
> > > > > On Mar 2, 2018, at 8:19 AM, Volodymyr Tkach  > > >  > > > > vovatkac...@gmail.com>> wrote:
> > > > >
> > > > > testFlattenUpperLimit
> > > > >
> > > > >
> > > >
> > >
> >
> >
>


Re: [DISCUSS] 1.13.0 release

2018-02-26 Thread Arina Yelchiyeva
I remember that at the beginning of the year we discussed that Drill on
Yarn, transition to JDK 8 should be in 1.13 release.
Before doing the release we need to make sure all these are done and merged.
Drill on Yarn PR is reviewed but requires some fixes. Not sure about JDK 8.


On Mon, Feb 26, 2018 at 5:50 PM, Parth Chandra  wrote:

> Since there don't appear to be many PRs that folks want merged in, I'm
> thinking of rolling out the release candidate on March 1st. That should
> give folks who want to get stuff in at the last minute enough time. Note
> that I'm on Indian time so I'll be half a day ahead of most other folks.
> Charles, that gives you your  deadline :)
>
> Parth
>
>
>
> On Fri, Feb 23, 2018 at 4:20 PM, Charles Givre  wrote:
>
> > I agree and I’ll try to get the log file PR done for this release.
> >
> > Sent from my iPhone
> >
> > > On Feb 23, 2018, at 00:14, Parth Chandra  wrote:
> > >
> > > Bit of a tepid response from dev; but Aman's approval is all the
> > > encouragement I need to roll out a release :)
> > >
> > > Thoughts on pending PRs?
> > >
> > >
> > >
> > >
> > >> On Thu, Feb 22, 2018 at 9:54 PM, Aman Sinha 
> > wrote:
> > >>
> > >> Agreed...it would be good to get the ball rolling on the 1.13.0
> release.
> > >> Among other things, this release
> > >> has the long pending Calcite rebase changes and the sooner we get it
> it
> > out
> > >> for users, the better.
> > >>
> > >> Thanks for volunteering !
> > >>
> > >> -Aman
> > >>
> > >>> On Wed, Feb 21, 2018 at 9:03 PM, Parth Chandra 
> > wrote:
> > >>>
> > >>> Hello Drillers,
> > >>>
> > >>>  I feel we might benefit from a early release for 1.13.0. We took
> > longer
> > >>> to do the previous release so it would be nice to bring the release
> > train
> > >>> back on track.
> > >>>
> > >>>  I'll volunteer (!) to manage the release :)
> > >>>
> > >>>  What do you guys think?
> > >>>
> > >>>  If we are in agreement on starting the release cycle and there are
> any
> > >>> issues on which work is in progress, that you feel we *must* include
> in
> > >> the
> > >>> release, please post in reply to this thread. Let's at least get a
> head
> > >>> start on closing pending PRs since these are usually what delays
> > >> releases.
> > >>>
> > >>> Thanks
> > >>>
> > >>> Parth
> > >>>
> > >>
> >
>


Re: Deprecation of BaseTestQuery FYI

2018-06-28 Thread Arina Yelchiyeva
Hi Tim,

it looks like deprecating BaseTestQuery was a little bit pre-mature.
For example, from in this PR - https://github.com/apache/drill/pull/1331 -
Charles is trying to re-work  BaseTestQuery usage to ClusterTest.
First, it did not contain getSigletonDouble method which Charles has
implemented. Now he has troubles with implementing getSigletonBoolean
method which might be due to reader limitations.
Also I am not quite clear how we can verify columns names and multiple
columns in the result.
For example:

testBuilder()
  .sqlQuery("select (mi || lname) as CONCATOperator, mi, lname,
concat(mi, lname) as CONCAT from concatNull")
  .ordered()
  .baselineColumns("CONCATOperator", "mi", "lname", "CONCAT")
  .baselineValues("A.Nowmer", "A.", "Nowmer", "A.Nowmer")
  .baselineValues("I.Whelply", "I.", "Whelply", "I.Whelply")
  .baselineValues(null, null, "Derry", "Derry")
  .baselineValues("J.Spence", "J.", "Spence", "J.Spence")
  .build().run();

Can you please suggest how this example can be re-written?

Kind regards,
Arina

On Mon, Jun 25, 2018 at 11:10 PM Timothy Farkas  wrote:

> Hi All,
>
> BaseTestQuery was deprecated a while ago. Keeping it short and sweet :), if
> you want to use BaseTestQuery directly, don't. Use ClusterTest instead. If
> you are using PlanTestBase for planner tests, continue to do so. Eventually
> PlanTestBase will be changed to extend ClusterTest instead. There is a JIRA
> to track that issue https://issues.apache.org/jira/browse/DRILL-6536.
>
> Thanks,
> Tim
>


Re: Discussion about the metadata design

2018-06-28 Thread Arina Yelchiyeva
Hi,

Vitalii and Vova is also looking at this part, you might want to sync up
with them. Or even better, we can create Jira for this and held all
discussions there.
Vitalii, what do you think?

Kind regards,
Arina

On Thu, Jun 28, 2018 at 6:46 PM weijie tong  wrote:

> HI all:
>
> As @aman ever noticed me about the roadmap of DRILL-2.0 ,which includes
> the description of  the metadata design (
>
> https://lists.apache.org/thread.html/74cf48dd78d323535dc942c969e72008884e51f8715f4a20f6f8fb66@%3Cdev.drill.apache.org%3E
> )
> , I am interested in taking the role to implement the metadata part.
> Here I fire this discussion thread to know your idea about this problem.
>
> I have investigated some open source project about the metadata ,such
> as Hive Metastore (
> https://cwiki.apache.org/confluence/display/Hive/Design#Design-Metastore)
> ,Netflix metacat, Apache Atlas,LinkedIn WhereHows(
> https://github.com/linkedin/WhereHows)  ;  Except Hive Metastore, other
> projects have an high abstract definition to the actual physical metadata
> which will benefit to extend to add new metadata property. Hive Metastore‘s
> design is to the physical metadata , also with thrift interface to
> different languages, but depend on the relational database  not good to
> scale and performance.   To my opinion , I would prefer Hive Metastore as
> our design template or just reuse it, as we don't need to do a rich
> metadata management system. Maybe we should change the backend database to
> a high query performance kv store like Hbase.
>
>Besides the metadata interface design and the backend storage chosen, we
> should also provide the random query ability . So users can calculate the
> statistics like NDV to store in the metadata. Btw, maybe we can go further
> to take in the Verdictdb  (https://github.com/mozafari/verdictdb) to
> provide more richful approximate query processing .
>


Re: Scan mechanism PR

2018-10-12 Thread Arina Yelchiyeva
Cool, I'll do the review.

Kind regards,
Arina

On Fri, Oct 12, 2018 at 9:31 AM Paul Rogers 
wrote:

> Hi Arina,
>
> Just did another PR towards completing the scan operator revision to work
> with the "result set loader." This one is mostly plumbing to implement
> projection with the scan operator. It generalizes lots of code that already
> exists into a single, unified mechanism.
>
> Basically, this one takes care of mapping from the data source's schema to
> the projection list provided by the query (empty, wildcard or list of
> columns). It provides mechanisms for the all-null columns (our famous
> nullable INT), for the implicit columns and so on. This particular solution
> ensures that the data source only worries about populating its own vectors;
> it does not worry about Drill-specific columns (nulls or implicit), nor
> does it worry about projection if it is just reading a set of records with
> a fixed schema.
>
> This PR includes the foundation for file-level schema support. The idea is
> that the scan operator will ask each reader if it has an up-front schema.
> Something like Parquet or JDBC can get the schema from the data (or data
> source) itself. Something like JSON could get the schema from a schema
> file, or information passed along with the reader's physical plan (like
> what JC did for MsgPack.)
>
> The mechanism still allows schemas to be "discovered" on the fly, and has
> quite a bit of code to handle the many bizarre cases that can occur (and
> that we've been discussing.) This is called "schema smoothing" trying to
> handle the case that column x appears in, say, file 1, but not in file 2,
> and shows up again in file 3.
>
> The next PR will assemble this stuff into a scan framework, after which I
> can add the three readers: mock, delimited and JSON.
>
> My goal is that, with the scan framework, and the CSV and JSON examples,
> that the team can retrofit other readers as the need arises.
>
> The entire mechanism, and the design goals behind it, are documented in
> [1].
>
> Thanks,
> - Paul
>
> [1] https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades
>
>
>
>
>
>
> On Thursday, October 11, 2018, 2:51:22 AM PDT, Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
>
>  Paul,
>
> sounds good. I like the idea of mock scanner being done first, since
> besides csv and json, other readers would have to be updated as well.
> Could you please share Jira number(-s) if any so I can follow them?
>
> Kind regards,
> Arina
>
>
>


Re: [DISCUSSION] CI for Drill

2018-10-12 Thread Arina Yelchiyeva
Vitalii,

in this case I think it's ok to merge CircleCI configs. Could you please
share how to setup CircleCI for custom builds?
Also could you follow up with INFRA ticket there would be any response?

Kind regards,
Arina

On Thu, Oct 11, 2018 at 2:20 PM Vitalii Diravka  wrote:

> Hi all!
>
> I have opened PR with adding CircleCI configs for Drill build [1].
> And the ticket [2] for INFRA to setup CircleCI for ApacheDrill.
> But then I've noticed that INFRA can't allow write access for 3d party
> (Apache Arrow + CircleCI [3]).
> So here are two ways:
> * to merge it, then CircleCI builds will work for Drill forks only.
> * try to help INFRA to enable CircleCI for Apache Drill main repo via
> configuring CircleCI webhooks [4]
>
> I think we can proceed with both of them, since even just to merge
> .circleci to the Drill will be useful for the forks
> of committers and contributors (like in the Apache Cassandra [5]).
> Thoughts?
>
> [1] https://github.com/apache/drill/pull/1493
> [2] https://issues.apache.org/jira/browse/INFRA-17133
> [3]
>
> https://issues.apache.org/jira/browse/INFRA-15964?focusedCommentId=16351422=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16351422
> [4]
>
> https://issues.apache.org/jira/browse/INFRA-12197?focusedCommentId=15652850=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15652850
> [5] https://github.com/apache/cassandra/tree/trunk/.circleci
>
> On Wed, Sep 12, 2018 at 3:41 PM Vitalii Diravka  >
> wrote:
>
> > The current issue with CircleCI is default RAM limit of medium (default)
> > instance for Docker images - 4Gb [1].
> > It can be expanded by using VM instead of Docker image or possibly
> > CircleCI team can provide us bigger instance for it [2]
> >
> > I have created the Jira ticket for it [3]. Further discussion can be
> > continued there.
> >
> > [1]
> https://circleci.com/docs/2.0/configuration-reference/#resource_class
> > [2] https://circleci.com/pricing/
> > [3] https://issues.apache.org/jira/browse/DRILL-6741
> >
> > Kind regards
> > Vitalii
> >
> >
> > On Wed, Sep 12, 2018 at 12:27 PM Arina Yelchiyeva <
> > arina.yelchiy...@gmail.com> wrote:
> >
> >> +1, especially if other Apache project uses it, there should not be any
> >> issues with Apache.
> >>
> >> Kind regards,
> >> Arina
> >>
> >> On Wed, Sep 12, 2018 at 12:36 AM Timothy Farkas 
> wrote:
> >>
> >> > +1 For trying out Circle CI. I've used it in the past, and I think the
> >> UI
> >> > is much better than Travis.
> >> >
> >> > Tim
> >> >
> >> > On Tue, Sep 11, 2018 at 8:21 AM Vitalii Diravka <
> >> vitalii.dira...@gmail.com
> >> > >
> >> > wrote:
> >> >
> >> > > Recently we discussed Travis build failures and there were excluded
> >> more
> >> > > tests to make Travis happy [1]. But looks like the issue returned
> back
> >> > and
> >> > > Travis build fails intermittently.
> >> > >
> >> > > I tried to find other solution instead of exclusion Drill unit tests
> >> and
> >> > > found other good CI - CircleCI [2]. Looks like this CI will allow to
> >> run
> >> > > all unit tests successfully.
> >> > > And it offers good conditions for open-source projects [3] (even OS
> X
> >> > > environment is available).
> >> > > The example of Apache project, which uses this CI is Apache
> Cassandra
> >> [4]
> >> > >
> >> > > My quick set-up of CircleCI for Drill still fails, but it should be
> >> just
> >> > > configured properly [5].
> >> > >
> >> > > I think we can try CircleCI in parallel with Travis and if it works
> >> well,
> >> > > we will move completely to CircleCI.
> >> > > Does it make sense? Maybe somebody faced with it and knows some
> >> > limitations
> >> > > or complexities?
> >> > >
> >> > > [1]
> >> > >
> >> >
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6559=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=Q-8LHY-5W3frk1S48j4jsEgmHOKPowwFtsEHM9Fp_g4=0oL67ROsJWhMDYzDS-y3Ch-ibgsfKQph8tN0I0jsB1o=
> >> > > [2]
> >> > >
> >> > >
> >> >
> >>
> https://urldefense.proofpoint.com/v2/url?u=http

Re: November Apache Drill board report

2018-11-01 Thread Arina Yelchiyeva
Thanks, Aman!  Updated the report.
I went too far with 2019, luckily the meet up will be much earlier :)

=

 ## Description:
 - Drill is a Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud
Storage.

## Issues:
 - There are no issues requiring board attention at this time.

## Activity:
 - Since the last board report, Drill has released version 1.14.0,
   including the following enhancements:
- Drill in a Docker container
- Image metadata format plugin
- Upgrade to Calcite 1.16.0
- Kafka plugin push down support
- Phonetic and String functions
- Enhanced decimal data support
- Spill to disk for the Hash Join support
- CGROUPs resource management support
 - There were active discussions about schema provision in Drill.
   Based on these discussions two projects are currently evolving:
   Drill metastore and schema provision in the file and in a query.
 - Apache Drill book has been written by two PMC members (Charles and Paul).
 - Drill developer meet up will be held on November 14, 2018.
   The following areas are going to be discussed:
- Storage plugins
- Schema discovery & Evolution
- Metadata Management
- Resource management
- Integration with Apache Arrow

## Health report:
 - The project is healthy. Development activity
   as reflected in the pull requests and JIRAs is good.
 - Activity on the dev and user mailing lists are stable.
 - Three committers and three new PMC member were added in the last period.

## PMC changes:

 - Currently 23 PMC members.
 - New PMC members:
- Boaz Ben-Zvi was added to the PMC on Fri Aug 17 2018
- Charles Givre was added to the PMC on Mon Sep 03 2018
- Vova Vysotskyi was added to the PMC on Fri Aug 24 2018

## Committer base changes:

 - Currently 48 committers.
 - New commmitters:
- Chunhui Shi was added as a committer on Thu Sep 27 2018
- Gautam Parai was added as a committer on Mon Oct 22 2018
- Weijie Tong was added as a committer on Fri Aug 31 2018

## Releases:

 - 1.14.0 was released on Sat Aug 04 2018

## Mailing list activity:

 - dev@drill.apache.org:
- 427 subscribers (down -6 in the last 3 months):
- 2827 emails sent to list (2126 in previous quarter)

 - iss...@drill.apache.org:
- 18 subscribers (down -1 in the last 3 months):
- 3487 emails sent to list (4769 in previous quarter)

 - u...@drill.apache.org:
- 597 subscribers (down -6 in the last 3 months):
- 332 emails sent to list (346 in previous quarter)


## JIRA activity:

 - 164 JIRA tickets created in the last 3 months
 - 128 JIRA tickets closed/resolved in the last 3 months

On Thu, Nov 1, 2018 at 6:20 PM Aman Sinha  wrote:

>Docket container  ==> 'Docker'
>November 14, 2019  ==>  2018  :)   (this is wrong in email that was sent
> out)
>
> Rest LGTM.
>
> On Thu, Nov 1, 2018 at 6:42 AM Arina Ielchiieva  wrote:
>
> > Hi all,
> >
> > please take a look at the draft board report for the last quarter and let
> > me know if you have any comments.
> >
> > Thanks,
> > Arina
> >
> > =
> >
> >  ## Description:
> >  - Drill is a Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud
> > Storage.
> >
> > ## Issues:
> >  - There are no issues requiring board attention at this time.
> >
> > ## Activity:
> >  - Since the last board report, Drill has released version 1.14.0,
> >including the following enhancements:
> > - Drill in a Docket container
> > - Image metadata format plugin
> > - Upgrade to Calcite 1.16.0
> > - Kafka plugin push down support
> > - Phonetic and String functions
> > - Enhanced decimal data support
> > - Spill to disk for the Hash Join support
> > - CGROUPs resource management support
> >  - There were active discussions about schema provision in Drill.
> >Based on these discussions two projects are currently evolving:
> >Drill metastore and schema provision in the file and in a query.
> >  - Apache Drill book has been written by two PMC members (Charles and
> > Paul).
> >  - Drill developer meet up will be held on November 14, 2019.
> >The following areas are going to be discussed:
> > - Storage plugins
> > - Schema discovery & Evolution
> > - Metadata Management
> > - Resource management
> > - Integration with Apache Arrow
> >
> > ## Health report:
> >  - The project is healthy. Development activity
> >as reflected in the pull requests and JIRAs is good.
> >  - Activity on the dev and user mailing lists are stable.
> >  - Three committers and three new PMC member were added in the last
> period.
> >
> > ## PMC changes:
> >
> >  - Currently 23 PMC members.
> >  - New PMC members:
> > - Boaz Ben-Zvi was added to the PMC on Fri Aug 17 2018
> > - Charles Givre was added to the PMC on Mon Sep 03 2018
> > - Vova Vysotskyi was added to the PMC on Fri Aug 24 2018
> >
> > ## Committer base changes:
> >
> >  - Currently 48 committers.
> >  - New commmitters:
> > - Chunhui Shi was added as a committer on Thu Sep 27 2018
> > - Gautam Parai was added as a committer on Mon Oct 22 2018
> > - 

Re: [HANGOUT] 29th Oct 2018 (9PM PST)

2018-10-30 Thread Arina Yelchiyeva
Is there a recording of the meeting?

Kind regards,
Arina 

> On Oct 30, 2018, at 7:19 AM, weijie tong  wrote:
> 
> Hi :
> Thanks for the invitation. Here is slide: JPPD
> 
> 
>> On Tue, Oct 30, 2018 at 12:12 PM Pritesh Maker  wrote:
>> 
>> Hi,
>> 
>> Apologies for the late notice - we are currently having a Hangout with
>> Weiji Tong (in Beijing) from AliPay who is talking about the JPPD feature
>> that he contributed recently.
>> 
>> This hangout replaces the Tuesday 10AM PST hangout for 30th Oct. We will
>> continue with regular time after 2 weeks.
>> 
>> Please use the new link: http://meet.google.com/yki-iqdf-tai
>> 
>> Weiji will share the slides after the hangout.
>> 
>> Thanks,
>> Pritesh
>> 


Schema Provision using File / Table Function

2018-11-08 Thread Arina Yelchiyeva
Hi all,

besides the initiative to allow providing metadata in the Drill metastore,
there also can be an option to provide table schema using file and / or
table function.

I have created Jira DRILL-6835 [1] with the link to the design document.
Those who are interested, are welcome to read it and provide feedback.

[1] https://issues.apache.org/jira/browse/DRILL-6835

Kind regards,
Arina


Re: November Apache Drill board report

2018-11-07 Thread Arina Yelchiyeva
> > >  - There are no issues requiring board attention at this time.
> > > >
> > > > ## Activity:
> > > >  - Since the last board report, Drill has released version 1.14.0,
> > > >including the following enhancements:
> > > > - Drill in a Docker container
> > > > - Image metadata format plugin
> > > > - Upgrade to Calcite 1.16.0
> > > > - Kafka plugin push down support
> > > > - Phonetic and String functions
> > > > - Enhanced decimal data support
> > > > - Spill to disk for the Hash Join support
> > > > - CGROUPs resource management support
> > > > - Lateral / Unnest support (disabled by default)
> > > >  - There were active discussions about schema provision in Drill.
> > > >Based on these discussions two projects are currently evolving:
> > > >Drill metastore and schema provision in the file and in a query.
> > > >  - Apache Drill book has been written by two PMC members (Charles and
> > > > Paul).
> > > >  - Drill developer meet up will be held on November 14, 2018.
> > > >
> > > >The following areas are going to be discussed:
> > > > - Storage plugins
> > > > - Schema discovery & Evolution
> > > > - Metadata Management
> > > > - Resource management
> > > > - Integration with Apache Arrow
> > > >
> > > > ## Health report:
> > > >  - The project is healthy. Development activity
> > > >as reflected in the pull requests and JIRAs is good.
> > > >  - Activity on the dev and user mailing lists are stable.
> > > >  - Three committers and three new PMC member were added in the last
> > > period.
> > > >
> > > > ## PMC changes:
> > > >
> > > >  - Currently 23 PMC members.
> > > >  - New PMC members:
> > > > - Boaz Ben-Zvi was added to the PMC on Fri Aug 17 2018
> > > > - Charles Givre was added to the PMC on Mon Sep 03 2018
> > > > - Vova Vysotskyi was added to the PMC on Fri Aug 24 2018
> > > >
> > > > ## Committer base changes:
> > > >
> > > >  - Currently 48 committers.
> > > >  - New commmitters:
> > > > - Chunhui Shi was added as a committer on Thu Sep 27 2018
> > > > - Gautam Parai was added as a committer on Mon Oct 22 2018
> > > > - Weijie Tong was added as a committer on Fri Aug 31 2018
> > > >
> > > > ## Releases:
> > > >
> > > >  - 1.14.0 was released on Sat Aug 04 2018
> > > >
> > > > ## Mailing list activity:
> > > >
> > > >  - dev@drill.apache.org:
> > > > - 427 subscribers (down -6 in the last 3 months):
> > > > - 2827 emails sent to list (2126 in previous quarter)
> > > >
> > > >  - iss...@drill.apache.org:
> > > > - 18 subscribers (down -1 in the last 3 months):
> > > > - 3487 emails sent to list (4769 in previous quarter)
> > > >
> > > >  - u...@drill.apache.org:
> > > > - 597 subscribers (down -6 in the last 3 months):
> > > > - 332 emails sent to list (346 in previous quarter)
> > > >
> > > >
> > > > ## JIRA activity:
> > > >
> > > >  - 164 JIRA tickets created in the last 3 months
> > > >  - 128 JIRA tickets closed/resolved in the last 3 months
> > > >
> > > >
> > > >
> > > > On Fri, Nov 2, 2018 at 12:25 AM Sorabh Hamirwasia <
> > shamirwa...@mapr.com>
> > > > wrote:
> > > >
> > > > > Hi Arina,
> > > > > Lateral/Unnest feature was part of 1.14 though it was disabled by
> > > > default.
> > > > > Should we mention it as part of 1.14 enhancements in the report?
> > > > >
> > > > > Thanks,
> > > > > Sorabh
> > > > >
> > > > > On Thu, Nov 1, 2018 at 9:29 AM Arina Yelchiyeva <
> > > > > arina.yelchiy...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks, Aman!  Updated the report.
> > > > > > I went too far with 2019, luckily the meet up will be much
> earlier
> > :)
> > > > > >
> > > > > > =
> > > > > >
> > > > > >  ## Description:
> > > > > >  - Drill is 

Re: show files in

2018-11-02 Thread Arina Yelchiyeva
Please try on the latest master. This issue has been fixed. No need to use the 
option as well.

Kind regards,
Arina

> On Nov 2, 2018, at 8:29 PM, Jean-Claude Cote  wrote:
> 
> Hi,
> 
> I'm running the show files in dfs.root.`subdir1/subdir2` query.
> 
> And got the error "To SHOW FILES in specific directory, enable option
> storage.list_files_recursively"
> 
> I've turn that on with alter session set
> storage.list_files_recursively=true;
> 
> However when I now run the query it seems like drill is iterating the
> entire file system from the root. It's not just listing the file from
> /subdir1/subdir2. My drive has a lot of files in it. I was expecting drill
> to list only the given folder.
> 
> I'm I correct in thinking it is listing from the root all the files. Any
> reason why it is implemented that way?
> 
> Thank you
> jc


Re: msgpack format reader with schema learning feature

2018-10-10 Thread Arina Yelchiyeva
Somehow this is correlates with two projects which are currently actively
being investigated / prototyping:
1. Drill metastore (DRILL-6552)
2. Providing schema from the query (casts, hints).
The second one will allow to provide schema using hints, as well as from
the file.
Regarding how to use Drill to propose the schema, there would be ANALYZE
table command which will output the schema.

Anyway, both of them in the end will need to feed schema to the readers, so
at this point Paul's new format reader framework that accepts schema would
become really important.
@Paul, do you have any ETA when you will be able to submit the PRs? Maybe
also do some presentation? Can you please share Jira number(-s) as well?

Kind regards,
Arina

On Wed, Oct 10, 2018 at 7:31 AM Paul Rogers 
wrote:

> Hi JC,
>
> Very cool indeed. You are the man!
>
> Ted's been advocating for this approach for as long as I can remember (2+
> years). You're well on your way to solving the JSON problems that I
> documented a while back in DRILL-4710 and summarize as "Drill can't predict
> the future." Basically, without a schema, Drill is forced to make a column
> schema decision (use nullable INT!) before Drill has the full information.
>
> And, because Drill is distributed (have you tested this case yet?) a scan
> over a large file, or multiple files, will be spread across readers, each
> attempting to make independent decisions. These decisions will conflict
> (reader 1 saw your VARCHAR column, reader 2 didn't and guessed nullable
> INT). We've always hoped that Drill operators can sort out the mess and do
> the right thing -- something that is probably impossible in the general
> case.
>
> So, having a schema available is the simplest possible solution: all
> readers agree on the schema and the rest of the Drill operators are happy
> because they get consistent inputs. Hat's off to you for creating this
> proof of concept to move this conversation forward.
>
> Your MaterializedField approach is clever. It is somewhat awkward for this
> use case, but can work. It is awkward because a properly-constructed
> MaterializedField for a MAP, say, includes the Map schema. But, for a LIST,
> the Materialized field does not include the child types. Ignoring these
> long-standing corner cases, it will get you 90% of where you want to go.
>
> Just FYI, I had been working on a project to help with this stuff. I've
> checked in a newer schema mechanism that handles all Drill's nuances:
> unions,  (non-repeated) LISTs, REPEATED LISTs, etc. Search for
> "SchemaBuilder" and "TupleSchema". (There are two SchemaBuilder classes,
> you want the one that builds a TupleSchema.) Then if you search for
> references, you'll find lots of unit tests that use this to define a
> schema. There is no serialization format, but you could easily add that.
>
> The project includes a "result set loader" that provides a convent way to
> convert from Java types to Drill types. It is designed to allow, say,
> converting a String from a CSV file into a VARCHAR, INT, DOUBLE or
> whatever. Each reader would need its own string-to-whatever parser, but
> these are meant to be easy to add.
>
> The TupleSchema provides the input to create a ResultSetLoader for that
> schema, kind of how you're using a map, but far more simply. There are lots
> of unit tests that illustrate how this works.
>
> All this would be wrapped in a new format reader framework that accepts a
> schema. That part has stalled and is not yet committed. If we can push this
> idea forward, perhaps I can get off the dime and do a few more PRs to get
> the rest of the framework into Drill for your use.
>
> The next problem is how to use Drill to build the schema? How does the
> user specify to gather the schema? Would be great to have a DESCRIBE
> statement that outputs the schema you discovered. Then maybe CREATE SCHEMA
>  AS DESCRIBE ...  The CREATE VIEW statement could act as a
> prototype.
>
> Drill is distributed and two users could do the CREATE SCHEMA at the same
> time, an in concert with someone doing a query on that same directory.
> Drill has no good synchronization solution. Since this seems to not be a
> problem for views, perhaps things will work for schemas. (Both are a single
> file.) We have had problem with metadata because refreshing that updates
> multiple files and lack of synchronization has been a lingering problem.
>
> Finally, I'd suggest ensuring your format can be created by hand or by
> other tools. Heck, if you want to make a big impact, work with the Arrow
> project to define a cross-project schema format that can be used to
> describe either Arrow or Drill schemas.
>
> Thanks,
> - Paul
>
>
>
> On Tuesday, October 9, 2018, 8:31:45 PM PDT, Jean-Claude Cote <
> jcc...@gmail.com> wrote:
>
>  I'm writing a msgpack reader and in doing so I noticed that the JSON
> reader
> will put a INT column place holder when no records match a select statement
> like "select str from.." when the str field 

Re: msgpack format reader with schema learning feature

2018-10-11 Thread Arina Yelchiyeva
Paul,

sounds good. I like the idea of mock scanner being done first, since
besides csv and json, other readers would have to be updated as well.
Could you please share Jira number(-s) if any so I can follow them?

Kind regards,
Arina

On Thu, Oct 11, 2018 at 8:52 AM Paul Rogers 
wrote:

> Hi JC,
>
> Drill's complex types can be a bit confusing. Note that, in your example,
> for the REPEATED BIGINT, we know that this is an array (REPEATED) and we
> know the type of each element (BIGINT).
>
> But, that REPEATED LIST, it is a list of ... what? The element type is
> missing.
>
> This is not the only hole. The UNION type has a list of child types which
> tell you the types in the UNION. But, if the UNION's child type is a MAP,
> that type does not include the full MAP schema.  A LIST is a list of a
> single type, or a LIST of UNIONs. It has the same schema ambiguity.
>
> The TupleSchema mechanism fills in this gap. But, for your use,
> MaterializedList should be fine because you probably don't want to use the
> "obscure" types. Frankly, LIST, REPEATED LIST and UNION are still pretty
> broken. I would recommend sticking with the scalar types (REQUIRED,
> OPTIONAL, REPEATED) and MAP (REQUIRED and REPEATED, there is no OPTIONAL).
> I ran into bug after bug when trying to use LIST or UNION. You can populate
> them, but some of the "fancier" operators (sort, hash join, aggregation)
> can't handle them yet.
>
>
> Can you explain a bit more the problem you ran into with the SchemaBuilder
> (the one that uses TupleSchema)? It is supposed to handle all types. I'd
> like to fix any issues you may have found.
>
>
> Just to give a bit more background on the tuple schema and related
> classes... The builder creates a schema that can be used with the RowSet
> class to create a record batch that matches the schema. The RowSet provides
> column writers to populate your record batch, and column readers to read
> it. The column accessors convert between Java types and vector types and
> can provide the custom type conversion I mentioned.
>
> For simple cases (working with a few types), the simple mechanism shown in
> the log reader works well. (It is what we explain in the Drill book.) But,
> as you add types, especially structured types, things get pretty complex.
> The RowSet family handles all that cruft for you.
>
> The part I still need to add is the "result set loader" which goes one
> step further: it can limit memory taken by a record batch. Most readers
> today use a fixed number, say 4K records. 4K of INTs is pretty small. 4K of
> 1 MB images is pretty big. The Result Set Loader works against a memory
> limit (20 MB, say) and automatically limits records per batch to that
> memory limit.
>
>
> Thanks for doing the PR. Will be great to see what you've created.
>
> Thanks,
> - Paul
>
>
>
> On Wednesday, October 10, 2018, 7:59:06 PM PDT, Jean-Claude Cote <
> jcc...@gmail.com> wrote:
>
>  Hey Paul,
>
> You mentionned that
>
> "But, for a LIST, the Materialized field does not include the child types"
>
> However MaterializedField do have type information for child types. You can
> see it in this example. I think it has all relevant information. Anyways
> all test cases I've tried so far are working..
>
>
> child {
>   major_type {
> minor_type: LIST
> mode: REPEATED
>   }
>   name_part {
> name: "arrayOfArray"
>   }
>   child {
> major_type {
>   minor_type: BIGINT
>   mode: REPEATED
> }
> name_part {
>   name: "$data$"
> }
> child {
>   major_type {
> minor_type: BIGINT
> mode: REQUIRED
> precision: 0
> scale: 0
>   }
>   name_part {
> name: "$data$"
>   }
> }
>   }
> }
>
> You also mention that I could leverage the "SchemaBuilder" and
> "TupleSchema". I did try to use the TupleSchema but back out of using it. I
> think my issue was navigating down into lists. Anyways in the end using a
> Map MaterializedField to represent my row seem to do the job.
>
> You also mention
> "The next problem is how to use Drill to build the schema? How does the
> user specify to gather the schema? Would be great to have a DESCRIBE
> statement that outputs the schema you discovered. Then maybe CREATE SCHEMA
>  AS DESCRIBE ...  The CREATE VIEW statement could act as a
> prototype."
>
> Totally agree, I initially though that it would be great to trigger the
> schema learning like that but did not really know how it could be done. So
> in the end I just used a format plugin property to toggle

Re: Possible way to specify column types in query

2018-10-01 Thread Arina Yelchiyeva
Currently Calcite supports the following syntax, apparently used in Phoenix.
*select empno + x from EMP_MODIFIABLEVIEW extend (x int not null)*

Another option to consider is hint syntax (many DBs use this one) basically
it's a multiline comment followed by a plus:
*select /*+.*/ col_name from t*
This would allow us to pass not only schema but join / index hints etc.

Example:
*select /*+ SCHEMA(a int not null, b int) */ a from t*

One minus we would need to implement this first in Calcite if Calcite
community would be in favor of such changes.

Kind regards,
Arina

On Mon, Sep 10, 2018 at 7:42 AM Paul Rogers 
wrote:

> Hi Weijie,
>
> Thanks for the paper pointer. F1 uses the same syntax as Scope (the system
> cited in my earlier note): data type after the name.
>
> Another description is [1]. Neither paper describe how F1 handles arrays.
> However, this second paper points out that Protobuf is F1's native format,
> and so F1 has support for nested types. Drill does also, but in Drill, a
> reference to "customer.phone.cell" cause the nested "cell" column to be
> projected as a top-level column. And, neither paper say whether F1 is used
> with O/JDBC, and if so, how they handle the mapping from nested types to
> the flat tuple structure required by xDBC.
>
> Have you come across these details?
>
> Thanks,
> - Paul
>
>
>
> On Thursday, September 6, 2018, 8:43:57 PM PDT, weijie tong <
> tongweijie...@gmail.com> wrote:
>
>  Google's latest paper about F1[1] claims to support any data sources by
> using an extension api called TVF see section 6.3. Also need to declare
> column datatype before the query.
>
>
> [1] http://www.vldb.org/pvldb/vol11/p1835-samwel.pdf
>
> On Fri, Sep 7, 2018 at 9:47 AM Paul Rogers 
> wrote:
>
> > Hi All,
> >
> > We've discussed quite a few times whether Drill should or should not
> > support or require schemas, and if so, how the user might express the
> > schema.
> >
> > I came across a paper [1] that suggests a simple, elegant SQL extension:
> >
> > EXTRACT [:] {,[:]}
> > FROM 
> >
> > Paraphrasing into Drill's SQL:
> >
> > SELECT [:][AS ] {,[:][AS ]}
> > FROM 
> >
> > Have a collection of JSON files in which string column `foo` appears in
> > only half the files? Don't want to get schema conflicts with VARCHAR and
> > nullable INT? Just do:
> >
> > SELECT name:VARCHAR, age:INT, foo:VARCHAR
> > FROM `my-dir` ...
> >
> > Not only can the syntax be used to specify the "natural" type for a
> > column, it might also specify a preferred type. For example. "age:INT"
> says
> > that "age" is an INT, even though JSON would normally parse it as a
> BIGINT.
> > Similarly, using this syntax is a easy way to tell Drill how to convert
> CSV
> > columns from strings to DATE, INT, FLOAT, etc. without the need for CAST
> > functions. (CAST functions read the data in one format, then convert it
> to
> > another in a Project operator. Using a column type might let the reader
> do
> > the conversion -- something that is easy to implement if using the
> "result
> > set loader" mechanism.)
> >
> > Plus, the syntax fits nicely into the existing view file structure. If
> the
> > types appear in views, then client tools can continue to use standard SQL
> > without the type information.
> >
> > When this idea came up in the past, someone mentioned the issue of
> > nullable vs. non-nullable. (Let's also include arrays, since Drill
> supports
> > that. Maybe add a suffix to the the name:
> >
> > SELECT req:VARCHAR NOT NULL, opt:INT NULL, arr:FLOAT[] FROM ...
> >
> > Not pretty, but works with the existing SQL syntax rules.
> >
> > Obviously, Drill has much on its plate, so not suggestion that Drill
> > should do this soon. Just passing it along as yet another option to
> > consider.
> >
> > Thanks,
> > - Paul
> >
> > [1] http://www.cs.columbia.edu/~jrzhou/pub/Scope-VLDBJ.pdf
>


Re: Contrib Plugin Question

2018-09-03 Thread Arina Yelchiyeva
Hi Charles.

Recently new udfs module was added under contrib, you can can take look at
that PR for the example.

Regarding unit tests and data availability:
1. create TEST resources folder where you'll copy your data.
2. use dirTestWatcher to copy data to the root / tmp / custom directory.
3. query the data:

@Test
public void t() {
  dirTestWatcher.copyResourceToRoot(Paths.get("complex_1.parquet"));
  queryBuilder().sql("select * from dfs.`root`.`complex_1.parquet`").printCsv();
}


Also I would suggest you create module formats under contrib and place your
format plugin module under formats: contrib / formats / format-plugin.

On Mon, Sep 3, 2018 at 4:38 PM Charles Givre  wrote:

> Hello all,
> I’m working on a format-plugin for syslog (RFC-5424) data and I’m having
> some strange issues.  I’d like to submit this contribution in the contrib/
> folder, however I cannot seem to get Drill to recognize the module.  I’ve
> built the module separately, and the code works, however, when I try to
> build Drill, it does not recognize the module and I cannot query my data
> (and the unit tests fail).  I’ve added the module to the contrib pom.xml
> file and added the module to the assemble/bin.xml and still no luck.
> Here’s the REALLY weird part.  I can run the unit tests in IntelliJ and it
> works but it does not if I run the tests from the command line.
>
> The code can be found here:
> https://github.com/cgivre/drill/tree/format-syslog <
> https://github.com/cgivre/drill/tree/format-syslog>
>
> Does anyone have any suggestions?
>
> —C
>
>
>


Re: Contrib Plugin Question

2018-09-03 Thread Arina Yelchiyeva
Personally, I did not have such problems with IntelliJ, did not try Eclipse, so 
can’t tell. Maybe somebody else can chime in.

Kind regards,
Arina 

> On Sep 3, 2018, at 10:23 PM, Paul Rogers  wrote:
> 
> I've been helping Charles with this. He's got a branch that works some times, 
> but not others.
> 
> * If I run his unit test from Eclipse, it works.
> * If I run his unit test from the command line with Maven, it works.
> * If he runs his unit test using the mechanism he is using, Drill can't find 
> his class.
> 
> The question is, has anyone who uses IntelliJ run into a similar issue? Do 
> you recall what you had to fix?
> 
> I'll continue to work with Charles to figure out what, exactly, he is doing 
> to run the test to see if we can zero in on the problem. But would be great 
> if someone said, "Yeah, I had that issue and I had to ..."
> 
> Thanks,
> - Paul
> 
> 
> 
>On Monday, September 3, 2018, 7:40:18 AM PDT, Arina Yelchiyeva 
>  wrote:  
> 
> Hi Charles.
> 
> Recently new udfs module was added under contrib, you can can take look at
> that PR for the example.
> 
> Regarding unit tests and data availability:
> 1. create TEST resources folder where you'll copy your data.
> 2. use dirTestWatcher to copy data to the root / tmp / custom directory.
> 3. query the data:
> 
> @Test
> public void t() {
>   dirTestWatcher.copyResourceToRoot(Paths.get("complex_1.parquet"));
>   queryBuilder().sql("select * from 
> dfs.`root`.`complex_1.parquet`").printCsv();
> }
> 
> 
> Also I would suggest you create module formats under contrib and place your
> format plugin module under formats: contrib / formats / format-plugin.
> 
>> On Mon, Sep 3, 2018 at 4:38 PM Charles Givre  wrote:
>> 
>> Hello all,
>> I’m working on a format-plugin for syslog (RFC-5424) data and I’m having
>> some strange issues.  I’d like to submit this contribution in the contrib/
>> folder, however I cannot seem to get Drill to recognize the module.  I’ve
>> built the module separately, and the code works, however, when I try to
>> build Drill, it does not recognize the module and I cannot query my data
>> (and the unit tests fail).  I’ve added the module to the contrib pom.xml
>> file and added the module to the assemble/bin.xml and still no luck.
>> Here’s the REALLY weird part.  I can run the unit tests in IntelliJ and it
>> works but it does not if I run the tests from the command line.
>> 
>> The code can be found here:
>> https://github.com/cgivre/drill/tree/format-syslog <
>> https://github.com/cgivre/drill/tree/format-syslog>
>> 
>> Does anyone have any suggestions?
>> 
>> —C
>> 
>> 


Re: storage plugin test case

2018-09-26 Thread Arina Yelchiyeva
This can also help:

1. create TEST resources folder where you'll copy your data.
2. use dirTestWatcher to copy data to the root / tmp / custom directory.
3. query the data:

@Test
public void t() {
  dirTestWatcher.copyResourceToRoot(Paths.get("complex_1.parquet"));
  queryBuilder().sql("select * from dfs.`root`.`complex_1.parquet`").printCsv();
}


On Wed, Sep 26, 2018 at 12:37 PM Vitalii Diravka  wrote:

> Hi Jean-Claude
>
> BaseTestQuery is deprecated. Please use ClusterTest instead.
> See TestCsv.java for example.
>
> You can find more info about Drill Cluster-Fixture-Framework here:
> https://github.com/paul-rogers/drill/wiki/Cluster-Fixture-Framework
>
> On Wed, Sep 26, 2018 at 12:00 AM Jean-Claude Cote 
> wrote:
>
> > I have writing a msgpack storage plugin from drill.
> > https://github.com/jcmcote/drill/tree/master/contrib/storage-msgpack
> >
> > I'm now trying to write test cases like
> >
> > testBuilder()
> > .sqlQuery("select * from cp.`msgpack/testBasic.mp`")
> > .ordered()
> > .baselineColumns("a").baselineValues("1").baselineValues("1")
> > .baselineColumns("b").baselineValues("2").baselineValues("2")
> > .build().run();
> >
> > However when I run the test case it says it cannot find the
> > msgpack/testBasic.mp file. However it is in my src/test/resources folder.
> >
> > Should this work? I'm I going at it the right way?
> > Thanks
> > jc
> >
>


Re: storage plugin test case

2018-09-26 Thread Arina Yelchiyeva
Taking into the account that your code is in the contrib module, modifying
boostrap-storage-plugins.json does not make any sense.
If you need to add your own format in unit tests, as Vitalii pointed out,
TestCsv is a good example for this.

Kind regards,
Arina

On Wed, Sep 26, 2018 at 7:07 PM Jean-Claude Cote  wrote:

> I found the cause of the problem I had. It was not due to the fact that the
> classloader did not find the resource. It is that my new FormatPlugin was
> not registered into the
> drill\exec\java-exec\src\main\resources\bootstrap-storage-plugins.json
>
> cp: {
>   type: "file",
>   connection: "classpath:///",
>   formats: {
> "msgpack" : {
>   type: "msgpack",
>   extensions: [ "mp" ]
> },
>
> So the resource was rejected. I've added this entry to the
> boostrap-storage-plugins.json and now it works.
>
> Thanks for all your help.
> Jean-Claude
>
>
>
> On Wed, Sep 26, 2018 at 7:18 AM, Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
>
> > This can also help:
> >
> > 1. create TEST resources folder where you'll copy your data.
> > 2. use dirTestWatcher to copy data to the root / tmp / custom directory.
> > 3. query the data:
> >
> > @Test
> > public void t() {
> >   dirTestWatcher.copyResourceToRoot(Paths.get("complex_1.parquet"));
> >   queryBuilder().sql("select * from dfs.`root`.`complex_1.parquet`
> > ").printCsv();
> > }
> >
> >
> > On Wed, Sep 26, 2018 at 12:37 PM Vitalii Diravka 
> > wrote:
> >
> > > Hi Jean-Claude
> > >
> > > BaseTestQuery is deprecated. Please use ClusterTest instead.
> > > See TestCsv.java for example.
> > >
> > > You can find more info about Drill Cluster-Fixture-Framework here:
> > > https://github.com/paul-rogers/drill/wiki/Cluster-Fixture-Framework
> > >
> > > On Wed, Sep 26, 2018 at 12:00 AM Jean-Claude Cote 
> > > wrote:
> > >
> > > > I have writing a msgpack storage plugin from drill.
> > > > https://github.com/jcmcote/drill/tree/master/contrib/storage-msgpack
> > > >
> > > > I'm now trying to write test cases like
> > > >
> > > > testBuilder()
> > > > .sqlQuery("select * from cp.`msgpack/testBasic.mp`")
> > > > .ordered()
> > > > .baselineColumns("a").baselineValues("1").baselineValues("1")
> > > > .baselineColumns("b").baselineValues("2").baselineValues("2")
> > > > .build().run();
> > > >
> > > > However when I run the test case it says it cannot find the
> > > > msgpack/testBasic.mp file. However it is in my src/test/resources
> > folder.
> > > >
> > > > Should this work? I'm I going at it the right way?
> > > > Thanks
> > > > jc
> > > >
> > >
> >
>


Re: storage plugin test case

2018-09-27 Thread Arina Yelchiyeva
Jean-Claude,

this is a good question. Previously new formats were added into the exec
module and thus modifying bootstrap-storage-plugins.json was making sense.
Since now we are adding new format plugins into the contrib directory, this
raises the question on how to publish new format in dfs storage plugin.
In StoragePluginRegistryImpl if we find several
bootstrap-storage-plugins.json we load data from all of them but if there
is a duplicate between storage plugins (ex: dfs), the duplicate would be
ignored.
Basically, we need to implement new mechanism that would merge two storage
plugins formats.

P.S.
1. There is a difference between storage plugin and storage format, I
believe you have implemented the second.
2. I suggest we put all formats under contrib/format/.., so they are not
confused with plugins.

Kind regards,
Arina

On Thu, Sep 27, 2018 at 12:54 PM Vitalii Diravka  wrote:

> I'm not sure what you exactly mean. All *bootstrap-storage-plugins.json*
> are registered and stored in Persistent Store
> for the first fresh instantiating of Drill, see
> *StoragePluginRegistryImpl.loadBootstrapPlugins()
> *for details.
>
> I have noticed that you have a typo in the
> *bootstrap-storage-plugins.json *file
> name (boostrap -> *bootstrap*).
>
> Also you can use *storage-plugins-override.conf *for configuring plugins
> configs during start-up [1].
>
> [1]
>
> https://drill.apache.org/docs/configuring-storage-plugins/#configuring-storage-plugins-with-the-storage-plugins-override.conf-file
>
>
> On Wed, Sep 26, 2018 at 10:17 PM Jean-Claude Cote 
> wrote:
>
> > I see how the cluster.makeDataDir call will setup the configuration for a
> > MsgpackFormatConfig. I'll update my test cases to use it.
> >
> > However there is another question related to this. If I don't modify the
> > boostrap-storage-plugins.json then when I launch the drill-embeded it
> does
> > not know of the MsgpackFormatConfig. I need to use the web console to
> edit
> > the storage plugins.
> >
> > Is there a better way?
> > Thanks
> > Jean-Claude
> >
> >
> > On Wed, Sep 26, 2018 at 1:17 PM, Arina Yelchiyeva <
> > arina.yelchiy...@gmail.com> wrote:
> >
> > > Taking into the account that your code is in the contrib module,
> > modifying
> > > boostrap-storage-plugins.json does not make any sense.
> > > If you need to add your own format in unit tests, as Vitalii pointed
> out,
> > > TestCsv is a good example for this.
> > >
> > > Kind regards,
> > > Arina
> > >
> > > On Wed, Sep 26, 2018 at 7:07 PM Jean-Claude Cote 
> > wrote:
> > >
> > > > I found the cause of the problem I had. It was not due to the fact
> that
> > > the
> > > > classloader did not find the resource. It is that my new FormatPlugin
> > was
> > > > not registered into the
> > > >
> drill\exec\java-exec\src\main\resources\bootstrap-storage-plugins.json
> > > >
> > > > cp: {
> > > >   type: "file",
> > > >   connection: "classpath:///",
> > > >   formats: {
> > > > "msgpack" : {
> > > >   type: "msgpack",
> > > >   extensions: [ "mp" ]
> > > > },
> > > >
> > > > So the resource was rejected. I've added this entry to the
> > > > boostrap-storage-plugins.json and now it works.
> > > >
> > > > Thanks for all your help.
> > > > Jean-Claude
> > > >
> > > >
> > > >
> > > > On Wed, Sep 26, 2018 at 7:18 AM, Arina Yelchiyeva <
> > > > arina.yelchiy...@gmail.com> wrote:
> > > >
> > > > > This can also help:
> > > > >
> > > > > 1. create TEST resources folder where you'll copy your data.
> > > > > 2. use dirTestWatcher to copy data to the root / tmp / custom
> > > directory.
> > > > > 3. query the data:
> > > > >
> > > > > @Test
> > > > > public void t() {
> > > > >
>  dirTestWatcher.copyResourceToRoot(Paths.get("complex_1.parquet"));
> > > > >   queryBuilder().sql("select * from dfs.`root`.`complex_1.parquet`
> > > > > ").printCsv();
> > > > > }
> > > > >
> > > > >
> > > > > On Wed, Sep 26, 2018 at 12:37 PM Vitalii Diravka <
> vita...@apache.org
> > >
> > > > > wrote:
> > > > >
> > &g

Re: Apache Drill support for S3 Version 4 API | Hadoop new version libraries

2019-01-23 Thread Arina Yelchiyeva
Hi Harsh,

there is an open Jira ticket for this with the details [1]. Contributions
are definitely are welcomed.

[1] https://issues.apache.org/jira/browse/DRILL-6540

Kind regards,
Arina

On Wed, Jan 23, 2019 at 7:38 PM Harsh Choudhary 
wrote:

> Hi
>
> As I can see Apache Drill is still running on the libraries of Hadoop 2.7.4
> version. One thing which does not work with this version of Hadoop is AWS
> S3 Signature Version 4 API, which is the only APIs available in newer Data
> Centers like Mumbai. This I first noticed in June 2017. I thought
> eventually Hadoop version would be upgraded and it would be fixed but
> Hadoop Version is still not 2.8. Is there any specific reason for this? Can
> you refer me to any open ticket on this and what are the other challenges
> which I am not seeing? Maybe I can contribute to this.
>
> *Thanks!*
> Harsh Choudhary
>


Re: [jira] [Created] (DRILL-6898) Web UI cannot be used without internet connection (jquery loaded from ajax.googleapis.com)

2018-12-12 Thread Arina Yelchiyeva
Mostly like this is fixed in
https://issues.apache.org/jira/browse/DRILL-6776.

Kind regards,
Arina

On Wed, Dec 12, 2018 at 6:08 PM Charles Givre  wrote:

> As someone who regularly works from isolated networks, we really should
> make sure that the UI does not have a dependency on an internet
> connection.  It’s exceedingly frustrating when you can’t use a tool because
> of something like this.
> —C
>
> > On Dec 12, 2018, at 10:50, Paul Bormans (JIRA)  wrote:
> >
> > Paul Bormans created DRILL-6898:
> > ---
> >
> > Summary: Web UI cannot be used without internet connection
> (jquery loaded from ajax.googleapis.com)
> > Key: DRILL-6898
> > URL: https://issues.apache.org/jira/browse/DRILL-6898
> > Project: Apache Drill
> >  Issue Type: Improvement
> >  Components: Web Server
> >Affects Versions: 1.14.0
> >Reporter: Paul Bormans
> >
> >
> > When opening the web ui in an environment that does not have an internet
> connection, then the jquery js library is not loaded and the website does
> not function as it should.
> >
> > One solution can be to add a configuration option to use local/packages
> javascript libraries iso loading these from a CDN.
> >
> >
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v7.6.3#76005)
>
>


Re: [DISCUSS] 1.15.0 release

2018-12-03 Thread Arina Yelchiyeva
Looks like this is a release blocker:
https://issues.apache.org/jira/browse/DRILL-6877
This is a regression after DRILL-6039.

Kind regards,
Arina

On Thu, Nov 29, 2018 at 6:24 PM Vitalii Diravka  wrote:

> Hello drillers!
>
> I am going to merge the changes for 5 PR with ready-to-commit label
> (DRILL-6867, DRILL-6866, DRILL-6863, DRILL-6792, DRILL-6039)
> and then to start the release process.
> Please do not merge any new commits to master branch.
>
>
> Kind regards
> Vitalii
>
>
> On Tue, Nov 27, 2018 at 5:40 PM Vitalii Diravka 
> wrote:
>
> > Hi all!
> > Thanks for updating tickets.
> >
> > There are couple of tickets, which are almost done and would be nice to
> > include them to 1.15 release:
> > Can be merged:
> > DRILL-6864 ben-zvi Root POM: Update the git-commit-id plugin
> > DRILL-6039 vdonapati drillbit.sh graceful_stop does not wait for
> > fragments to complete before stopping the drillbitAlmost ready to commit:
> > DRILL-6867 le.louch WebUI Query editor cursor position
> > DRILL-6792 weijie Find the right probe side fragment to any storage
> plugin
> > DRILL-6806 timothyfarkas Start moving code for handling a partition in
> > HashAgg into a separate class
> > Also I am planning to include DRILL-6562: Upgrade to SqlLine 1.6.0 and my
> > work - DRILL-6562: Plugin Management improvements.
> > I suppose these tickets will be addressed in the next few days, then I
> > will start release process.
> >
> > Kind regards
> > Vitalii
> >
> >
> > On Mon, Nov 26, 2018 at 7:50 PM Vitalii Diravka 
> > wrote:
> >
> >> Hi all!
> >>
> >> I found the issue DRILL-6828 [1], which is introduced in DRILL-6381. I
> >> think it is an important one, since the issue is a degradation and it
> >> blocks working with HashPartitionSender exchange operator.
> >> Boaz, since you are assigned to the jira, could you please take a look?
> >>
> >> Charles, your "REST metadata" work is merged. Regarding "Syslog plugin"
> I
> >> will do review, but as usual process. I mean it should not block the
> >> release.
> >>
> >> Team, these are tickets, which should be updated to make the release in
> >> time. Please change the release version to 1.16, if it is not a blocker
> and
> >> requires additional work.
> >> Issue key Assignee Summary
> >> DRILL-6845 ben-zvi Eliminate duplicates for Semi Hash Join
> >> DRILL-6864 ben-zvi Root POM: Update the git-commit-id plugin
> >> DRILL-6867 le.louch WebUI Query editor cursor position
> >> DRILL-6849 weijie Runtime filter queries with nested broadcast returns
> >> wrong results
> >> DRILL-6838 weijie Query with Runtime Filter fails with
> >> IllegalStateException: Memory was leaked by query
> >> DRILL-6792 weijie Find the right probe side fragment to any storage
> >> plugin
> >> DRILL-6791 Paul.Rogers Merge scan projection framework into master
> >> DRILL-6039 vdonapati drillbit.sh graceful_stop does not wait for
> >> fragments to complete before stopping the drillbit
> >> DRILL-6806 timothyfarkas Start moving code for handling a partition in
> >> HashAgg into a separate class
> >> DRILL-6543 ben-zvi Option for memory mgmt: Reserve allowance for
> >> non-buffered
> >> DRILL-6863 KazydubB Drop table is not working if path within workspace
> >> starts with '/'
> >> DRILL-6253 timothyfarkas HashAgg Unit Testing And Refactoring
> >> DRILL-6032 timothyfarkas Use RecordBatchSizer to estimate size of
> >> columns in HashAgg
> >> DRILL-6623 karthikm Drill encounters exception
> >> IndexOutOfBoundsException: writerIndex: -8373248 (expected:
> readerIndex(0)
> >> <= writerIndex <= capacity(32768))
> >> DRILL-6245 vdonapati Clicking on anything redirects to main login page
> >> [1]
> >>
> https://issues.apache.org/jira/browse/DRILL-6828?focusedCommentId=16699152=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16699152
> >>
> >>
> >> Kind regards
> >> Vitalii
> >>
> >>
> >> On Tue, Nov 20, 2018 at 3:06 PM Charles Givre  wrote:
> >>
> >>> Hi @Vitalii
> >>> Are you sure?  The metadata commit is really pretty minor now that I
> >>> cleaned it up.  If nobody can review the Syslog format plugin until the
> >>> next version, that’s fine, but I don’t think it should be a big deal to
> >>> review either.
> >>> Best,
> >>> — C
> >>>
> >>> > On Nov 20, 2018, at 05:20, Vitalii Diravka 
> wrote:
> >>> >
> >>> > @Charles Your changes require some time to pass review, it is better
> to
> >>> > consider them to the next release.
> >>> > All other mentioned issues are resolved.
> >>> >
> >>> > Team, please verify the tickets which you are responsible for and
> >>> update
> >>> > them accordingly [1].
> >>> > If there are no blockers we can consider to include one more
> >>> batch-commit
> >>> > to the 1.15.0 Drill release.
> >>> > Therefore 27.11 can be the final cut-off date.
> >>> >
> >>> > [1]
> >>> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=185
> >>> >
> >>> > Kind regards
> >>> > Vitalii
> >>> >
> >>> >
> >>> > On Wed, Nov 7, 2018 at 2:24 AM Vitalii Diravka 
> >>> 

Apache Drill Hangout - 08 Jan, 2019

2019-01-08 Thread Arina Yelchiyeva
Hi all,

Apache Drill hangout will be held today at 10 am PST.
Link - http://meet.google.com/yki-iqdf-tai 
 

Gautam / Karthik is planning to talk about Statistics project.
If there are any another questions / topics, feel free to send them as reply to 
this email or join the hangout.

Kind regards,
Arina

Re: Plugin Info in INFORMATION_SCHEMA

2019-01-04 Thread Arina Yelchiyeva
Hi Charles.

Please look into information_schema.schema table, specifically in type
column.
Also there is describe schema command [1] but it returns result in json.

[1] https://drill.apache.org/docs/describe/

Kind regards,
Arina

On Fri, Jan 4, 2019 at 8:21 AM Charles Givre  wrote:

> Hello Drillers,
> I am working on integrating Drill with a BI tool, and I’m wondering if
> there is any way to get information about a storage plugin via the
> INFORMATION_SCHEMA or other query?   Specifically, I want to be able to
> determine whether a given storage plugin is a ‘file’ plugin or not.
> Thanks!
> — C


Re: [IDEAS] Drill start up quotes

2018-09-11 Thread Arina Yelchiyeva
Some quotes ideas:

drill never goes out of style
everything is easier with drill

Kunal,
regarding config, sounds reasonable, I'll do that.

Kind regards,
Arina


On Tue, Sep 11, 2018 at 12:17 AM Benedikt Koehler 
wrote:

> You told me to drill sergeant! (Forrest Gump)
>
> Benedikt
> @furukama
>
>
> Kunal Khatua  schrieb am Mo. 10. Sep. 2018 um 21:01:
>
> > +1 on the suggestion.
> >
> > I would also suggest that we change the backend implementation of the
> > quotes to refer to a properties file (within the classpath) rather than
> > have it hard coded within the SqlLine package.  This will ensure that new
> > quotes can be added with every release without the need to touch the
> > SqlLine fork for Drill.
> >
> > ~ Kunal
> > On 9/10/2018 7:06:59 AM, Arina Ielchiieva  wrote:
> > Hi all,
> >
> > we are close to SqlLine 1.5.0 upgrade which now has the mechanism to
> > preserve Drill customizations. This one does include multiline support
> but
> > the next release might.
> > You all know that one of the Drill customizations is quotes at startup. I
> > was thinking we might want to fresh up the list a little bit.
> >
> > Here is the current list:
> >
> > start your sql engine
> > this isn't your grandfather's sql
> > a little sql for your nosql
> > json ain't no thang
> > drill baby drill
> > just drill it
> > say hello to my little drill
> > what ever the mind of man can conceive and believe, drill can query
> > the only truly happy people are children, the creative minority and drill
> > users
> > a drill is a terrible thing to waste
> > got drill?
> > a drill in the hand is better than two in the bush
> >
> > If anybody has new serious / funny / philosophical / creative quotes
> > ideas, please share and we can consider adding them to the existing list.
> >
> > Kind regards,
> > Arina
> >
> --
>
> --
> Dr. Benedikt Köhler
> Kreuzweg 4 • 82131 Stockdorf
> Mobil: +49 170 333 0161 • Telefon: +49 89 857 45 84
> Mail: bened...@eigenarbeit.org
>


Re: [DISCUSSION] CI for Drill

2018-09-12 Thread Arina Yelchiyeva
+1, especially if other Apache project uses it, there should not be any
issues with Apache.

Kind regards,
Arina

On Wed, Sep 12, 2018 at 12:36 AM Timothy Farkas  wrote:

> +1 For trying out Circle CI. I've used it in the past, and I think the UI
> is much better than Travis.
>
> Tim
>
> On Tue, Sep 11, 2018 at 8:21 AM Vitalii Diravka  >
> wrote:
>
> > Recently we discussed Travis build failures and there were excluded more
> > tests to make Travis happy [1]. But looks like the issue returned back
> and
> > Travis build fails intermittently.
> >
> > I tried to find other solution instead of exclusion Drill unit tests and
> > found other good CI - CircleCI [2]. Looks like this CI will allow to run
> > all unit tests successfully.
> > And it offers good conditions for open-source projects [3] (even OS X
> > environment is available).
> > The example of Apache project, which uses this CI is Apache Cassandra [4]
> >
> > My quick set-up of CircleCI for Drill still fails, but it should be just
> > configured properly [5].
> >
> > I think we can try CircleCI in parallel with Travis and if it works well,
> > we will move completely to CircleCI.
> > Does it make sense? Maybe somebody faced with it and knows some
> limitations
> > or complexities?
> >
> > [1]
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6559=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=Q-8LHY-5W3frk1S48j4jsEgmHOKPowwFtsEHM9Fp_g4=0oL67ROsJWhMDYzDS-y3Ch-ibgsfKQph8tN0I0jsB1o=
> > [2]
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_DefinitelyTyped_DefinitelyTyped_issues_20308-23issuecomment-2D342115544=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=Q-8LHY-5W3frk1S48j4jsEgmHOKPowwFtsEHM9Fp_g4=k1Q44t4uWwCoA0fUVtaoKHaXEMq4Gtf97k0ST1YjGNs=
> > [3]
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__circleci.com_pricing_=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=Q-8LHY-5W3frk1S48j4jsEgmHOKPowwFtsEHM9Fp_g4=2XHpHg1fhBVMrNA2HuZJCWl08PQ3SqJ0r0Kd3L9wqao=
> > [4]
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_trunk_.circleci_config.yml=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=Q-8LHY-5W3frk1S48j4jsEgmHOKPowwFtsEHM9Fp_g4=RiP35johSh3iM0LkqEDGuuMH_F9Hy4LBrtFOqCcTYQ4=
> > [5]
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__circleci.com_gh_vdiravka_drill_tree_circleCI=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=Q-8LHY-5W3frk1S48j4jsEgmHOKPowwFtsEHM9Fp_g4=TAY_BXixKtv88mMRkXzSFcIlJ5bYxigcAK0RbJsFlPU=
> >
> > Kind regards
> > Vitalii
> >
>


Re: Drill not compiling after rebase!!

2019-04-02 Thread Arina Yelchiyeva
tps://bugs.openjdk.java.net/browse/JDK-8066974.
>>>>>> Charles, Hanu, could you please share you JDK versions, on my
>>>>>> machine 1.8.0_191 and everything works fine.
>>>>>> 
>>>>>> Also, could you please check whether specifying types explicitly will
>>>>> help:
>>>>>> *expr.accept(new FieldReferenceFinder(), null)* *->*
>>>>> *expr.,
>>>>>> Void, RuntimeException>accept(new FieldReferenceFinder(), null)*
>>>>>> 
>>>>>> Kind regards,
>>>>>> Volodymyr Vysotskyi
>>>>>> 
>>>>>> 
>>>>>> On Mon, Apr 1, 2019 at 10:40 PM Charles Givre 
>>>> wrote:
>>>>>> 
>>>>>>> Hi Hanu,
>>>>>>> I posted code that fixed this to the list.  Once I did that, it
>> worked
>>>>>>> fine.
>>>>>>> —C
>>>>>>> 
>>>>>>>> On Apr 1, 2019, at 15:39, hanu mapr  wrote:
>>>>>>>> 
>>>>>>>> Hello All,
>>>>>>>> 
>>>>>>>> The exact function which is causing this error is the following.
>>>>>>>> 
>>>>>>>> public static RowsMatch evalFilter(LogicalExpression expr,
>>>>>>>> MetadataBase.ParquetTableMetadataBase footer,
>>>>>>>>int rowGroupIndex, OptionManager
>>>>>>>> options, FragmentContext fragmentContext) throws Exception {
>>>>>>>> 
>>>>>>>> and also for the caller functions in TestParquetFilterPushDown all
>>>>> along.
>>>>>>>> 
>>>>>>>> I think evalFilter needs to catch the Exception or throw an
>>>> Exception.
>>>>>>>> I just tried this, didn't put much thought into it. So I think this
>>>>>>>> Exception needs to be handled properly.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> -Hanu
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Apr 1, 2019 at 12:20 PM hanu mapr 
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hello All,
>>>>>>>>> 
>>>>>>>>> I am also getting the same error which Charles got on compilation
>> of
>>>>> the
>>>>>>>>> latest build.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Here is the message which I got.
>>>>>>>>> 
>>>>>>>>> [ERROR]
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> /Users/hmaduri/contribs/APACHE/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68]
>>>>>>>>> error: unreported exception E; must be caught or declared to be
>>>> thrown
>>>>>>>>> where E,T,V are type-variables:
>>>>>>>>> E extends Exception declared in method
>>>>>>>>> accept(ExprVisitor,V)
>>>>>>>>> T extends Object declared in method
>>>>>>> accept(ExprVisitor,V)
>>>>>>>>> V extends Object declared in method
>>>>>>> accept(ExprVisitor,V)
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> -Hanu
>>>>>>>>> 
>>>>>>>>> On Mon, Apr 1, 2019 at 11:09 AM Abhishek Girish <
>> agir...@apache.org
>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hey Charles,
>>>>>>>>>> 
>>>>>>>>>> On the latest apache/drill master, I don't see any errors during
>>>>> build
>>>>>>> /
>>>>>>>>>> running unit tests. But sometimes I've seen this issue with stale
>>>>>>>>>> artifacts.. Can you clear all maven artifacts from your local
>> maven
>>>>>>> repo
>>>>>>>>>> cache and build master again (or with -U option)?
>>>>>>>>>>

Re: [DISCUSS]: Hadoop 3

2019-04-03 Thread Arina Yelchiyeva
Looks like we don’t have much of a choice if we want to support Hadoop 3.

Kind regards,
Arina

> On Apr 2, 2019, at 7:40 PM, Vitalii Diravka  wrote:
> 
> Hi devs!
> 
> I am working on the update of Hadoop libs to the 3.2.0 version [1].
> I found the issue in *hadoop-common* related to several loggers in the
> project [2], [3].
> So to update the version of hadoop libs in Drill it is necessary to remove
> *commons-logging* from banned dependencies [4].
> After doing it I didn't find conflicts between two logger libs in Drill.
> 
> Is this solution acceptable?
> It can be temporary until [3] is fixed.
> 
> 
> 
> [1] https://issues.apache.org/jira/browse/DRILL-6540
> [2]
> https://issues.apache.org/jira/browse/DRILL-6540?focusedCommentId=16606306=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16606306
> [3] https://issues.apache.org/jira/browse/HADOOP-15749
> [4] https://github.com/apache/drill/blob/master/pom.xml#L522
> 
> 
> Kind regards
> Vitalii


Re: Drill not compiling after rebase!!

2019-04-01 Thread Arina Yelchiyeva
Hi Charles,

Build on the latest commit is successful - 
https://travis-ci.org/apache/drill/builds/514145219?utm_source=github_status_medium=notification
 

Git does not always rebase smoothly, even if it writes that rebase was 
successful.

Kind regards,
Arina

> On Apr 1, 2019, at 5:20 PM, Charles Givre  wrote:
> 
> All, 
> I just rebased Drill with the latest commits and it no longer builds.  I’m 
> getting the following errors:
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile 
> (default-compile) on project drill-java-exec: Compilation failure
> [ERROR] 
> /Users/cgivre/github/drill-dev/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68]
>  error: unreported exception E; must be caught or declared to be thrown
> [ERROR]   where E,T,V are type-variables:
> [ERROR] E extends Exception declared in method 
> accept(ExprVisitor,V)
> [ERROR] T extends Object declared in method 
> accept(ExprVisitor,V)
> [ERROR] V extends Object declared in method 
> accept(ExprVisitor,V)
> [ERROR]
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :drill-java-exec



Re: [DISCUSS] Whether to create separate 2.0 branch

2019-02-28 Thread Arina Yelchiyeva
+1 for option a.
I think backward compatibility can be handled via temp options, for interfaces 
since we use Java 8 default methods can be used.
In case of more extreme cases, they should be postponed till 2.0 branch is 
created.

Kind regards,
Arina

> On Feb 28, 2019, at 7:25 PM, Pritesh Maker  wrote:
> 
> I think option (a) is better in terms of maintaining branches. With option
> (a) how do we handle a situation where one of these features breaks
> backward compatibility or deprecates some functionality?
> 
> Pritesh
> 
> On Wed, Feb 27, 2019 at 4:23 PM Abhishek Girish  wrote:
> 
>> My opinion would be option (a) as well. It's easier to maintain a single
>> master branch. With a separate v2 branch, it's twice the effort to test
>> common commits going in (either directly or via later via rebase).
>> 
>> On Wed, Feb 27, 2019 at 12:40 PM Aman Sinha  wrote:
>> 
>>> My personal preference would be option (a)  as much as possible until we
>>> get to a situation where it is getting too unwieldy at which point we
>>> re-evaluate.
>>> 
>>> Aman
>>> 
>>> On Wed, Feb 27, 2019 at 12:35 PM Aman Sinha  wrote:
>>> 
 Hi Drill devs,
 There are couple of ongoing projects - Resource Manager and the Drill
 Metastore - that are relatively large in scope.  Intermediate PRs will
>> be
 created for these (for example, there's one open for the metastore [1].
 Another one for the RM [2].  These don't currently break existing
 functionality, so they have been opened against master branch.
 
 The question is, for future PRs,  would it make sense to create a
>>> separate
 Drill 2.0 branch ?  There are pros and cons.  Separate branch would
>> allow
 development on these features to proceed at a faster pace without
 disrupting others.  However, in Drill we typically have only created a
 separate branch close to the release, not up-front.  It simplifies
>>> testing
 and maintenance to have a unified master branch.
 
 Another option is feature specific branch.
 
 What do people think about the 3 options:
 a)  Merge intermediate PRs into Apache master as long as they don't
>>> break
 existing functionality.  In some cases, temporary config options may be
 used to enable new functionality for unit testing.
 b)  Create a Drill-2.0 branch which will be work-in-progress and be
 periodically sync-ed with master branch.  Code reviews will be done
>>> against
 this branch.
 c)   Have a feature specific branch - e.g for RM, for Metastore etc.
>> such
 that collaborators can do peer reviews and merge intermediate commits.
 These branches will also need to be periodically sync-ed with the
>> master
 branch.
 
 Please share your choice of one of these options and any additional
 thoughts.
 
 [1] https://github.com/apache/drill/pull/1646
 [2] https://github.com/apache/drill/pull/1652
 
>>> 
>> 



Re: Drill Protocol Buffers update

2019-03-11 Thread Arina Yelchiyeva
Hi Anton,

Thanks for making the upgrade!

Kind regards,
Arina

> On Mar 7, 2019, at 7:57 PM, Anton Gozhiy  wrote:
> 
> Hi everyone,
> 
> I'd like to inform you that Protocol Buffers library used in Drill was
> updated to version 3.6.1 (was 2.5.0). Readme files were updated accordingly.
> Please use the new version if you need to re-generate protobufs.
> Note that we still use proto2 syntax. Updating to proto3 is addressed by
> the JIRA ticket: https://issues.apache.org/jira/browse/DRILL-7040.
> 
> -- 
> Sincerely, Anton Gozhiy
> anton5...@gmail.com



Re: [DISCUSS] Including Features that Need Regular Updating?

2019-03-22 Thread Arina Yelchiyeva
I am not sure that library update should be included in the release process. On 
the contrary, I think it should be done via regular PR process. Of course, you 
always have to find a volunteer :)

Kind regards,
Arina

> On Mar 22, 2019, at 8:39 PM, Bob Rudis  wrote:
> 
> Two rly  UDFs, too!
> 
> Charles: many of the FOSS (or even non-FOSS) security tools we both likely 
> tend to use in either of those contexts put the onus of updating the core DBs 
> on the end-user (and try to go out of their way in the docs re: need to do 
> that).
> 
> I think as long as the version included with any given Drill release is 
> stated along with verbiage re: need to update (perhaps even provide an update 
> script) would be fine.
> 
> One question I'd have for the MM DB is where would 
> https://support.maxmind.com/geolite-faq/general/what-do-i-need-to-do-to-meet-the-geolite2-attribution-requirement/
>  have to go to meet the letter and spirit of the requirement? Would it need 
> to be in the console? Just on the web site? At $WORK we license it for use 
> in-product so it's not something we have to care abt but in-Drill would seem 
> to be something that'd need to be ironed out.
> 
>> On Mar 22, 2019, at 14:31, Charles Givre  wrote:
>> 
>> Boaz,
>> If that’s the case… I’ll have 2 PRs for Drill v 1.17 ;-)
>> 
>>> On Mar 22, 2019, at 14:25, Boaz Ben-Zvi  wrote:
>>> 
>>> Hi Charles,
>>> 
>>>   If these updates are only small simple tasks, it would not be a big issue 
>>> to add them to the Drill Release Process (see [1]).
>>> 
>>> BTW, most of the release work is automated via a script (see section 4 in 
>>> [1]); so if these updates could be automated as well, it would be a trivial 
>>> matter,
>>> 
>>>  Thanks for your useful contributions,
>>> 
>>>   -- Boaz
>>> 
>>> [1] https://github.com/parthchandra/drill/wiki/Drill-Release-Process
>>> 
 On 3/22/19 11:13 AM, Charles Givre wrote:
 Hello all,
 I have a question regarding new Drill features.  I have two UDFs which 
 I’ve been considering submitting but these UDFs will require regular 
 updating.  The features in question are UDFs to do IP Geolocation and a 
 User Agent Parser.  The IP geolocation is dependent on the MaxMind 
 database and associated libraries.  Basically if it were to be included in 
 Drill, every release we would have to update the MaxMind DB.  (This is 
 done in many tools that rely on it for IP Geolocation)
 
 The other is the user agent parser.   Likewise, the only updating it would 
 need would be to update the pom.xml file to reflect the latest version of 
 the UA parser.  These are both very useful features for security analysis 
 but I wanted to ask the Drill developer community if this is something we 
 wanted to consider.
 — C
>> 
> 


Re: [DISCUSS] Format plugins in contrib module

2019-02-06 Thread Arina Yelchiyeva
Created Jira - https://issues.apache.org/jira/browse/DRILL-7030

Kind regards,
Arina

On Tue, Feb 5, 2019 at 4:58 PM Vitalii Diravka  wrote:

> Absolutely agree with Arina.
>
> I think the core Format Plugins for Parquet, Json and CSV, TSV, PSV files
> (which are used for creating Drill tables) can be left in current config
> file
> and the rest ones should be factored out to the separate config files along
> with creating separate modules in Drill *contrib *module.
>
> Therefore the process of creating the new plugins will be more transparent.
>
> Kind regards
> Vitalii
>
>
> On Tue, Feb 5, 2019 at 3:12 PM Charles Givre  wrote:
>
> > I’d concur with Arina’s suggestion.  I do think this would be useful and
> > make it easier to make plugins “pluggable”.
> > In the meantime, should we recommend that developers of format-plugins
> > include their plugins in the bootstrap-storage-plugins.json?  I was
> > thinking also that we might want to have some guidelines for unit tests
> for
> > format plugins.  I’m doing some work on the HTTPD format plugin and found
> > some issues which cause it to throw NPEs.
> > — C
> >
> >
> > > On Feb 5, 2019, at 06:40, Arina Yelchiyeva  >
> > wrote:
> > >
> > > Hi all,
> > >
> > > Before we were adding new formats / plugins into the exec module.
> > Eventually we came up to the point that exec package size is growing and
> > adding plugin and format contributions is better to separate out in the
> > different module.
> > > Now we have contrib module where we add such contributions. Plugins are
> > pluggable, there are added automatically by means of having
> > drill-module.conf file which points to the scanning packages.
> > > Format plugins are using the same approach, the only problem is that
> > they are not added into bootstrap-storage-plugins.json. So when adding
> new
> > format plugin, in order for it to automatically appear in Drill Web UI,
> > developer has to update bootstrap file which is in the exec module.
> > > My suggestion we implement some functionality that would merge format
> > config with the bootstrap one. For example, each plugin would have to
> have
> > bootstrap-format.json file with the information to which plugin format
> > should be added (structure the same as in
> bootstrap-storage-plugins.json):
> > > Example:
> > >
> > > {
> > >  "storage":{
> > >dfs: {
> > >  formats: {
> > >"psv" : {
> > >  type: "msgpack",
> > >  extensions: [ "mp" ]
> > >}
> > >  }
> > >}
> > >  }
> > > }
> > >
> > > Then during Drill start up such bootstrap-format.json files will be
> > merged with bootstrap-storage-plugins.json.
> > >
> > >
> > > Current open PR for adding new format plugins:
> > > Format plugin for LTSV files -
> https://github.com/apache/drill/pull/1627
> > > SYSLOG (RFC-5424) Format Plugin -
> > https://github.com/apache/drill/pull/1530
> > > Msgpack format reader - https://github.com/apache/drill/pull/1500
> > >
> > > Any suggestions?
> > >
> > > Kind regards,
> > > Arina
> >
> >
>


Re: Travis CI improvements

2019-02-08 Thread Arina Yelchiyeva
Great improvements. Thanks, Vova!

Kind regards,
Arina

> On Feb 8, 2019, at 1:35 PM, Vova Vysotskyi  wrote:
> 
> Hi all,
> 
> Recently there are more PRs which require changes in protobuf files, but
> sometimes contributors may forget to regenerate them. It was the reason for
> creating DRILL-7031 .
> 
> Before the fix for this Jira, there was a single job which builds Drill,
> checks licenses and runs unit tests.
> In the fix for this Jira, this job was splitted into two jobs: the first
> one only runs unit tests and the second one builds Drill, checks the
> licenses and regenerates both Java and C++ protobuf files. For the case, if
> after regeneration changes are found, job will fail.
> 
> So time required for finishing Travis job is reduced to 29 minutes (time of
> the longest job), but total time for both jobs exceeds current (was 32
> mins, but now 29+15 mins). Current build bay be found here:
> https://travis-ci.org/apache/drill/builds/490483425?utm_source=github_status_medium=notification
> 
> Except for the check for changes in protobuf files, in the case of failure
> will be printed diff for changed classes which may be copied and applied as
> a patch. Build with the failed protobuf check may be found here:
> https://travis-ci.org/vdiravka/drill/jobs/490016316.
> 
> Kind regards,
> Volodymyr Vysotskyi



[DISCUSS] Format plugins in contrib module

2019-02-05 Thread Arina Yelchiyeva
Hi all,

Before we were adding new formats / plugins into the exec module. Eventually we 
came up to the point that exec package size is growing and adding plugin and 
format contributions is better to separate out in the different module.
Now we have contrib module where we add such contributions. Plugins are 
pluggable, there are added automatically by means of having drill-module.conf 
file which points to the scanning packages.
Format plugins are using the same approach, the only problem is that they are 
not added into bootstrap-storage-plugins.json. So when adding new format 
plugin, in order for it to automatically appear in Drill Web UI, developer has 
to update bootstrap file which is in the exec module.
My suggestion we implement some functionality that would merge format config 
with the bootstrap one. For example, each plugin would have to have 
bootstrap-format.json file with the information to which plugin format should 
be added (structure the same as in bootstrap-storage-plugins.json):
Example:

{
  "storage":{
dfs: {
  formats: {
"psv" : {
  type: "msgpack",
  extensions: [ "mp" ]
}
  }
}
  }
}

Then during Drill start up such bootstrap-format.json files will be merged with 
bootstrap-storage-plugins.json.


Current open PR for adding new format plugins:
Format plugin for LTSV files - https://github.com/apache/drill/pull/1627
SYSLOG (RFC-5424) Format Plugin - https://github.com/apache/drill/pull/1530 
Msgpack format reader - https://github.com/apache/drill/pull/1500

Any suggestions?

Kind regards,
Arina

Re: [DISCUSS] 1.16.0 release

2019-04-15 Thread Arina Yelchiyeva
The problem with leaving master branch open for commits while RC is prepared is 
that commits that won’t be cherry-picked into 1.16, will be still committed 
into 1.16 SNAPSHOT, since usually master is updated to 1.17 SNAPSHOT only after 
the release.
But in reality they belong to 1.17 version. That’s why I though we actually 
closed master for commits for the period of the release before.

Kind regards,
Arina

> On Apr 11, 2019, at 3:20 AM, Sorabh Hamirwasia  
> wrote:
> 
> Hi All,
> I have created 1.16.0 branch with commit to revert protobuf change. There
> is only 1 blocker right now which Abhsihek shared. Meanwhile we will be
> running regression tests and performance tests to see if there are any
> other issues before creating RC0. All the commits for blocking issues will
> be cherry-picked to this 1.16.0 branch so master branch is still open for
> commits.
> 
> Blocker: DRILL-7166: Tests doing count(* ) with wildcards in table name are
> querying metadata cache and returning wrong results
> 
> 1.16.0 Branch - https://github.com/apache/drill/commits/1.16.0
> 
> Thanks,
> Sorabh
> 
> On Wed, Apr 10, 2019 at 5:15 PM Abhishek Girish  wrote:
> 
>> FYI - DRILL-7166  is a
>> blocker for the release and is being actively worked on by Jyothsna.
>> 
>> On Mon, Apr 8, 2019 at 3:59 PM Sorabh Hamirwasia 
>> wrote:
>> 
>>> Hi All,
>>> I will create 1.16.0 branch later today and will merge in all
>>> ready-to-commit JIRA's. Please try to close most of the PR's by EOD today
>>> (around 5:30 P.M. PST). Once the branch is created I will work on
>> reverting
>>> the protobuf changes on 1.16.0 branch.
>>> 
>>> There are few other JIRA's which are in review and almost ready.
>> Tomorrow,
>>> whichever PR is categorized as blocker or must-have for release will only
>>> be cherry-picked.
>>> 
>>> Thanks,
>>> Sorabh
>>> 
>>> On Thu, Apr 4, 2019 at 4:29 PM Sorabh Hamirwasia <
>> sohami.apa...@gmail.com>
>>> wrote:
>>> 
 Hi All,
 A gentle reminder about the release cut-off date of *Apr 8, 2019.*
 
   - Currently there are *12 JIRA's [1] in Reviewable state* and I am
   seeing progress being made on these PR's. Please try to resolve all
>>> the
   review comments for your PR by Monday.
   - Also currently there are *10 JIRA's which are in Open and In
   Progress [2]* state, given there are only few days left until
>> cut-off
   date please either move these JIRA's to 1.17 or try to make it to
   ready-to-commit (code review complete and approved) state by Monday
>>> EOD.
 
 [1]:
 
>>> 
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20DRILL%20AND%20issuetype%20in%20(standardIssueTypes()%2C%20subTaskIssueTypes())%20AND%20fixVersion%20%3D%201.16.0%20AND%20status%20%3D%20Reviewable%20AND%20(labels%20!%3D%20ready-to-commit%20OR%20labels%20is%20EMPTY)%20ORDER%20BY%20assignee%20ASC
 [2]:
 
>>> 
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20DRILL%20AND%20issuetype%20in%20(standardIssueTypes()%2C%20subTaskIssueTypes())%20AND%20fixVersion%20%3D%201.16.0%20AND%20(status%20%3D%20Open%20OR%20status%20%3D%20%22In%20Progress%22)%20ORDER%20BY%20status%20DESC
 
 Thanks,
 Sorabh
 
 On Mon, Mar 25, 2019 at 9:37 AM Sorabh Hamirwasia <
>>> sohami.apa...@gmail.com>
 wrote:
 
> Hi Aman
> Thanks for the information.
> 
> Given we need 2 weeks of estimated dev effort for parquet metadata
> caching and other open items have also estimated similar remaining
>>> effort,
> I would like to propose first release cut-off date as *Apr 8, 2019.*
>>> Please
> try to include all the major feature work by then. Let's try to close
>>> all
> the open and in progress items [1]
> 
> [1]:
> 
>>> 
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20DRILL%20AND%20issuetype%20in%20(standardIssueTypes()%2C%20subTaskIssueTypes())%20AND%20fixVersion%20%3D%201.16.0%20AND%20(status%20%3D%20Open%20OR%20status%20%3D%20%22In%20Progress%22)%20ORDER%20BY%20status%20DESC%2C%20assignee%20ASC
> 
> Thanks,
> Sorabh
> 
> On Fri, Mar 22, 2019 at 1:48 PM Aman Sinha 
>> wrote:
> 
>> Hi Sorabh,
>> for the Parquet Metadata caching improvements, we are estimating 2
>> more
>> weeks for the feature development.
>> This does not include bug fixing if any blocker bugs are discovered
>> during
>> functional testing.
>> Hope that helps with setting the release cut-off date.
>> 
>> Aman
>> 
>> 
>> On Tue, Mar 19, 2019 at 5:23 PM Sorabh Hamirwasia <
>> sohami.apa...@gmail.com>
>> wrote:
>> 
>>> Hi All,
>>> Thanks for your response. We discussed about release today in
>>> hangout.
>> Most
>>> of the JIRA's shared in this thread will be considered for 1.16
>> since
>> they
>>> need at most a week or two to wrap up.
>>> 

Apache Drill Hangout - June 11, 2019

2019-06-11 Thread Arina Yelchiyeva
Hi all,

We will have our bi-weekly hangout June 11th, at 10 AM PST (link: 
https://meet.google.com/yki-iqdf-tai ).

If there are any topics you would like to discuss during the hangout please 
respond to this email.

Kind regards,
Arina

Re: Strange metadata from Text Reader

2019-06-24 Thread Arina Yelchiyeva
Just to confirm, in Drill 1.15 it works correctly?

Kind regards,
Arina

> On Jun 24, 2019, at 10:15 PM, Charles Givre  wrote:
> 
> Hi Arina, 
> It doesn't seem to make a difference unfortunately. :-(
> --C 
> 
>> On Jun 24, 2019, at 3:09 PM, Arina Yelchiyeva  
>> wrote:
>> 
>> Hi Charles,
>> 
>> Please try with v3 reader enabled: set `exec.storage.enable_v3_text_reader` 
>> = true.
>> Does it behave the same?
>> 
>> Kind regards,
>> Arina
>> 
>>> On Jun 24, 2019, at 9:38 PM, Charles Givre  wrote:
>>> 
>>> Hello Drill Devs,
>>> I'm noticing some strange behavior with the newest version of Drill.  If 
>>> you query a CSV file, you get the following metadata:
>>> 
>>> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
>>> 
>>> {
>>> "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>>> "columns": [
>>>  "domain"
>>> ],
>>> "rows": [
>>>  {
>>>"domain": "thedataist.com"
>>>  }
>>> ],
>>> "metadata": [
>>>  "VARCHAR(0, 0)",
>>>  "VARCHAR(0, 0)"
>>> ],
>>> "queryState": "COMPLETED",
>>> "attemptedAutoLimit": 0
>>> }
>>> 
>>> 
>>> There are two issues here:
>>> 1.  VARCHAR now has precision 
>>> 2.  There are twice as many columns as there should be.
>>> 
>>> Additionally, if you query a regular CSV, without the columns extracted, 
>>> you get the following:
>>> 
>>> "rows": [
>>>  {
>>>"columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
>>>  }
>>> ],
>>> "metadata": [
>>>  "VARCHAR(0, 0)",
>>>  "VARCHAR(0, 0)"
>>> ],
>>> 
>>> This is bizarre in that the data type is not being reported correctly, it 
>>> should be LIST or something like that, AND we're getting too many columns 
>>> in the metadata.  I'll submit a JIRA as well, but could someone please take 
>>> a look?
>>> Thanks,
>>> -- C
>>> 
>>> 
>>> 
>> 
> 



Re: Strange metadata from Text Reader

2019-06-24 Thread Arina Yelchiyeva
Hi Charles,

Please try with v3 reader enabled: set `exec.storage.enable_v3_text_reader` = 
true.
Does it behave the same?

Kind regards,
Arina

> On Jun 24, 2019, at 9:38 PM, Charles Givre  wrote:
> 
> Hello Drill Devs,
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> 
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> 
> {
>  "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>  "columns": [
>"domain"
>  ],
>  "rows": [
>{
>  "domain": "thedataist.com"
>}
>  ],
>  "metadata": [
>"VARCHAR(0, 0)",
>"VARCHAR(0, 0)"
>  ],
>  "queryState": "COMPLETED",
>  "attemptedAutoLimit": 0
> }
> 
> 
> There are two issues here:
> 1.  VARCHAR now has precision 
> 2.  There are twice as many columns as there should be.
> 
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> 
> "rows": [
>{
>  "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
>}
>  ],
>  "metadata": [
>"VARCHAR(0, 0)",
>"VARCHAR(0, 0)"
>  ],
> 
> This is bizarre in that the data type is not being reported correctly, it 
> should be LIST or something like that, AND we're getting too many columns in 
> the metadata.  I'll submit a JIRA as well, but could someone please take a 
> look?
> Thanks,
> -- C
> 
> 
> 



Re: Strange metadata from Text Reader

2019-06-24 Thread Arina Yelchiyeva
It would be good to help to identify the commit that actually caused the bug.
Personally, I don’t recall anything that might have broken this functionality.

Kind regards,
Arina

> On Jun 24, 2019, at 10:19 PM, Charles Givre  wrote:
> 
> I don't have that version of Drill anymore but this feature worked correctly 
> until recently.  I'm using the latest build of Drill. 
> 
>> On Jun 24, 2019, at 3:18 PM, Arina Yelchiyeva  
>> wrote:
>> 
>> Just to confirm, in Drill 1.15 it works correctly?
>> 
>> Kind regards,
>> Arina
>> 
>>> On Jun 24, 2019, at 10:15 PM, Charles Givre  wrote:
>>> 
>>> Hi Arina, 
>>> It doesn't seem to make a difference unfortunately. :-(
>>> --C 
>>> 
>>>> On Jun 24, 2019, at 3:09 PM, Arina Yelchiyeva  
>>>> wrote:
>>>> 
>>>> Hi Charles,
>>>> 
>>>> Please try with v3 reader enabled: set 
>>>> `exec.storage.enable_v3_text_reader` = true.
>>>> Does it behave the same?
>>>> 
>>>> Kind regards,
>>>> Arina
>>>> 
>>>>> On Jun 24, 2019, at 9:38 PM, Charles Givre  wrote:
>>>>> 
>>>>> Hello Drill Devs,
>>>>> I'm noticing some strange behavior with the newest version of Drill.  If 
>>>>> you query a CSV file, you get the following metadata:
>>>>> 
>>>>> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
>>>>> 
>>>>> {
>>>>> "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>>>>> "columns": [
>>>>> "domain"
>>>>> ],
>>>>> "rows": [
>>>>> {
>>>>>  "domain": "thedataist.com"
>>>>> }
>>>>> ],
>>>>> "metadata": [
>>>>> "VARCHAR(0, 0)",
>>>>> "VARCHAR(0, 0)"
>>>>> ],
>>>>> "queryState": "COMPLETED",
>>>>> "attemptedAutoLimit": 0
>>>>> }
>>>>> 
>>>>> 
>>>>> There are two issues here:
>>>>> 1.  VARCHAR now has precision 
>>>>> 2.  There are twice as many columns as there should be.
>>>>> 
>>>>> Additionally, if you query a regular CSV, without the columns extracted, 
>>>>> you get the following:
>>>>> 
>>>>> "rows": [
>>>>> {
>>>>>  "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
>>>>> }
>>>>> ],
>>>>> "metadata": [
>>>>> "VARCHAR(0, 0)",
>>>>> "VARCHAR(0, 0)"
>>>>> ],
>>>>> 
>>>>> This is bizarre in that the data type is not being reported correctly, it 
>>>>> should be LIST or something like that, AND we're getting too many columns 
>>>>> in the metadata.  I'll submit a JIRA as well, but could someone please 
>>>>> take a look?
>>>>> Thanks,
>>>>> -- C
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 



Re: Multi char csv delimiter

2019-06-24 Thread Arina Yelchiyeva
Hi Matthias,

Attachments are not supported on the mailing list, please include text 
describing your configuration.

Kind regards,
Arina

> On Jun 24, 2019, at 2:21 PM, Rosenthaler Matthias (PS-DI/ETF1.1) 
>  wrote:
> 
> Hi,
>  
> It seems that multi char delimiter “\n\r” is not supported for csv format 
> drill 1.16.
> The documentation mentions it should work, but it does not work for me. It 
> always says “invalid JSON syntax” if I try to change the storage plugin 
> configuration.
>  
> 
>  
> Mit freundlichen Grüßen / Best regards 
> 
> Matthias Rosenthaler
> 
> Powertrain Solutions, Engine Testing (PS-DI/ETF1.1) 
> Robert Bosch AG | Robert-Bosch-Straße 1 | 4020 Linz | AUSTRIA | www.bosch.at 
>  
> Tel. +43 732 7667-479 | matthias.rosentha...@at.bosch.com 
>  
> 
> Sitz: Robert Bosch Aktiengesellschaft, A-1030 Wien, Göllnergasse 15-17 , 
> Registergericht: FN 55722 w HG-Wien
> Aufsichtsratsvorsitzender: Dr. Uwe Thomas; Geschäftsführung: Dr. Klaus Peter 
> Fouquet
> DVR-Nr.: 0418871- ARA-Lizenz-Nr.: 1831 - UID-Nr.: ATU14719303 - Steuernummer 
> 140/4988



Re: [DISCUSS] New approach for using Drill-Calcite fork

2019-06-25 Thread Arina Yelchiyeva
Hi Volodymyr,

Thanks for starting this discussion.
I wish Drill did not have to maintain its own Drill Calcite fork but even 
latest Calcite release made us to revert one of the commits since it makes 
queries to hang.
Hosting Drill Calcite in Mapr repository is definitely a big problem when it 
comes to give access for committers and PMCs, especially to deploy artifacts.

+1 for this proposal and I suggest next Calcite upgrade follows this strategy.
I would also suggest to create md document in docs/dev describing reasons for 
custom Calcite version and list of commits not accepted by Calcite community, 
process of adding new changes etc.

Kind regards,
Arina

> On Jun 24, 2019, at 4:22 PM, Volodymyr Vysotskyi  wrote:
> 
> Hi all,
> 
> Currently, Calcite fork with Drill-specific commits is placed in
> https://github.com/mapr/incubator-calcite. Though it is a public
> repository, it is problematic to provide writable access for most of the
> cases.
> 
> Another more frequent problem is deploying new Drill-Calcite versions to
> the maven repository (currently they are deployed to
> http://repository.mapr.com/nexus/content/repositories/drill-optiq/). Only
> several people have writable access for it, and there is no way to provide
> it to more people, in particular committers and PMCs.
> 
> To resolve these problems, I propose to create a personal repository (I can
> create it, here is a test version:
> https://github.com/vvysotskyi/drill-calcite) and add all committers and
> PMCs as collaborators to it.
> 
> To resolve the problem with deploys I propose to use https://jitpack.io, so
> it will automatically deploy a newer version when it will be required.
> The minor thing which should be mentioned - instead of *org.apache.calcite*
> groupId will be used groupId for specific GitHub repo.
> 
> The following rules will be used to clarify the process:
> - Push changes to the repository (including commit with bumping up the
> version)
> - Create a new tag for bumped-up version - tag name should match the
> version, for example, *1.18.0-drill-r2* and push it to the remote repo
> - Bump up version in Drill
> 
> Are there are any objections or ideas on this?
> 
> Kind regards,
> Volodymyr Vysotskyi



Re: [VOTE] Apache Drill Release 1.16.0 - RC2

2019-04-30 Thread Arina Yelchiyeva
Downloaded binary tarball and ran Drill in embedded mode.
Verified schema provisioning for text files, dynamic UDFs.
Ran random queries, including long-running, queried system tables, created 
tables with different formats.
Checked Web UI (queries, profiles, storage plugins, logs pages).

+1 (binding)

Kind regards,
Arina

> On Apr 30, 2019, at 8:33 AM, Aman Sinha  wrote:
> 
> Downloaded binary tarball on my Mac  and ran in embedded mode.
> Verified Sorabh's release signature and the tar file's checksum
> Did a quick glance through maven artifacts
> Did some manual tests with TPC-DS  Web_Sales table and ran REFRESH METADATA
> command against the same table
> Checked runtime query profiles of above queries and verified COUNT(*),
> COUNT(column) optimization is getting applied.
> Also did a build from source on my linux VM.
> 
> RC2 looks good !   +1
> 
> On Fri, Apr 26, 2019 at 8:28 AM SorabhApache  wrote:
> 
>> Hi Drillers,
>> I'd like to propose the third release candidate (RC2) for the Apache Drill,
>> version 1.16.0.
>> 
>> Changes since the previous release candidate:
>> DRILL-7201: Strange symbols in error window (Windows)
>> DRILL-7202: Failed query shows warning that fragments has made no progress
>> DRILL-7207: Update the copyright year in NOTICE.txt file
>> DRILL-7212: Add gpg key with apache.org email for sorabh
>> DRILL-7213: drill-format-mapr.jar contains stale git.properties file
>> 
>> The RC2 includes total of 220 resolved JIRAs [1].
>> Thanks to everyone for their hard work to contribute to this release.
>> 
>> The tarball artifacts are hosted at [2] and the maven artifacts are hosted
>> at [3].
>> 
>> This release candidate is based on commit
>> 751e87736c2ddbc184b52cfa56f4e29c68417cfe located at [4].
>> 
>> Please download and try out the release candidate.
>> 
>> The vote ends at 04:00 PM UTC (09:00 AM PDT, 07:00 PM EET, 09:30 PM IST),
>> May 1st, 2019
>> 
>> [ ] +1
>> [ ] +0
>> [ ] -1
>> 
>> Here is my vote: +1
>>  [1]
>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12344284
>>  [2] http://home.apache.org/~sorabh/drill/releases/1.16.0/rc2/
>>  [3]
>> https://repository.apache.org/content/repositories/orgapachedrill-1073/
>>  [4] https://github.com/sohami/drill/commits/drill-1.16.0
>> 
>> Thanks,
>> Sorabh
>> 



Re: [VOTE] Apache Drill Release 1.16.0 - RC0

2019-04-19 Thread Arina Yelchiyeva
Fix is ready: https://github.com/apache/drill/pull/1757

> On Apr 19, 2019, at 2:36 PM, Anton Gozhiy  wrote:
> 
> Reported a regression:
> https://issues.apache.org/jira/browse/DRILL-7186
> 
> On Fri, Apr 19, 2019 at 2:10 AM Bob Rudis  wrote:
> 
>> Thx Sorabh!
>> 
>> On Thu, Apr 18, 2019 at 16:23 SorabhApache  wrote:
>> 
>>> Hi Bob,
>>> With protobuf change both JDBC and ODBC will need to be updated but for
>>> 1.16 release this change was reverted since it will take some time to
>>> prepare for drivers with latest protobuf versions. In the original JIRA
>>> there is a comment stating that the commit is reverted on 1.16 branch and
>>> will add the commit id once the release branch is finalized.
>>> 
>>> JIRA: https://issues.apache.org/jira/browse/DRILL-6642
>>> Commit that revert's the change [1]:
>>> 6eedd93dadf6d7d4f745f99d30aee329976c2191
>>> 
>>> [1]: https://github.com/sohami/drill/commits/drill-1.16.0
>>> 
>>> Thanks,
>>> Sorabh
>>> 
>>> On Thu, Apr 18, 2019 at 1:12 PM Bob Rudis  wrote:
>>> 
 Q abt the RC (and eventual full release): Does
 https://issues.apache.org/jira/browse/DRILL-5509 mean that the ODBC
 drivers will need to be updated to avoid warning messages (and
 potential result set errors) as the was the case with a previous
 release or is this purely (as I read the ticket) a "make maven less
 unhappy" and no underlying formats are changing? If it does mean the
 ODBC drivers need changing, any chance there's a way to ensure the
 provider of those syncs it closer to release than the last time?
 
 On Thu, Apr 18, 2019 at 3:06 PM SorabhApache 
>> wrote:
> 
> Hi Drillers,
> I'd like to propose the first release candidate (RC0) for the Apache
 Drill,
> version 1.16.0.
> 
> The RC0 includes total of 211 resolved JIRAs [1].
> Thanks to everyone for their hard work to contribute to this release.
> 
> The tarball artifacts are hosted at [2] and the maven artifacts are
 hosted
> at [3].
> 
> This release candidate is based on commit
> 61cd2b779d85ff1e06947327fca2e076994796b5 located at [4].
> 
> Please download and try out the release.
> 
> The vote ends at 07:00 PM UTC (12:00 PM PDT, 10:00 PM EET, 12:30 AM
> IST(next day)), Apr 23rd, 2019
> 
> [ ] +1
> [ ] +0
> [ ] -1
> 
> Here is my vote: +1
> 
>  [1]
> 
 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12344284
>  [2] http://home.apache.org/~sorabh/drill/releases/1.16.0/rc0/
>  [3]
> 
>>> https://repository.apache.org/content/repositories/orgapachedrill-1066/
>  [4] https://github.com/sohami/drill/commits/drill-1.16.0
> 
> Thanks,
> Sorabh
 
>>> 
>> 
> 
> 
> -- 
> Sincerely, Anton Gozhiy
> anton5...@gmail.com



Re: [VOTE] Apache Drill Release 1.16.0 - RC1

2019-04-26 Thread Arina Yelchiyeva
Maybe we should include these scripts directly into Drill project (of course if 
Parth does not mind), maybe in doc module?
This way we will be able to modify them if needed using regular PR process and 
everybody will know where to find them.

Any thoughts?

Kind regards,
Arina

> On Apr 26, 2019, at 1:56 AM, SorabhApache  wrote:
> 
> Update:
> DRILL-7201 / DRILL-7202: Kunal has fixes for both and are ready-to-commit.
> DRILL-7213: drill-format-mapr.jar contains stale git.properties file
> 
>   - The main issue here was that in 1.14 and 1.15 drill-format-mapr.jar
>   was included with release tarballs where it shouldn't be. So Vova helped to
>   make a change in 1.16 to fix that and now format-mapr will not be included
>   in release tarballs, not any mapr specific jars. The PR is approved and
>   ready-to-commit.
> 
> DRILL-7212: Add gpg key with apache.org email for sorabh
> 
>   - PR is opened for it. I have added both my emails to the gpg key and is
>   signed by boaz.
> 
> DRILL-7207: Update the copyright year in NOTICE.txt file
> 
>   - PR is opened for this as well.
> 
> Last issue which Vova reported about files like
> *org.codehaus.plexus.compiler.javac.JavacCompiler1256088670033285178arguments*
> being included in jar. This was present in 1.14 as well but not in 1.15.
> The reason is it looks like different processes are followed for release.
> Every release done using the script (drill-release.sh)[1] will have above
> file. The reason is because the *mvn release:prepare* phase is done with -X
> flag which creates debug files and those are not excluded from maven-jar
> plugin configuration. After removing the -X option I am not seeing above
> files anymore and speed of prepare phase is increased significantly as
> well. Will submit a separate PR for this change in script post release.
> 
> Once all the changes are merged into master, I will re-prepare the RC
> candidate and share with the community.
> 
> [1]: https://github.com/parthchandra/stuff/tree/master/scripts
> 
> Thanks,
> Sorabh
> 
> On Thu, Apr 25, 2019 at 3:02 PM Kunal Khatua  wrote:
> 
>> UPDATE:
>> 
>> 
>> Both, DRILL-7201 and DRILL-7202 has been fixed and verified by Sorabh
>> and Arina, so we can have it as part of "RC2".
>> 
>> (Thanks for catching the issues, Arina ! )
>> 
>> ~ kunal
>> 
>> On 4/25/2019 4:12:16 AM, Volodymyr Vysotskyi  wrote:
>> Hi Sorabh,
>> 
>> I have noticed that jars in prebuild tar contain some strange files, for
>> example, *drill-jdbc-all-1.16.0.jar* contains the following files:
>> *javac.sh*
>> 
>> *org.codehaus.plexus.compiler.javac.JavacCompiler1256088670033285178arguments*
>> 
>> *org.codehaus.plexus.compiler.javac.JavacCompiler1458111453480208588arguments*
>> 
>> *org.codehaus.plexus.compiler.javac.JavacCompiler2392560589194600493arguments*
>> 
>> *org.codehaus.plexus.compiler.javac.JavacCompiler4475905192586529595arguments*
>> 
>> *org.codehaus.plexus.compiler.javac.JavacCompiler4524532450095901144arguments*
>> 
>> *org.codehaus.plexus.compiler.javac.JavacCompiler4670895443631397937arguments*
>> 
>> *org.codehaus.plexus.compiler.javac.JavacCompiler5215058338087807885arguments*
>> 
>> *org.codehaus.plexus.compiler.javac.JavacCompiler7526103232425779297arguments*
>> 
>> which contain some info about your machine (username, etc.)
>> 
>> Jars from the previous release didn't contain these files. Also, I have
>> built master on my machine and these files are absent for me.
>> 
>> Could you please take a look? This problem is observed for both RCs.
>> 
>> Kind regards,
>> Volodymyr Vysotskyi
>> 
>> 
>> On Thu, Apr 25, 2019 at 8:20 AM SorabhApache wrote:
>> 
>>> Update:
>>> 1) DRILL-7208 is there in 1.15 release as well, so it's not a blocker for
>>> 1.16
>>> 2) DRILL-7213: drill-format-mapr.jar contains stale git.properties file
>>> 
>>> - Still investigating on the above issue.
>>> 
>>> 3) DRILL-7201: Strange symbols in error window (Windows)
>>> 
>>> - Issue is not reproducible on Kunal's machine. He is having discussion
>>> on JIRA to see if it's treated as a blocker or not.
>>> 
>>> To investigate for DRILL-7213 I have to drop the RC1 candidate since
>> again
>>> performing the release required to push it to my remote repo and publish
>> to
>>> maven repo as well. So I don't have RC1 binaries if we consider all the
>>> issues as non-blocking.
>>> 
>>> I will re-share the RC candidate once either fix for
>> DRILL-7213/DRILL-7201
>>> are available or it's considered as non-blockers. Any thoughts?
>>> 
>>> Thanks,
>>> Sorabh
>>> 
>>> On Wed, Apr 24, 2019 at 9:17 PM Boaz Ben-Zvi wrote:
>>> 
 Downloaded both the binary and src tarballs, and verified the SHA
 signatures and the PGP.
 
 Built and ran the full unit tests on both Linux and Mac (took 3:05
>> hours
 on my Mac).
 
 Successfully ran some old favorite queries with Sort/Hash-Join/Hash-Agg
 spilling.
 
 Ran several manual tests of REFRESH METADATA with COLUMNS, and verified
 the metadata files and 

Re: [VOTE] Apache Drill Release 1.16.0 - RC1

2019-04-26 Thread Arina Yelchiyeva
Thank you, Sorabh!

> On Apr 26, 2019, at 6:42 PM, SorabhApache  wrote:
> 
> Sure. I will do it.
> 
> Thanks,
> Sorabh
> 
> On Fri, Apr 26, 2019 at 8:35 AM Arina Yelchiyeva 
> wrote:
> 
>> Sure, we can discuss it during Hangout.
>> Sorabh, since you are latest release manager, I am wondering if you would
>> volunteer to add scripts into Drill project? :)
>> At least something to start with.
>> 
>> Kind regards,
>> Arina
>> 
>>> On Apr 26, 2019, at 6:32 PM, SorabhApache  wrote:
>>> 
>>> +1 for including release scripts. I would also recommend adding a
>> README.md
>>> in the same location, which can include link to a wiki for the release
>>> process or all the instructions in it. Also it would be great if we can
>>> formulate some guidelines as to what kind of issues may not be considered
>>> as blocker for a release going forward. May be we can discuss in the next
>>> hangout as well.
>>> 
>>> Thanks,
>>> Sorabh
>>> 
>>>> On Fri, Apr 26, 2019 at 8:03 AM Aman Sinha  wrote:
>>>> 
>>>> +1 on including the release preparation script into the code base.
>> Location
>>>> TBD.  Perhaps under a separate 'release' subdirectory either in contrib
>> or
>>>> docs ?
>>>> 
>>>> On Fri, Apr 26, 2019 at 12:46 AM Arina Yelchiyeva <
>>>> arina.yelchiy...@gmail.com> wrote:
>>>> 
>>>>> Maybe we should include these scripts directly into Drill project (of
>>>>> course if Parth does not mind), maybe in doc module?
>>>>> This way we will be able to modify them if needed using regular PR
>>>> process
>>>>> and everybody will know where to find them.
>>>>> 
>>>>> Any thoughts?
>>>>> 
>>>>> Kind regards,
>>>>> Arina
>>>>> 
>>>>>> On Apr 26, 2019, at 1:56 AM, SorabhApache  wrote:
>>>>>> 
>>>>>> Update:
>>>>>> DRILL-7201 / DRILL-7202: Kunal has fixes for both and are
>>>>> ready-to-commit.
>>>>>> DRILL-7213: drill-format-mapr.jar contains stale git.properties file
>>>>>> 
>>>>>> - The main issue here was that in 1.14 and 1.15 drill-format-mapr.jar
>>>>>> was included with release tarballs where it shouldn't be. So Vova
>>>>> helped to
>>>>>> make a change in 1.16 to fix that and now format-mapr will not be
>>>>> included
>>>>>> in release tarballs, not any mapr specific jars. The PR is approved
>>>> and
>>>>>> ready-to-commit.
>>>>>> 
>>>>>> DRILL-7212: Add gpg key with apache.org email for sorabh
>>>>>> 
>>>>>> - PR is opened for it. I have added both my emails to the gpg key and
>>>>> is
>>>>>> signed by boaz.
>>>>>> 
>>>>>> DRILL-7207: Update the copyright year in NOTICE.txt file
>>>>>> 
>>>>>> - PR is opened for this as well.
>>>>>> 
>>>>>> Last issue which Vova reported about files like
>>>>>> 
>>>>> 
>>>> 
>> *org.codehaus.plexus.compiler.javac.JavacCompiler1256088670033285178arguments*
>>>>>> being included in jar. This was present in 1.14 as well but not in
>>>> 1.15.
>>>>>> The reason is it looks like different processes are followed for
>>>> release.
>>>>>> Every release done using the script (drill-release.sh)[1] will have
>>>> above
>>>>>> file. The reason is because the *mvn release:prepare* phase is done
>>>> with
>>>>> -X
>>>>>> flag which creates debug files and those are not excluded from
>>>> maven-jar
>>>>>> plugin configuration. After removing the -X option I am not seeing
>>>> above
>>>>>> files anymore and speed of prepare phase is increased significantly as
>>>>>> well. Will submit a separate PR for this change in script post
>> release.
>>>>>> 
>>>>>> Once all the changes are merged into master, I will re-prepare the RC
>>>>>> candidate and share with the community.
>>>>>> 
>>>>>> [1]: https://github.com/parthchandra/stuff/tree/master/scripts
&g

Re: [VOTE] Apache Drill Release 1.16.0 - RC1

2019-04-26 Thread Arina Yelchiyeva
Sure, we can discuss it during Hangout.
Sorabh, since you are latest release manager, I am wondering if you would 
volunteer to add scripts into Drill project? :)
At least something to start with.

Kind regards,
Arina

> On Apr 26, 2019, at 6:32 PM, SorabhApache  wrote:
> 
> +1 for including release scripts. I would also recommend adding a README.md
> in the same location, which can include link to a wiki for the release
> process or all the instructions in it. Also it would be great if we can
> formulate some guidelines as to what kind of issues may not be considered
> as blocker for a release going forward. May be we can discuss in the next
> hangout as well.
> 
> Thanks,
> Sorabh
> 
> On Fri, Apr 26, 2019 at 8:03 AM Aman Sinha  wrote:
> 
>> +1 on including the release preparation script into the code base. Location
>> TBD.  Perhaps under a separate 'release' subdirectory either in contrib or
>> docs ?
>> 
>> On Fri, Apr 26, 2019 at 12:46 AM Arina Yelchiyeva <
>> arina.yelchiy...@gmail.com> wrote:
>> 
>>> Maybe we should include these scripts directly into Drill project (of
>>> course if Parth does not mind), maybe in doc module?
>>> This way we will be able to modify them if needed using regular PR
>> process
>>> and everybody will know where to find them.
>>> 
>>> Any thoughts?
>>> 
>>> Kind regards,
>>> Arina
>>> 
>>>> On Apr 26, 2019, at 1:56 AM, SorabhApache  wrote:
>>>> 
>>>> Update:
>>>> DRILL-7201 / DRILL-7202: Kunal has fixes for both and are
>>> ready-to-commit.
>>>> DRILL-7213: drill-format-mapr.jar contains stale git.properties file
>>>> 
>>>>  - The main issue here was that in 1.14 and 1.15 drill-format-mapr.jar
>>>>  was included with release tarballs where it shouldn't be. So Vova
>>> helped to
>>>>  make a change in 1.16 to fix that and now format-mapr will not be
>>> included
>>>>  in release tarballs, not any mapr specific jars. The PR is approved
>> and
>>>>  ready-to-commit.
>>>> 
>>>> DRILL-7212: Add gpg key with apache.org email for sorabh
>>>> 
>>>>  - PR is opened for it. I have added both my emails to the gpg key and
>>> is
>>>>  signed by boaz.
>>>> 
>>>> DRILL-7207: Update the copyright year in NOTICE.txt file
>>>> 
>>>>  - PR is opened for this as well.
>>>> 
>>>> Last issue which Vova reported about files like
>>>> 
>>> 
>> *org.codehaus.plexus.compiler.javac.JavacCompiler1256088670033285178arguments*
>>>> being included in jar. This was present in 1.14 as well but not in
>> 1.15.
>>>> The reason is it looks like different processes are followed for
>> release.
>>>> Every release done using the script (drill-release.sh)[1] will have
>> above
>>>> file. The reason is because the *mvn release:prepare* phase is done
>> with
>>> -X
>>>> flag which creates debug files and those are not excluded from
>> maven-jar
>>>> plugin configuration. After removing the -X option I am not seeing
>> above
>>>> files anymore and speed of prepare phase is increased significantly as
>>>> well. Will submit a separate PR for this change in script post release.
>>>> 
>>>> Once all the changes are merged into master, I will re-prepare the RC
>>>> candidate and share with the community.
>>>> 
>>>> [1]: https://github.com/parthchandra/stuff/tree/master/scripts
>>>> 
>>>> Thanks,
>>>> Sorabh
>>>> 
>>>> On Thu, Apr 25, 2019 at 3:02 PM Kunal Khatua  wrote:
>>>> 
>>>>> UPDATE:
>>>>> 
>>>>> 
>>>>> Both, DRILL-7201 and DRILL-7202 has been fixed and verified by Sorabh
>>>>> and Arina, so we can have it as part of "RC2".
>>>>> 
>>>>> (Thanks for catching the issues, Arina ! )
>>>>> 
>>>>> ~ kunal
>>>>> 
>>>>> On 4/25/2019 4:12:16 AM, Volodymyr Vysotskyi 
>>> wrote:
>>>>> Hi Sorabh,
>>>>> 
>>>>> I have noticed that jars in prebuild tar contain some strange files,
>> for
>>>>> example, *drill-jdbc-all-1.16.0.jar* contains the following files:
>>>>> *javac.sh*
>>>>> 
>>>>> 
>>> 
>> *org.codehaus.plexus.compiler.javac.JavacCompil

Re: Writing storage plugin for elasticsearch

2019-10-02 Thread Arina Yelchiyeva
Please make sure you using RowSet Framework for the implementation - 
https://github.com/apache/drill/blob/master/docs/dev/RowSetFramework.md 
 

> On Oct 2, 2019, at 5:30 PM, Charles Givre  wrote:
> 
> Hi Badrul, 
> Arina was working on this, but from what I understand she's had to refocus on 
> other things.  Personally I think it would be great if we could implement an 
> ElasticSearch plugin for Drill, so I would say please go ahead and start a 
> proposal design.
> -- C
> 
> 
>> On Oct 1, 2019, at 5:45 PM, Badrul Chowdhury  
>> wrote:
>> 
>> Hi,
>> 
>> Has there been any progress on the following JIRA? I was thinking of
>> getting started on it by proposing a design for its implementation.
>> https://issues.apache.org/jira/browse/DRILL-3637
>> 
>> Thanks,
>> Badrul
> 



Re: [Discuss] Minor Release

2019-09-30 Thread Arina Yelchiyeva
I think minor release should be considered only if it affects many users. In 
this case we have report only from one user which according the Volodymyr’s 
reply has a workaround and is not reproducible for others.
Another thing is that SqlLine 1.9 release is not ready, we hope it would be 
really before next Drill release so we can include upgrade in Drill 1.17.

Kind regards,
Arina

> On Sep 27, 2019, at 10:25 PM, Ted Dunning  wrote:
> 
> Yes.
> 
> 
> 
> On Fri, Sep 27, 2019 at 11:09 AM Charles Givre  wrote:
> 
>> Hello all
>> There was a recent email to the user group about a blocking issue with
>> sqlline.  The issue was resolved in the latest version of sqlline however
>> it was preventing a user from executing queries.  In a situation like this
>> where a simple upgrade of a library fixes a major issue, would it make
>> sense to release a minor upgrade only including the updated library?
>> 
>> Sent from my iPhone



Re: Draft ASF Board Report: 2.0

2019-11-11 Thread Arina Yelchiyeva
+1

> On Nov 8, 2019, at 3:32 PM, Charles Givre  wrote:
> 
> All, 
> Below is the draft with some updates.  If anyone has anything else, please 
> get them to me over the weekend. 
> Thanks!
> -- C
> 
> ## Description:
> The mission of Drill is the creation and maintenance of software related to 
> Schema-free SQL Query Engine for Apache Hadoop, NoSQL and Cloud Storage
> 
> ## Issues:
> There are no issues requiring board attention at this time.
> 
> ## Membership Data:
> Apache Drill was founded 2014-11-18 (5 years ago)
> There are currently 55 committers and 24 PMC members in this project.
> The Committer-to-PMC ratio is roughly 7:3.
> 
> Community changes, past quarter:
> - No new PMC members. Last addition was Sorabh Hamirwasia on 2019-04-04.
> - No new committers. Last addition was Anton Gozhiy on 2019-07-22.
> 
> ## Project Activity:
> - Drill 1.16 was released on 2019-05-02.  
> - Drill 1.17 was delayed until end of November.  
> 
> ### Next Release
> The next release of Drill (1.17) resolved many issues and added a lot of new
> functionality including:
> - Enhanced Drill metastore 
> - Hive complex types support (arrays, structs, union)
> - Canonical Map support
> - Schema provisioning via table function
> - Empty parquet files read / write support
> - Run-time row group pruning
> - Numerous enhancements and upgrades to Drill with Hive
> - Format plugin for Excel Files
> - Format plugin for ESRI Shape Files
> - Add Variable Argument UDFs
> - Add UDF to parse user agent strings
> 
> ### Future Functionality in Development
> There are a number of enhancements for which there are active PRs or 
> discussions
> on the various boards.
> - Integration between Apache Drill and Apache Daffodil (Incubating)
> - Storage plugin for Apache Druid
> - Upgrading Drill to use Hadoop v. 3.0
> - Format plugin for HDF5
> 
> 
> ## Community Health:
> Drill seems to be recovering from the acquisition of Drill's major backer, 
> MapR.
> 
> ### Development Activity
> - 96 issues opened in JIRA (1% increase from last quarter)
> - 85 issues closed in JIRA (28% increase from last quarter)
> - 55 commits in past quarter (14% increase from last quarter)
> - 15 contributors from last quarter (25% increase) 
> - 53 PRs opened on GitHub (no change from last quarter)
> - 63 PRs closed on GitHub (no change from last quarter)
> 
> ### Email Lists
> - dev@drill.apache.org 
>  - 46% increase in traffic in past quarter (1574 compared to 1073)
> 
> - iss...@drill.apache.org 
>  - 47% increase in traffic in past quarter (2027 compared to 1377)
> 
> - us...@drill.apache.org 
> - 27% decrease in traffic in past quarter (116 compared to 157)



Re: [VOTE] Release Apache Drill 1.17.0 - RC0

2019-12-10 Thread Arina Yelchiyeva
And one more regression: https://issues.apache.org/jira/browse/DRILL-7476 


Kind regards,
Arina

> On Dec 10, 2019, at 1:59 PM, Igor Guzenko  wrote:
> 
> Hi all,
> 
> I've found regression with repeated maps in parquet [1]. Also I personally
> don't like that difference between previous and current tar.gz size is 124
> Mb.
> 
> My vote: -1.
> 
> [1] https://issues.apache.org/jira/browse/DRILL-7473
> 
> Thanks,
> Igor
> 
> 
> On Mon, Dec 9, 2019 at 2:47 PM Volodymyr Vysotskyi 
> wrote:
> 
>> Hi all,
>> 
>> I'd like to propose the first release candidate (RC0) of Apache Drill,
>> version 1.17.0.
>> 
>> The release candidate covers a total of 190 resolved JIRAs [1]. Thanks to
>> everyone who contributed to this release.
>> 
>> The tarball artifacts are hosted at [2] and the maven artifacts are hosted
>> at [3].
>> 
>> This release candidate is based on commit
>> 4171eeac876249731ccf86d116455dd8d53c44e9 located at [4].
>> 
>> The vote ends at 13:00 PM UTC (5:00 AM PST, 3:00 PM EET, 6:30 PM IST),
>> December 12, 2019.
>> 
>> [ ] +1
>> [ ] +0
>> [ ] -1
>> 
>> Here's my vote: +1
>> 
>> [1]
>> 
>> https://issues.apache.org/jira/secure/Releashttps://github.com/vvysotskyi/drill/commits/drill-1.17.0eNote.jspa?projectId=12313820=12344870
>> <
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12344870
>>> 
>> [2] http://home.apache.org/~volodymyr/drill/releases/1.17.0/rc0/
>> [3]
>> https://repository.apache.org/content/repositories/orgapachedrill-1075/
>> [4] https://github.com/vvysotskyi/drill/commits/drill-1.17.0
>> 
>> Kind regards,
>> Volodymyr Vysotskyi
>> 



Re: Admin Access to JIRA Repo

2019-12-10 Thread Arina Yelchiyeva
All Drill PMC members are Drill Jira administrators (if you use Apache Login).

> On Dec 10, 2019, at 2:54 PM, Charles Givre  wrote:
> 
> Hello Drill Devs,
> I'm wondering who is the admin for the Drill JIRA repo?  I saw on some other 
> projects that when you submit a PR, there is a template that you must use 
> when you submit the PR.  I think the same for submitting bugs.  Would it be 
> possible to set that up on ours?
> 
> Thanks,
> -- C



Re: Admin Access to JIRA Repo

2019-12-10 Thread Arina Yelchiyeva
Maybe some customizations can requested through INFRA.

> On Dec 10, 2019, at 3:06 PM, Charles Givre  wrote:
> 
> I meant to say github.  
> 
> Sent from my iPhone
> 
>> On Dec 10, 2019, at 08:04, Arina Yelchiyeva  
>> wrote:
>> 
>> All Drill PMC members are Drill Jira administrators (if you use Apache 
>> Login).
>> 
>>> On Dec 10, 2019, at 2:54 PM, Charles Givre  wrote:
>>> 
>>> Hello Drill Devs,
>>> I'm wondering who is the admin for the Drill JIRA repo?  I saw on some 
>>> other projects that when you submit a PR, there is a template that you must 
>>> use when you submit the PR.  I think the same for submitting bugs.  Would 
>>> it be possible to set that up on ours?
>>> 
>>> Thanks,
>>> -- C
>> 



Re: [ANNOUNCE] New PMC member: Ihor Guzenko

2019-12-13 Thread Arina Yelchiyeva
Congratulations, Ihor! 

> On Dec 13, 2019, at 3:38 PM, Volodymyr Vysotskyi  wrote:
> 
> I am pleased to announce that Drill PMC invited Ihor Guzenko to
> the PMC and he has accepted the invitation.
> 
> Congratulations Ihor and welcome!
> 
> - Vova
> (on behalf of Drill PMC)



Re: Drill fails for postgress if you use a Foreign Tables

2019-10-17 Thread Arina Yelchiyeva
Hi Erik,

Could you please provide full stacktrace from the logs?

Kind regards,
Arina

> On Oct 16, 2019, at 11:00 PM, Erik Anderson  wrote:
> 
> Short version:
> 
> SELECT * FROM `INFORMATION_SCHEMA`.`TABLES`;
> 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalArgumentException: Multiple entries with same key: vessel=JdbcTable 
> {vessel} and vessel=JdbcTable {vessel}
> 
> Long Version:
> 
> 1) Setup a JDBC driver in Drill to Postgres
> 2) Create a public foreign table like below in postgres
> 
> public   | vessel   | foreign table | postgres
> public   | vessel_movement  | foreign table | postgres
> public   | vessel_movement_hist | foreign table | postgres
> 
> 3) On windows install the MapR ODBC driver
> https://drill.apache.org/docs/installing-the-driver-on-windows/
> 
> 4) Setup an ODBC connection with the MapR
> 5) Now in the ODBC connections, use the "Drill Explorer"
> 
> The Drill explorer tries to run the query
> SELECT * FROM `INFORMATION_SCHEMA`.`TABLES`;
> 
> This fails with the error 
> 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalArgumentException: Multiple entries with same key: vessel=JdbcTable 
> {vessel} and vessel=JdbcTable {vessel}
> 
> This is the same problem a was also reported here.
> https://stackoverflow.com/questions/47149236/unable-to-query-postgresql-with-apache-drill-1-11-validation-error
> 
> This looks like BUG in Drill, not a "use foo.schema" workaround as listed 
> above.
> 
> We have tried various ?currentSchema=foo in the postgres driver. Nothing 
> seems to get rid of the problem. Its Drill+ForeignTable specific.
> 
> Has anyone else ran into this?
> 
> Erik Anderson
> Bloomberg



Re: Next Release?

2019-09-23 Thread Arina Yelchiyeva
Metastore work was aimed to be included in this release, since delivery date 
was shifted due to larger scope of work than expected, we did not push for the 
release until it’s done but I think mid October is achievable due date. 
Volodymyr any thoughts? 

Kind regards,
Arina

> On 23 Sep 2019, at 15:09, Charles Givre  wrote:
> 
> Hello All, 
> I wanted to ask if we can start thinking about our next release?  I seem to 
> recall that there was discussion around a new release around mid-September 
> which clearly didn't happen. So... What if we were to shoot for mid-October?
> -- C


Re: [VOTE] Release Apache Drill 1.17.0 - RC1

2019-12-20 Thread Arina Yelchiyeva
Hi Holder,

Thanks for participation in release verification.

Only regressions or some really significant issues can be a release blockers.
As Anton mentioned, since this rc jars are present in previous Drill versions, 
it is not a regression. 
I believe Drill has dependency to this libraries (maybe transitive) and I am 
not sure we can remove them, maybe version can be upgraded.
I would be good if you could check and file a Jira if issue is still actual.

Kind regards,
Arina

> On Dec 20, 2019, at 6:18 PM, Anton Gozhiy  wrote:
> 
> Hi Holger,
> These RC files were present in Drill 1.16.0, so this is not a regression.
> And about JDBC connection problem, could you please file a JIRA with more
> details?
> 
> On Fri, Dec 20, 2019 at 6:02 PM  wrote:
> 
>> - Tested custom authenticator with JDBC connect to Drill, and was unable
>> to connect (connection hangs, ACL disabled, couldn't spend a deeper look
>> into it).
>> + Custom authenticator login with local sqlline has been successful
>> - Found RC files for 3rd party jars in the binary archive, which shouldn't
>> be there in a release version (from my perspective):
>> 
>> apache-drill-1.17.0/jars/3rdparty/kerb-client-1.0.0-RC2.jar
>> apache-drill-1.17.0/jars/3rdparty/kerby-config-1.0.0-RC2.jar
>> apache-drill-1.17.0/jars/3rdparty/kerb-common-1.0.0-RC2.jar
>> apache-drill-1.17.0/jars/3rdparty/kerb-crypto-1.0.0-RC2.jar
>> apache-drill-1.17.0/jars/3rdparty/kerb-util-1.0.0-RC2.jar
>> apache-drill-1.17.0/jars/3rdparty/kerb-core-1.0.0-RC2.jar
>> apache-drill-1.17.0/jars/3rdparty/kerby-asn1-1.0.0-RC2.jar
>> apache-drill-1.17.0/jars/3rdparty/kerby-pkix-1.0.0-RC2.jar
>> apache-drill-1.17.0/jars/3rdparty/kerby-util-1.0.0-RC2.jar
>> apache-drill-1.17.0/jars/3rdparty/kerb-simplekdc-1.0.0-RC2.jar
>> apache-drill-1.17.0/jars/3rdparty/kerb-server-1.0.0-RC2.jar
>> apache-drill-1.17.0/jars/3rdparty/kerb-identity-1.0.0-RC2.jar
>> apache-drill-1.17.0/jars/3rdparty/kerb-admin-1.0.0-RC2.jar
>> 
>> From me:
>> -1 (non-binding)
>> 
>> BR
>> Holger
>> 
> 
> 
> -- 
> Sincerely, Anton Gozhiy
> anton5...@gmail.com



Re: [VOTE] Release Apache Drill 1.17.0 - RC1

2019-12-19 Thread Arina Yelchiyeva
Verified checksums and signatures.
Checked persistent and temporary tables creation.
Checked Web UI (profiles, logs, options, query submission)
Checked dynamic UDFs.
Run ad-hoc queries.

+1 (binding)

Kind regards,
Arina

> On Dec 18, 2019, at 7:03 PM, Igor Guzenko  wrote:
> 
> Hi all,
> 
> I've verified checksums, queried csv, json, parquet formats. I found a few
> issues with complex types, but they're not regressions.
> 
> My vote: +1 (binding)
> 
> Kind regards,
> Igor
> 
> On Wed, Dec 18, 2019 at 6:23 PM Denys Ordynskiy 
> wrote:
> 
>> Hi all,
>> 
>> I successfully built and run Drill from sources on Ubuntu, Windows and
>> CentOS.
>> Successfully connected to the Drill using DrillExplorer with ODBC driver.
>> 
>> My vote: +1
>> 
>> Best regards,
>> Denys Ordynskiy
>> 
>> On Tue, Dec 17, 2019 at 7:14 PM Volodymyr Vysotskyi 
>> wrote:
>> 
>>> Hi all,
>>> 
>>> I'd like to propose the second release candidate (RC1) of Apache Drill,
>>> version 1.17.0.
>>> 
>>> Changes since the previous release candidate: fixed the following
>>> show-stoppers: DRILL-7484 <
>>> https://issues.apache.org/jira/browse/DRILL-7484>
>>> , DRILL-7485 ,
>>> DRILL-6332
>>> , DRILL-7472
>>> , DRILL-7474
>>> , DRILL-7476
>>> , DRILL-7481
>>> , DRILL-7482
>>> , DRILL-7473
>>> , DRILL-7479
>>> , DRILL-7483
>>> , DRILL-7486
>>> , and DRILL-7470
>>> .
>>> 
>>> The release candidate covers a total of 203 resolved JIRAs [1]. Thanks to
>>> everyone who contributed to this release.
>>> 
>>> The tarball artifacts are hosted at [2] and the maven artifacts are
>> hosted
>>> at [3].
>>> 
>>> This release candidate is based on commit
>>> d65d2b4cc2aa5ea7a59cd40d0ad57a1e4639ae12 located at [4].
>>> 
>>> Please download and try out the release.
>>> 
>>> The vote ends at 5 PM UTC (9 AM PDT, 7 PM EET, 10:30 PM IST), December
>> 20,
>>> 2019
>>> 
>>> [ ] +1
>>> [ ] +0
>>> [ ] -1
>>> 
>>> Here's my vote: +1
>>> 
>>> [1]
>>> 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12344870
>>> [2] http://home.apache.org/~volodymyr/drill/releases/1.17.0/rc1/
>>> [3]
>>> https://repository.apache.org/content/repositories/orgapachedrill-1076/
>>> [4] https://github.com/vvysotskyi/drill/commits/drill-1.17.0
>>> 
>>> Kind regards,
>>> Volodymyr Vysotskyi
>>> 
>> 



Re: [DISCUSS] ExecConstants class refactoring

2020-03-03 Thread Arina Yelchiyeva
+1 for ExecConstants refactoring.
Igor, could you please create Jira to capture discussion in Jira and decision 
on the fix? Thanks.

Kind regards,
Arina

> On Mar 2, 2020, at 9:30 PM, Paul Rogers  wrote:
> 
> Hi Igor,
> 
> You know the old joke about how you eat an elephant? One bite at a time.
> 
> Yes, trying to refactor java-exec in one go would be difficult, as hard as 
> replacing value vectors in one go. Instead, we might do it in baby steps.
> 
> For example, as we get the new "SPI" mechanism to work for storage plugins, 
> we have the option to pull the existing plugins into separate modules. 
> Interestingly, these modules must be below java-exec (which becomes the 
> integration point for the runtime modules. This means that the module to 
> define the SPI must be even lower. This then requires that it be defined in 
> terms of interfaces, some of which are implemented in java-exec.
> 
> The same process can be applied to operators. If we want an operator module, 
> they must implement interfaces and accept interfaces, with the 
> implementations residing elsewhere.
> 
> We've been gradually creating such interfaces, there are more to go. Each 
> time we create one, we break apart a bit more of the tight coupling you 
> mentioned.
> 
> 
> Why would we gradually do such work? Primarily so Drill can be extensible. We 
> want storage plugins to be easily built outside of Drill and be somewhat 
> cross-version compatible. The same argument could be made for specialized 
> operators (specified in SQL as, say, some kind of table function.)
> 
> Further, I've seen over and over that tight coupling makes it ever harder to 
> safely change anything, so changes get more expensive and the project 
> stagnates. Drill might be different, but I'd not bet on it. So, we need 
> modularity to allow the project to continue to flourish.
> 
> 
> I agree that is nothing we'd do any time soon. But, often if we know where 
> we're going, we can slowly get our ducks in a line so we can eventually do it.
> 
> Thanks,
> - Paul
> 
> 
> 
>On Monday, March 2, 2020, 11:11:21 AM PST, Igor Guzenko 
>  wrote:  
> 
> Hello Paul,
> 
> Thanks for quick response, I've been thinking a lot of times about the
> necessity to split java-exec into modules. Although it would be really cool
> to do it, at the moment it is almost impossible to do it correctly. Too
> many things are tightly coupled together and without good preparation,
> splitting will be a huge pain. By preparation I mean the long iterative
> process to collect related components into packages (by feature) and
> clearly define interaction APIs between the packages. Another long-standing
> issue which most probably will never be fixed... Huh, but at least we're
> moving towards the goal :)
> 
> Kind regards,
> Igor
> 
> On Mon, Mar 2, 2020 at 8:01 PM Paul Rogers 
> wrote:
> 
>> Hi Igor,
>> 
>> Great idea; I've been noticing that file has gotten excessively large.
>> 
>> I wonder if we can split the file by topic instead of by the (often odd)
>> naming hierarchy which has evolved over the years. For example, one file
>> for internal server config options (thread counts, RPC stuff.) Another for
>> things related to local config (local file systems, etc.) Another related
>> to core operators (sorts, hash joins, etc.) Picking the right split will
>> require a bit of thought and sone experimentation.
>> 
>> 
>> There are three kinds of constants:
>> 
>> * Config variables (from drill-override.conf)
>> * System/session options
>> * Other random constants
>> 
>> One could argue that we should keep the three kinds together for each
>> topic. (That is, all the sort-related stuff in one place.) Whether that
>> means three well-known files in one place, three nested interfaces within a
>> single class could be debated.
>> 
>> One thing we probably should do is to separate out the string name of a
>> system/session property from the implementation of its validator. It used
>> to be that people would use the validator to access the option value. Most
>> newer code in the last several years uses the typed access methods with a
>> string key. So, we can move the validators into a OptionDefinitions
>> class/interface separate from the key definitions.
>> 
>> Most names are for the benefit of us: the poor humans who have to
>> understand them. The compiler would be happy with inline constant values.
>> Most names tend to be short to be easier to remember. For example, it is
>> easier to understand CLIENT_RPC_THREADS than
>> "drill.exec.rpc.user.client.threads".
>> 
>> 
>> Most costants are in ExecConstants. Some (but not all) constants for the
>> planner live in PlannerSettings. Oddly, some planner settings are in
>> ExecConstants. We might want to consolidate planner-related constants into
>> a single location.
>> 
>> One final thing to keep in mind is that the "java-exec" project has become
>> overly large. At some point, it might make sense to split it into
>> components, such as 

Re: Slack Channel

2020-01-22 Thread Arina Yelchiyeva
Charles, I don’t think Slack channel is that popular among Drill devs.
I guess best recommendation is to ask to send email to user mailing list.
Maybe some automatic reply can be configured. 

Kind regards,
Arina

> On Jan 22, 2020, at 9:33 PM, Charles Givre  wrote:
> 
> Hey Drill Devs
> There are two pending questions on the Drill slack channel, one relating to 
> Hive and the other relating to complex data in Drill.  Could you guys take a 
> look?
> Thx,
> -- C



Re: [DISCUSS] Using GitHub Actions for CI

2020-02-03 Thread Arina Yelchiyeva
I think this is a great idea. Volodymyr thank for looking into it.
Having ability to run full test suit will definitely beneficial.

+1

Kind regards,
Arina

> On Feb 3, 2020, at 1:51 PM, Volodymyr Vysotskyi  wrote:
> 
> Hi all,
> 
> I want to discuss using GitHub Actions for running Drill unit tests.
> 
> Currently, we use Travis to build the project and run *partial* tests suite
> for every pull request and new commits pushed to the master branch.
> Also, we have a configuration for CircleCI which allows running more unit
> tests for user repositories, including jobs for JDK 8, 11-13.
> CircleCI is not set up for Apache Drill since INFRA can't allow write
> access for 3d party [1].
> 
> GitHub Actions provides more resources for running jobs and has softer
> limitations compared with CI mentioned above (for example, it allows 20
> concurrent jobs and with a time limit of 6 hours [2] compared to 50 minutes
> for Travis).
> So GitHub Actions may be used as a single CI and will be able to run *full*
> tests suite.
> 
> Here is the Jira tor this with the latest status:
> https://issues.apache.org/jira/browse/DRILL-7543
> 
> Are there any thoughts or objections regarding moving to GitHub Actions?
> 
> [1] https://issues.apache.org/jira/browse/INFRA-17133
> [2]
> https://help.github.com/en/actions/automating-your-workflow-with-github-actions/about-github-actions#usage-limits
> 
> Kind regards,
> Volodymyr Vysotskyi



Re: Thanks for the commits!

2020-01-23 Thread Arina Yelchiyeva
You are welcome :)

> On Jan 23, 2020, at 7:04 PM, Charles Givre  wrote:
> 
> Hey Arina, 
> Thanks for the bulk commit.  A lot of good stuff was committed, especially 
> the improvements to the JDBC storage plugin.
> -- C
> 



Re: ES Storage Plugin

2020-01-24 Thread Arina Yelchiyeva
Personally I would wait for the new storage plugins framework as was discussed 
during the call.

Kind regards,
Arina

> On Jan 24, 2020, at 6:39 PM, Charles Givre  wrote:
> 
> All, 
> Thanks for the great hangout today.  I just posted a draft PR of an 
> Elasticsearch storage plugin I've been working on.  I'm waiting for the Base 
> framework to be committed, so I can't do much more until that happens, but I 
> wanted to share my work and solicit feedback (not code review).  If anyone 
> would like to collaborate with me on this (@arina?) I would definitely 
> welcome and appreciate that. 
> 
> Have a great weekend,
> --C 
> 



Re: Official Apache Drill Docker Images

2020-01-09 Thread Arina Yelchiyeva
Nice ;)

> On Jan 9, 2020, at 7:33 PM, Volodymyr Vysotskyi  wrote:
> 
> Hi all,
> 
> Some time ago we have introduced Docker Images for Drill and published them
> under custom repository.
> But now we have Official Docker Repository for Apache Drill placed in
> https://hub.docker.com/r/apache/drill.
> 
> All images from our previous repository were pushed there and also
> DockerHub Automated Build was set up for the master branch which publishes
> images with master tag after the master branch is updated.
> 
> Feel free to test it, and now even on the actual master branch!
> 
> For the instructions on how to run Drill on Docker, please refer to
> https://drill.apache.org/docs/running-drill-on-docker/.
> 
> Kind regards,
> Volodymyr Vysotskyi



Re: Kanban Board for Drill 1.18

2020-01-21 Thread Arina Yelchiyeva
No special procedure, please go ahead. Just share the links afterwards.

Kind regards,
Arina

> On Jan 21, 2020, at 2:46 PM, Charles Givre  wrote:
> 
> Hi Drill Devers
> Could we create a Kanban board for Drill 1.18 in JIRA?  I'd do it myself but 
> I didn't know if there was some special procedure for it.
> Thanks,
> -- C



Re: Testing Storage Plugins

2020-01-03 Thread Arina Yelchiyeva
Hi Charles,

I would expect contributor to provide efficient way to test storage plugins. 
Depending on the source, some have embedded versions, some don’t. So 
contributor should provide instructions how plugin can be tested.
Ideally, tests should be able to run without dependency on external system. 
Kudu is old plugin and does not comply with Drill coding standards, so the fact 
it does not have unit tests does not mean that other won’t have.

I would suggest we won’t accept storage plugins that does not have good unit 
tests coverage, documentations and code quality. Since plugins are easily 
pluggable, if plugin does not qualify Drill coding standards, it can be build 
manually and added as jar to the Drill classpath.
From my perspective, Drill users (or any users) expect if code is available 
from official build, it means it fully tested and works. Otherwise, users would 
complain that somethings does not work and say the Drill product is of bad 
quality.

Regarding code standards, this should apply to all PRs. We don’t have many code 
reviewers for the project and submitting PRs that have poor code quality would 
mean that these people (who voluntarily spend time reviewing and understanding 
someone code) would have to spend lots of time reviewing and correcting simple 
things. Many Apache projects have high code quality standards, much higher than 
Drill, so I think it would be fair if contributor would spend more time making 
code better than expect reviewer to point to every trivial issue.

Kind regards,
Arina

> On Jan 2, 2020, at 8:50 PM, Charles Givre  wrote:
> 
> Hello Drill Devs, 
> I wanted to ask a question about testing storage plugins.  Currently there 
> are PRs for storage plugins for Apache Druid 
> (https://github.com/apache/drill/pull/1888 
> ), and a generic REST API plugin 
> (https://github.com/apache/drill/pull/1892 
> ).  
> 
> I wanted to ask about what is necessary with respect to testing to get a 
> storage plugin accepted?  I looked at some of the others, and some, like the 
> HBase plugin, have many unit tests and others, like the Kudu plugin barely 
> have any.  Also, since obviously, these unit tests require an external system 
> (or at least a docker container of one) what should a contributor provide 
> with a storage plugin?  Should they provide a docker container with a data 
> set pre-loaded, or should the unit tests be marked "Ignore"?
> 
> When we do releases, are we running the tests against external systems?
> Thanks,
> -- C
> 
> 
> 
> 



Re: HDF5 Format Plugin

2020-01-03 Thread Arina Yelchiyeva
I though we agreed in the PR that API storage plugin won’t be submitted until  
#1914 and #1913 are committed.
Since it would required API plugin code rewrite. I think there is no need 
spending time reviewing it until final changes are done.

Kind regards,
Arina

> On Jan 2, 2020, at 7:08 PM, Charles Givre  wrote:
> 
> Hello all, 
> Now that we've released Drill 1.17, I wanted to ask if you could please take 
> a look at the HDF5 format plugin (https://github.com/apache/drill/pull/1778 
> )  I submitted as well as the API 
> storage plugin (https://github.com/apache/drill/pull/1892 
> ). 
> Thanks,
> -- C
> 
> 
> 



Re: [VOTE] Release Apache Drill 1.17.0 - RC2

2019-12-24 Thread Arina Yelchiyeva
Verified checksums and signatures.
Unpacked tar.gz and started in embedded mode on Windows (used Java 8, 11).
Run simple queries.
Did not perform any sophisticated checks, since it is already the third 
candidate.

+1 (binding)

Kind regards,
Arina

> On Dec 23, 2019, at 5:01 PM, Charles Givre  wrote:
> 
> Hello all, 
> I found a minor bug in the Excel format plugin 
> (https://issues.apache.org/jira/browse/DRILL-7495 
> ) where if the first column 
> in the spreadsheet is a date, the reader will throw an exception.  This can 
> be worked around by setting allTextMode to true.  However, in so doing I also 
> found that the README.md file didn't list that option correctly, so I will 
> update that. I queried other excel files without incident and this only 
> seemed to occur when the date was the first column. 
> 
> I created a JIRA for this, however, I don't think it is a blocking issue.  If 
> we encounter other blockers for RC2, I'll fix it for RC3, if not, I'd say we 
> go with it and I'll fix for version 1.18.  Regardless, I will try to fix this 
> in the next few days. 
> 
> Successfully queried shape files, pcap, with and w/o TCP sessionization. 
> 
> So +1 for me. 
> -- C
> 
> 
> 
>> On Dec 22, 2019, at 12:00 PM, Volodymyr Vysotskyi  
>> wrote:
>> 
>> Hi all,
>> 
>> I'd like to propose the third release candidate (RC2) of Apache Drill,
>> version 1.17.0.
>> 
>> Changes since the previous release candidate: fixed show-stopper DRILL-7494
>> .
>> 
>> The release candidate covers a total of 205 resolved JIRAs [1]. Thanks to
>> everyone who contributed to this release.
>> 
>> The tarball artifacts are hosted at [2] and the maven artifacts are hosted
>> at [3].
>> 
>> This release candidate is based on
>> commit 2eb6bbe0501cb6553106e63dc1f2810ff10ae375 located at [4].
>> 
>> Please download and try out the release.
>> 
>> The vote ends at 5 PM UTC (9 AM PDT, 7 PM EET, 10:30 PM IST), December 25,
>> 2019
>> 
>> [ ] +1
>> [ ] +0
>> [ ] -1
>> 
>> Here's my vote: +1
>> 
>> [1]
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12344870
>> [2] http://home.apache.org/~volodymyr/drill/releases/1.17.0/rc2/
>> [3] https://repository.apache.org/content/repositories/orgapachedrill-1077/
>> [4] https://github.com/vvysotskyi/drill/commits/drill-1.17.0
>> 
>> Kind regards,
>> Volodymyr Vysotskyi
> 



Re: [ANNOUNCE] Apache Drill 1.17.0 Released

2019-12-27 Thread Arina Yelchiyeva
Congrats everyone, great job!

Kind regards,
Arina

> On 26 Dec 2019, at 20:32, Igor Guzenko  wrote:
> 
> Many thanks to everyone who has contributed to the release! Great work!
> 
>> On Thu, Dec 26, 2019 at 8:27 PM Abhishek Girish  wrote:
>> 
>> Congratulations, everyone!
>> 
>> On Thu, Dec 26, 2019 at 10:20 AM Volodymyr Vysotskyi >> 
>> wrote:
>> 
>>> On behalf of the Apache Drill community, I am happy to announce the
>> release
>>> of Apache Drill 1.17.0.
>>> 
>>> Drill is an Apache open-source SQL query engine for Big Data exploration.
>>> Drill is designed from the ground up to support high-performance analysis
>>> on the semi-structured and rapidly evolving data coming from modern Big
>>> Data applications, while still providing the familiarity and ecosystem of
>>> ANSI SQL, the industry-standard query language. Drill provides
>>> plug-and-play integration with existing Apache Hive and Apache HBase
>>> deployments.
>>> 
>>> For information about Apache Drill, and to get involved, visit the
>> project
>>> website [1].
>>> 
>>> Total of 200 JIRA's are resolved in this release of Drill with following
>>> new features and improvements [2]:
>>> 
>>> - Hive complex types support (DRILL-7251,
>>> DRILL-7252, DRILL-7253, DRILL-7254)
>>> - ESRI Shapefile (shp) (DRILL-4303) and Excel (DRILL-7177) format
>>> plugins support
>>> - Drill Metastore support (DRILL-7272, DRILL-7273, DRILL-7357)
>>> - Upgrade to HADOOP-3.2 (DRILL-6540)
>>> - Schema Provision using File / Table Function (DRILL-6835)
>>> - Parquet runtime row group pruning (DRILL-7062)
>>> - User-Agent UDFs (DRILL-7343)
>>> - Canonical Map support (DRILL-7096)
>>> - Kafka storage plugin improvements
>>> (DRILL-6739, DRILL-6723, DRILL-7164, DRILL-7290, DRILL-7388)
>>> 
>>> For the full list please see release notes [3].
>>> 
>>> The binary and source artifacts are available here [4].
>>> 
>>> Thanks to everyone in the community who contributed to this release!
>>> 
>>> 1. https://drill.apache.org/
>>> 2. https://drill.apache.org/blog/2019/12/26/drill-1.17-released/
>>> 3. https://drill.apache.org/docs/apache-drill-1-17-0-release-notes/
>>> 4. https://drill.apache.org/download/
>>> 
>>> Kind regards,
>>> Volodymyr Vysotskyi
>>> 
>> 


Re: Testing Storage Plugins

2020-01-08 Thread Arina Yelchiyeva
If I am not mistaken, protobuf is not a big deal, well, Web UI won’t display 
operator name but other than that code should work.

Kind regards,
Arina

> On Jan 8, 2020, at 2:13 AM, Charles Givre  wrote:
> 
> Hi Paul, 
> In principle, I like the idea.  Currently, the main sticking point with 
> format and storage plugins is the protobuf.  But for that, they would be 
> completely “pluggable". 
> 
> 
>> On Jan 7, 2020, at 5:54 PM, Paul Rogers  wrote:
>> 
>> Hi All,
>> 
>> Wanted to chime in on this topic. We've long talked about the idea of 
>> building plugins separately from Drill itself; but have never had the 
>> resources to achieve this goal. Turns out Presto has a nice, simple way to 
>> build plugins separately from Presto itself. [1]
>> 
>> If Drill were to adopt something similar, then we could divide plugins into 
>> two tiers: core plugins supported by the Drill project (with the kind of 
>> testing Arena suggests), and true contributed plugins that are maintained by 
>> others, using whatever form of testing works for them.
>> 
>> Over time, if we find that a plugin becomes "stale" (lack of support or 
>> usage), we can perhaps "demote" it to contributed status as a way of 
>> managing the fact that we can't/won't do full testing on stale plugins.
>> 
>> Presto uses the Java ServiceLoader [2] class to load plugins in a class 
>> loader separate from that for Presto itself. A side benefit of such an 
>> approach is that, with the right class loader, we get isolation of plugin 
>> and Drill dependencies: different plugins can use different Guava versions 
>> without the conflicts that we ran into in the run-up to the recent release.
>> 
>> At some point we can make a simple trade-off: it might cost less to follow 
>> Presto's lead than to continue to deal with the conflicts and ambiguities in 
>> our current "everything in the Drill core" approach.
>> 
>> Thanks,
>> - Paul
>> 
>> [1] https://prestodb.io/docs/current/develop/spi-overview.html
>> 
>> [2] https://docs.oracle.com/javase/7/docs/api/java/util/ServiceLoader.html
>> 
>> 
>> 
>> 
>> 
>> 
>>   On Friday, January 3, 2020, 11:04:53 AM PST, Charles Givre 
>>  wrote:  
>> 
>> Arina, 
>> Thanks for the response.  See responses inline:
>> 
>>> On Jan 3, 2020, at 4:40 AM, Arina Yelchiyeva  
>>> wrote:
>>> 
>>> Hi Charles,
>>> 
>>> I would expect contributor to provide efficient way to test storage 
>>> plugins. Depending on the source, some have embedded versions, some don’t. 
>>> So contributor should provide instructions how plugin can be tested.
>>> Ideally, tests should be able to run without dependency on external system. 
>>> Kudu is old plugin and does not comply with Drill coding standards, so the 
>>> fact it does not have unit tests does not mean that other won’t have.
>> 
>> I wasn't suggesting that new plugins shouldn't have good testing ;-). Far 
>> from it.  What I am wondering is whether it is necessary to have tests for 
>> every class, or can we go for good coverage by providing a lot of query unit 
>> tests that test a lot of different cases?
>> 
>> Let's say I'm writing a plugin for Druid and Druid doesn't have an embedded 
>> mode. (I have no idea whether Druid does or doesn't, just using it as an 
>> example.)  Would it be acceptable for a developer to provide a Docker 
>> container loaded with Druid and a small dataset to serve as a testbed for 
>> the unit tests?  Just FYI, for the API plugin, I wrote 16 or so unit tests 
>> using a mock webserver, but that one was relatively easy because there are 
>> no shortage of libraries for testing HTTP responses.
>> 
>> 
>>> 
>>> I would suggest we won’t accept storage plugins that does not have good 
>>> unit tests coverage, documentations and code quality. Since plugins are 
>>> easily pluggable, if plugin does not qualify Drill coding standards, it can 
>>> be build manually and added as jar to the Drill classpath.
>> 
>> I completely agree. One of my biggest frustrations when writing the Drill 
>> book was the amount of undocumented functionality in Drill.  One thing 
>> however, is that I don't think plugins are truly "pluggable" yet because 
>> most if not all require updates to the protobufs. 
>> 
>>> From my perspective, Drill users (or any users) expect if code is available 
>>> from official build, it means it fully tested and works. Otherwise, use

<    1   2