Re: [VOTE] Add LGTM to Drill Pull Requests

2021-08-31 Thread Jason Altekruse
+1

On Tue, Aug 31, 2021, 2:44 AM James Turton 
wrote:

> If there are no unpleasant side effects then it sounds like a good idea
> to me +1.
>
> On 2021/08/30 23:23, Charles Givre wrote:
> > Hello Drill Devs,
> > I’d like to call a vote as to whether we add LGTM automated code check
> to our pull requests.  This would not replace the  current review process,
> but rather add a quality check to new code.  I seem to recall us voting on
> this before, but I couldn’t find the email, so I apologize for the possible
> duplicate vote.
> >
> > Thanks!
> > — C
> >
>
>


Re: [ANNOUNCE] New Committer: Paul Rogers

2017-05-22 Thread Jason Altekruse
Congrats Paul!

On Fri, May 19, 2017 at 11:36 AM, Vitalii Diravka  wrote:

> Congratulations Paul! Really well deserved!
>
> Kind regards
> Vitalii
>
> On Fri, May 19, 2017 at 6:31 PM, Parth Chandra  wrote:
>
> > I thinks it's time to put a link to Paul's wiki in the Apache Drill web
> > site.
> >
> > On Fri, May 19, 2017 at 11:16 AM, Sudheesh Katkam 
> > wrote:
> >
> > > Forgot to mention, not many developers know about this:
> > > https://github.com/paul-rogers/drill/wiki
> > >
> > > So thank you Paul, for that informative wiki, and all your
> contributions.
> > >
> > > On May 19, 2017, at 10:50 AM, Paul Rogers  > > r...@mapr.com>> wrote:
> > >
> > > Thanks everyone!
> > >
> > > - Paul
> > >
> > > On May 19, 2017, at 10:30 AM, Kunal Khatua  kkhat
> > > u...@mapr.com>> wrote:
> > >
> > > Congratulations, Paul !!  Thank you for your contributions!
> > >
> > > 
> > > From: Khurram Faraaz >
> > > Sent: Friday, May 19, 2017 10:07:09 AM
> > > To: dev
> > > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
> > >
> > > Congratulations, Paul!
> > >
> > > 
> > > From: Bridget Bevens >
> > > Sent: Friday, May 19, 2017 10:29:29 PM
> > > To: dev
> > > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
> > >
> > > Congratulations, Paul!
> > >
> > > 
> > > From: Jinfeng Ni >
> > > Sent: Friday, May 19, 2017 9:57:35 AM
> > > To: dev
> > > Subject: Re: [ANNOUNCE] New Committer: Paul Rogers
> > >
> > > Congratulations, Paul!
> > >
> > >
> > > On Fri, May 19, 2017 at 9:36 AM, Aman Bawa  > > mapr.com>> wrote:
> > >
> > > Congratulations, Paul!
> > >
> > > On 5/19/17, 8:22 AM, "Aman Sinha"  > mansi...@apache.org>> wrote:
> > >
> > >   The Project Management Committee (PMC) for Apache Drill has invited
> > > Paul
> > >   Rogers to become a committer, and we are pleased to announce that he
> > > has
> > >   accepted.
> > >
> > >   Paul has a long list of contributions that have touched many aspects
> > > of the
> > >   product.
> > >
> > >   Welcome Paul, and thank you for your contributions.  Keep up the good
> > > work !
> > >
> > >   - Aman
> > >
> > >   (on behalf of the Apache Drill PMC)
> > >
> > >
> > >
> > >
> > >
> > >
> >
>


Re: Having some trouble locating a file referenced in the Advanced Regression tests

2017-03-17 Thread Jason Altekruse
Thanks for the quick response Rahul! That looks to be exactly what I need,
test are running now.

- Jason

On Thu, Mar 16, 2017 at 6:07 PM, rahul challapalli <
challapallira...@gmail.com> wrote:

> I already did :)
>
> On Thu, Mar 16, 2017 at 5:31 PM, Aman Sinha <asi...@mapr.com> wrote:
>
> > I am guessing Rahul Chalapalli might have created that data file.  Rahul,
> > can you comment ?
> >
> > -Aman
> >
> > On 3/16/17, 11:57 AM, "Jason Altekruse" <altekruseja...@gmail.com>
> wrote:
> >
> > Hey Drillers,
> >
> > I am working to set up a test environment to run the Advanced
> > Regression
> > suites. I have been successful getting most of the tests running, but
> > I am
> > unable to locate the file "widestrings" referenced by the tests in
> the
> > Advanced/data-shapes/wide-columns/5000/10rows/parquet suite. It
> > does
> > not appear to be in the list of files available on S3 specified in
> the
> > framework pom.xml file. This test suite also does not declare any
> > necessary
> > data preparation step in its test description JSON file.
> >
> > I do see that there is a bash file under
> > resources/Datasources/data-shapes/wide-strings.sh, but this is
> > producing a
> > json file, not a parquet file and is not referenced as a data-prep
> > prerequisite for any of the tests.
> >
> > Any help tracking down the file, or a description of the process
> > necessary
> > to re-create the file would be appreciated.
> >
> > Thanks,
> > Jason
> >
> >
> >
>


Re: [ANNOUNCE] - New Apache Drill Committer - Chris Westin

2016-12-01 Thread Jason Altekruse
Congrats Chris!

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Thu, Dec 1, 2016 at 2:05 PM, Sudheesh Katkam <skat...@maprtech.com>
wrote:

> Congratulations, Chris!
>
> > On Dec 1, 2016, at 10:30 AM, Aditya <adityakish...@gmail.com> wrote:
> >
> > Congratulations Chris!
> >
> > On Thu, Dec 1, 2016 at 9:56 AM, Parth Chandra <par...@apache.org> wrote:
> >
> >> Congrats Chris. And thank you for all your cool contributions!
> >>
> >>
> >>
> >> On Thu, Dec 1, 2016 at 8:54 AM, Jacques Nadeau <jacq...@dremio.com>
> wrote:
> >>
> >>> On behalf of the Apache Drill PMC, I am very pleased to announce that
> >> Chris
> >>> Westin has accepted the invitation to become a committer in the
> project.
> >>>
> >>> Welcome Chris and thanks for your great contributions!
> >>>
> >>>
> >>> --
> >>> Jacques Nadeau
> >>> CTO and Co-Founder, Dremio
> >>>
> >>
>
>


Re: [ANNOUNCE] - New Apache Drill Committer - Neeraja Rentachintala

2016-11-17 Thread Jason Altekruse
Congratulations! Thanks for all your contributions to Drill!

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Thu, Nov 17, 2016 at 11:12 AM, Abhishek Girish <agir...@mapr.com> wrote:

> Congrats Neeraja!
>
> On Thu, Nov 17, 2016 at 11:10 AM, Parth Chandra <par...@apache.org> wrote:
>
> > On behalf of the Apache Drill PMC, I am very pleased to announce that
> > Neeraja Rentachintala has accepted the invitation to become a committer
> in
> > the project.
> >
> >
> > Welcome Neeraja !
> >
>


Re: Jason's operator test framework?

2016-10-31 Thread Jason Altekruse
Hey Paul,

I included basic tests for a good portion of the operators with the
framework itself. You can check out this class [1] for the examples. Feel
free to send along any questions.

A known limitation:
- There is no way to currently declare assertions about where the outgoing
batch boundaries are expected to occur, it currently concatenates all of
the outgoing batches together before comparing them to a single result set.
This includes dereferencing selection vectors that are produced by the
filter operator (selection vector 2, which is a bitmask over a single batch
to represent valid records that matched the filter) and the sort operator
(selection vector 4, which is a pointer sort reordering over many batches
that has not yet been rewritten)

[1] -
https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/BasicPhysicalOpUnitTest.java

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Mon, Oct 31, 2016 at 9:50 AM, Paul Rogers <prog...@maprtech.com> wrote:

> Hi Jason & All,
>
> A couple months back Jason presented some very nice work where he was able
> to create a test framework for individual operators.
>
> Jason, is your framework documented anywhere? Or, can you point me to some
> tests that use the framework?
>
> Thanks!
>
> - Paul


Re: isDateCorrect field in ParquetTableMetadata

2016-10-28 Thread Jason Altekruse
The only worry I have about declaring a writer version is possible
confusion with the Parquet format version itself. The format is already
defined through version 2.1 or something like that, but we are currently
only writing files based on the 1.x version of the format.

My preferred solution to this problem would be to just make point releases
for problems like this (like in this case we could have made a 1.8.1
release, and then all of the 1.8.0-SNAPSHOT would all known to be bad and
everything after would be 1.8.1-SNAPSHOT and could have been known to be
correct).

I'm open to to hearing other opinions on this, I just generally feel like
these bugs should be rare, and fixing them should be done with a lot of
care (and in this case I missed a few things). I don't think it would be
crazy to say that we should only merge these kinds of patches if we are
willing to say the fix is ready for a release.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Fri, Oct 28, 2016 at 2:52 PM, Vitalii Diravka <vitalii.dira...@gmail.com>
wrote:

> Jinfeng,
>
> isDateCorrect will be false in the code when isDateCorrect property is
> absent in the parquet metadata.
>
> Anyway I am going to implement the mentioned approach with the
> parquet-writer.version instead of isDateCorrect property.
>


Re: isDateCorrect field in ParquetTableMetadata

2016-10-28 Thread Jason Altekruse
The isDataCorrect flag means that the values are known to be correct, and
there is no need to auto-detect corruption or correct anything.

META_SHOWS_CORRUPTION can be set either when we have a known old version of
Drill written in the metadata, or we have older files that might have been
written by Drill that we have checked the values in the statistics and
found corrupt looking values. Really old files without any statistics don't
have information that allows us to identify them as Drill-produced, so we
have to test the values during actual page reads, this is where
META_UNCLEAR_TEST_VALUES is used.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Fri, Oct 28, 2016 at 12:53 PM, Jinfeng Ni <j...@apache.org> wrote:

> Hi Vitalli,
>
> DateCorruptionStatus has three possibilities: META_SHOWS_CORRUPTION,
> META_SHOWS_NO_CORRUPTION, META_UNCLEAR_TEST_VALUES.  What value will
> this isDateCorrect flag have for each possiblity, especially for
> META_UNCLEAR_TEST_VALUES? Are DateCorruptionStatus and isDateCorrect
> same things, or different?
>
> Thanks.
>
> Jinfeng
>
>
>
> On Fri, Oct 28, 2016 at 9:26 AM, Paul Rogers <prog...@maprtech.com> wrote:
> > Thanks Vitalii.
> >
> > The Parquet Writer solution “just works”. As soon as someone upgrades
> the writer, files are labeled as having that new version. No fuzziness
> during a release as in 1.9.
> >
> > It is fine to also include the Drill version. But, format decisions
> should be keyed off of the writer version.
> >
> > By the way, do other tools happen to already do this? It would be rather
> surprising if they didn’t.
> >
> > - Paul
> >
> >> On Oct 28, 2016, at 8:30 AM, Vitalii Diravka <vitalii.dira...@gmail.com>
> wrote:
> >>
> >> I agree that it would be good if the approach of parquet date
> correctness
> >> detection will be upgraded. So I created the jira for it DRILL-4980
> >> <https://issues.apache.org/jira/browse/DRILL-4980>.
> >>
> >> But now we have two ideas:
> >> 1. To add checking of the drill version additionally, so later we can
> >> delete isDateCorrect label from parquet metadata.
> >> 2. To add parquet writer version to the parquet metadata and check this
> >> value instead of isDateCorrect and drillVersion.
> >>
> >> So which way, we should prefer now?
> >>
> >> Kind regards
> >> Vitalii
> >>
> >> 2016-10-27 23:54 GMT+00:00 Paul Rogers <prog...@maprtech.com>:
> >>
> >>> FWIW: back on the magic flag issue…
> >>>
> >>> I noted Vitali’s concern about “1.9” and “1.9-SNAPSHOT” being too
> course
> >>> grained for our needs.
> >>>
> >>> A typical solution is include the version of the Parquet writer in
> >>> addition to that of Drill. Each time we change something in the writer,
> >>> increment the version number. If we number changes, we can easily
> handle
> >>> two changes in the same Drill release, or differentiate between the
> “early
> >>> 1.9” files with old-style dates and “late 1.9” files with correct
> dates.
> >>>
> >>> Since we have no version now, start it at some arbitrary point (2?).
> >>>
> >>> Now, if the Parquet file has a Drill Writer version in the header, and
> >>> that version is 2 or greater, the date is in the “correct” format.
> Anything
> >>> written by Drill before writer version 2, the date is wrong. The
> “check the
> >>> data to see if it is sane” approach is needed only for files were we
> can’t
> >>> tell if an older Drill wrote it.
> >>>
> >>> Do other tools label the data? Does Hive say that it wrote the file? If
> >>> so, we don’t need to do the sanity check if we can tell the data comes
> from
> >>> Hive (or Impala, or anything other than old Drill.)
> >>>
> >>> - Paul
> >>>
> >>>> On Oct 27, 2016, at 4:03 PM, Zelaine Fong <zf...@maprtech.com> wrote:
> >>>>
> >>>> Vitalii -- are you still planning to open a ticket and pull request
> for
> >>> the
> >>>> fix you've noted below?
> >>>>
> >>>> -- Zelaine
> >>>>
> >>>> On Wed, Oct 26, 2016 at 8:28 AM, Vitalii Diravka <
> >>> vitalii.dira...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> @Paul Rogers
> >>>>> It may be the undefined case when the file is generated with
>

Re: isDateCorrect field in ParquetTableMetadata

2016-10-27 Thread Jason Altekruse
Vitalli,

Thank you for looking into this, sorry I missed it in the review.  When you
open up a request to fix this issue could you update the check for
correctness in the metadata to check for the is.date.correct flag, or a
version greater than or equal to 1.9.0 (no snapshot)? This will allow us to
stop writing the flag into the metadata at the release or shortly
thereafter.

It might be worth looking at how we can catch issues like this related to
plan serialization. There were pretty thorough tests with the patch, but we
still have code paths that only come up with remote fragment usage that we
could test in other ways to avoid bugs like this.

- Jason

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Thu, Oct 27, 2016 at 4:03 PM, Zelaine Fong <zf...@maprtech.com> wrote:

> Vitalii -- are you still planning to open a ticket and pull request for the
> fix you've noted below?
>
> -- Zelaine
>
> On Wed, Oct 26, 2016 at 8:28 AM, Vitalii Diravka <
> vitalii.dira...@gmail.com>
> wrote:
>
> > @Paul Rogers
> > It may be the undefined case when the file is generated with
> drill.version
> > = 1.9-SNAPSHOT.
> > It is more easy to determine corrupted date with this flag and there is
> no
> > need to wait the end of release to merge these changes.
> >
> > @Jinfeng NI
> > It looks like you are right.
> > With consistent mode (isDateCorrect = true) all tests are passed. So I am
> > going to open a jira ticket for it with next changes
> > https://github.com/vdiravka/drill/commit/ff8d5c7d601915f760d1b0e9618730
> > 3410cac5d3
> > Thanks.
> >
> > Kind regards
> > Vitalii
> >
> > 2016-10-25 18:36 GMT+00:00 Jinfeng Ni <j...@apache.org>:
> >
> > > I'm not sure if I fully understand your answers. The bottom line is
> > > quite simple: given a set of parquet files, the ParquetTableMeta
> > > instance constructed in Drill should have identical value for
> > > "isDateCorrect", whether it comes from parquet footer, or parquet
> > > metadata cache, or whether there is partition pruning or not. However,
> > > the code shows that this flag is not in consistent mode across
> > > different cases.
> > >
> > >
> > >
> > > On Tue, Oct 25, 2016 at 11:24 AM, Vitalii Diravka
> > > <vitalii.dira...@gmail.com> wrote:
> > > > Hi Jinfeng,
> > > >
> > > > 1.If the parquet files are generated with Drill after Drill-4203
> these
> > > > files have "isDateCorrect = true" property.
> > > > Drill serializes this property from metadata now. When we set this
> > > property
> > > > in the first constructor we will hide the value from metadata.
> > > > IsDateCorrect will be false only if this value equals to the false
> (no
> > > case
> > > > for it now) or absent in parquet metadata footer.
> > > >
> > > >
> > > > 2. I'm not sure the reason to change isDateCorrect metadata property
> > when
> > > > the user disable dates correction.
> > > > If you have some use case it would be great if you provide it.
> > > >
> > > > 3. Maybe you are right regarding to when Parquet metadata is cloned.
> > > > Here I added the property in the same manner as Jason's new property
> > > > "drillVersion. So need it a separate unit test?
> > > >
> > > >
> > > > Kind regards
> > > > Vitalii
> > > >
> > > > 2016-10-25 16:23 GMT+00:00 Jinfeng Ni <j...@apache.org>:
> > > >
> > > >> Forgot to copy the link to the code.
> > > >>
> > > >> [1] https://github.com/apache/drill/blob/master/exec/java-
> > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/
> > > >> Metadata.java#L950-L955
> > > >>
> > > >> On Tue, Oct 25, 2016 at 9:16 AM, Jinfeng Ni <j...@apache.org> wrote:
> > > >> > @Jason, @Vitalli,
> > > >> >
> > > >> > Any thoughts on this question, since both you worked on fix of
> > > >> DRILL-4203?
> > > >> >
> > > >> > Looking through the code, there is a third case [1], where this
> flag
> > > >> > is set to false when Parquet metadata is cloned (after partition
> > > >> > pruning, etc).  That means, for the 2nd case where the flag is set
> > to
> > > >> > true, if there is pruning happening, the new parquet metadata will
> > see
> > > >

Re: The project says it has dozens of committers

2016-08-11 Thread Jason Altekruse
I believe the page you are referring to is our "Team" page available here
[1]. The page does not say we have dozens of comitters, it says
contributors, who we currently don't credit on this page, but whose
contributions are tracked through JIRA and git. If you look at the github
mirror of our repo you can see that there are 65 contributors according to
Github's accounting, but I'm pretty sure that only tracks commits
associated with e-mail that have connected Github accounts. For quite a
while we used patch files, so there is a good chance that we received
contributions from a few users without github accounts.

This page should be updated to include Cloudera on the list of contributing
organizations, as several members of the Kudu team including Todd Lipcon
helped contribute an experimental Kudu reader.

[1] - https://drill.apache.org/team/

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Thu, Aug 11, 2016 at 4:32 AM, Michael Lazar <mla...@clouderagovt.com>
wrote:

> You say you have “dozens of committers”. A dozen is 12.  You have 21 names
> on the list. That is not even 2 dozen.  Seems to be a math error to me.
>
>
>
> Michael Lazar, Sr Sales Engineer
>
> 301 202 4084
>
>
>
> [image:
> http://files.cloudera.com.s3.amazonaws.com/email-imgs/2016/hadoop10.png]
>
> Cloudera Enterprise  Easy, Fast, Secure
>
>
>


Re: [GitHub] drill issue #518: DRILL-4653.json - Malformed JSON should not stop the entir...

2016-08-08 Thread Jason Altekruse
Hey Parth and Subbu,

Sorry for missing the last message on this thread, I will be able to attend
the hangout tomorrow to discuss my concern. As I had said previously, I am
mostly trying to make sure that there is agreement about the impact of this
change on user behavior and expectations before we merge.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Thu, Aug 4, 2016 at 5:11 PM, Parth Chandra <pchan...@maprtech.com> wrote:

> Hi Subbu,
>
>   Yes we can discuss this on the next hangout. If Jason is able to attend
> we can discuss some way to address his concern.
>
> Parth
>
> On Wed, Aug 3, 2016 at 10:24 AM, Subbu Srinivasan <ssriniva...@zscaler.com
> >
> wrote:
>
> > Hi Folks,
> > When can we discuss this feature? Would next hangout be appropriate?
> >
> > Thanks
> > Subbu
> >
> > On Mon, Jul 25, 2016 at 10:20 AM, Subbu Srinivasan <
> > ssriniva...@zscaler.com>
> > wrote:
> >
> > > This mechanism falls in line with other JSON processing similar to
> > serde's
> > > with Hive, UDF's enabled at global level will apply to all users and is
> > > outlined using documentation.
> > >
> > >
> > > What is your stance if we move to the JSONFormatPlugin?
> > >
> > > On Fri, Jul 15, 2016 at 2:08 PM, jaltekruse <g...@git.apache.org>
> wrote:
> > >
> > >> Github user jaltekruse commented on the issue:
> > >>
> > >> https://github.com/apache/drill/pull/518
> > >>
> > >> I don't think we should merge this without a mechanism to return a
> > >> warning to the user to tell them at least that some data was ignored,
> > and
> > >> ideally some indication of how much data was discarded. While I do
> > >> understand this is not the default behavior, I think there is still
> too
> > >> high of a risk that an admin could set this at a global level and
> users
> > >> would be unaware of some of their data being discarded.
> > >>
> > >> I am willing to discuss the benefits of merging this before such a
> > >> system exists, but until this issue has been thoroughly evaluated I am
> > -1
> > >> on the change.
> > >>
> > >> One improvement you could make to the current implementation is
> > >> moving the option to the format plugin instead of the system/session
> > list.
> > >> This enables users to include setting the option in there query with
> the
> > >> "table with options" syntax that was added last fall. We already have
> a
> > >> JIRA open for moving the all_text_mode and read_numbers_as_double
> > options
> > >> to this location, because it doesn't really make sense to change query
> > >> results based on session state. Unfortunately this change does not
> > >> completely remove my initial concern, because not all users can modify
> > or
> > >> see the storage plugins in the case when web UI security is enabled.
> > >> Non-admin users in these cases could be surprised by this behavior.
> > >>
> > >> For examples of how this is done, you can look at the text plugin
> > >> config, you would just need to add these options as properties to the
> > json
> > >> config which is currently mostly empty.
> > >>
> > >> https://github.com/apache/drill/blob/master/exec/java-
> > exec/src/main/java/org/apache/drill/exec/store/easy/json/
> > JSONFormatPlugin.java#L93
> > >>
> > >>
> > >> https://github.com/apache/drill/blob/master/exec/java-
> > exec/src/main/java/org/apache/drill/exec/store/easy/text/
> > TextFormatPlugin.java#L135
> > >>
> > >> Select with options: https://issues.apache.org/
> > jira/browse/DRILL-4047
> > >> Jira for moving the existing options:
> > >> https://issues.apache.org/jira/browse/DRILL-4206
> > >>
> > >>
> > >> ---
> > >> If your project is set up for it, you can reply to this email and have
> > >> your
> > >> reply appear on GitHub as well. If your project does not have this
> > feature
> > >> enabled and wishes so, or if the feature is enabled but not working,
> > >> please
> > >> contact infrastructure at infrastruct...@apache.org or file a JIRA
> > ticket
> > >> with INFRA.
> > >> ---
> > >>
> > >
> > >
> >
>


Re: Suggestions for hangout topics for 08/09

2016-08-08 Thread Jason Altekruse
Yeah, I can join the hangout tomorrow to talk about the PR, thanks for the
heads up.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Mon, Aug 8, 2016 at 12:09 PM, Zelaine Fong <zf...@maprtech.com> wrote:

> Jason -- will you be able to join tomorrow's hangout, since you had raised
> questions about Subbu's pull request?
>
> -- Zelaine
>
>
> On Mon, Aug 8, 2016 at 11:33 AM, Gautam Parai <gpa...@maprtech.com> wrote:
>
>> Tomorrow's hangout is scheduled for 10AM - 11AM PST
>>
>> On Mon, Aug 8, 2016 at 11:30 AM, Subbu Srinivasan <
>> ssriniva...@zscaler.com>
>> wrote:
>>
>> > What time is tomorrow's mtg scheduled for?
>> >
>> >
>> > On Mon, Aug 8, 2016 at 10:48 AM, Gautam Parai <gpa...@maprtech.com>
>> wrote:
>> >
>> > > If you have any suggestions for Drill hangout topics for tomorrow,
>> you
>> > can
>> > > add it to this thread.  We will also ask around at the beginning of
>> the
>> > > hangout for any topics.  We will try to cover whatever possible during
>> > the
>> > > 1 hr.
>> > >
>> > > Topics:
>> > >   1.  DRILL-4653:  Malformed JSON should not stop the entire query
>> from
>> > > progressing.
>> > >Discussion about the PR.
>> > >
>> >
>>
>
>


Re: query pushdown into HBase subscan

2016-05-31 Thread Jason Altekruse
The constant folding feature is turned on by default (and can be disabled
with planner.enable_constant_folding).

It should be able to work with UDFs, as it has access to all of the same
function definitions as our standard resolution/evaluation during full
execution.

In the plan that includes the full scan, in the filter above the scan does
your expression appear as written (i.e convert_from(...) =
hash_to_long('key_part1')), or has the right hand side been reduced to a
constant value?

The next thing that would probably be good to debug would be pre-computing
the right hand side and seeing if that gets pushed down.




Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Tue, May 31, 2016 at 5:04 PM, Aditya <adityakish...@gmail.com> wrote:

> Hi Andrey,
>
> Drill currently does require a constant value on the right hand side of a
> comparison operator to pushdown the filter.
>
> I believe that Jason had worked on constant folding feature which would
> evaluate a constant expression during planning phase and rewrite the plan
> to replace the expression with the corresponding constant value.
>
> Not sure if that works with UDFs as well.
>
> Jason?
>
> On Tue, May 31, 2016 at 3:54 PM, Andrey Gusev <and...@siftscience.com>
> wrote:
>
> > Hello Drill,
> >
> > We're noticing somewhat of an odd behavior with the following query
> > against HBase table.
> >
> > They key of the table is roughly speaking
> > *8byteHash(string1)8byteHash(string2)*
> >
> >
> > SELECT CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT') p1_long, ...
> from {table}
> > WHERE CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT_BE') =
> hash_to_long('key_part1') limit 10
> >
> > The query does seem to work correctly in terms of result set but times
> out
> > on larger tables. The hash_to_long is udf that I wrote that converts a
> > string to long such that the above equality can be satisfied.
> >
> > It appears that it doesn't push down this into subscan (i.e. prefix HBase
> > scan) - while the operator profile shows HBASE_SUB_SCAN:
> >
> > [image: Inline image 1]
> >
> > The physical plan start with unconstrained full table scan:
> >
> > Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec
> [tableName={table}, startRow=null, stopRow=null, filter=null],
> >
> >
> > How can we force the where clause to be reflected into scan bounds?
> >
> > We're running latest Drill 1.6.
> >
> > Andrey
> >
>


Re: [ANNOUNCE] New PMC Chair of Apache Drill

2016-05-25 Thread Jason Altekruse
Congrats Parth!

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Wed, May 25, 2016 at 10:39 AM, rahul challapalli <
challapallira...@gmail.com> wrote:

> Congratulations Parth!
>
> Thank You Jacques for your leadership over the last few years.
>
> On Wed, May 25, 2016 at 10:26 AM, Gautam Parai <gpa...@maprtech.com>
> wrote:
>
> > Congratulations Parth!
> >
> > On Wed, May 25, 2016 at 9:02 AM, Jinfeng Ni <jinfengn...@gmail.com>
> wrote:
> >
> > > Big congratulations, Parth!
> > >
> > > Thank you, Jacques, for your contribution and leadership over the last
> > > few years!
> > >
> > >
> > > On Wed, May 25, 2016 at 8:35 AM, Jacques Nadeau <jacq...@dremio.com>
> > > wrote:
> > > > I'm pleased to announce that the Drill PMC has voted to elect Parth
> > > Chandra
> > > > as the new PMC chair of Apache Drill. Please join me in
> congratulating
> > > > Parth!
> > > >
> > > > thanks,
> > > > Jacques
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > >
> >
>


[jira] [Created] (DRILL-4663) FileSystem properties Config block from filesystem plugin are not being applied for file writers

2016-05-10 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4663:
--

 Summary: FileSystem properties Config block from filesystem plugin 
are not being applied for file writers
 Key: DRILL-4663
 URL: https://issues.apache.org/jira/browse/DRILL-4663
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jason Altekruse


Currently all of the record writers create their own empty filesystem 
configuration upon initialization. They do not currently apply the custom 
configurations that are included in the plugin configuration, which prevents 
users from setting custom properties on the write path. If possible this 
configuration should be shared with the readers. If there is a need to isolate 
this from the configuration used for the readers, we should still add the 
configurations from the storage plugin config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Created] (DRILL-4638) Netflix support

2016-04-26 Thread Jason Altekruse
select * from netflix.seinfeld where where script_text like '%BOSCO!%'

For the uninitiated: https://www.youtube.com/watch?v=lyEFiaUQYGE

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Tue, Apr 26, 2016 at 9:14 AM, Jim Bates <jba...@maprtech.com> wrote:

> Auto streaming of new netflix shows if query times are running longer then
> expected.
>
> On Tue, Apr 26, 2016 at 3:42 AM, Aditya <a...@apache.org> wrote:
>
> > For a minute I thought: "cool, new plugin!!!" :)
> >
> > On Tue, Apr 26, 2016 at 12:26 AM, Julian Hyde <jh...@apache.org> wrote:
> >
> > > This is part of the spam storm that has been sweeping many Apache
> > > projects’ JIRA accounts over the past few days.
> > >
> > > Julian
> > >
> > >
> > > > On Apr 23, 2016, at 4:50 PM, Edmon Begoli <ebeg...@gmail.com> wrote:
> > > >
> > > > How in the world did this sneak into the JIRA?
> > > >
> > > > Are we going to get next an email from alleged Nigerian prince about
> > the
> > > > massive inheritance?
> > > >
> > > > On Sat, Apr 23, 2016 at 1:01 PM, Admas lewis (JIRA) <j...@apache.org
> >
> > > wrote:
> > > >
> > > >> Admas lewis created DRILL-4638:
> > > >> --
> > > >>
> > > >> Summary: Netflix support
> > > >> Key: DRILL-4638
> > > >> URL:
> https://issues.apache.org/jira/browse/DRILL-4638
> > > >> Project: Apache Drill
> > > >>  Issue Type: Bug
> > > >>  Components: Client - C++
> > > >>Affects Versions: 1.6.0
> > > >> Environment: netflix helpline
> > > >>Reporter: Admas lewis
> > > >>Priority: Trivial
> > > >> Fix For: 1.5.0
> > > >>
> > > >>
> > > >> If you're facing any drawback together with your Netflix or your
> > Netflix
> > > >> isn't connecting then please provide U.S.A. a invoke our toll free
> > > number:
> > > >> 1-855-855-3090.
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> This message was sent by Atlassian JIRA
> > > >> (v6.3.4#6332)
> > > >>
> > >
> > >
> >
>


Re: Drill v1.6 and s3n connection

2016-04-20 Thread Jason Altekruse
Thanks Bridget, let me know if you need any other info from me or want me
to review the changes.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Wed, Apr 20, 2016 at 11:28 AM, Bridget Bevens <bbev...@maprtech.com>
wrote:

> Created DRILL-4621 <https://issues.apache.org/jira/browse/DRILL-4621> to
> track doc change request.
>
> Thanks,
> Bridget
>
> On Wed, Apr 20, 2016 at 10:10 AM, Jason Altekruse <ja...@dremio.com>
> wrote:
>
> > It looks like a number of doc pages can be improved by referencing some
> > changes made recently.
> >
> > With the inclusion of the needed jars for s3a with Drill, there is no
> > longer a need to download jets3t [1]. In addition to setting your
> > credentials, this option for allowing more concurrent connections
> > (necessary to allow reads of wider parquet files) can also be set in this
> > block instead of a core-site.xml file [2].
> >
> > This config block can actually be used to set any filesystem properties.
> > Some of these are custom to a particular filesystem like S3, but a number
> > of them are used by a variety of implementations of the HDFS interface.
> Any
> > properties like these [3] should be able to be set in this config block.
> >
> > [1] -
> >
> https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/
> > [2] -
> >
> >
> https://drill.apache.org/docs/s3-storage-plugin/#quering-parquet-format-files-on-s3
> > [3] -
> >
> >
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/core-default.xml
> >
> > Jason Altekruse
> > Software Engineer at Dremio
> > Apache Drill Committer
> >
> > On Wed, Apr 20, 2016 at 9:52 AM, Abhishek Girish <
> > abhishek.gir...@gmail.com>
> > wrote:
> >
> > > Thanks Jason! I hadn't noticed the config property for S3. I tried this
> > out
> > > now, and feel it is a lot more easier now.
> > >
> > > And yes, we should definitely update the docs. There have been quite a
> > few
> > > threads related to S3 config.
> > >
> > > On Wed, Apr 20, 2016 at 8:19 AM, Jason Altekruse <ja...@dremio.com>
> > wrote:
> > >
> > > > I don't believe there is any way in which a particular bucket has a
> > > > property of being s3, s3n or s3a. As I understand it, this only
> change
> > > the
> > > > client library that is used to interface with S3. We have included
> the
> > > jars
> > > > necessary for s3a with Drill, which is the newest and most performant
> > > > option available.
> > > >
> > > > I need to open a doc JIRA for this, but there is one way in which the
> > s3
> > > > experience was improved recently to prevent the need to restart Drill
> > to
> > > > add your S3 credentials. When you create a connection to an S3
> bucket,
> > > you
> > > > can now specify your credentials in a property named "config" in the
> > > > storage plugin. This allows you to set any filesystem properties,
> which
> > > we
> > > > previously was only possible to set with a core-site.xml file on the
> > > > classpath when starting Drill.
> > > >
> > > > Example:
> > > > {
> > > >   "type": "file",
> > > >   "enabled": true,
> > > >   "connection": "s3a://address.of.your.bucket/",
> > > >   "config": {
> > > > "fs.s3a.access.key": "",
> > > > "fs.s3a.secret.key": ""
> > > >   },
> > > >   "workspaces": {
> > > > "root": {
> > > >   "location": "/",
> > > >   "writable": false,
> > > >   "defaultInputFormat": null
> > > > }
> > > >   },
> > > >   "formats": {
> > > > "psv": {
> > > >   "type": "text",
> > > >   "extensions": [
> > > > "tbl"
> > > >   ],
> > > >   "delimiter": "|"
> > > > }, ...
> > > >
> > > >
> > > > Jason Altekruse
> > > > Software Engineer at Dremio
> > > > Apache Drill Committer
> > > >
> > > > On Wed, Apr 20, 2016 at 7:40 AM, Nick Monetta <ni...@inrix.com>
> wrote:
> > > >
> > > > > Hi,
> > > > > Does Drill v1.6 still support s3n connections or just s3a?
> > > > >
> > > > > I have a s3n S3 bucket that I'm trying to connect to and it will
> not
> > > > work.
> > > > > My config is:
> > > > >
> > > > > {
> > > > >   "type": "file",
> > > > >   "enabled": true,
> > > > >   "connection": "s3n://inrixprod-tapp/",
> > > > >   "workspaces": {
> > > > > "root": {
> > > > >   "location": "/",
> > > > >   "writable": false,
> > > > >   "defaultInputFormat": null
> > > > > },
> > > > >
> > > > > Nick Monetta | INRIX |ni...@inrix.com |Movement Intelligence |
> > > > > www.inrix.com  | mobile +1 646-248-4105 |
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>


[jira] [Resolved] (DRILL-4445) Remove extra code to work around mixture of arrays and Lists used in Logical and Physical query plan nodes

2016-04-20 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4445.

Resolution: Fixed

Fixed in d24205d4e795a1aab54b64708dde1e7deeca668b

> Remove extra code to work around mixture of arrays and Lists used in Logical 
> and Physical query plan nodes
> --
>
> Key: DRILL-4445
> URL: https://issues.apache.org/jira/browse/DRILL-4445
> Project: Apache Drill
>  Issue Type: Improvement
>    Reporter: Jason Altekruse
>    Assignee: Jason Altekruse
>
> The physical plan node classes for all of the operators currently use a mix 
> of arrays and Lists to refer to lists of incoming operators, expressions, and 
> other operator properties. This had lead to the introduction of several 
> utility methods for translating between the two representations, examples can 
> be seen in common/logical/data/Abstractbuilder.
> This isn't a major problem, but the new operator test framework uses these 
> classes as a primary interface for setting up the tests. It seemed worthwhile 
> to just refactor the classes to be consistent so that the tests would all be 
> similar. There are a few changes to execution code, but they are all just 
> trivial changes to use the list based interfaces (length vs size(), set() 
> instead of arr[i] = foo, etc.) as Jackson just transparently handles both 
> types the same (which is why this hasn't really been a problem).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4437) Implement framework for testing operators in isolation

2016-04-20 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4437.

Resolution: Fixed

Fixed in d93a3633815ed1c7efd6660eae62b7351a2c9739

> Implement framework for testing operators in isolation
> --
>
> Key: DRILL-4437
> URL: https://issues.apache.org/jira/browse/DRILL-4437
> Project: Apache Drill
>  Issue Type: Test
>  Components: Tools, Build & Test
>    Reporter: Jason Altekruse
>    Assignee: Jason Altekruse
> Fix For: 1.7.0
>
>
> Most of the tests written for Drill are end-to-end. We spin up a full 
> instance of the server, submit one or more SQL queries and check the results.
> While integration tests like this are useful for ensuring that all features 
> are guaranteed to not break end-user functionality overuse of this approach 
> has caused a number of pain points.
> Overall the tests end up running a lot of the exact same code, parsing and 
> planning many similar queries.
> Creating consistent reproductions of issues, especially edge cases found in 
> clustered environments can be extremely difficult. Even the simpler case of 
> testing cases where operators are able to handle a particular series of 
> incoming batches of records has required hacks like generating large enough 
> files so that the scanners happen to break them up into separate batches. 
> These tests are brittle as they make assumptions about how the scanners will 
> work in the future. An example of when this could break, we might do perf 
> evaluation to find out we should be producing larger batches in some cases. 
> Existing tests that are trying to test multiple batches by producing a few 
> more records than the current threshold for batch size would not be testing 
> the same code paths.
> We need to make more parts of the system testable without initializing the 
> entire Drill server, as well as making the different internal settings and 
> state of the server configurable for tests.
> This is a first effort to enable testing the physical operators in Drill by 
> mocking the components of the system necessary to enable operators to 
> initialize and execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Drill v1.6 and s3n connection

2016-04-20 Thread Jason Altekruse
It looks like a number of doc pages can be improved by referencing some
changes made recently.

With the inclusion of the needed jars for s3a with Drill, there is no
longer a need to download jets3t [1]. In addition to setting your
credentials, this option for allowing more concurrent connections
(necessary to allow reads of wider parquet files) can also be set in this
block instead of a core-site.xml file [2].

This config block can actually be used to set any filesystem properties.
Some of these are custom to a particular filesystem like S3, but a number
of them are used by a variety of implementations of the HDFS interface. Any
properties like these [3] should be able to be set in this config block.

[1] -
https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/
[2] -
https://drill.apache.org/docs/s3-storage-plugin/#quering-parquet-format-files-on-s3
[3] -
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/core-default.xml

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Wed, Apr 20, 2016 at 9:52 AM, Abhishek Girish <abhishek.gir...@gmail.com>
wrote:

> Thanks Jason! I hadn't noticed the config property for S3. I tried this out
> now, and feel it is a lot more easier now.
>
> And yes, we should definitely update the docs. There have been quite a few
> threads related to S3 config.
>
> On Wed, Apr 20, 2016 at 8:19 AM, Jason Altekruse <ja...@dremio.com> wrote:
>
> > I don't believe there is any way in which a particular bucket has a
> > property of being s3, s3n or s3a. As I understand it, this only change
> the
> > client library that is used to interface with S3. We have included the
> jars
> > necessary for s3a with Drill, which is the newest and most performant
> > option available.
> >
> > I need to open a doc JIRA for this, but there is one way in which the s3
> > experience was improved recently to prevent the need to restart Drill to
> > add your S3 credentials. When you create a connection to an S3 bucket,
> you
> > can now specify your credentials in a property named "config" in the
> > storage plugin. This allows you to set any filesystem properties, which
> we
> > previously was only possible to set with a core-site.xml file on the
> > classpath when starting Drill.
> >
> > Example:
> > {
> >   "type": "file",
> >   "enabled": true,
> >   "connection": "s3a://address.of.your.bucket/",
> >   "config": {
> > "fs.s3a.access.key": "",
> > "fs.s3a.secret.key": ""
> >   },
> >   "workspaces": {
> > "root": {
> >   "location": "/",
> >   "writable": false,
> >   "defaultInputFormat": null
> > }
> >   },
> >   "formats": {
> > "psv": {
> >   "type": "text",
> >   "extensions": [
> > "tbl"
> >   ],
> >   "delimiter": "|"
> > }, ...
> >
> >
> > Jason Altekruse
> > Software Engineer at Dremio
> > Apache Drill Committer
> >
> > On Wed, Apr 20, 2016 at 7:40 AM, Nick Monetta <ni...@inrix.com> wrote:
> >
> > > Hi,
> > > Does Drill v1.6 still support s3n connections or just s3a?
> > >
> > > I have a s3n S3 bucket that I'm trying to connect to and it will not
> > work.
> > > My config is:
> > >
> > > {
> > >   "type": "file",
> > >   "enabled": true,
> > >   "connection": "s3n://inrixprod-tapp/",
> > >   "workspaces": {
> > > "root": {
> > >   "location": "/",
> > >   "writable": false,
> > >   "defaultInputFormat": null
> > > },
> > >
> > > Nick Monetta | INRIX |ni...@inrix.com |Movement Intelligence |
> > > www.inrix.com  | mobile +1 646-248-4105 |
> > >
> > >
> > >
> >
>


Re: Operator unit test framework merged

2016-04-20 Thread Jason Altekruse
small correction: thank you Parth* for the review

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Wed, Apr 20, 2016 at 9:23 AM, Jason Altekruse <ja...@dremio.com> wrote:

> Hello all,
>
> I finally got a chance to do some final minor fixes and merge the operator
> unit test framework I posted a while back, thanks again to Path for doing a
> review on it. There are still some enhancements I would like to add to make
> the tests more flexible, but for examples of what can be done with the
> current version please check out the tests that were included with the
> patch [1]. Please don't hesitate to ask questions or suggest improvements.
> I think that writing tests in smaller units like this could go a long way
> in improving our coverage and ensure that we can write tests that
> consistently cover a particular execution path, independent of the query
> planner.
>
> For anyone looking to get more familiar with how Drill executes
> operations, these tests might be a little easier way to start getting
> antiquated with the internals of Drill. The tests mock a number of the more
> complex parts of the system and try to produce a minimal environment where
> a single operation can run.
>
> [1] -
> https://github.com/apache/drill/blob/d93a3633815ed1c7efd6660eae62b7351a2c9739/exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/BasicPhysicalOpUnitTest.java
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>


Operator unit test framework merged

2016-04-20 Thread Jason Altekruse
Hello all,

I finally got a chance to do some final minor fixes and merge the operator
unit test framework I posted a while back, thanks again to Path for doing a
review on it. There are still some enhancements I would like to add to make
the tests more flexible, but for examples of what can be done with the
current version please check out the tests that were included with the
patch [1]. Please don't hesitate to ask questions or suggest improvements.
I think that writing tests in smaller units like this could go a long way
in improving our coverage and ensure that we can write tests that
consistently cover a particular execution path, independent of the query
planner.

For anyone looking to get more familiar with how Drill executes operations,
these tests might be a little easier way to start getting antiquated with
the internals of Drill. The tests mock a number of the more complex parts
of the system and try to produce a minimal environment where a single
operation can run.

[1] -
https://github.com/apache/drill/blob/d93a3633815ed1c7efd6660eae62b7351a2c9739/exec/java-exec/src/test/java/org/apache/drill/exec/physical/unit/BasicPhysicalOpUnitTest.java

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer


Re: Drill v1.6 and s3n connection

2016-04-20 Thread Jason Altekruse
I don't believe there is any way in which a particular bucket has a
property of being s3, s3n or s3a. As I understand it, this only change the
client library that is used to interface with S3. We have included the jars
necessary for s3a with Drill, which is the newest and most performant
option available.

I need to open a doc JIRA for this, but there is one way in which the s3
experience was improved recently to prevent the need to restart Drill to
add your S3 credentials. When you create a connection to an S3 bucket, you
can now specify your credentials in a property named "config" in the
storage plugin. This allows you to set any filesystem properties, which we
previously was only possible to set with a core-site.xml file on the
classpath when starting Drill.

Example:
{
  "type": "file",
  "enabled": true,
  "connection": "s3a://address.of.your.bucket/",
  "config": {
"fs.s3a.access.key": "",
"fs.s3a.secret.key": ""
  },
  "workspaces": {
"root": {
  "location": "/",
  "writable": false,
  "defaultInputFormat": null
}
  },
  "formats": {
"psv": {
  "type": "text",
  "extensions": [
"tbl"
  ],
  "delimiter": "|"
}, ...


Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Wed, Apr 20, 2016 at 7:40 AM, Nick Monetta <ni...@inrix.com> wrote:

> Hi,
> Does Drill v1.6 still support s3n connections or just s3a?
>
> I have a s3n S3 bucket that I'm trying to connect to and it will not work.
> My config is:
>
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3n://inrixprod-tapp/",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
>
> Nick Monetta | INRIX |ni...@inrix.com |Movement Intelligence |
> www.inrix.com  | mobile +1 646-248-4105 |
>
>
>


Re: Hangout starting in 5 minutes

2016-04-05 Thread Jason Altekruse
No need for apologies John, today was pretty lively for discussion. These
are the notes I took.

---
community hangout 4/5/2016
---
Attendees: Jason, Arina, Vitalli, Stefan, Pawan, Parth, Jingeng, Sudheesh,
Aman

Pawan is new - might not have had a working mic, he didn't give an
introduction
- if you see this feel free to respond to the thread with more info
  about yourself and you interest in Drill

Topics
- Aman
- Metadata cache file
- proposal to create a separate file with just directory info
- should we just put info in a small databse like sqllite?
- 4530
- Newlines in CSV files
- 3178
- Jason - to do this we need to turn off splittability
- Aman, we can provide an option for users that want it, and they
can choose if losing splitabillity is worth it for them
- Jason - this should be a format/select with options setting, not
a session one
- Parth - or dotdrill
  - Julien was working on this? Proposal was given on a JIRA a
while back
  - has not updated it lately
- generally the other option is more flexible, an admin can set it
in the storage plugin or view, or a user can put it in a query themselves
  - dotdrill would require write permissions on the filesystem
  - still useful for other cases, collocating metadata with
data, but not necessarily needed for this case initially
- Stefan
- Wanted to apologize to Jacques for the thread last week
- We will continue the discussion on the list about the best way
forward for the Avro plugin
- Vitalli
- questions about PRs
- review the PR for spill directories, he has updated it based on the
comments
- Arina
- JIRA for viewing logs in Web UI
- originally logs only on the current node
- getting remote logs
- HTTP rest call
- Custom RPC tunnel?
- implement distributed read of files on each of the remote systems
- Parth
- release schedule? Jacques withdrew his offer for managing 1.7 to
focus on the new 2.0 branch
- Jinfeng
- partition pruning enhancement
 - remove redundant filters, we re-evaluate predicates on parent
directories over and over
- test framework complaining about ordering of files
- Jinfeng will make a proposal on the list about how to fix this in the
test framework level
- Different time?
- This lands pretty late for the folks in the Ukraine
- they said it was okay, but anyone who might not be attending due to
the
  time the meeting happens please speak up and we can look at moving it
or scheduling
  it at different time every other week or something to make sure
everyone is included

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Tue, Apr 5, 2016 at 12:32 PM, John Omernik <j...@omernik.com> wrote:

> Sorry I missed this, anything exciting happen?
>
> On Tue, Apr 5, 2016 at 11:57 AM, Jason Altekruse <ja...@dremio.com> wrote:
>
> > Anyone with an interest in Drill is welcome to attend to hear what is
> > happening in the Drill community. Feel free to ask questions or just
> listen
> > in.
> >
> > https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
> >
> > Jason Altekruse
> > Software Engineer at Dremio
> > Apache Drill Committer
> >
>


Hangout starting in 5 minutes

2016-04-05 Thread Jason Altekruse
Anyone with an interest in Drill is welcome to attend to hear what is
happening in the Drill community. Feel free to ask questions or just listen
in.

https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer


[jira] [Created] (DRILL-4551) Add some missing functions that are generated by Tableau (cot, regex_matches, split_part, isdate)

2016-03-29 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4551:
--

 Summary: Add some missing functions that are generated by Tableau 
(cot, regex_matches, split_part, isdate)
 Key: DRILL-4551
 URL: https://issues.apache.org/jira/browse/DRILL-4551
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse
Assignee: Jason Altekruse


Several of these functions do not appear to be standard SQL functions, but they 
are available in several other popular databases like SQL Server, Oracle and 
Postgres.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Moving to HBase 1.1 [DRILL-4199]

2016-03-21 Thread Jason Altekruse
With the recent issues that have been discussed on other threads related to
correctness issues when using our current client I agree we should upgrade.
+1

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Mon, Mar 21, 2016 at 1:18 PM, Aditya <a...@apache.org> wrote:

> Hi,
>
> HBase has moved to 1.1 branch as their latest stable release[1] and since
> it is wire compatible with 0.98 releases, I'd like to propose that Drill
> updates its supported HBase release to 1.1.
>
> Essentially, it means that we update the HBase clients bundled with Drill
> distribution to latest stable version of 1.1 branch. I do not expect any
> code change.
>
> I have assigned DRILL-4199 to myself and unless someone has a reason to not
> to, I'd like to move to HBase 1.1 in Drill 1.7 release.
>
> aditya...
>
> [1] https://dist.apache.org/repos/dist/release/hbase/stable
> [2] https://issues.apache.org/jira/browse/DRILL-4199
>


Hangout happening now!

2016-03-15 Thread Jason Altekruse
Hello All,

Join us to discuss the latest happenings in Drill!

https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer


Re: Time for the 1.6 Release

2016-03-10 Thread Jason Altekruse
I hadn't actually tested out the patch, what I had said was that I could
add a flag to make avro files behave like parquet and JSON, without schema
validation. The patch made it so the behavior of directories would be
different from that of individual files, removing the schema validation. I
tried applying it just now and it still doesn't appear to make the dirN
columns work, but I don't understand why. I will try to take a look tonight
and post a patch. It will be up to Parth if he wants to put it in the
release once the full fix is merged.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Thu, Mar 10, 2016 at 1:09 AM, Stefán Baxter <ste...@activitystream.com>
wrote:

> Hi,
>
> This issue is still unresolved:
> https://issues.apache.org/jira/browse/DRILL-4120
>
> It would mean a great deal to us if it was.
> The solution is, as I understood Jason and Jacques, ready and only needs to
> be merged.
>
> Regards,
>  -Stefán
>
> On Thu, Mar 10, 2016 at 3:50 AM, Parth Chandra <par...@apache.org> wrote:
>
> > Hi everyone,
> >
> >   Just a note to  update everyone that the QA team is testing out the
> build
> > from master.
> >   There are no further commits expected for the 1.6.0 release.
> >   The repo is open for commits but try not to break anything :)
> >
> >
> > Parth
> >
> > On Tue, Mar 8, 2016 at 5:16 PM, Parth Chandra <par...@apache.org> wrote:
> >
> > > Okay we are down to the final one -
> > >
> > > DRILL-4482 - Avro no longer selects data correctly from a
> > > sub-structure.(Jason)
> > >
> > > Note that MapR QA team is going to start testing 1.6 snapshot now
> before
> > I
> > > roll out the release candidate. DRILL-4482 can be merged in later as it
> > is
> > > not likely to affect the  Hopefully there will be no show stoppers (.
> > >
> > > The plan is to roll out the release candidate by Thursday.
> > >
> > > Thanks
> > >
> > > Parth
> > >
> > >
> > > On Tue, Mar 8, 2016 at 9:31 AM, Parth Chandra <par...@apache.org>
> wrote:
> > >
> > >> OK, let's leave it out then.
> > >>
> > >> On Tue, Mar 8, 2016 at 9:25 AM, Jason Altekruse <
> > altekruseja...@gmail.com
> > >> > wrote:
> > >>
> > >>> To be honest I was expecting a longer review cycle so I hadn't run
> the
> > >>> unit
> > >>> tests before posting it for review. There were only very minor
> > functional
> > >>> changes, so I wasn't thinking it would be an issue, and I was
> > >>> anticipating
> > >>> having to update the patch before merging it. I could update the test
> > >>> that
> > >>> is failing but I don't see much sense in trying to get it into the
> > >>> release
> > >>> because it only introduces new tests and some small core refactoring.
> > >>>
> > >>> I'm all for getting it merged so everyone can start using it, I just
> > >>> think
> > >>> it doesn't really matter if it happens on the release branch or back
> on
> > >>> master once we cut a release branch.
> > >>>
> > >>> I would rather try to focus on getting the Avro issues resolved,
> which
> > is
> > >>> what I'm working on right now.
> > >>>
> > >>> - Jason
> > >>>
> > >>> On Tue, Mar 8, 2016 at 8:58 AM, Parth Chandra <par...@apache.org>
> > wrote:
> > >>>
> > >>> > Sounds good Jason. Let's finalize this in the hangout.
> > >>> > Do you have the expected plans for the failing tests? If so can you
> > >>> update
> > >>> > those and put in a pull request and we'll merge and run the tests.
> > >>> > Any reason for the operator test framework to be punted? You have a
> > +1
> > >>> to
> > >>> > merge it.
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Mon, Mar 7, 2016 at 9:33 PM, Khurram Faraaz <
> kfar...@maprtech.com
> > >
> > >>> > wrote:
> > >>> >
> > >>> > > We should update the expected results (i.e. the expected query
> plan
> > >>> in
> > >>> > this
> > >>> > > case) and not mark them as Failing. We do not have a Failing test
> > >>> > directory
> > >

[jira] [Resolved] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-09 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4482.

Resolution: Fixed
  Assignee: Jason Altekruse  (was: Stefán Baxter)

Fixed in 64ab0a8ec9d98bf96f4d69274dddc180b8efe263

> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [GitHub] drill pull request: DRILL-4482: Avro subselection broken by 4382

2016-03-09 Thread Jason Altekruse
/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[119,2]
> error: illegal start of expression
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[119,4]
> error: illegal start of expression
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[119,6]
> error: illegal start of expression
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[120]
> error: illegal start of expression
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[120,3]
> error: illegal start of expression
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[120,6]
> error: illegal start of expression
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[120,14]
> error: ';' expected
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[120,48]
> error: ';' expected
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[125,4]
> error: illegal start of expression
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[125,11]
> error: illegal start of expression
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[125,25]
> error: ';' expected
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[125,35]
> error: not a statement
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[125,46]
> error: ';' expected
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[131,21]
> error: ';' expected
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[131,31]
> error: not a statement
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[131,42]
> error: ';' expected
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[135,4]
> error: illegal start of expression
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[135,29]
> error: ';' expected
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[139,4]
> error: illegal start of expression
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[139,54]
> error: ';' expected
> [ERROR]
>
> /var/as/drill/exec/java-exec/src/test/java/org/apache/drill/exec/store/avro/AvroTestUtil.java:[702,1]
> error: reached end of file while parsing
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR]   mvn  -rf :drill-java-exec
>
>
> On Wed, Mar 9, 2016 at 1:45 AM, StevenMPhillips <g...@git.apache.org>
> wrote:
>
> > Github user StevenMPhillips commented on the pull request:
> >
> > https://github.com/apache/drill/pull/419#issuecomment-194060258
> >
> > +1
> >
> >
> > ---
> > If your project is set up for it, you can reply to this email and have
> your
> > reply appear on GitHub as well. If your project does not have this
> feature
> > enabled and wishes so, or if the feature is enabled but not working,
> please
> > contact infrastructure at infrastruct...@apache.org or file a JIRA
> ticket
> > with INFRA.
> > ---
> >
>



-- 
Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer


[jira] [Created] (DRILL-4492) TestMergeJoinWithSchemaChanges depends on order files in a directory are read to pass, should be refactored

2016-03-08 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4492:
--

 Summary: TestMergeJoinWithSchemaChanges depends on order files in 
a directory are read to pass, should be refactored
 Key: DRILL-4492
 URL: https://issues.apache.org/jira/browse/DRILL-4492
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jason Altekruse
Assignee: amit hadke


I was running unit tests and saw a failure that seemed unrelated to the changes 
I was making. The test runs fine in isolation both from IntelliJ and the maven 
command line (with -Dtest=TestMergeJoinWithSchemaChanges in the java-exec 
module).

Not sure what about the particular test run made it change the order the files 
were read, but we cannot rely on any particular system to read the files in a 
given order. The test should be updated to remove this assumption.

This is the error I received on one run of the full unit tests:
{code}
testMissingAndNewColumns(TestMergeJoinWithSchemaChanges.java:265)
Caused by: org.apache.drill.common.exceptions.UserRemoteException: 
UNSUPPORTED_OPERATION ERROR: Sort doesn't currently supportsorts with 
changing schemas

Fragment 0:0

[Error Id: bf84bffb-f643-493b-9ed5-720eb18d55f2 on 10.1.10.225:31010]

  (org.apache.drill.exec.exception.SchemaChangeException) Sort currently only 
supports a single schema.
org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder.build():146
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext():442
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.RecordIterator.nextBatch():97
org.apache.drill.exec.record.RecordIterator.next():183
org.apache.drill.exec.record.RecordIterator.prepare():167
org.apache.drill.exec.physical.impl.join.JoinStatus.prepare():87
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext():162
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
org.apache.drill.exec.record.AbstractRecordBatch.next():162

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Drill Hangout Today?

2016-03-08 Thread Jason Altekruse
Meeting notes - Hangout 3/8/2016

Parth, Aman, John O., Jason, Zelaine


- release

- 4474 - incorect direct scans, MapR will run regression tests

- Avro issues, one is ready, had a +1 there is just an update for a
related issue

- the other one Jason was having an issue with repro


- Union type

- Aman - what is needed to make union type complete

- Steven could give the best answer for known shortcomings

- Jason - I think mostly we just need more thorough testing

- Aman - we need to go through all of the operators to update for

  schema change

- Jason - Union type and handling schema change are really

 two separate issues, but we should discuss a path forward

 to making both work well in Drill

- John - handling messy JSON better would be really useful for users

- doesn't need to fix everything automatically, but giving a user
info

  about why something is failing and what the next step for getting
to

  further analysis is useful


- User experience - John O.

- partial JSON records

- JSON incosistent errors, sometimes not givng line numbers

- Feature request, show the record that failed to parse

- Just give users enough info so that they know what to fix

- Drill can try to have good defaults about how to handle

  abmiguities, but making a user choose anytime Drill cannot

  be sure is fine

- metadata cache issues

- permissions issues with authentication

On Tue, Mar 8, 2016 at 10:18 AM, Parth Chandra <par...@apache.org> wrote:

> Joining in a minute
>
> On Tue, Mar 8, 2016 at 10:17 AM, Jason Altekruse <altekruseja...@gmail.com
> >
> wrote:
>
> > For anyone else interested in joining the hangout here is the link.
> >
> > https://plus.google.com/hangouts/_/dremio.com/drillhangout?authuser=1
> >
> > On Tue, Mar 8, 2016 at 10:15 AM, Jason Altekruse <
> altekruseja...@gmail.com
> > >
> > wrote:
> >
> > > Yes, sorry I forgot to sign on.
> > >
> > > Can you try to join again?
> > >
> > > On Tue, Mar 8, 2016 at 10:10 AM, Zelaine Fong <zf...@maprtech.com>
> > wrote:
> > >
> > >> Are we having one today?  We're trying to connect from the MapR end,
> but
> > >> not getting a response.
> > >>
> > >> -- Zelaine
> > >>
> > >
> > >
> >
>


Re: Drill Hangout Today?

2016-03-08 Thread Jason Altekruse
For anyone else interested in joining the hangout here is the link.

https://plus.google.com/hangouts/_/dremio.com/drillhangout?authuser=1

On Tue, Mar 8, 2016 at 10:15 AM, Jason Altekruse <altekruseja...@gmail.com>
wrote:

> Yes, sorry I forgot to sign on.
>
> Can you try to join again?
>
> On Tue, Mar 8, 2016 at 10:10 AM, Zelaine Fong <zf...@maprtech.com> wrote:
>
>> Are we having one today?  We're trying to connect from the MapR end, but
>> not getting a response.
>>
>> -- Zelaine
>>
>
>


Re: Drill Hangout Today?

2016-03-08 Thread Jason Altekruse
Yes, sorry I forgot to sign on.

Can you try to join again?

On Tue, Mar 8, 2016 at 10:10 AM, Zelaine Fong  wrote:

> Are we having one today?  We're trying to connect from the MapR end, but
> not getting a response.
>
> -- Zelaine
>


[jira] [Resolved] (DRILL-4332) tests in TestFrameworkTest fail in Java 8

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4332.

   Resolution: Fixed
Fix Version/s: (was: Future)
   1.6.0

Fixed in 447b093cd2b05bfeae001844a7e3573935e84389

> tests in TestFrameworkTest fail in Java 8
> -
>
> Key: DRILL-4332
> URL: https://issues.apache.org/jira/browse/DRILL-4332
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Assignee: Laurent Goujon
> Fix For: 1.6.0
>
>
> the following unit tests fail in Java 8:
> {noformat}
> TestFrameworkTest.testRepeatedColumnMatching
> TestFrameworkTest.testCSVVerificationOfOrder_checkFailure
> {noformat}
> The tests expect the query to fail with a specific error message. The message 
> generated by DrillTestWrapper.compareMergedVectors assumes a specific order 
> in a map keySet (which we shouldn't). In Java 8 it seems the order changed 
> which causes a slightly different error message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4486) Expression serializer incorrectly serializes escaped characters

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4486.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in 80316f3f8bef866720f99e609fe758ec8e0c4612

> Expression serializer incorrectly serializes escaped characters
> ---
>
> Key: DRILL-4486
> URL: https://issues.apache.org/jira/browse/DRILL-4486
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.6.0
>
>
> the drill expression parser requires backslashes to be escaped. But the 
> ExpressionStringBuilder is not properly escaping them. This causes problems, 
> especially in the case of regex expressions run with parallel execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4375) Fix the maven release profile, broken by jdbc jar size enforcer added in DRILL-4291

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4375.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in 1f29914fc5c7d1e36651ac28167804c4012501fe

> Fix the maven release profile, broken by jdbc jar size enforcer added in 
> DRILL-4291
> ---
>
> Key: DRILL-4375
> URL: https://issues.apache.org/jira/browse/DRILL-4375
> Project: Apache Drill
>  Issue Type: Bug
>    Reporter: Jason Altekruse
>    Assignee: Jason Altekruse
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: UnrecognizedPropertyException: Unrecognized field "config" (class org.apache.drill.exec.store.dfs.FileSystemConfig), not marked as ignorable (4 known properties: "enabled", "formats", "connection"

2016-03-08 Thread Jason Altekruse
This exception should only occur if you start an older version of Drill
using a configuration (stored in zookeeper or your local temp directory)
that was created by starting a version of Drill after 4383 was merged
(0842851c854595f140779e9ed09331dbb63f6623).

This change added a new property to filesystem configuration to allow
passing custom options to the filesystem config. This can be used in place
of core-site.xml to set things like your AWS private keys, as well as any
other properties normally provided to an implementation of the Hadoop
FileSystem API.

Removing the new configuration should allow it to start up, but you
shouldn't be seeing this if you are running the build you mentioned. Can
you verify that this version successfully built and that you are not
running an older version?

- Jason

P.S. I will be trying to get in a change soon that give a better error in
this case, it should only happen with downgrades, which we generally don't
thoroughly test, but would still be good to fix. I'm sure there are several
bugs filed about these kinds of issues, this is one of them and I've
assigned it to myself, hoping to post a fix soon.

https://issues.apache.org/jira/browse/DRILL-2048


On Tue, Mar 8, 2016 at 2:33 AM, Khurram Faraaz  wrote:

> Hi All,
>
> I am seeing an Exception on Drill 1.6.0 commit ID 447b093c (I am using the
> RPM)
>
> I did not see this Exception on earlier version of Drill 1.6.0 commit ID
> 6d5f4983
>
> Could this be related to DRILL-4383
> 
>
> Drill version where we see the Exception is
>
> git.commit.id=447b093cd2b05bfeae001844a7e3573935e84389
> git.commit.message.short=DRILL-4332\: Makes vector comparison order stable
> in test framework
>
> oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> UnrecognizedPropertyException: Unrecognized field "config" (class
> org.apache.drill.exec.store.dfs.FileSystemConfig), not marked as ignorable
> (4 known properties: "enabled", "formats", "connection", "workspaces"])
>  at [Source: [B@2b88d9b2; line: 5, column: 18] (through reference chain:
> org.apache.drill.exec.store.dfs.FileSystemConfig["config"])
>
>
> [Error Id: 7fdc89ac-91ac-46eb-8201-8fe5e1acf278 on centos-02.qa.lab:31010]
> at
>
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
> at
>
> oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
> at
>
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
> at
>
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
> at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
> at oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
> at
>
> oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
> at
> oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
> at
>
> oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
> at
>
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285)
> at
>
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257)
> at
>
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
> at
>
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
>
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
>
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
> at
>
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
>
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
>
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at
>
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
>
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
>
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
> at
>
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
>
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
>
> oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
> at
>
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
>
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> 

Re: Time for the 1.6 Release

2016-03-07 Thread Jason Altekruse
4474 is actually revealing a few invalid tests in the Regression suite that
test for the current incorrect plans. The fix should be included in the
release, but I will post a PR on the regression suite to update the tests
before I push it.

On Mon, Mar 7, 2016 at 4:44 PM, Steven Phillips <ste...@dremio.com> wrote:

> DRILL-4486 is a pretty simple fix. Without it, currently some regex queries
> will fail.
>
> I think we should include it in the release.
>
>
> https://github.com/apache/drill/pull/412
>
> On Mon, Mar 7, 2016 at 2:15 PM, Jason Altekruse <altekruseja...@gmail.com>
> wrote:
>
> > There is a small test issue with some of the refactoring that accompanied
> > the operator unit tests. These don't change any user-facing behavior, so
> I
> > don't think it really needs to get into the release. I will be working to
> > merge them into master after we cut the release branch.
> >
> > The change to update the avatica JDBC driver version also does not make
> any
> > behavior changes, so I think it also makes sense to keep it off the
> release
> > branch.
> >
> > I will be merging the fix for 4375 the maven release profile, 4474 wrong
> > results with incorrect creation of DirectScan and 4332 fixing a unit test
> > to work in JDK 8, after another test run.
> >
> > On Mon, Mar 7, 2016 at 1:53 PM, Venki Korukanti <
> venki.koruka...@gmail.com
> > >
> > wrote:
> >
> > > WebUI profile issue: this is a regression cause by refactoring of
> Calcite
> > > integration code (DRILL-4465) which sets the text plan only if debug is
> > > enabled. Will submit a patch soon.
> > >
> > > On Mon, Mar 7, 2016 at 1:29 PM, Sudheesh Katkam <skat...@maprtech.com>
> > > wrote:
> > >
> > > > Thanks for clarifying Jacques.
> > > >
> > > > I haven’t looked into the fix for DRILL-4384; I reopened it because
> the
> > > > description mentioned “visualized plan” section is (also) empty.
> > > >
> > > > Thank you,
> > > > Sudheesh
> > > >
> > > > > On Mar 7, 2016, at 1:08 PM, Jacques Nadeau <jacq...@dremio.com>
> > wrote:
> > > > >
> > > > > The new bug (currently filed under DRILL-4384) is a completely
> > > different
> > > > > bug than the original (original one has to do with profile metrics,
> > > this
> > > > > has to do with plan text). I try to look at it tonight if noone can
> > get
> > > > to
> > > > > it sooner.
> > > > >
> > > > >
> > > > > --
> > > > > Jacques Nadeau
> > > > > CTO and Co-Founder, Dremio
> > > > >
> > > > > On Mon, Mar 7, 2016 at 12:37 PM, Parth Chandra <
> > pchan...@maprtech.com>
> > > > > wrote:
> > > > >
> > > > >> DRILL-4384 is a blocker for the release though
> > > > >>
> > > > >> On Mon, Mar 7, 2016 at 12:01 PM, Sudheesh Katkam <
> > > skat...@maprtech.com>
> > > > >> wrote:
> > > > >>
> > > > >>> I reopened DRILL-4384 <
> > > > https://issues.apache.org/jira/browse/DRILL-4384>
> > > > >>> (blocker); it is assigned to Jacques.
> > > > >>>
> > > > >>> On the latest master, the visualized and physical plan tabs on
> web
> > UI
> > > > are
> > > > >>> empty.
> > > > >>>
> > > > >>> Thank you,
> > > > >>> Sudheesh
> > > > >>>
> > > > >>>> On Mar 7, 2016, at 11:39 AM, Jason Altekruse <
> > > > altekruseja...@gmail.com
> > > > >>>
> > > > >>> wrote:
> > > > >>>>
> > > > >>>> I don't know if there are any specific time constraints for
> > getting
> > > > out
> > > > >>> the
> > > > >>>> release, but I'm inclined to go with Vicky on DRILL-4477, at
> least
> > > > some
> > > > >>>> investigation into the scope of a fix would be good. I think
> it's
> > > > >>>> reasonably big problem whether it's a regression or not.
> > > > >>>>
> > > > >>>> On Mon, Mar 7, 2016 at 11:35 AM, Zelaine Fong <
> zf...@maprtech.com
> > >
> > > > >>> wrote:
>

Re: Time for the 1.6 Release

2016-03-07 Thread Jason Altekruse
There is a small test issue with some of the refactoring that accompanied
the operator unit tests. These don't change any user-facing behavior, so I
don't think it really needs to get into the release. I will be working to
merge them into master after we cut the release branch.

The change to update the avatica JDBC driver version also does not make any
behavior changes, so I think it also makes sense to keep it off the release
branch.

I will be merging the fix for 4375 the maven release profile, 4474 wrong
results with incorrect creation of DirectScan and 4332 fixing a unit test
to work in JDK 8, after another test run.

On Mon, Mar 7, 2016 at 1:53 PM, Venki Korukanti <venki.koruka...@gmail.com>
wrote:

> WebUI profile issue: this is a regression cause by refactoring of Calcite
> integration code (DRILL-4465) which sets the text plan only if debug is
> enabled. Will submit a patch soon.
>
> On Mon, Mar 7, 2016 at 1:29 PM, Sudheesh Katkam <skat...@maprtech.com>
> wrote:
>
> > Thanks for clarifying Jacques.
> >
> > I haven’t looked into the fix for DRILL-4384; I reopened it because the
> > description mentioned “visualized plan” section is (also) empty.
> >
> > Thank you,
> > Sudheesh
> >
> > > On Mar 7, 2016, at 1:08 PM, Jacques Nadeau <jacq...@dremio.com> wrote:
> > >
> > > The new bug (currently filed under DRILL-4384) is a completely
> different
> > > bug than the original (original one has to do with profile metrics,
> this
> > > has to do with plan text). I try to look at it tonight if noone can get
> > to
> > > it sooner.
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Mon, Mar 7, 2016 at 12:37 PM, Parth Chandra <pchan...@maprtech.com>
> > > wrote:
> > >
> > >> DRILL-4384 is a blocker for the release though
> > >>
> > >> On Mon, Mar 7, 2016 at 12:01 PM, Sudheesh Katkam <
> skat...@maprtech.com>
> > >> wrote:
> > >>
> > >>> I reopened DRILL-4384 <
> > https://issues.apache.org/jira/browse/DRILL-4384>
> > >>> (blocker); it is assigned to Jacques.
> > >>>
> > >>> On the latest master, the visualized and physical plan tabs on web UI
> > are
> > >>> empty.
> > >>>
> > >>> Thank you,
> > >>> Sudheesh
> > >>>
> > >>>> On Mar 7, 2016, at 11:39 AM, Jason Altekruse <
> > altekruseja...@gmail.com
> > >>>
> > >>> wrote:
> > >>>>
> > >>>> I don't know if there are any specific time constraints for getting
> > out
> > >>> the
> > >>>> release, but I'm inclined to go with Vicky on DRILL-4477, at least
> > some
> > >>>> investigation into the scope of a fix would be good. I think it's
> > >>>> reasonably big problem whether it's a regression or not.
> > >>>>
> > >>>> On Mon, Mar 7, 2016 at 11:35 AM, Zelaine Fong <zf...@maprtech.com>
> > >>> wrote:
> > >>>>
> > >>>>> Hakim,
> > >>>>>
> > >>>>> Yes, we'll include this in the release.
> > >>>>>
> > >>>>> -- Zelaine
> > >>>>>
> > >>>>> On Mon, Mar 7, 2016 at 9:31 AM, Abdel Hakim Deneche <
> > >>> adene...@maprtech.com
> > >>>>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> If we still have time, I would like to include DRILL-4457 [1],
> it's
> > a
> > >>>>> wrong
> > >>>>>> results issue, I already have a fix and it's passing all tests, I
> am
> > >>> just
> > >>>>>> waiting for a review [2]
> > >>>>>>
> > >>>>>>
> > >>>>>> [1] https://issues.apache.org/jira/browse/DRILL-4457
> > >>>>>> [2] https://github.com/apache/drill/pull/410
> > >>>>>>
> > >>>>>> On Mon, Mar 7, 2016 at 4:50 PM, Parth Chandra <par...@apache.org>
> > >>> wrote:
> > >>>>>>
> > >>>>>>> Hi guys,
> > >>>>>>>
> > >>>>>>> I'm still waiting for the following to be reviewed/merged by
> today.
> > >>>>>>>
> > >>>>&

Re: Time for the 1.6 Release

2016-03-07 Thread Jason Altekruse
> > DRILL-3688/pr 382 (skip.header.line.count in hive). -
> > Already
> > > > > > merged.
> > > > > > > > PR
> > > > > > > > > > needs to be closed.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 3, 2016 at 9:44 PM, Parth Chandra <
> > > > par...@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Right. My mistake. Thanks, Jacques, for reviewing.
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Mar 3, 2016 at 9:08 PM, Zelaine Fong <
> > > > > zf...@maprtech.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> DRILL-4281/pr 400 (Drill should support inbound
> > > > impersonation)
> > > > > > > > > (Sudheesh
> > > > > > > > > > >> to
> > > > > > > > > > >> review)
> > > > > > > > > > >>
> > > > > > > > > > >> Sudheesh is the fixer of DRILL-4281, so I don't think
> he
> > > can
> > > > > be
> > > > > > > the
> > > > > > > > > > >> reviewer :).
> > > > > > > > > > >>
> > > > > > > > > > >> -- Zelaine
> > > > > > > > > > >>
> > > > > > > > > > >> On Thu, Mar 3, 2016 at 6:30 PM, Parth Chandra <
> > > > > > par...@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> > Here's an updated list with names of reviewers
> added.
> > If
> > > > > > anyone
> > > > > > > > else
> > > > > > > > > > is
> > > > > > > > > > >> > reviewing the open PRs please let me know. Some PRs
> > have
> > > > > > owners
> > > > > > > > > names
> > > > > > > > > > >> that
> > > > > > > > > > >> > I will follow up with.
> > > > > > > > > > >> > Jason, I've included your JIRA in the list.
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > Committed for 1.6 -
> > > > > > > > > > >> >
> > > > > > > > > > >> > DRILL-4384 - Query profile is missing important
> > > > information
> > > > > on
> > > > > > > > > WebUi -
> > > > > > > > > > >> > Merged
> > > > > > > > > > >> > DRILL-3488/pr 388 (Java 1.8 support) - Merged.
> > > > > > > > > > >> > DRILL-4410/pr 380 (listvector should initiatlize
> > > bits...)
> > > > -
> > > > > > > Merged
> > > > > > > > > > >> > DRILL-4383/pr 375 (Allow custom configs for S3,
> > > Kerberos,
> > > > > > etc) -
> > > > > > > > > > Merged
> > > > > > > > > > >> > DRILL-4465/pr 401 (Simplify Calcite parsing &
> planning
> > > > > > > > integration)
> > > > > > > > > -
> > > > > > > > > > >> > Waiting to be merged
> > > > > > > > > > >> >
> > > > > > > > > > >> > DRILL-4281/pr 400 (Drill should support inbound
> > > > > impersonation)
> > > > > > > > > > >> (Sudheesh to
> > > > > > > > > > >> > review)
> > > > > > > > > > >> > DRILL-4372/pr 377(?) (Drill Operators and Functions
> > > should
> > > > > > > > correctly
> > > > > > > > > > >> expose
> > >

Re: Time for the 1.6 Release

2016-03-07 Thread Jason Altekruse
gt;
> > > > > > > > > >> On Thu, Mar 3, 2016 at 6:30 PM, Parth Chandra <
> > > > > par...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Here's an updated list with names of reviewers added.
> If
> > > > > anyone
> > > > > > > else
> > > > > > > > > is
> > > > > > > > > >> > reviewing the open PRs please let me know. Some PRs
> have
> > > > > owners
> > > > > > > > names
> > > > > > > > > >> that
> > > > > > > > > >> > I will follow up with.
> > > > > > > > > >> > Jason, I've included your JIRA in the list.
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > Committed for 1.6 -
> > > > > > > > > >> >
> > > > > > > > > >> > DRILL-4384 - Query profile is missing important
> > > information
> > > > on
> > > > > > > > WebUi -
> > > > > > > > > >> > Merged
> > > > > > > > > >> > DRILL-3488/pr 388 (Java 1.8 support) - Merged.
> > > > > > > > > >> > DRILL-4410/pr 380 (listvector should initiatlize
> > bits...)
> > > -
> > > > > > Merged
> > > > > > > > > >> > DRILL-4383/pr 375 (Allow custom configs for S3,
> > Kerberos,
> > > > > etc) -
> > > > > > > > > Merged
> > > > > > > > > >> > DRILL-4465/pr 401 (Simplify Calcite parsing & planning
> > > > > > > integration)
> > > > > > > > -
> > > > > > > > > >> > Waiting to be merged
> > > > > > > > > >> >
> > > > > > > > > >> > DRILL-4281/pr 400 (Drill should support inbound
> > > > impersonation)
> > > > > > > > > >> (Sudheesh to
> > > > > > > > > >> > review)
> > > > > > > > > >> > DRILL-4372/pr 377(?) (Drill Operators and Functions
> > should
> > > > > > > correctly
> > > > > > > > > >> expose
> > > > > > > > > >> > their types within Calcite.) - Waiting for Aman to
> > review.
> > > > > > > (Owners:
> > > > > > > > > >> Hsuan,
> > > > > > > > > >> > Jinfeng, Aman, Sudheesh)
> > > > > > > > > >> > DRILL-4313/pr 396  (Improved client randomization.
> > Update
> > > > JIRA
> > > > > > > with
> > > > > > > > > >> > warnings about using the feature ) (Sudheesh to
> review.)
> > > > > > > > > >> > DRILL-4437 (and others)/pr 394 (Operator unit test
> > > > framework).
> > > > > > > > (Parth
> > > > > > > > > to
> > > > > > > > > >> > review)
> > > > > > > > > >> > DRILL-4449/pr 389 (Wrong results when metadata cache
> is
> > > > > used..)
> > > > > > > > (Aman
> > > > > > > > > to
> > > > > > > > > >> > review)
> > > > > > > > > >> > DRILL-4416/pr 385 (quote path separator) (Owner:
> Hanifi)
> > > > > > > > > >> > DRILL-4069/pr 352 Enable RPC thread offload by default
> > > > (Owner:
> > > > > > > > > Sudheesh)
> > > > > > > > > >> >
> > > > > > > > > >> > Need review -
> > > > > > > > > >> > DRILL-4375/pr 402 (Fix the maven release profile)
> > > > > > > > > >> > DRILL-4452/pr 395 (Update Avatica Driver to latest
> > > Calcite)
> > > > > > > > > >> > DRILL-4332/pr 389 (Make vector comparison order stable
> > in
> > > > test
> > > > > > > > > >

Re: Time for the 1.6 Release

2016-03-03 Thread Jason Altekruse
I have updated the PR for the parquet date corruption issue that didn't
make it into 1.5.

https://github.com/apache/drill/pull/341
https://issues.apache.org/jira/browse/DRILL-4203

If this can get reviewed, I think it would be good to get into the release.
Any takers?

On Wed, Mar 2, 2016 at 11:07 PM, Parth Chandra <par...@apache.org> wrote:

> I've summarized the list of JIRs below.
> The first set of pull requests is under review (or have some reviewer
> assigned).
> The second set contains pull requests that need review. We need committers
> to review these. Please volunteer or these will not be able to make it into
> the release.
> The third set is Jira's that do not have a patch and/or should not be
> included because they require deeper scrutiny.
> I'm hoping we can finalize the list of PRs that can be reviewed by Friday
> morning and possibly *finalize the list of issues to be included by Friday
> end of day* so please take some time to review the PRs.
> Also note that the QA team has offered to do sanity testing once we decide
> on the final commit to be included, before the release candidate is rolled
> out, which helps with the release candidate moving forward smoothly.
>
> Here's the list -
>
> *Committed for 1.6 -*
> DRILL-4281/pr 400 (Drill should support inbound impersonation)
> DRILL-4372/pr 377(?) (Drill Operators and Functions should correctly expose
> their types within Calcite.) - Waiting for Aman to review.
> DRILL-4313/pr 396  (Improved client randomization. Update JIRA with
> warnings about using the feature ) Sudheesh to review.
> DRILL-3488/pr 388 (Java 1.8 support) Hanifi to review
> DRILL-4437 (and others)/pr 394 (Operator unit test framework). Parth to
> review
> DRILL-4384 - Query profile is missing important information on WebUi -
> Marked as resolved. Patch not applied?
>
> *Need review -*
> DRILL-4465/pr 401 (Simplify Calcite parsing & planning integration)
> DRILL-4375/pr 402 (Fix the maven release profile)
> DRILL-4452/pr 395 (Update Avatica Driver to latest Calcite)
> DRILL-4332/pr 389 (Make vector comparison order stable in test framework)
> DRILL-4449/pr 389 (Wrong results when metadata cache is used..)
> DRILL-4416/pr 385 (quote path separator)
> DRILL-4411/pr 381 (hash join over-memory condition)
> DRILL-4410/pr 380 (listvector should initiatlize bits...)
> DRILL-4387/pr 379 (GroupScan should not use star column)
> DRILL-4383/pr 375 (Allow custom configs for S3, Kerberos, etc)
> DRILL-4184/pr 372 (support variable length decimal fields in parquet)
> DRILL-4069/pr 352 Enable RPC thread offload by default
> DRILL-4120 - dir0 does not work when the directory structure contains Avro
> files - Partial patch available.
>
> *Not included (yet) - *
> DRILL-3149 - No patch available
> DRILL-4441 - IN operator does not work with Avro reader - No patch
> available
> DRILL-3745/pr 399 - Hive char support - New feature - Needs QA - Not
> included in 1.6
> DRILL-3623 - Limit 0 should avoid execution when querying a known schema.
> (Need to add limitations of current impl). Intrusive change; should be
> included at beginning of release cycle.
>
> *Others -*
> DRILL-2517   - Already resolved.
> DRILL-3688/pr 382 (skip.header.line.count in hive). - Already merged. PR
> needs to be closed.
>
>
>
> On Wed, Mar 2, 2016 at 3:11 PM, Vicky Markman <vmark...@maprtech.com>
> wrote:
>
> > You are welcome, Jacques.
> >
> > Vick*y *:)
> >
> > On Wed, Mar 2, 2016 at 3:06 PM, Jacques Nadeau <jacq...@dremio.com>
> wrote:
> >
> > > I just realized that we didn't merge the broken profile patch (thanks
> > > Vicki). We should get it merged as well.
> > >
> > > DRILL-4384
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Wed, Mar 2, 2016 at 10:46 AM, Jason Altekruse <
> > altekruseja...@gmail.com
> > > >
> > > wrote:
> > >
> > > > I should have merged this sooner but we will need this patch that I
> had
> > > > applied to the 1.5 release branch. The change is small and fixes a
> > build
> > > > problem that only appears when running the maven release profile.
> > > >
> > > > https://github.com/apache/drill/pull/402
> > > >
> > > > On Wed, Mar 2, 2016 at 9:28 AM, Jinfeng Ni <jinfengn...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi John,
> > > > >
> > > > > I think patch for DRILL-2517 has been merged to the apache master
> > > > > branch. Have you tried your query on the latest master branch?
> > > > &g

[jira] [Created] (DRILL-4471) Add unit test for the Drill Web UI

2016-03-03 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4471:
--

 Summary: Add unit test for the Drill Web UI
 Key: DRILL-4471
 URL: https://issues.apache.org/jira/browse/DRILL-4471
 Project: Apache Drill
  Issue Type: Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse


While the Web UI isn't being very actively developed, a few times changes to 
the Drill build or internal parts of the server have broken parts of the Web UI.

As the web UI is a primary interface for viewing cluster information, 
cancelling queries, configuring storage and other tasks, we really should add 
automated tests for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Apache drill on Android devices!

2016-03-03 Thread Jason Altekruse
No one has tried to do this yet. The first question I would investigate is
if Android supports java direct memory, which Drill makes extensive use of,
but is not considered a standard feature in all implementations of the JVM.
A quick glance at the docs seems to indicate that it is supported (the
allocateDirect() method is what we are interested in here) [1]. I am not
aware of all of the differences between Android and standard Java so there
may be other major hurdles. It looks like Android now has Java 7 support,
which is the version that we currently develop Drill against.

You can certainly give it a shot, but I doubt it will just work. Drill has
a lot of dependencies, which I believe you would also have to re-compile.
That could end up being quite a task itself. If you decide to try it feel
free to ask questions here and we'll try to help out the best we can.

[1] - http://developer.android.com/reference/java/nio/ByteBuffer.html

- Jason

On Thu, Mar 3, 2016 at 6:16 AM, Sandeep Choudhary <
schoudh...@gofalconsmart.com> wrote:

> Dear Team,
>
> I am looking for running the Apache Drill on Android (linux + java based)
> devices, these devices are 4-8 cores and 2-4 GB RAM + storage speed around
> to SSD speed!
>
> Is there any way to compile or run it?
>
> I think it will be great for Apache Drill too as an advantage, there are
> many No-SQL provider started supporting this but they are not good enough.
>
> Looking for a positive response.
>
> Best,
> Sandeep Choudhary
>


Re: Time for the 1.6 Release

2016-03-02 Thread Jason Altekruse
I should have merged this sooner but we will need this patch that I had
applied to the 1.5 release branch. The change is small and fixes a build
problem that only appears when running the maven release profile.

https://github.com/apache/drill/pull/402

On Wed, Mar 2, 2016 at 9:28 AM, Jinfeng Ni  wrote:

> Hi John,
>
> I think patch for DRILL-2517 has been merged to the apache master
> branch. Have you tried your query on the latest master branch?
>
> In DRILL-2517, I posted some performance number for 117k small parquet
> files. The patch did show improvement.
>
> Before DRILL-3996 is resolved, for now if your query relies on filter
> pushdown logic to push partitioning filter first, then the patch for
> DRILL-2517 will not help.
>
>
>
>
> On Wed, Mar 2, 2016 at 4:23 AM, John Omernik  wrote:
> > I'd like to request drill-2517 be added as a bandaid for the planning
> > issues when there are lots of directories of parquet files.   This issue
> is
> > really hurting drill adoption for me in my org.
> >
> > Thanks,  John
> >
> >
> >
> > On Tuesday, March 1, 2016, Edmon Begoli  wrote:
> >
> >> May I please ask to give this issue the attention for 1.6:
> >> https://issues.apache.org/jira/plugins/servlet/mobile#issue/DRILL-3149
> >>
> >> I will try to suggest a patch. Given my time constraints I might not be
> >> able to submit a complete, unit tested code but at I least I will try to
> >> submit a snippet that will help with fixing it up (I think we just need
> to
> >> do a look ahead byte lookup to ensure that it is not \r\n.
> >>
> >> On Tuesday, March 1, 2016, Jacques Nadeau  >> > wrote:
> >>
> >> > It seems like a stretch to include DRILL-3623 right before the
> release.
> >> > This is a pretty fundamental change that seems like it should soak
> for a
> >> > bit of time before we release. If we want to include, I'd suggest
> that we
> >> > disable the functionality by default and consider it experimental.
> >> >
> >> > I'll propose a few other patches for inclusion shortly.
> >> >
> >> > --
> >> > Jacques Nadeau
> >> > CTO and Co-Founder, Dremio
> >> >
> >> > On Tue, Mar 1, 2016 at 5:04 PM, Parth Chandra  >> 
> >> > > wrote:
> >> >
> >> > > Hello everyone,
> >> > >
> >> > >   It's time to start looking into the 1.6 release.
> >> > >
> >> > >   Can all the folks working on open issues let me know if there are
> any
> >> > > JIRAs you would like to get into the release?
> >> > >
> >> > >   I know of the following -
> >> > >
> >> > > *DRILL-4281 *- Drill should support inbound impersonation. Pull
> request
> >> > > expected today. C++ client to be tested.
> >> > >
> >> > > *DRILL-4372* - Drill Operators and Functions should correctly expose
> >> > their
> >> > > types within Calcite. Waiting for review.
> >> > >
> >> > > *DRILL-3623* - Limit 0 should avoid execution when querying a known
> >> > schema.
> >> > > Pull request expected today. Need to add limitations of current
> impl to
> >> > the
> >> > > JIRA. Review needed.
> >> > >
> >> > > *DRILL-4313* - Improved client randomization. Update JIRA with
> warnings
> >> > > about using the feature. Waiting for review.
> >> > >
> >> > >
> >> > > Thanks
> >> > >
> >> > > Parth
> >> > >
> >> >
> >>
> >
> >
> > --
> > Sent from my iThing
>


On improving project maintainence

2016-03-01 Thread Jason Altekruse
Hello devs,

I think everyone has noticed that there are some parts of project
maintenance that have been lagging in the past few months.

The good new is that the list has been really active, and I think that we
really have been trying to get back to everyone. Despite a lot of responses
from a lot of committers and contributors, there have been at least a fair
number of threads that received no response, or didn't lead to a resolution
of the issue for the user.

On github there are currently 80ish open pull requests, while some are
abandoned or replaced by other work, there are a number of instances of
good contributions that are waiting for review.

I don't have some magical prescription about how to solve this, but one
small change we could make would be revive this document [1] for
designating a primary list manager for each week. This role does not have
to be terribly burdening, or even require a committer to fulfill it. Many
of the questions on the list simply need to be marshalled into a JIRA with
enough info for a reproduction of the bug, or in other cases just require a
pointer to a doc page or existing JIRA on the thread to answer a question.

As far as the outstanding reviews are concerned, it might make sense for
the list manager to also try to make sure that contributions have an
assigned reviewer when they are posted.

Thoughts? Does it make sense to try to get something like this going, is
there something that made this effort fade away the last time we tried it
that we should change?

[1]  -
https://docs.google.com/spreadsheets/d/1bEQKk16Kktb1XeZwKD8xCuhaO8FtNfF1Cr2rcTv1a6M


Re: Avro support in Drill - Missing support for the IN operator and other frustrating things

2016-03-01 Thread Jason Altekruse
Hey Stefan,

It is possible that this is the case. A quick look at the code seems to
indicate that the Avro reader is not overriding the default behavior of
determining approximate row count of files. I believe there is still a
small issue with the code handling tiny files, are the files you are
dealing with at least a few megabytes?

Can you see how many minor fragments are listed under the scan operation in
the query profile? If there are multiple fragments then the scan is
parallelized.

- Jason

On Mon, Feb 29, 2016 at 1:58 PM, Stefán Baxter <ste...@activitystream.com>
wrote:

> Hi Jason,
>
> Is it possible that the Avro plugin does not use any parallelism and that
> all the target files are scanned sequentially by the same process?  (1.5)
>
> - Stefán
>
> On Fri, Feb 26, 2016 at 8:04 PM, Stefán Baxter <ste...@activitystream.com>
> wrote:
>
> > Thank you Jason.
> >
> > I do realize that this is an OS project and that everyone is doing their
> > best.
> >
> > There are just a few things I wish I had realized before switching over
> > from JSON to Avro that  have caused us a lot of problems and taken a long
> > time.
> >
> > Your work is appreciated and I apologize for letting my frustration get
> > the better of me.
> >
> > - Stefán
> >
> > On Fri, Feb 26, 2016 at 8:00 PM, Jason Altekruse <
> altekruseja...@gmail.com
> > > wrote:
> >
> >> Stefan,
> >>
> >> I'm sorry that we have not been better about getting back to the issues
> >> you
> >> have filed against the Avro reader. We do appreciate all of the effort
> you
> >> have put into filing thorough bugs and being active in the discussions
> on
> >> the list. I have responded on the bug you filed on this issue [1] with a
> >> workaround and will be posting a patch shortly with a fix.
> >>
> >> - Jason <https://issues.apache.org/jira/browse/DRILL-4120>
> >>
> >> [1] - https://issues.apache.org/jira/browse/DRILL-4441
> >> <https://issues.apache.org/jira/browse/DRILL-4120>
> >>
> >> On Thu, Feb 25, 2016 at 12:29 PM, Stefán Baxter <
> >> ste...@activitystream.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > This query targets Avro files in the latest 1.5 release:
> >> >
> >> > 0: jdbc:drill:zk=local> select count(*) from
> >> > dfs.asa.`/streaming/venuepoint/transactions/` as s where s.sold_to =
> >> > 'Customer/4-2492847';
> >> > +-+
> >> > | EXPR$0  |
> >> > +-+
> >> > | 5788|
> >> > +-+
> >> >
> >> > 0: jdbc:drill:zk=local> select count(*) from
> >> > dfs.asa.`/streaming/venuepoint/transactions/` as s where s.sold_to IN
> >> > ('Customer/4-2492847');
> >> > +-+
> >> > | EXPR$0  |
> >> > +-+
> >> > | 0   |
> >> > +-+
> >> >
> >> > It shows that the IN operator does not work with Avro (works with
> >> Parquet).
> >> >
> >> > This finally tips us over. We have invested hundreds of hours moving
> all
> >> > streaming/fresh data from JSON to Avro but the Avro part of Drill is
> >> broken
> >> > in too many ways to recommend its use to anyone.
> >> >
> >> > Attempts to report Avro errors and shortcomings, like the missing
> >> support
> >> > for dirX, has had no results.
> >> >
> >> > I think it would be prudent to warn people on the Drill website that
> the
> >> > Avro support is experimental, at best
> >> >
> >> > - Stefán Baxter
> >> >
> >>
> >
> >
>


New system tables to expose information about query and cluster state

2016-02-29 Thread Jason Altekruse
Hello all,

I am going to begin work on a series of JIRAs [1] that were filed last
month around providing more information about cluster state in new system
tables. These new tables will enable some information that is already
available in the REST API and Web UI through the standard SQL interface.
This provides the added benefit of exposing all of this information to the
analytical capabilities of Drill itself.

Additionally this should give a nicer API to build the Web UI based on
going forward. Currently direct access to data structures inside of the
core Drill server is used to populate several of the JSON responses in the
Rest API and the pages in the Web UI, as we had not written other
client-facing interfaces for this information previously.

Please take a look at the proposed tables if you have an interest in this
feature and provide feedback on the JIRA discussion if you think that any
for the information should be organized or presented differently.

[1] - https://issues.apache.org/jira/browse/DRILL-4258


[jira] [Created] (DRILL-4451) Improve operator unit tests to allow for direct inspection of the sequence of result batches

2016-02-26 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4451:
--

 Summary: Improve operator unit tests to allow for direct 
inspection of the sequence of result batches
 Key: DRILL-4451
 URL: https://issues.apache.org/jira/browse/DRILL-4451
 Project: Apache Drill
  Issue Type: Test
  Components: Tools, Build & Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse


The first version of the operator test framework allows for comparison of the 
result set with a baseline, but does not give a way to specify the expected 
batch boundaries. All of the batches are combined together before they are 
compared (sharing code with the existing test infrastructure for complete SQL 
queries).

The framework should also include a way to directly inspect SV2 and SV4 batches 
that are produced by operators like filter and sort. These structures are used 
to store a view into the incoming data (an SV2 is a bitmask for everything that 
matched the filter and an SV4 is used to represent cross-batch pointers to 
reflect the sorted order of a series of batches without rewriting them). 
Currently the test just follows the pointers to iterate over the values as they 
would appear after a rewrite of the data (by the SelectionVectorRemover 
operator).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4450) Improve operator unit tests to allow for setting custom options on a test

2016-02-26 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4450:
--

 Summary: Improve operator unit tests to allow for setting custom 
options on a test
 Key: DRILL-4450
 URL: https://issues.apache.org/jira/browse/DRILL-4450
 Project: Apache Drill
  Issue Type: Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse


The initial work done on the operator test framework included mocking of the 
system/session options just complete enough to get the first ~10 operators to 
execute a single query. These values are currently shared across all tests. To 
test all code paths we will need a way to set options from individual tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4448) Specification of Ordering (ASC, DESC) on a sort plan node uses Strings for it construction, should also allow for use of the corresponding Calcite Enums

2016-02-26 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4448:
--

 Summary: Specification of Ordering (ASC, DESC) on a sort plan node 
uses Strings for it construction, should also allow for use of the 
corresponding Calcite Enums
 Key: DRILL-4448
 URL: https://issues.apache.org/jira/browse/DRILL-4448
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Avro support in Drill - Missing support for the IN operator and other frustrating things

2016-02-26 Thread Jason Altekruse
Stefan,

I'm sorry that we have not been better about getting back to the issues you
have filed against the Avro reader. We do appreciate all of the effort you
have put into filing thorough bugs and being active in the discussions on
the list. I have responded on the bug you filed on this issue [1] with a
workaround and will be posting a patch shortly with a fix.

- Jason 

[1] - https://issues.apache.org/jira/browse/DRILL-4441


On Thu, Feb 25, 2016 at 12:29 PM, Stefán Baxter 
wrote:

> Hi,
>
> This query targets Avro files in the latest 1.5 release:
>
> 0: jdbc:drill:zk=local> select count(*) from
> dfs.asa.`/streaming/venuepoint/transactions/` as s where s.sold_to =
> 'Customer/4-2492847';
> +-+
> | EXPR$0  |
> +-+
> | 5788|
> +-+
>
> 0: jdbc:drill:zk=local> select count(*) from
> dfs.asa.`/streaming/venuepoint/transactions/` as s where s.sold_to IN
> ('Customer/4-2492847');
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
>
> It shows that the IN operator does not work with Avro (works with Parquet).
>
> This finally tips us over. We have invested hundreds of hours moving all
> streaming/fresh data from JSON to Avro but the Avro part of Drill is broken
> in too many ways to recommend its use to anyone.
>
> Attempts to report Avro errors and shortcomings, like the missing support
> for dirX, has had no results.
>
> I think it would be prudent to warn people on the Drill website that the
> Avro support is experimental, at best
>
> - Stefán Baxter
>


[jira] [Created] (DRILL-4439) Improve new unit operator tests to handle operators that expect RawBatchBuffers off of the wire, such as the UnorderedReciever and MergingReciever

2016-02-25 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4439:
--

 Summary: Improve new unit operator tests to handle operators that 
expect RawBatchBuffers off of the wire, such as the UnorderedReciever and 
MergingReciever
 Key: DRILL-4439
 URL: https://issues.apache.org/jira/browse/DRILL-4439
 Project: Apache Drill
  Issue Type: Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4437) Implement framework for testing operators in isolation

2016-02-25 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4437:
--

 Summary: Implement framework for testing operators in isolation
 Key: DRILL-4437
 URL: https://issues.apache.org/jira/browse/DRILL-4437
 Project: Apache Drill
  Issue Type: Test
  Components: Tools, Build & Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse
 Fix For: 1.6.0


Most of the tests written for Drill are end-to-end. We spin up a full instance 
of the server, submit one or more SQL queries and check the results.

While integration tests like this are useful for ensuring that all features are 
guaranteed to not break end-user functionality overuse of this approach has 
caused a number of pain points.

Overall the tests end up running a lot of the exact same code, parsing and 
planning many similar queries.

Creating consistent reproductions of issues, especially edge cases found in 
clustered environments can be extremely difficult. Even the simpler case of 
testing cases where operators are able to handle a particular series of 
incoming batches of records has required hacks like generating large enough 
files so that the scanners happen to break them up into separate batches. These 
tests are brittle as they make assumptions about how the scanners will work in 
the future. An example of when this could break, we might do perf evaluation to 
find out we should be producing larger batches in some cases. Existing tests 
that are trying to test multiple batches by producing a few more records than 
the current threshold for batch size would not be testing the same code paths.

We need to make more parts of the system testable without initializing the 
entire Drill server, as well as making the different internal settings and 
state of the server configurable for tests.

This is a first effort to enable testing the physical operators in Drill by 
mocking the components of the system necessary to enable operators to 
initialize and execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4438) Fix out of memory failure identified by new operator unit tests

2016-02-25 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4438:
--

 Summary: Fix out of memory failure identified by new operator unit 
tests
 Key: DRILL-4438
 URL: https://issues.apache.org/jira/browse/DRILL-4438
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jason Altekruse
Assignee: Jason Altekruse
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3930) Remove direct references to TopLevelAllocator from unit tests

2016-02-25 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-3930.

   Resolution: Fixed
 Assignee: (was: Chris Westin)
Fix Version/s: 1.3.0

> Remove direct references to TopLevelAllocator from unit tests
> -
>
> Key: DRILL-3930
> URL: https://issues.apache.org/jira/browse/DRILL-3930
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Chris Westin
> Fix For: 1.3.0
>
>
> The RootAllocatorFactory should be used throughout the code to allow us to 
> change allocators via configuration or other software choices. Some unit 
> tests still reference TopLevelAllocator directly. We also need to do a better 
> job of handling exceptions that can be handled by close()ing an allocator 
> that isn't in the proper state (remaining open child allocators, outstanding 
> buffers, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4394) Can’t build the custom functions for Apache Drill 1.5.0

2016-02-24 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4394.

Resolution: Fixed
  Assignee: Jason Altekruse

> Can’t build the custom functions for Apache Drill 1.5.0
> ---
>
> Key: DRILL-4394
> URL: https://issues.apache.org/jira/browse/DRILL-4394
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Kumiko Yada
>    Assignee: Jason Altekruse
>Priority: Critical
>
> I tried to build the custom functions for Drill 1.5.0, but I got the below 
> error:
> Failure to find org.apache.drill.exec:drill-java-exec:jar:1.5.0 in 
> http://repo.maven.apache.org/maven2 was cached in the local repository.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4435) Add YARN jar required for running Drill on cluster with Kerberos

2016-02-24 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4435:
--

 Summary: Add YARN jar required for running Drill on cluster with 
Kerberos
 Key: DRILL-4435
 URL: https://issues.apache.org/jira/browse/DRILL-4435
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse
Assignee: Jason Altekruse


As described here, Drill currently requires adding a YARN jar to the classpath 
to run on Kerberos. If it doesn't conflict with any jars currently included 
with Drill we should just include this in the distribution to make this work 
out of the box.

http://www.dremio.com/blog/securing-sql-on-hadoop-part-2-installing-and-configuring-drill/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3229) Create a new EmbeddedVector

2016-02-24 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-3229.

   Resolution: Fixed
Fix Version/s: (was: Future)
   1.4.0

> Create a new EmbeddedVector
> ---
>
> Key: DRILL-3229
> URL: https://issues.apache.org/jira/browse/DRILL-3229
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Jacques Nadeau
>Assignee: Steven Phillips
> Fix For: 1.4.0
>
>
> Embedded Vector will leverage a binary encoding for holding information about 
> type for each individual field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-284) Publish artifacts to maven for Drill

2016-02-24 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-284.
---
   Resolution: Fixed
Fix Version/s: (was: Future)
   1.1.0

> Publish artifacts to maven for Drill
> 
>
> Key: DRILL-284
> URL: https://issues.apache.org/jira/browse/DRILL-284
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Timothy Chen
> Fix For: 1.1.0
>
>
> We need to publish our artifacts and version to maven so other dependencies 
> (Whirr, or other ones that wants maven include) can use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Hangout notes from the last few meetings

2016-02-24 Thread Jason Altekruse
Hey guys,

Sorry I haven't been sending these out, I keep meaning to go back to them
and clean them up before sending them out and I don't get around to it. I
will just post the raw notes after the meeting going forward and provide
clarification on the thread if anyone has questions.


Drill Hangout - 2/9/2016

- Attendees: Yulia, Sean, Vicky, Sudheesh, Neeraja, Karol Potocki, Arina,
Aman, Hakim, Jinfeng

- New community members, Welcome!

- Arina

- working with MapR team

- Karol

- tiny contribution to allow spacial queries in Drill

- interested in sparking interest in geo locations

- PR outstanding for shapefile format

- Neeraja - would be nice for simple doc for users to start

- examples in PR and Karol's github repo

- he could write a blog post for the apache repo

- Discussion topics

- Sudheesh

- 4281 - client impersonation

- post a design doc soon

- some drill deployments

- tableau desktop is presentation layer on top of Drill

- users only use tableau desktop, talking through tableau server

- want to pass user from tableau desktop through the tableau
server

  so that impersonation works correctly

- requires a change in Tableau as well, working with the team
there

- Yulia

- 4132 - simple queries in parallel

- design doc on JIRA

- 2 goals

- separate planning from execution

- separate fragment plans so that they can be run independently

- those available please review design doc and PR

- 1.5.0

- new vote out soon

- Jinfeng

- 2517 - directory pruning in calcite logical

- vicky seems to have found a bug

- follow up work

- need to separate the rules and run them individually to

  improve planning performance

- Drill user survey

- other projects list who is using them

- just a google survey

- simple questions, I assume all will be considered optional

- current drill version in use

- cluster size

- datasources used

- clients: sqlline, REST, Applications, JDBC, ODBC, BI Tools

- what is your use case?

- why Drill?

- data formats, data types

- are you using any of the security features of Drill to
restrict access of some data to users?

- view chaining, impersonation, Web UI security

- SQL features you would like to see as enhancements soon?

- how many users are querying your Drill cluster

- have you written a storage plugin, UDF or format plugin?

- issues with the build

- jdbc-all jar size enforcement

- jacques made changes to remove proguard and generally fix up
jdbc-all JAR

- 1.4.0 has a large JDBC-all jar that wasn't excluding what it was
supposed to

- Aman

- Dechang - perf regressions on rc2 metadata cache


Drill Hangout - 2/16/2016

- Attendees: Parth, Andries, Arina, Jason, Vitalii

- Topics for discussion

- Release

- issues with publishing the web site

- annoucnement should be up shortly

- Jacques had mentioned Metadata caching

- follow up if he wants to post thoughts

- Discussion was short today


Drill Hangout - 2/23/2016

- Attendees: Jason, Minji, Laurent, Arina, Parth, Sudheesh, Zelaine


arina -- modify calcite, timestamp related function --> contact calcite
folks/julien


improve c++ client, better distribution of queries across cluster,
randomization routine not distributing uniformly.

session options not allowed since can't maintain sessions if uniformly
distributed

--> c++ client std c library rand() function not always good

--> different random number generator

--> new connnection in the pool, then need to keep track of all the
altersessions (temporary tables, new schema, etc.)

--> small number of clients, need foreman workload distributed more
(planning and so on)

--> ping jacques


impersonation--> client to impersonate other clients (Delegation?)

--> odbc/jdbc:  provide an api (c++/java) and how they will use it

--> waiting on comments


better testing for operator:  better tests for independent components

--> mock internal parts of systems

--> run operators in isolation (posting soon)

--> exchanges needs a bit more discussion (vector container) - separate way
to mock data coming in


juliens test changes to run tests on multiple drill bits (?)

--> This actually wasn't Julien's contribution as was in the meeting,
Sudheesh was actually referring to Andrew's PR here:
https://github.com/apache/drill/pull/135


Re: failing to drill zips of jsons

2016-02-23 Thread Jason Altekruse
Drill needs to know what format is stored underneath the compression, the
default way this is accomplished is with a compound extension (I don't know
if there is an accepted term for this practice).

You should be able to read the file if you name it data.json.zip.

On Tue, Feb 23, 2016 at 2:53 PM, Sharma, Tapan  wrote:

> Hi Gang,
>
> I read the following email:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201412.mbox/%3ccampyv7by21rj1nw5kuhbwn-19mzjpe4np3kzzu294texr96...@mail.gmail.com%3E
>
> And I was trying to query from a zip of JSONs and it just fails with
> Validation error.  Any clue what might be wrong?
>
>
> taps@ubuntu:~/data/temp$ file json.zip
> json.zip: Zip archive data, at least v1.0 to extract
>
> 0: jdbc:drill:zk=local> select count(*) from
> dfs.`/home/taps/data/temp/json.zip`;
> Feb 21, 2016 2:35:26 PM
> org.apache.calcite.sql.validate.SqlValidatorException 
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table
> 'dfs./home/taps/data/temp/json.zip' not found
> Feb 21, 2016 2:35:26 PM org.apache.calcite.runtime.CalciteException 
> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1,
> column 22 to line 1, column 24: Table 'dfs./home/taps/data/temp/json.zip'
> not found
> Error: VALIDATION ERROR: From line 1, column 22 to line 1, column 24:
> Table 'dfs./home/taps/data/temp/json.zip' not found
>
>
> [Error Id: f734308a-8c5b-4140-b813-08572f9a54c1 on ubuntu:31010]
> (state=,code=0)
>
> If I unzip the JSONs and try the query it works, I unzipped it in the very
> same temp directory and moved the json.zip file.
> 0: jdbc:drill:zk=local> select count(*) from dfs.`/home/taps/data/temp/`;
> +-+
> | EXPR$0  |
> +-+
> | 284 |
> +-+
> 1 row selected (0.632 seconds)
>
>
> Thanks,
> Tapan
>


Hangout Happening Now!

2016-02-23 Thread Jason Altekruse
https://plus.google.com/hangouts/_/dremio.com/drillhangout?authuser=1


[jira] [Created] (DRILL-4426) Review storage and format plugins like parquet, JSON, Avro, Hive, etc. to ensure they fail with useful error messages including filename, column, etc.

2016-02-23 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4426:
--

 Summary: Review storage and format plugins like parquet, JSON, 
Avro, Hive, etc. to ensure they fail with useful error messages including 
filename, column, etc.
 Key: DRILL-4426
 URL: https://issues.apache.org/jira/browse/DRILL-4426
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse
Assignee: Jason Altekruse


A number of these issues have been fixed in the past in individual instances. 
but we should review any remaining cases where a failure does not produce an 
error message with as much useful context information as possible. Filename 
should always be possible, column or record/line number where possible would be 
good.

One such case with a low level parquet failure was reported here.

http://search-hadoop.com/m/qRVAX48ao4xTDne/drill+Query+Return+Error+because+of+a+single+file=Query+Return+Error+because+of+a+single+file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Move master to 1.6.0-SNAPSHOT?

2016-02-19 Thread Jason Altekruse
Just pushed the version change. Will do it as soon as I create a release
branch in the future.

On Fri, Feb 19, 2016 at 1:47 PM, Jason Altekruse <altekruseja...@gmail.com>
wrote:

> Agreed, running the build for a sanity check and will push when it
> completes.
>
> On Fri, Feb 19, 2016 at 1:31 PM, Aditya <adityakish...@gmail.com> wrote:
>
>> In my opinion this should be done as the next commit on the master branch
>> as soon a release branch is created.
>>
>> This avoids maven artifacts from one branch polluting another.
>>
>> We should definitely do it now.
>>
>> aditya...
>>
>> On Fri, Feb 19, 2016 at 10:37 AM, Abhishek Girish <agir...@mapr.com>
>> wrote:
>>
>> > Hey Jason,
>> >
>> > Should we update the version info? I built from master and it still
>> shows
>> > up as 1.5.0-SNAPSHOT
>> >
>> > Thanks,
>> > Abhishek
>> >
>>
>
>


Re: Move master to 1.6.0-SNAPSHOT?

2016-02-19 Thread Jason Altekruse
Agreed, running the build for a sanity check and will push when it
completes.

On Fri, Feb 19, 2016 at 1:31 PM, Aditya  wrote:

> In my opinion this should be done as the next commit on the master branch
> as soon a release branch is created.
>
> This avoids maven artifacts from one branch polluting another.
>
> We should definitely do it now.
>
> aditya...
>
> On Fri, Feb 19, 2016 at 10:37 AM, Abhishek Girish 
> wrote:
>
> > Hey Jason,
> >
> > Should we update the version info? I built from master and it still shows
> > up as 1.5.0-SNAPSHOT
> >
> > Thanks,
> > Abhishek
> >
>


[ANNOUNCE] Apache Drill 1.5.0 released

2016-02-16 Thread Jason Altekruse
On behalf of *Apache* *Drill* community, I am happy to *announce* the
release of
*Apache* *Drill* 1.5.0.

The source and binary artifacts are available at [1]
Review a complete list of fixes and enhancements at [2]

This release of *Drill* fixes many issues and introduces a number of
enhancements, including the following highlights:

Web Authentication

Drill 1.5 extends Drill user authentication to the Web Console and
underlying REST API so administrators can control the extent of access to
the Web Console and REST API client applications.

Kudu Support

Drill now includes experimental support for querying the Apache Kudu
(incubating) scalable columnar database.

Improved Memory Allocator

Drill uses a new allocator that improves an operator’s use of direct memory
and tracks the memory use more accurately.

Configurable Caching of Hive Metadata

You can now configure the TTL for the Hive metadata client cache depending
on how frequently the Hive metadata is updated.

Thanks to everyone in the community who contributed in this release.

[1] http://drill.apache.org/download/
[2] http://drill.apache.org/docs/apache-drill-1-5-0-release-notes/

- Jason


Hangout happening now!

2016-02-16 Thread Jason Altekruse
Join us to hear the latest news on Drill, anyone with an interest in Drill
is welcome to join.

https://plus.google.com/hangouts/_/dremio.com/drillhangout

- Jason


Question about generating the markdown pages for the Drill site

2016-02-15 Thread Jason Altekruse
Hello Devs,

I am currently trying to publish a blog post for the 1.5 release before I
send out the announcement to the mailing list, and I am having a little
trouble with the Jekyll site generation.

I pulled down the latest gh-pages branch and wrote a new blog post based on
a draft from Bridget:
https://github.com/jaltekruse/incubator-drill/commit/0e88b88a6acd4dcfe61ef849c4ed13aa878485d7

When I went to go generate the site using the commands from the readme on
the gh-pages branch I am getting errors for old doc pages that are missing
dates. I can fix each of the pages by editing it and committing to cause
the date auto-generation, but it appears that there are quite a few pages
missing the dates.

Are these the up-to-date commands that have been used recently? Is there
something I should be doing to make it skip the missing dates?

(taken from the readme page)

jekyll build --config _config.yml,_config-prod.yml
_tools/createdatadocs.py
jekyll build --config _config.yml,_config-prod.yml

Thanks in advance for any help!
Jason


[RESULT][VOTE] Release Apache Drill 1.5.0 RC3

2016-02-12 Thread Jason Altekruse
*The vote* passes. Thanks everyone for your time *voting*. Final Tally:

4 x +1 (binding)
Jason
Jacques
Aman
Jinfeng


4 x +1 (non-binding)
Sudheesh
Rahul
Abhishek
Norris

No -1s.

I'll push the *release* artifacts and send an announcement once propagated.

Thanks,
Jason

On Fri, Feb 12, 2016 at 10:31 AM, Norris Lee <norr...@simba.com> wrote:

> +1 (non-binding)
>
> Built from source on CentOS. Ran queries against JSON, CSV, TSV, parquet,
> Hive tables through the ODBC driver.
>
> Norris
>
> -Original Message-
> From: Abhishek Girish [mailto:abhishek.gir...@gmail.com]
> Sent: Friday, February 12, 2016 9:12 AM
> To: dev@drill.apache.org
> Subject: Re: [VOTE] Release Apache Drill 1.5.0 RC3
>
> +1 (non-binding)
>
> - Built from source, and did basic manual sanity tests.
> - Ran Functional Regression tests from Drill Test Framework.
>
> -Abhishek
>
> On Friday, February 12, 2016, Jinfeng Ni <jinfengn...@gmail.com> wrote:
>
> > +1 (binding)
> >
> > * Download src tar ball and ran unit tests and full maven build on Mac.
> > * Run yelp tutorial queries against yelp dataset.
> > * Run TPC-DS queries through sqline and WebUi.
> >
> >
> >
> >
> > On Fri, Feb 12, 2016 at 8:46 AM, Aman Sinha <asi...@maprtech.com
> > <javascript:;>> wrote:
> > > +1 (binding)
> > > - Downloaded src and built, ran unit tests on my Mac
> > > - Manually ran a few queries against TPC-DS
> > > - Verified partition pruning, metadata caching was working as
> > > expected
> > for
> > > these test queries
> > > - Checked query profile in Web UI
> > >
> > > looks good !
> > > Aman
> > >
> > > On Thu, Feb 11, 2016 at 9:31 PM, Jacques Nadeau <jacq...@dremio.com
> > <javascript:;>> wrote:
> > >
> > >> Download, build, unit tests.
> > >> Deploy on small cluster and verify operation of a few distributed
> > queries.
> > >>
> > >> LGTM
> > >> +1 (binding)
> > >>
> > >> --
> > >> Jacques Nadeau
> > >> CTO and Co-Founder, Dremio
> > >>
> > >> On Thu, Feb 11, 2016 at 4:18 PM, rahul challapalli <
> > >> challapallira...@gmail.com <javascript:;>> wrote:
> > >>
> > >> > +1 (non-binding)
> > >> >
> > >> > Built from source and ran the Functional and Advanced suites from
> > >> > the
> > >> test
> > >> > framework.
> > >> > Performed some sanity tests against web ui authentication.
> > >> >
> > >> > On Thu, Feb 11, 2016 at 4:03 PM, Jason Altekruse <
> > >> altekruseja...@gmail.com <javascript:;>
> > >> > >
> > >> > wrote:
> > >> >
> > >> > > Thanks for the vote Sudheesh!
> > >> > >
> > >> > > Please, others with time available try out the candidate, the
> > >> > > vote
> > is
> > >> > > supposed to close tomorrow at 7PM Pacific.
> > >> > >
> > >> > > Thanks,
> > >> > > Jason
> > >> > >
> > >> > > On Thu, Feb 11, 2016 at 3:53 PM, Sudheesh Katkam <
> > sudhe...@apache.org <javascript:;>>
> > >> > > wrote:
> > >> > >
> > >> > > > +1 (non-binding; committer)
> > >> > > >
> > >> > > > * downloaded and built from source tar-ball; ran unit tests
> > >> > successfully
> > >> > > on
> > >> > > > Ubuntu
> > >> > > > * ran simple queries (including cancellations) in embedded
> > >> > > > mode on
> > >> Mac;
> > >> > > > verified states in web UI
> > >> > > > * ran simple queries (including cancellations) on a 3 node
> > cluster;
> > >> > > > verified states in web UI
> > >> > > > * verified that queries complete with queuing enabled
> > >> > > > * verified md5 and sha1 checksums on binary and src
> > >> > > > tar-balls, and
> > >> > zipped
> > >> > > > folder
> > >> > > >
> > >> > > > Thank you,
> > >> > > > Sudheesh
> > >> > > >
> > >> > > > On Tue, Feb 9, 2016 at 7:38 PM, Jason Altekruse <
> &

Re: [VOTE] Release Apache Drill 1.5.0 RC3

2016-02-11 Thread Jason Altekruse
Thanks for the vote Sudheesh!

Please, others with time available try out the candidate, the vote is
supposed to close tomorrow at 7PM Pacific.

Thanks,
Jason

On Thu, Feb 11, 2016 at 3:53 PM, Sudheesh Katkam <sudhe...@apache.org>
wrote:

> +1 (non-binding; committer)
>
> * downloaded and built from source tar-ball; ran unit tests successfully on
> Ubuntu
> * ran simple queries (including cancellations) in embedded mode on Mac;
> verified states in web UI
> * ran simple queries (including cancellations) on a 3 node cluster;
> verified states in web UI
> * verified that queries complete with queuing enabled
> * verified md5 and sha1 checksums on binary and src tar-balls, and zipped
> folder
>
> Thank you,
> Sudheesh
>
> On Tue, Feb 9, 2016 at 7:38 PM, Jason Altekruse <altekruseja...@gmail.com>
> wrote:
>
> > For anyone who jumped on testing out the release candidate early I'm
> going
> > to have to ask you to re-download the artifacts you verified. I had
> > prepared an earlier version of this candidate (but didn't get a chance to
> > start the vote) before another regression was identified and fixed
> today. I
> > had forgotten to update the source and binary artifacts on my apache web
> > space with the new ones.
> >
> > I just uploaded the corrected versions of the artifacts after verifying
> > their git.properties files to ensure they were the correct versions.
> >
> > The last modified dates on the correct versions are between: 10-Feb-2016
> > 03:30 and 10-Feb-2016 03:34
> >
> > (the copy took a few minutes as my home upload speed isn't great :P)
> >
> > Sorry about the mistake, everything should be good to go now.
> >
> > Thanks,
> > Jason
> >
> >
> >
> > On Tue, Feb 9, 2016 at 7:08 PM, Jason Altekruse <
> altekruseja...@gmail.com>
> > wrote:
> >
> > > Hello all,
> > >
> > > I'd like to propose the forth release candidate (rc3) of Apache Drill,
> > > version
> > > 1.5.0. It covers a total of 60 resolved JIRAs [1]. Thanks to everyone
> who
> > > contributed to this release. This release candidate includes fixes for
> > > DRILL-4235 and DRILL-4380, both regressions found sine the last release
> > > candidate.
> > >
> > > I also pulled in two bug fixes (4230, 4349) that had been merged into
> > > master since making the release branch, they looked useful to include
> and
> > > were both had little risk of introducing regressions.
> > >
> > > The tarball artifacts are hosted at [2] and the maven artifacts are
> > hosted
> > > at
> > > [3]. This release candidate is based on commit
> > > 3f228d34782741457a14e28b0d1fdbc35a4fd958 located at [4].
> > >
> > > The vote will be open for the next 72 hours ending at 7 PM Pacific,
> > > February 12th, 2016.
> > >
> > > [ ] +1
> > > [ ] +0
> > > [ ] -1
> > >
> > > Here's my vote: +1
> > >
> > > Thanks,
> > > Jason
> > >
> > > [1]
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12332948
> > >
> > > [2] http://people.apache.org/~json/apache-drill-1.5.0.rc3/
> > > [3]
> > https://repository.apache.org/content/repositories/orgapachedrill-1028
> > > [4] https://github.com/jaltekruse/incubator-drill/tree/drill-1.5.0-rc3
> > >
> >
>


[jira] [Created] (DRILL-4383) Allow passign custom configuration options to a file system through the storage plugin config

2016-02-11 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4383:
--

 Summary: Allow passign custom configuration options to a file 
system through the storage plugin config
 Key: DRILL-4383
 URL: https://issues.apache.org/jira/browse/DRILL-4383
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Reporter: Jason Altekruse
Assignee: Jason Altekruse
 Fix For: 1.6.0


A similar feature already exists in the Hive and Hbase plugins, it simply 
provides a key/value map for passing custom configuration options to the 
underlying storage system.

This would be useful for the filesystem plugin to configure S3 without needing 
to create a core-site.xml file or restart Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Release Apache Drill 1.5.0 RC2

2016-02-09 Thread Jason Altekruse
Alright, that sinks the release, I'll have a new candidate up shortly.

On Tue, Feb 9, 2016 at 5:50 AM, Jacques Nadeau <jacq...@dremio.com> wrote:

> It sounds like a blocker to me.
>
> I'm switching to -1
> On Feb 8, 2016 9:17 PM, "Sudheesh Katkam" <skat...@maprtech.com> wrote:
>
> > I agree that there should be tests with queuing enabled (at least sanity
> > tests). I did not mean to delay the release, but this regression causes
> all
> > queries to fail with an illegal state transition exception (when queueing
> > is enabled).
> >
> > Thank you,
> > Sudheesh
> >
> > > On Feb 8, 2016, at 6:22 PM, Jason Altekruse <altekruseja...@gmail.com>
> > wrote:
> > >
> > > The case that was reported in the JIRA was a failure on a very simple
> > > query:  select * from sys.options;
> > >
> > > I assume this means that any query will fail when queuing is enabled.
> > That
> > > would make a strong case for inclusion in the release, I didn't look
> > > closely at the JIRA before. Hakim, you reviewed the patch, but it
> doesn't
> > > include any new tests. Did Hanifi mention if the change made there was
> > > necessary to pretty much fix any query when queuing was enabled?
> > >
> > > - Jason
> > >
> > > On Mon, Feb 8, 2016 at 4:57 PM, Abdel Hakim Deneche <
> > adene...@maprtech.com>
> > > wrote:
> > >
> > >> Does it mean that any user who's been using queuing won't be able to
> use
> > >> 1.5.0 ?
> > >>
> > >> On Mon, Feb 8, 2016 at 4:40 PM, Jason Altekruse <
> > altekruseja...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hey Sudheesh,
> > >>>
> > >>> I just pushed Venki's fix for the Web UI issue to the master branch.
> > >>>
> > >>> My fix for the build issue I ran into when trying to prepare the
> > release
> > >> is
> > >>> a fair point. The change only has a very limited impact on the build,
> > and
> > >>> only changes the result when running a release itself. I should have
> > been
> > >>> better communicating the change that was made, I have the posted an
> > >> update
> > >>> on the JIRA I filed to do a follow-up investigation of the problem
> > [1]. I
> > >>> didn't include it on m merge branch with Venki's change, but I will
> > post
> > >> it
> > >>> shortly associated with this new JIRA [2] for review and kick off the
> > >> tests
> > >>> with the change rebased.
> > >>>
> > >>> As far as 4235 is concerned, I would like the release to be as stable
> > as
> > >>> possible, but the release has taken quite a long time to get to vote.
> > >> This
> > >>> issue was filed at the end of December, and was fixed just 4 days
> ago,
> > >> with
> > >>> no comment on the previous release thread about including the fix in
> > the
> > >>> release. I fully support making queuing a first-class feature of
> Drill,
> > >> but
> > >>> we need to add automated tests for it if we want it to stay stable.
> > >>>
> > >>> I'm open to discussion on the topic, but I'm not sure we should delay
> > the
> > >>> release further for it.
> > >>>
> > >>> - Jason
> > >>>
> > >>> [1] - https://issues.apache.org/jira/browse/DRILL-4336
> > >>> [2] - https://issues.apache.org/jira/browse/DRILL-4375
> > >>>
> > >>> On Mon, Feb 8, 2016 at 2:59 PM, Sudheesh Katkam <
> skat...@maprtech.com>
> > >>> wrote:
> > >>>
> > >>>> Although my vote is non-binding <
> > >>>> http://drill.apache.org/docs/project-bylaws/#actions>, I have two
> > >>>> concerns:
> > >>>>
> > >>>> * DRILL-4187 <https://issues.apache.org/jira/browse/DRILL-4187>
> > >> caused a
> > >>>> critical regression noted in DRILL-4235 <
> > >>>> https://issues.apache.org/jira/browse/DRILL-4235>. There is a patch
> > >> for
> > >>>> DRILL-4235, which is not part of the release candidate. This can
> cause
> > >>>> failures for users that are using the queuing feature.
> > >>>>
> > >>>> * There are commits mad

[jira] [Resolved] (DRILL-4230) NullReferenceException when SELECTing from empty mongo collection

2016-02-09 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4230.

   Resolution: Fixed
Fix Version/s: 1.5.0

Fixed in ed2f1ca8ed3c0ebac7e33494db6749851fc2c970

This was applied separately to the 1.5 release branch, so the commit there has 
identical content and the same commit message, but will have a different hash.

> NullReferenceException when SELECTing from empty mongo collection
> -
>
> Key: DRILL-4230
> URL: https://issues.apache.org/jira/browse/DRILL-4230
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MongoDB
>Affects Versions: 1.3.0
>Reporter: Brick Shitting Bird Jr.
>Assignee: Jason Altekruse
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] Release Apache Drill 1.5.0 RC3

2016-02-09 Thread Jason Altekruse
Hello all,

I'd like to propose the forth release candidate (rc3) of Apache Drill,
version
1.5.0. It covers a total of 60 resolved JIRAs [1]. Thanks to everyone who
contributed to this release. This release candidate includes fixes for
DRILL-4235 and DRILL-4380, both regressions found sine the last release
candidate.

I also pulled in two bug fixes (4230, 4349) that had been merged into
master since making the release branch, they looked useful to include and
were both had little risk of introducing regressions.

The tarball artifacts are hosted at [2] and the maven artifacts are hosted
at
[3]. This release candidate is based on commit
3f228d34782741457a14e28b0d1fdbc35a4fd958 located at [4].

The vote will be open for the next 72 hours ending at 7 PM Pacific,
February 12th, 2016.

[ ] +1
[ ] +0
[ ] -1

Here's my vote: +1

Thanks,
Jason

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12332948

[2] http://people.apache.org/~json/apache-drill-1.5.0.rc3/
[3] https://repository.apache.org/content/repositories/orgapachedrill-1028
[4] https://github.com/jaltekruse/incubator-drill/tree/drill-1.5.0-rc3


Re: [VOTE] Release Apache Drill 1.5.0 RC3

2016-02-09 Thread Jason Altekruse
For anyone who jumped on testing out the release candidate early I'm going
to have to ask you to re-download the artifacts you verified. I had
prepared an earlier version of this candidate (but didn't get a chance to
start the vote) before another regression was identified and fixed today. I
had forgotten to update the source and binary artifacts on my apache web
space with the new ones.

I just uploaded the corrected versions of the artifacts after verifying
their git.properties files to ensure they were the correct versions.

The last modified dates on the correct versions are between: 10-Feb-2016
03:30 and 10-Feb-2016 03:34

(the copy took a few minutes as my home upload speed isn't great :P)

Sorry about the mistake, everything should be good to go now.

Thanks,
Jason



On Tue, Feb 9, 2016 at 7:08 PM, Jason Altekruse <altekruseja...@gmail.com>
wrote:

> Hello all,
>
> I'd like to propose the forth release candidate (rc3) of Apache Drill,
> version
> 1.5.0. It covers a total of 60 resolved JIRAs [1]. Thanks to everyone who
> contributed to this release. This release candidate includes fixes for
> DRILL-4235 and DRILL-4380, both regressions found sine the last release
> candidate.
>
> I also pulled in two bug fixes (4230, 4349) that had been merged into
> master since making the release branch, they looked useful to include and
> were both had little risk of introducing regressions.
>
> The tarball artifacts are hosted at [2] and the maven artifacts are hosted
> at
> [3]. This release candidate is based on commit
> 3f228d34782741457a14e28b0d1fdbc35a4fd958 located at [4].
>
> The vote will be open for the next 72 hours ending at 7 PM Pacific,
> February 12th, 2016.
>
> [ ] +1
> [ ] +0
> [ ] -1
>
> Here's my vote: +1
>
> Thanks,
> Jason
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12332948
>
> [2] http://people.apache.org/~json/apache-drill-1.5.0.rc3/
> [3] https://repository.apache.org/content/repositories/orgapachedrill-1028
> [4] https://github.com/jaltekruse/incubator-drill/tree/drill-1.5.0-rc3
>


Hangout starting now!

2016-02-09 Thread Jason Altekruse
Join us to hear the latest Drill news or to bring up any concerns you would
like to see addressed.

https://plus.google.com/hangouts/_/dremio.com/drillhangout?authuser=1


Re: [VOTE] Release Apache Drill 1.5.0 RC2

2016-02-08 Thread Jason Altekruse
The case that was reported in the JIRA was a failure on a very simple
query:  select * from sys.options;

I assume this means that any query will fail when queuing is enabled. That
would make a strong case for inclusion in the release, I didn't look
closely at the JIRA before. Hakim, you reviewed the patch, but it doesn't
include any new tests. Did Hanifi mention if the change made there was
necessary to pretty much fix any query when queuing was enabled?

- Jason

On Mon, Feb 8, 2016 at 4:57 PM, Abdel Hakim Deneche <adene...@maprtech.com>
wrote:

> Does it mean that any user who's been using queuing won't be able to use
> 1.5.0 ?
>
> On Mon, Feb 8, 2016 at 4:40 PM, Jason Altekruse <altekruseja...@gmail.com>
> wrote:
>
> > Hey Sudheesh,
> >
> > I just pushed Venki's fix for the Web UI issue to the master branch.
> >
> > My fix for the build issue I ran into when trying to prepare the release
> is
> > a fair point. The change only has a very limited impact on the build, and
> > only changes the result when running a release itself. I should have been
> > better communicating the change that was made, I have the posted an
> update
> > on the JIRA I filed to do a follow-up investigation of the problem [1]. I
> > didn't include it on m merge branch with Venki's change, but I will post
> it
> > shortly associated with this new JIRA [2] for review and kick off the
> tests
> > with the change rebased.
> >
> > As far as 4235 is concerned, I would like the release to be as stable as
> > possible, but the release has taken quite a long time to get to vote.
> This
> > issue was filed at the end of December, and was fixed just 4 days ago,
> with
> > no comment on the previous release thread about including the fix in the
> > release. I fully support making queuing a first-class feature of Drill,
> but
> > we need to add automated tests for it if we want it to stay stable.
> >
> > I'm open to discussion on the topic, but I'm not sure we should delay the
> > release further for it.
> >
> > - Jason
> >
> > [1] - https://issues.apache.org/jira/browse/DRILL-4336
> > [2] - https://issues.apache.org/jira/browse/DRILL-4375
> >
> > On Mon, Feb 8, 2016 at 2:59 PM, Sudheesh Katkam <skat...@maprtech.com>
> > wrote:
> >
> > > Although my vote is non-binding <
> > > http://drill.apache.org/docs/project-bylaws/#actions>, I have two
> > > concerns:
> > >
> > > * DRILL-4187 <https://issues.apache.org/jira/browse/DRILL-4187>
> caused a
> > > critical regression noted in DRILL-4235 <
> > > https://issues.apache.org/jira/browse/DRILL-4235>. There is a patch
> for
> > > DRILL-4235, which is not part of the release candidate. This can cause
> > > failures for users that are using the queuing feature.
> > >
> > > * There are commits made to the release branch <
> > > https://github.com/jaltekruse/incubator-drill/commits/1.5-release-rc2>
> > in
> > > Jason's repo that are not checked in to master.
> > >
> > > Thanks,
> > > Sudheesh
> > >
> > > > On Feb 8, 2016, at 2:30 PM, Jason Altekruse <
> altekruseja...@gmail.com>
> > > wrote:
> > > >
> > > > Thanks everyone who has voted so far. The vote closes tomorrow
> morning
> > > and
> > > > right now we're only at the minimum number of binding votes for it to
> > > pass.
> > > > Anyone who has some time available, please try out the release and
> > cast a
> > > > vote.
> > > >
> > > > On Mon, Feb 8, 2016 at 2:02 PM, Jacques Nadeau <jacq...@dremio.com>
> > > wrote:
> > > >
> > > >> Downloaded, built and ran unit tests.
> > > >> Manually tried a few queries.
> > > >>
> > > >> Looks good
> > > >>
> > > >> +1 (binding)
> > > >>
> > > >>
> > > >> --
> > > >> Jacques Nadeau
> > > >> CTO and Co-Founder, Dremio
> > > >>
> > > >> On Sun, Feb 7, 2016 at 10:03 AM, Aman Sinha <amansi...@apache.org>
> > > wrote:
> > > >>
> > > >>> +1
> > > >>> - Downloaded src and built, ran unit tests on my Mac
> > > >>> - Manually ran a few queries against TPC-DS
> > > >>> - Verified partition pruning, metadata caching was working as
> > expected
> > > >> for
> > > >>> these test queries
> > > &g

Re: [VOTE] Release Apache Drill 1.5.0 RC2

2016-02-08 Thread Jason Altekruse
Thanks everyone who has voted so far. The vote closes tomorrow morning and
right now we're only at the minimum number of binding votes for it to pass.
Anyone who has some time available, please try out the release and cast a
vote.

On Mon, Feb 8, 2016 at 2:02 PM, Jacques Nadeau <jacq...@dremio.com> wrote:

> Downloaded, built and ran unit tests.
> Manually tried a few queries.
>
> Looks good
>
> +1 (binding)
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Sun, Feb 7, 2016 at 10:03 AM, Aman Sinha <amansi...@apache.org> wrote:
>
> > +1
> > - Downloaded src and built, ran unit tests on my Mac
> > - Manually ran a few queries against TPC-DS
> > - Verified partition pruning, metadata caching was working as expected
> for
> > these test queries
> > - Checked query profile in Web UI, checked query cancellation
> > - Found 1 performance issue with lots of small parquet files ...filed
> > DRILL-4365 but need confirmation whether it is reproducible for other
> > folks.  At this point, I am not considering it a blocker due to the fact
> I
> > could not reproduce with a more general/bigger dataset.
> >
> > Aman
> >
> > On Fri, Feb 5, 2016 at 12:21 PM, Julien Le Dem <jul...@dremio.com>
> wrote:
> >
> > > +1 (non-binding)
> > > Built and run the tests on linux (took 27 min)
> > >
> > >
> > >
> > > On Fri, Feb 5, 2016 at 11:21 AM, Stefán Baxter <
> > ste...@activitystream.com>
> > > wrote:
> > >
> > > > +1 (non-binding / not a committer)
> > > >
> > > >- Built the project on ubuntu/linux
> > > >- Ran our test suite
> > > >- Verified that the jdbc driver works and is properly shaded (we
> had
> > > >problems with *leakage*)
> > > >
> > > > (I ran into a problem reading a snappy zipped parquet file that was
> > > created
> > > > with the latest parquet-mr/parquet-avro (1.8.1) but i think that is
> out
> > > of
> > > > scope here and I will create a Jira issue once I have tested it
> better)
> > > >
> > > > Thank you
> > > >
> > > > On Fri, Feb 5, 2016 at 6:56 PM, Jason Altekruse <
> > > altekruseja...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > I'd like to propose the third release candidate (rc2) of Apache
> > Drill,
> > > > > version
> > > > > 1.5.0. It covers a total of 55 resolved JIRAs [1]. Thanks to
> everyone
> > > who
> > > > > contributed to this release. This release candidate includes a fix
> > for
> > > > > DRILL-4353, a major stability problem with the Rest API that was
> > > > identified
> > > > > during the last vote.
> > > > >
> > > > > The tarball artifacts are hosted at [2] and the maven artifacts are
> > > > hosted
> > > > > at
> > > > > [3]. This release candidate is based on commit
> > > > > 0a64888ba8d374e94435e2518e81352e677255ad located at [4].
> > > > >
> > > > > The vote will be open for the next 96 hours (including an extra day
> > as
> > > > the
> > > > > vote is happening over a weekend) ending at 11AM Pacific, February
> > 9th,
> > > > > 2016.
> > > > >
> > > > > [ ] +1
> > > > > [ ] +0
> > > > > [ ] -1
> > > > >
> > > > > Here's my vote: +1
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12332948
> > > > > [2] http://people.apache.org/~json/apache-drill-1.5.0.rc2/
> > > > > [3]
> > > >
> https://repository.apache.org/content/repositories/orgapachedrill-1026
> > > > > [4]
> > https://github.com/jaltekruse/incubator-drill/tree/1.5-release-rc2
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Julien
> > >
> >
>


[jira] [Created] (DRILL-4375) Fix the maven release profile, broken by jdbc jar size enforcer added in DRILL-4291

2016-02-08 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4375:
--

 Summary: Fix the maven release profile, broken by jdbc jar size 
enforcer added in DRILL-4291
 Key: DRILL-4375
 URL: https://issues.apache.org/jira/browse/DRILL-4375
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jason Altekruse
Assignee: Jason Altekruse






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Release Apache Drill 1.5.0 RC2

2016-02-08 Thread Jason Altekruse
Hey Sudheesh,

I just pushed Venki's fix for the Web UI issue to the master branch.

My fix for the build issue I ran into when trying to prepare the release is
a fair point. The change only has a very limited impact on the build, and
only changes the result when running a release itself. I should have been
better communicating the change that was made, I have the posted an update
on the JIRA I filed to do a follow-up investigation of the problem [1]. I
didn't include it on m merge branch with Venki's change, but I will post it
shortly associated with this new JIRA [2] for review and kick off the tests
with the change rebased.

As far as 4235 is concerned, I would like the release to be as stable as
possible, but the release has taken quite a long time to get to vote. This
issue was filed at the end of December, and was fixed just 4 days ago, with
no comment on the previous release thread about including the fix in the
release. I fully support making queuing a first-class feature of Drill, but
we need to add automated tests for it if we want it to stay stable.

I'm open to discussion on the topic, but I'm not sure we should delay the
release further for it.

- Jason

[1] - https://issues.apache.org/jira/browse/DRILL-4336
[2] - https://issues.apache.org/jira/browse/DRILL-4375

On Mon, Feb 8, 2016 at 2:59 PM, Sudheesh Katkam <skat...@maprtech.com>
wrote:

> Although my vote is non-binding <
> http://drill.apache.org/docs/project-bylaws/#actions>, I have two
> concerns:
>
> * DRILL-4187 <https://issues.apache.org/jira/browse/DRILL-4187> caused a
> critical regression noted in DRILL-4235 <
> https://issues.apache.org/jira/browse/DRILL-4235>. There is a patch for
> DRILL-4235, which is not part of the release candidate. This can cause
> failures for users that are using the queuing feature.
>
> * There are commits made to the release branch <
> https://github.com/jaltekruse/incubator-drill/commits/1.5-release-rc2> in
> Jason's repo that are not checked in to master.
>
> Thanks,
> Sudheesh
>
> > On Feb 8, 2016, at 2:30 PM, Jason Altekruse <altekruseja...@gmail.com>
> wrote:
> >
> > Thanks everyone who has voted so far. The vote closes tomorrow morning
> and
> > right now we're only at the minimum number of binding votes for it to
> pass.
> > Anyone who has some time available, please try out the release and cast a
> > vote.
> >
> > On Mon, Feb 8, 2016 at 2:02 PM, Jacques Nadeau <jacq...@dremio.com>
> wrote:
> >
> >> Downloaded, built and ran unit tests.
> >> Manually tried a few queries.
> >>
> >> Looks good
> >>
> >> +1 (binding)
> >>
> >>
> >> --
> >> Jacques Nadeau
> >> CTO and Co-Founder, Dremio
> >>
> >> On Sun, Feb 7, 2016 at 10:03 AM, Aman Sinha <amansi...@apache.org>
> wrote:
> >>
> >>> +1
> >>> - Downloaded src and built, ran unit tests on my Mac
> >>> - Manually ran a few queries against TPC-DS
> >>> - Verified partition pruning, metadata caching was working as expected
> >> for
> >>> these test queries
> >>> - Checked query profile in Web UI, checked query cancellation
> >>> - Found 1 performance issue with lots of small parquet files ...filed
> >>> DRILL-4365 but need confirmation whether it is reproducible for other
> >>> folks.  At this point, I am not considering it a blocker due to the
> fact
> >> I
> >>> could not reproduce with a more general/bigger dataset.
> >>>
> >>> Aman
> >>>
> >>> On Fri, Feb 5, 2016 at 12:21 PM, Julien Le Dem <jul...@dremio.com>
> >> wrote:
> >>>
> >>>> +1 (non-binding)
> >>>> Built and run the tests on linux (took 27 min)
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Feb 5, 2016 at 11:21 AM, Stefán Baxter <
> >>> ste...@activitystream.com>
> >>>> wrote:
> >>>>
> >>>>> +1 (non-binding / not a committer)
> >>>>>
> >>>>>   - Built the project on ubuntu/linux
> >>>>>   - Ran our test suite
> >>>>>   - Verified that the jdbc driver works and is properly shaded (we
> >> had
> >>>>>   problems with *leakage*)
> >>>>>
> >>>>> (I ran into a problem reading a snappy zipped parquet file that was
> >>>> created
> >>>>> with the latest parquet-mr/parquet-avro (1.8.1) but i think that is
> >> out
> >>>> of
> >>>>> scope here and I w

Re: project build fails -> drill-jdbc-all-1.5.0-SNAPSHOT.jar is outside the expected size range

2016-02-08 Thread Jason Altekruse
Hey Sudheesh,

Unfortunately it will not fix this issue, it is related specifically to how
the addition of the enforcer (for currently unknown reasons) caused the
release profile to fail in a new way. I hadn't run into issues with
enforcer itself actually failing with my version of Maven.

I would be in favor of a flag to make it easier to disable this check, we
can even change the message to tell people about the flag (it could be
updated now to suggest upgrading maven), but I do think we should keep this
enforcer rule on by default as the 1.4 release had a pretty bloated JAR
because this wasn't being checked.

- Jason

On Mon, Feb 8, 2016 at 4:23 PM, Sudheesh Katkam 
wrote:

> @Jason, does DRILL-4375 
> address this issue as well?
>
> > On Feb 8, 2016, at 4:19 PM, Sudheesh Katkam 
> wrote:
> >
> > On one of the Linux VMs, when I run mvn clean install -DskipTests
> -Pmapr, I get this error with 3.3.x (but not with 3.2.x). Weird.
> >
> > Should we disable the rule until we figure out the cause?
> >
> > - Sudheesh
> >
> >> On Feb 2, 2016, at 6:11 AM, Jacques Nadeau > wrote:
> >>
> >> This is a bug in maven we  haven't figured out yet how we're causing.
> >> Upgrading to Maven 3.3.x fixes it.
> >> On Feb 2, 2016 2:18 AM, "Arina Yelchiyeva"  >
> >> wrote:
> >>
> >>> Hi all!
> >>>
> >>> Just pulled recent changes from master (revision number
> >>> 1b96174b1e5bafb13a873dd79f03467802d7c929) and mvn clean install
> -DskipTests
> >>> failed with the following error:
> >>>
> >>> *[ERROR] Failed to execute goal
> >>> org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce
> >>> (enforce-jdbc-jar-compactness) on project drill-jdbc-all: Some Enforcer
> >>> rules have failed. Look above for specific messages explaining why the
> rule
> >>> failed.*
> >>>
> >>> *[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireFilesSize
> >>> failed with message:*
> >>> *The file drill-jdbc-all-1.5.0-SNAPSHOT.jar is outside the expected
> size
> >>> range. *
> >>>
> >>> *This is likely due to you adding new dependencies to a java-exec and
> not
> >>> updating the excludes in this module. This is important as it
> minimizes the
> >>> size of the dependency of Drill application users.*
> >>>
> *F:\git_repo\drill\exec\jdbc-all\target\drill-jdbc-all-1.5.0-SNAPSHOT.jar
> >>> size (44664290) too large. Max. is 2000
> >>>
> F:\git_repo\drill\exec\jdbc-all\target\drill-jdbc-all-1.5.0-SNAPSHOT.jar*
> >>>
> >>> Had to change 2000 ->
> 5000 in
> >>> jdbc-all pom.xml to build the project.
> >>>
> >>> Do we need to create jira for this or it's already being fixed?
> >>>
> >>> Kind regards
> >>> Arina
> >>>
> >
>
>


[jira] [Resolved] (DRILL-4295) Obsolete protobuf generated files under protocol/

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4295.

Resolution: Fixed

Fixed in fbb0165def5e23b6b2f6a690d47dc5fbeb2bdbcb

> Obsolete protobuf generated files under protocol/
> -
>
> Key: DRILL-4295
> URL: https://issues.apache.org/jira/browse/DRILL-4295
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build & Test
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Trivial
> Fix For: 1.6.0
>
>
> The following two files don't have a protobuf definition anymore, and are not 
> generated when running {{mvn process-sources -P proto-compile}} under 
> {{protocol/}}:
> {noformat}
> src/main/java/org/apache/drill/exec/proto/beans/RpcFailure.java
> src/main/java/org/apache/drill/exec/proto/beans/ViewPointer.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4331) TestFlattenPlanning.testFlattenPlanningAvoidUnnecessaryProject fail in Java 8

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4331.

Resolution: Fixed

Fixed in 32da4675e8bf1358b863532daadd2769f380600f

> TestFlattenPlanning.testFlattenPlanningAvoidUnnecessaryProject fail in Java 8
> -
>
> Key: DRILL-4331
> URL: https://issues.apache.org/jira/browse/DRILL-4331
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
> Fix For: 1.6.0
>
>
> This test expects the following Project in the query plan:
> {noformat}
> Project(EXPR$0=[$1], rownum=[$0])
> {noformat}
> In Java 8, for some reason the scan operator exposes the columns in reverse 
> order which causes the project to be different than the one expected:
> {noformat}
> Project(EXPR$0=[$0], rownum=[$1])
> {noformat}
> The plan is still correct, so the test must be fixed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4359) EndpointAffinity missing equals method

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4359.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in 6b1b4d257b89e5579140e75388cd37db5563a6a8

> EndpointAffinity missing equals method
> --
>
> Key: DRILL-4359
> URL: https://issues.apache.org/jira/browse/DRILL-4359
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Trivial
> Fix For: 1.6.0
>
>
> EndpointAffinity is a placeholder class, but has no equals method to allow 
> comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4353) Expired sessions in web server are not cleaning up resources, leading to resource leak

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4353.

Resolution: Fixed

Fixed in 282dfd762f1bd6628b293c68b20cdff321bd70a3

This was also merged into the 1.5 release branch, that commit has a different 
hash, but there were other changes that had already been merged into master 
that we didn't want to include in the release.

> Expired sessions in web server are not cleaning up resources, leading to 
> resource leak
> --
>
> Key: DRILL-4353
> URL: https://issues.apache.org/jira/browse/DRILL-4353
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.5.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently we store the session resources (including DrillClient) in attribute 
> {{SessionAuthentication}} object which implements 
> {{HttpSessionBindingListener}}. Whenever a session is invalidated, all 
> attributes are removed and if an attribute class implements 
> {{HttpSessionBindingListener}}, listener is informed. 
> {{SessionAuthentication}} implementation of {{HttpSessionBindingListener}} 
> logs out the user which includes cleaning up the resources as well, but 
> {{SessionAuthentication}} relies on ServletContext stored in thread local 
> variable (see 
> [here|https://github.com/eclipse/jetty.project/blob/jetty-9.1.5.v20140505/jetty-security/src/main/java/org/eclipse/jetty/security/authentication/SessionAuthentication.java#L88]).
>  In case of thread that cleans up the expired sessions there is no 
> {{ServletContext}} in thread local variable, leading to not logging out the 
> user properly and resource leak.
> Fix: Add {{HttpSessionEventListener}} to cleanup the 
> {{SessionAuthentication}} and resources every time a HttpSession is expired 
> or invalidated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4361) Allow for FileSystemPlugin subclasses to override FormatCreator

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4361.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in 5e57b0e3b44f46aa93bf82f366eb3a3f61990da3

> Allow for FileSystemPlugin subclasses to override FormatCreator
> ---
>
> Key: DRILL-4361
> URL: https://issues.apache.org/jira/browse/DRILL-4361
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Minor
> Fix For: 1.6.0
>
>
> FileSystemPlugin subclasses are not able to customize plugins, as 
> FormatCreator in created in FileSystemPlugin constructor and immediately used 
> to create SchemaFactory instance.
> FormatCreator instantiation should be moved to a protected method so that 
> subclass can choose to implement it differently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4225) TestDateFunctions#testToChar fails when the locale is non-English

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4225.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in 4e9b82562cf0fc46e759b89857ffb85e129a178b

> TestDateFunctions#testToChar fails when the locale is non-English
> -
>
> Key: DRILL-4225
> URL: https://issues.apache.org/jira/browse/DRILL-4225
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.4.0
> Environment: Mac OS X 10.10.5
>Reporter: Akihiko Kusanagi
> Fix For: 1.6.0
>
>
> Set the locale to ja_JP on Mac OS X: 
> {noformat}
> $ defaults read -g AppleLocale
> ja_JP
> {noformat}
> TestDateFunctions#testToChar fails with the following output:
> {noformat}
> Running org.apache.drill.exec.fn.impl.TestDateFunctions#testToChar
> 2008-2-23
> 12 20 30
> 2008 2 23 12:00:00
> ...
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 14.333 sec 
> <<< FAILURE! - in org.apache.drill.exec.fn.impl.TestDateFunctions
> testToChar(org.apache.drill.exec.fn.impl.TestDateFunctions)  Time elapsed: 
> 2.793 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<2008-[Feb]-23> but was:<2008-[2]-23>
>   at 
> org.apache.drill.exec.fn.impl.TestDateFunctions.testCommon(TestDateFunctions.java:66)
>   at 
> org.apache.drill.exec.fn.impl.TestDateFunctions.testToChar(TestDateFunctions.java:139)
> ...
> Failed tests: 
>   TestDateFunctions.testToChar:139->testCommon:66 expected:<2008-[Feb]-23> 
> but was:<2008-[2]-23>
> {noformat}
> Test queries are like this:
> {noformat}
> to_char((cast('2008-2-23' as date)), '-MMM-dd')
> to_char(cast('12:20:30' as time), 'HH mm ss')
> to_char(cast('2008-2-23 12:00:00' as timestamp), ' MMM dd HH:mm:ss')
> {noformat}
> This failure occurs because org.joda.time.format.DateTimeFormat interprets 
> the pattern 'MMM' differently depending on the locale. This will probably 
> occur in other OS platforms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Please vote for proposed Drill talks for the Hadoop Summit

2016-02-05 Thread Jason Altekruse
Hello Drillers,

There are some great proposed talks for this year's Hadoop summit related
to Drill. Please help to promote Drill in the wider Big Data community by
taking a look through the list and voting for talks that sound good.

You don't need to register or anything to vote, it just asks for an e-mail
address.

http://hadoopsummit.uservoice.com/search?filter=ideas=drill

Thanks!
Jason


Re: [VOTE] Release Apache Drill 1.5.0 RC1

2016-02-05 Thread Jason Altekruse
This seems like a major issue, due to a resource leak, consistent usage of
the rest API can make a drillbit crash in less than an hour.

I have run tests on the patch rebased on the release branch, and Venki
reported a stress test he completed showed that his change fixes the file
handle leaks. I am going to call this vote closed and spin another release.

On Thu, Feb 4, 2016 at 2:48 PM, Venki Korukanti <venki.koruka...@gmail.com>
wrote:

> -1. Found a regression DRILL-4353, I think we should include it in 1.5.0
>
> On Wed, Feb 3, 2016 at 1:38 AM, Jason Altekruse <altekruseja...@gmail.com>
> wrote:
>
> > Hello all,
> >
> > I'd like to propose the second release candidate (rc1) of Apache Drill,
> > version
> > 1.5.0. It covers a total of 54 resolved JIRAs [1]. Thanks to everyone who
> > contributed to this release. This release candidate includes a small test
> > modification that was detailed on the vote thread for RC0.
> >
> > The tarball artifacts are hosted at [2] and the maven artifacts are
> hosted
> > at
> > [3]. This release candidate is based on commit
> > c3939c55cf3e274c9bcbc8ca860603e7197cfa16 located at [4].
> >
> > The vote will be open for the next ~72 hours ending at 7AM Pacific,
> > January 6, 2016.
> >
> > [ ] +1
> > [ ] +0
> > [ ] -1
> >
> > Here's my vote: +1
> >
> > Thanks,
> > Jason
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12332948
> > [2] http://people.apache.org/~json/apache-drill-1.5.0.rc1/
> > [3]
> https://repository.apache.org/content/repositories/orgapachedrill-1024
> > <https://repository.apache.org/content/repositories/orgapachedrill-1023/
> >
> > [4] https://github.com/jaltekruse/incubator-drill/tree/1.5-release-rc1
> >
>


[VOTE] Release Apache Drill 1.5.0 RC2

2016-02-05 Thread Jason Altekruse
Hello all,

I'd like to propose the third release candidate (rc2) of Apache Drill,
version
1.5.0. It covers a total of 55 resolved JIRAs [1]. Thanks to everyone who
contributed to this release. This release candidate includes a fix for
DRILL-4353, a major stability problem with the Rest API that was identified
during the last vote.

The tarball artifacts are hosted at [2] and the maven artifacts are hosted
at
[3]. This release candidate is based on commit
0a64888ba8d374e94435e2518e81352e677255ad located at [4].

The vote will be open for the next 96 hours (including an extra day as the
vote is happening over a weekend) ending at 11AM Pacific, February 9th,
2016.

[ ] +1
[ ] +0
[ ] -1

Here's my vote: +1

Thanks,
Jason

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12332948
[2] http://people.apache.org/~json/apache-drill-1.5.0.rc2/
[3] https://repository.apache.org/content/repositories/orgapachedrill-1026
[4] https://github.com/jaltekruse/incubator-drill/tree/1.5-release-rc2


Re: where can I find the source code of parquet 1.8.1-drill-r0 ?

2016-02-05 Thread Jason Altekruse
Agreed, I will put this on my list, not sure when I'll get to it, but I
know it would be good to just have it set up.

On Fri, Feb 5, 2016 at 3:27 PM, Abdel Hakim Deneche <adene...@maprtech.com>
wrote:

> Thanks Jason,
>
> We should publish the source code in maven too. This would make it so much
> easier.
>
> On Fri, Feb 5, 2016 at 3:23 PM, Jason Altekruse <altekruseja...@gmail.com>
> wrote:
>
> >
> >
> https://github.com/dremio/parquet-mr/commit/c74a6b7a0ed7180c5759cea5d2157919c1e80c2b
> >
> > The current version is just the Parquet master branch where the
> bytebuffer
> > patch was merged, with the one new commit to declare the version number
> so
> > that we could deploy it and not be depending on a SNAPSHOT version.
> >
> > On Fri, Feb 5, 2016 at 3:19 PM, Abdel Hakim Deneche <
> adene...@maprtech.com
> > >
> > wrote:
> >
> > > Hey all,
> > >
> > > Does anyone knows where is the source code of the parquet library
> > currently
> > > used by Drill ?
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available
> >
>


Re: setting session params in a rest request

2016-02-04 Thread Jason Altekruse
Venki,

Even for authenticated sessions, do we impose any kind of upper limit on
the number of concurrent connections? I seems like we should issue a
reasonable error message about an over-used Drillbit before the process
dies due to hitting the system file handle limit.

- Jason

On Thu, Feb 4, 2016 at 11:00 AM, Josh Schlesser  wrote:

> Thanks Venki,
> Let me know how I can help further.
>
> > On Feb 2, 2016, at 10:23 PM, Venki Korukanti 
> wrote:
> >
> > I think you made valid points. It makes sense to have session less REST
> > calls in both auth enabled and disabled cases.
> >
> > In case of auth enabled:
> > 1) Session-less calls can be authenticated using Basic auth (this was
> > already asked on mailing list sometime back) as a start and move onto
> token
> > based auth later. These requests usually come from non-browsers. The only
> > issue is setting options before the query. For this we can implement your
> > suggestion of enhancing the query REST API to accept the options.
> > 2) Session-based call using form auth for browser based access. If we
> > enhance the UI to enter options in the query form, we don't need any
> > session on server actually.
> >
> > I will get a fix ASAP to remove the sessions in anonymous calls as they
> the
> > session are not reused in non-browser cases.
> >
> > Thanks
> > Venki
> >
> > On Tue, Feb 2, 2016 at 9:20 PM, Josh Schlesser 
> wrote:
> >
> >> No, it wasn’t logging out, it was just stopping, obviously that caused
> >> dangling sessions for the authenticated scenario.
> >>
> >> I don’t think that a short timeout for anonymous sessions is a good way
> to
> >> go for anonymous api calls.  Session management isn’t what anybody would
> >> expect when using a REST api that is anonymous in a server to server
> >> context.   I would expect to use a token for authorization for a server
> to
> >> server REST api as well.  I’m not saying that is what it should be here,
> >> but that is what my general expectation is based on using other apis.
>  In
> >> the case of browser to server REST apis, I have run into authentication
> for
> >> a browser session and subsequent REST calls leaning on a browser cookie
> for
> >> persistent authentication.
> >>
> >> Removing sessions for anonymous calls seems like the right path and
> >> possibly easy and I think would be the expected behavior from most
> >> developers.  I would advocate for sessionless and token authenticated
> REST
> >> apis for when using authentication for the server to server case and
> cookie
> >> based with a session for the browser to server scenario, but its really
> the
> >> browser that has a session, not the api per se, its  just piggybacking
> on a
> >> regular authenticated web session for the REST api calls.
> >>
> >> This would actually leave me in a quandary for what I am trying to do
> >> which is set a session configuration option ’store.format', but I cant
> >> think of any reason that those types of settings shouldn’t just be set
> on a
> >> per request basis for a REST api.  In a server to server context for a
> rest
> >> api, keeping it sessionless means you could front a cluster of drillbits
> >> with a load balancer and not worry about dying nodes and sticky sessions
> >> etc...
> >>
> >> I have to get something up and running quickly right now so im
> versioning
> >> back to 1.4 and just spinning up a separate drillbit that will have the
> >> store.format system variable set to ‘json’ . it will be ok for me until
> a
> >> good long term solution arrives in drill.
> >>
> >> I’ll run the test on short session_max_idle_secs to 30 seconds on
> >> 1.5.0-SNAPSHOT to see if that gets rid of the file handle starvation
> >> problem, but keep in mind that means that users of the web console will
> >> have 30 seconds between pages or they have to authenticate again, which
> >> will probably be very annoying.  It doesnt seem like a good long term
> >> solution either.
> >>
> >> How do you think all of this should work?  I look forward to staying
> >> involved.
> >>
> >> Cheers,
> >> Josh
> >>
> >>> On Feb 2, 2016, at 4:40 PM, Venki Korukanti  >
> >> wrote:
> >>>
> >>> When auth is *enabled*, is the worker process logging out after queries
> >> are
> >>> done? When auth is *disabled* can you set session_max_idle_secs in
> >>> drill.exec.http block in drill-override.conf to something like 30
> (secs)
> >>> and try? This way anonymous sessions are closed quickly and not kept
> for
> >>> 1hr (default value). I think we may need to avoid creating sessions in
> >>> anonymous mode (when auth is disabled).
> >>>
> >>> Thanks
> >>> Venki
> >>>
> >>> On Tue, Feb 2, 2016 at 4:02 PM, Josh Schlesser 
> >> wrote:
> >>>
>  I have a background worker process (on a server, not a browser) that
> >> kicks
>  off every minute or so and issues some queries sequentially to the
> rest
>  query 

[VOTE] Release Apache Drill 1.5.0 RC1

2016-02-03 Thread Jason Altekruse
Hello all,

I'd like to propose the second release candidate (rc1) of Apache Drill,
version
1.5.0. It covers a total of 54 resolved JIRAs [1]. Thanks to everyone who
contributed to this release. This release candidate includes a small test
modification that was detailed on the vote thread for RC0.

The tarball artifacts are hosted at [2] and the maven artifacts are hosted
at
[3]. This release candidate is based on commit
c3939c55cf3e274c9bcbc8ca860603e7197cfa16 located at [4].

The vote will be open for the next ~72 hours ending at 7AM Pacific,
January 6, 2016.

[ ] +1
[ ] +0
[ ] -1

Here's my vote: +1

Thanks,
Jason

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12332948
[2] http://people.apache.org/~json/apache-drill-1.5.0.rc1/
[3] https://repository.apache.org/content/repositories/orgapachedrill-1024

[4] https://github.com/jaltekruse/incubator-drill/tree/1.5-release-rc1


[jira] [Resolved] (DRILL-4032) Drill unable to parse json files with schema changes

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4032.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Drill unable to parse json files with schema changes
> 
>
> Key: DRILL-4032
> URL: https://issues.apache.org/jira/browse/DRILL-4032
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - JSON
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Blocker
> Fix For: 1.4.0
>
>
> git.commit.id.abbrev=bb69f22
> {code}
> select d.col2.col3  from reg1 d;
> Error: DATA_READ ERROR: Error parsing JSON - index: 0, length: 4 (expected: 
> range(0, 0))
> File  /drill/testdata/reg1/a.json
> Record  2
> Fragment 0:0
> {code}
> The folder reg1 contains 2 files
> File 1 : a.json
> {code}
> {"col1": "val1","col2": null}
> {"col1": "val1","col2": {"col3":"abc", "col4":"xyz"}}
> {code}
> File 2 : b.json
> {code}
> {"col1": "val1","col2": null}
> {"col1": "val1","col2": null}
> {code}
> Exception from the log file :
> {code}
> [Error Id: a7e3c716-838d-4f8f-9361-3727b98f04cd ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.easy.json.JSONRecordReader.handleAndRaise(JSONRecordReader.java:165)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.easy.json.JSONRecordReader.next(JSONRecordReader.java:205)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:183) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:119)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:113)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:103)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:130)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:156)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:119)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> [na:1.7.0_71]
> at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>  [hadoop-common-2.7.0-mapr-1506.jar:na]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunn

[jira] [Resolved] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4048.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Parquet reader corrupts dictionary encoded binary columns
> -
>
> Key: DRILL-4048
> URL: https://issues.apache.org/jira/browse/DRILL-4048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Jason Altekruse
>Priority: Blocker
> Fix For: 1.4.0
>
> Attachments: lineitem_dic_enc.parquet
>
>
> git.commit.id.abbrev=04c01bd
> The below query returns corrupted data (not even showing up here) for binary 
> columns
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   |  |  | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PE  | T   | 
> egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> The same query from an older build (git.commit.id.abbrev=839f8da)
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   | N | O | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PERSON  | TRUCK 
>   | egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> Below is the output of the parquet-meta command for this dataset
> {code}
> creator: parquet-mr 
> file schema: root 
> ---
> l_orderkey:  REQUIRED INT32 R:0 D:0
> l_partkey:   REQUIRED INT32 R:0 D:0
> l_suppkey:   REQUIRED INT32 R:0 D:0
> l_linenumber:REQUIRED INT32 R:0 D:0
> l_quantity:  REQUIRED DOUBLE R:0 D:0
> l_extendedprice: REQUIRED DOUBLE R:0 D:0
> l_discount:  REQUIRED DOUBLE R:0 D:0
> l_tax:   REQUIRED DOUBLE R:0 D:0
> l_returnflag:REQUIRED BINARY O:UTF8 R:0 D:0
> l_linestatus:REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipdate:  REQUIRED INT32 O:DATE R:0 D:0
> l_commitdate:REQUIRED INT32 O:DATE R:0 D:0
> l_receiptdate:   REQUIRED INT32 O:

[jira] [Resolved] (DRILL-4243) CTAS with partition by, results in Out Of Memory

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4243.

   Resolution: Fixed
Fix Version/s: 1.5.0

> CTAS with partition by, results in Out Of Memory
> 
>
> Key: DRILL-4243
> URL: https://issues.apache.org/jira/browse/DRILL-4243
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
> Fix For: 1.5.0
>
>
> CTAS with partition by, results in Out Of Memory. It seems to be coming from 
> ExternalSortBatch
> Details of Drill are
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOTe4372f224a4b474494388356355a53808092a67a
> DRILL-4242: Updates to storage-mongo03.01.2016 @ 15:31:13 PST   
> Unknown 04.01.2016 @ 01:02:29 PST
>  create table `tpch_single_partition/lineitem` partition by (l_moddate) as 
> select l.*, l_shipdate - extract(day from l_shipdate) + 1 l_moddate from 
> cp.`tpch/lineitem.parquet` l;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while 
> executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010] 
> (state=,code=0)
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1923)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:73)
>   at 
> net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338)
>   at 
> net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:101)
>   at sqlline.Commands.execute(Commands.java:841)
>   at sqlline.Commands.sql(Commands.java:751)
>   at sqlline.SqlLine.dispatch(SqlLine.java:746)
>   at sqlline.SqlLine.runCommands(SqlLine.java:1651)
>   at sqlline.Commands.run(Commands.java:1304)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
>   at sqlline.SqlLine.dispatch(SqlLine.java:742)
>   at sqlline.SqlLine.initArgs(SqlLine.java:553)
>   at sqlline.SqlLine.begin(SqlLine.java:596)
>   at sqlline.SqlLine.start(SqlLine.java:375)
>   at sqlline.SqlLine.main(SqlLine.java:268)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: RESOURCE 
> ERROR: One or more nodes ran out of memory while executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69)
>   at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400)
>   at 
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
>   at 
> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264)
>   at 
> org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.j

[jira] [Resolved] (DRILL-4163) Support schema changes for MergeJoin operator.

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4163.

   Resolution: Fixed
Fix Version/s: 1.5.0

Fixed in cc9175c13270660ffd9ec2ddcbc70780dd72dada

> Support schema changes for MergeJoin operator.
> --
>
> Key: DRILL-4163
> URL: https://issues.apache.org/jira/browse/DRILL-4163
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: amit hadke
>    Assignee: Jason Altekruse
> Fix For: 1.5.0
>
>
> Since external sort operator supports schema changes, allow use of union 
> types in merge join to support for schema changes.
> For now, we assume that merge join always works on record batches from sort 
> operator. Thus merging schemas and promoting to union vectors is already 
> taken care by sort operator.
> Test Cases:
> 1) Only one side changes schema (join on union type and primitive type)
> 2) Both sids change schema on all columns.
> 3) Join between numeric types and string types.
> 4) Missing columns - each batch has different columns. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   >