[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

2016-10-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/592


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #634: DRILL-4974: NPE in FindPartitionConditions.analyzeCall() f...

2016-10-27 Thread bitblender
Github user bitblender commented on the issue:

https://github.com/apache/drill/pull/634
  
@amansinha100 Can you please review this change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #631: DRILL-4968: Add column size to ColumnMetadata

2016-10-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/631


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: isDateCorrect field in ParquetTableMetadata

2016-10-27 Thread Paul Rogers
FWIW: back on the magic flag issue…

I noted Vitali’s concern about “1.9” and “1.9-SNAPSHOT” being too course 
grained for our needs.

A typical solution is include the version of the Parquet writer in addition to 
that of Drill. Each time we change something in the writer, increment the 
version number. If we number changes, we can easily handle two changes in the 
same Drill release, or differentiate between the “early 1.9” files with 
old-style dates and “late 1.9” files with correct dates.

Since we have no version now, start it at some arbitrary point (2?).

Now, if the Parquet file has a Drill Writer version in the header, and that 
version is 2 or greater, the date is in the “correct” format. Anything written 
by Drill before writer version 2, the date is wrong. The “check the data to see 
if it is sane” approach is needed only for files were we can’t tell if an older 
Drill wrote it.

Do other tools label the data? Does Hive say that it wrote the file? If so, we 
don’t need to do the sanity check if we can tell the data comes from Hive (or 
Impala, or anything other than old Drill.)

- Paul
 
> On Oct 27, 2016, at 4:03 PM, Zelaine Fong  wrote:
> 
> Vitalii -- are you still planning to open a ticket and pull request for the
> fix you've noted below?
> 
> -- Zelaine
> 
> On Wed, Oct 26, 2016 at 8:28 AM, Vitalii Diravka 
> wrote:
> 
>> @Paul Rogers
>> It may be the undefined case when the file is generated with drill.version
>> = 1.9-SNAPSHOT.
>> It is more easy to determine corrupted date with this flag and there is no
>> need to wait the end of release to merge these changes.
>> 
>> @Jinfeng NI
>> It looks like you are right.
>> With consistent mode (isDateCorrect = true) all tests are passed. So I am
>> going to open a jira ticket for it with next changes
>> https://github.com/vdiravka/drill/commit/ff8d5c7d601915f760d1b0e9618730
>> 3410cac5d3
>> Thanks.
>> 
>> Kind regards
>> Vitalii
>> 
>> 2016-10-25 18:36 GMT+00:00 Jinfeng Ni :
>> 
>>> I'm not sure if I fully understand your answers. The bottom line is
>>> quite simple: given a set of parquet files, the ParquetTableMeta
>>> instance constructed in Drill should have identical value for
>>> "isDateCorrect", whether it comes from parquet footer, or parquet
>>> metadata cache, or whether there is partition pruning or not. However,
>>> the code shows that this flag is not in consistent mode across
>>> different cases.
>>> 
>>> 
>>> 
>>> On Tue, Oct 25, 2016 at 11:24 AM, Vitalii Diravka
>>>  wrote:
 Hi Jinfeng,
 
 1.If the parquet files are generated with Drill after Drill-4203 these
 files have "isDateCorrect = true" property.
 Drill serializes this property from metadata now. When we set this
>>> property
 in the first constructor we will hide the value from metadata.
 IsDateCorrect will be false only if this value equals to the false (no
>>> case
 for it now) or absent in parquet metadata footer.
 
 
 2. I'm not sure the reason to change isDateCorrect metadata property
>> when
 the user disable dates correction.
 If you have some use case it would be great if you provide it.
 
 3. Maybe you are right regarding to when Parquet metadata is cloned.
 Here I added the property in the same manner as Jason's new property
 "drillVersion. So need it a separate unit test?
 
 
 Kind regards
 Vitalii
 
 2016-10-25 16:23 GMT+00:00 Jinfeng Ni :
 
> Forgot to copy the link to the code.
> 
> [1] https://github.com/apache/drill/blob/master/exec/java-
> exec/src/main/java/org/apache/drill/exec/store/parquet/
> Metadata.java#L950-L955
> 
> On Tue, Oct 25, 2016 at 9:16 AM, Jinfeng Ni  wrote:
>> @Jason, @Vitalli,
>> 
>> Any thoughts on this question, since both you worked on fix of
> DRILL-4203?
>> 
>> Looking through the code, there is a third case [1], where this flag
>> is set to false when Parquet metadata is cloned (after partition
>> pruning, etc).  That means, for the 2nd case where the flag is set
>> to
>> true, if there is pruning happening, the new parquet metadata will
>> see
>> the flag is flipped to false. This does not make sense to me.
>> 
>> 
>> 
>> On Mon, Oct 24, 2016 at 3:10 PM, Jinfeng Ni  wrote:
>>> Hello All,
>>> 
>>> DRILL-4203 addressed the date field issue.  In the fix, it
>> introduced
>>> a new field in ParquetTableMetadata_v2 : isDateCorrect.  I have
>> some
>>> difficulty in understanding the meaning of this field.
>>> 
>>> According to [1], this field is set to false, when Drill gets
>> parquet
>>> metadata from parquet footer.  This field is  set to true in code
>>> flow
>>> of [2] and [3], when Drill gets parquet metadata from meta data
>>> cache.
>>> 
>>> Questions I have:

[GitHub] drill pull request #634: DRILL-4974: NPE in FindPartitionConditions.analyzeC...

2016-10-27 Thread bitblender
GitHub user bitblender opened a pull request:

https://github.com/apache/drill/pull/634

DRILL-4974: NPE in FindPartitionConditions.analyzeCall() for 'holistic' 
expressions

Changes: Added a missing null check in 
FindPartitionConditions.analyzeCall(), to ensure that opStack.peek() value is 
dereferenced only after a null-check. Without this check, if the expression is 
holistic, opStack can be null, so using the value of opStack.peek() without a 
check can cause an NPE.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bitblender/drill DRILL-4974

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #634


commit a519a0987280abeb00e33a8088d2f7d6c9809eed
Author: karthik 
Date:   2016-10-20T20:43:17Z

DRILL-4974: Add missing null check in FindPartitionConditions.analyzeCall()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #600: DRILL-4373: Drill and Hive have incompatible timestamp rep...

2016-10-27 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/600
  
Changing this to -1 until unit test failure is addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #600: DRILL-4373: Drill and Hive have incompatible timest...

2016-10-27 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/600#discussion_r85449218
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 ---
@@ -739,30 +741,76 @@ public void runTestAndValidate(String selection, 
String validationSelection, Str
   }
 
   /*
-  Test the reading of an int96 field. Impala encodes timestamps as int96 
fields
+Impala encodes timestamp values as int96 fields. Test the reading of 
an int96 field with two converters:
+the first one converts parquet INT96 into drill VARBINARY and the 
second one (works while
+store.parquet.reader.int96_as_timestamp option is enabled) converts 
parquet INT96 into drill TIMESTAMP.
*/
   @Test
   public void testImpalaParquetInt96() throws Exception {
 compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
+try {
+  test("alter session set %s = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+  compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_impala_1.parquet`");
+} finally {
+  test("alter session reset %s", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+}
   }
 
   /*
-  Test the reading of a binary field where data is in dicationary _and_ 
non-dictionary encoded pages
+  Test the reading of a binary field as drill varbinary where data is in 
dicationary _and_ non-dictionary encoded pages
*/
   @Test
-  public void testImpalaParquetVarBinary_DictChange() throws Exception {
+  public void testImpalaParquetBinaryAsVarBinary_DictChange() throws 
Exception {
 compareParquetReadersColumnar("field_impala_ts", 
"cp.`parquet/int96_dict_change.parquet`");
   }
 
   /*
+  Test the reading of a binary field as drill timestamp where data is in 
dicationary _and_ non-dictionary encoded pages
+   */
+  @Test
+  public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws 
Exception {
+final String WORKING_PATH = TestTools.getWorkingPath();
+final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources";
+try {
+  testBuilder()
+  .sqlQuery("select int96_ts from 
dfs_test.`%s/parquet/int96_dict_change`", TEST_RES_PATH)
+  .optionSettingQueriesForTestQuery(
+  "alter session set `%s` = true", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP)
+  .ordered()
+  
.csvBaselineFile("testframework/testParquetReader/testInt96DictChange/q1.tsv")
+  .baselineTypes(TypeProtos.MinorType.TIMESTAMP)
+  .baselineColumns("int96_ts")
+  .build().run();
+} finally {
+  test("alter system reset `%s`", 
ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
+}
+  }
+
+  /*
  Test the conversion from int96 to impala timestamp
*/
   @Test
-  public void testImpalaParquetTimestampAsInt96() throws Exception {
+  public void testTimestampImpalaConvertFrom() throws Exception {
 compareParquetReadersColumnar("convert_from(field_impala_ts, 
'TIMESTAMP_IMPALA')", "cp.`parquet/int96_impala_1.parquet`");
   }
 
   /*
+ Test reading parquet Int96 as TimeStamp and comparing obtained values 
with the
+ old results (reading the same values as VarBinary and 
convert_fromTIMESTAMP_IMPALA function using)
+   */
+  @Test
+  public void testImpalaParquetTimestampInt96AsTimeStamp() throws 
Exception {
--- End diff --

The test testImpalaParquetTimestampInt96AsTimeStamp fails when run in  a 
different timezone. Can you mark this as @Ignore unless you can fix the test to 
run across different timezones?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #631: DRILL-4968: Add column size to ColumnMetadata

2016-10-27 Thread adeneche
Github user adeneche commented on the issue:

https://github.com/apache/drill/pull/631
  
+1, LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: isDateCorrect field in ParquetTableMetadata

2016-10-27 Thread Jason Altekruse
Vitalli,

Thank you for looking into this, sorry I missed it in the review.  When you
open up a request to fix this issue could you update the check for
correctness in the metadata to check for the is.date.correct flag, or a
version greater than or equal to 1.9.0 (no snapshot)? This will allow us to
stop writing the flag into the metadata at the release or shortly
thereafter.

It might be worth looking at how we can catch issues like this related to
plan serialization. There were pretty thorough tests with the patch, but we
still have code paths that only come up with remote fragment usage that we
could test in other ways to avoid bugs like this.

- Jason

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Thu, Oct 27, 2016 at 4:03 PM, Zelaine Fong  wrote:

> Vitalii -- are you still planning to open a ticket and pull request for the
> fix you've noted below?
>
> -- Zelaine
>
> On Wed, Oct 26, 2016 at 8:28 AM, Vitalii Diravka <
> vitalii.dira...@gmail.com>
> wrote:
>
> > @Paul Rogers
> > It may be the undefined case when the file is generated with
> drill.version
> > = 1.9-SNAPSHOT.
> > It is more easy to determine corrupted date with this flag and there is
> no
> > need to wait the end of release to merge these changes.
> >
> > @Jinfeng NI
> > It looks like you are right.
> > With consistent mode (isDateCorrect = true) all tests are passed. So I am
> > going to open a jira ticket for it with next changes
> > https://github.com/vdiravka/drill/commit/ff8d5c7d601915f760d1b0e9618730
> > 3410cac5d3
> > Thanks.
> >
> > Kind regards
> > Vitalii
> >
> > 2016-10-25 18:36 GMT+00:00 Jinfeng Ni :
> >
> > > I'm not sure if I fully understand your answers. The bottom line is
> > > quite simple: given a set of parquet files, the ParquetTableMeta
> > > instance constructed in Drill should have identical value for
> > > "isDateCorrect", whether it comes from parquet footer, or parquet
> > > metadata cache, or whether there is partition pruning or not. However,
> > > the code shows that this flag is not in consistent mode across
> > > different cases.
> > >
> > >
> > >
> > > On Tue, Oct 25, 2016 at 11:24 AM, Vitalii Diravka
> > >  wrote:
> > > > Hi Jinfeng,
> > > >
> > > > 1.If the parquet files are generated with Drill after Drill-4203
> these
> > > > files have "isDateCorrect = true" property.
> > > > Drill serializes this property from metadata now. When we set this
> > > property
> > > > in the first constructor we will hide the value from metadata.
> > > > IsDateCorrect will be false only if this value equals to the false
> (no
> > > case
> > > > for it now) or absent in parquet metadata footer.
> > > >
> > > >
> > > > 2. I'm not sure the reason to change isDateCorrect metadata property
> > when
> > > > the user disable dates correction.
> > > > If you have some use case it would be great if you provide it.
> > > >
> > > > 3. Maybe you are right regarding to when Parquet metadata is cloned.
> > > > Here I added the property in the same manner as Jason's new property
> > > > "drillVersion. So need it a separate unit test?
> > > >
> > > >
> > > > Kind regards
> > > > Vitalii
> > > >
> > > > 2016-10-25 16:23 GMT+00:00 Jinfeng Ni :
> > > >
> > > >> Forgot to copy the link to the code.
> > > >>
> > > >> [1] https://github.com/apache/drill/blob/master/exec/java-
> > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/
> > > >> Metadata.java#L950-L955
> > > >>
> > > >> On Tue, Oct 25, 2016 at 9:16 AM, Jinfeng Ni  wrote:
> > > >> > @Jason, @Vitalli,
> > > >> >
> > > >> > Any thoughts on this question, since both you worked on fix of
> > > >> DRILL-4203?
> > > >> >
> > > >> > Looking through the code, there is a third case [1], where this
> flag
> > > >> > is set to false when Parquet metadata is cloned (after partition
> > > >> > pruning, etc).  That means, for the 2nd case where the flag is set
> > to
> > > >> > true, if there is pruning happening, the new parquet metadata will
> > see
> > > >> > the flag is flipped to false. This does not make sense to me.
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Mon, Oct 24, 2016 at 3:10 PM, Jinfeng Ni 
> wrote:
> > > >> >> Hello All,
> > > >> >>
> > > >> >> DRILL-4203 addressed the date field issue.  In the fix, it
> > introduced
> > > >> >> a new field in ParquetTableMetadata_v2 : isDateCorrect.  I have
> > some
> > > >> >> difficulty in understanding the meaning of this field.
> > > >> >>
> > > >> >> According to [1], this field is set to false, when Drill gets
> > parquet
> > > >> >> metadata from parquet footer.  This field is  set to true in code
> > > flow
> > > >> >> of [2] and [3], when Drill gets parquet metadata from meta data
> > > cache.
> > > >> >>
> > > >> >> Questions I have:
> > > >> >> 1.  If the parquet files are generated with Drill after
> DRILL-4203,
> > > >> >> Drill still thinks date field is NOT correct (isDateCorrect 

Re: isDateCorrect field in ParquetTableMetadata

2016-10-27 Thread Zelaine Fong
Vitalii -- are you still planning to open a ticket and pull request for the
fix you've noted below?

-- Zelaine

On Wed, Oct 26, 2016 at 8:28 AM, Vitalii Diravka 
wrote:

> @Paul Rogers
> It may be the undefined case when the file is generated with drill.version
> = 1.9-SNAPSHOT.
> It is more easy to determine corrupted date with this flag and there is no
> need to wait the end of release to merge these changes.
>
> @Jinfeng NI
> It looks like you are right.
> With consistent mode (isDateCorrect = true) all tests are passed. So I am
> going to open a jira ticket for it with next changes
> https://github.com/vdiravka/drill/commit/ff8d5c7d601915f760d1b0e9618730
> 3410cac5d3
> Thanks.
>
> Kind regards
> Vitalii
>
> 2016-10-25 18:36 GMT+00:00 Jinfeng Ni :
>
> > I'm not sure if I fully understand your answers. The bottom line is
> > quite simple: given a set of parquet files, the ParquetTableMeta
> > instance constructed in Drill should have identical value for
> > "isDateCorrect", whether it comes from parquet footer, or parquet
> > metadata cache, or whether there is partition pruning or not. However,
> > the code shows that this flag is not in consistent mode across
> > different cases.
> >
> >
> >
> > On Tue, Oct 25, 2016 at 11:24 AM, Vitalii Diravka
> >  wrote:
> > > Hi Jinfeng,
> > >
> > > 1.If the parquet files are generated with Drill after Drill-4203 these
> > > files have "isDateCorrect = true" property.
> > > Drill serializes this property from metadata now. When we set this
> > property
> > > in the first constructor we will hide the value from metadata.
> > > IsDateCorrect will be false only if this value equals to the false (no
> > case
> > > for it now) or absent in parquet metadata footer.
> > >
> > >
> > > 2. I'm not sure the reason to change isDateCorrect metadata property
> when
> > > the user disable dates correction.
> > > If you have some use case it would be great if you provide it.
> > >
> > > 3. Maybe you are right regarding to when Parquet metadata is cloned.
> > > Here I added the property in the same manner as Jason's new property
> > > "drillVersion. So need it a separate unit test?
> > >
> > >
> > > Kind regards
> > > Vitalii
> > >
> > > 2016-10-25 16:23 GMT+00:00 Jinfeng Ni :
> > >
> > >> Forgot to copy the link to the code.
> > >>
> > >> [1] https://github.com/apache/drill/blob/master/exec/java-
> > >> exec/src/main/java/org/apache/drill/exec/store/parquet/
> > >> Metadata.java#L950-L955
> > >>
> > >> On Tue, Oct 25, 2016 at 9:16 AM, Jinfeng Ni  wrote:
> > >> > @Jason, @Vitalli,
> > >> >
> > >> > Any thoughts on this question, since both you worked on fix of
> > >> DRILL-4203?
> > >> >
> > >> > Looking through the code, there is a third case [1], where this flag
> > >> > is set to false when Parquet metadata is cloned (after partition
> > >> > pruning, etc).  That means, for the 2nd case where the flag is set
> to
> > >> > true, if there is pruning happening, the new parquet metadata will
> see
> > >> > the flag is flipped to false. This does not make sense to me.
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Oct 24, 2016 at 3:10 PM, Jinfeng Ni  wrote:
> > >> >> Hello All,
> > >> >>
> > >> >> DRILL-4203 addressed the date field issue.  In the fix, it
> introduced
> > >> >> a new field in ParquetTableMetadata_v2 : isDateCorrect.  I have
> some
> > >> >> difficulty in understanding the meaning of this field.
> > >> >>
> > >> >> According to [1], this field is set to false, when Drill gets
> parquet
> > >> >> metadata from parquet footer.  This field is  set to true in code
> > flow
> > >> >> of [2] and [3], when Drill gets parquet metadata from meta data
> > cache.
> > >> >>
> > >> >> Questions I have:
> > >> >> 1.  If the parquet files are generated with Drill after DRILL-4203,
> > >> >> Drill still thinks date field is NOT correct (isDateCorrect =
> false)?
> > >> >> 2.  Why does this filed have nothing to do with "autoCorrection"
> flag
> > >> >> [4]?  If someone turns off autoCorrection, will it have impact on
> > this
> > >> >> "isDateCorrect" flag ?
> > >> >>
> > >> >> Thanks in advance for any input,
> > >> >>
> > >> >> Jinfeng
> > >> >>
> > >> >>
> > >> >> [1] https://github.com/apache/drill/blob/master/exec/java-
> > >> exec/src/main/java/org/apache/drill/exec/store/parquet/
> > Metadata.java#L932
> > >> >> [2] https://github.com/apache/drill/blob/master/exec/java-
> > >> exec/src/main/java/org/apache/drill/exec/store/parquet/
> > Metadata.java#L936
> > >> >> [3] https://github.com/apache/drill/blob/master/exec/java-
> > >> exec/src/main/java/org/apache/drill/exec/store/parquet/
> > Metadata.java#L187
> > >> >> [4] https://github.com/apache/drill/blob/master/exec/java-
> > >> exec/src/main/java/org/apache/drill/exec/store/parquet/
> > >> Metadata.java#L354-L355
> > >>
> >
>


[GitHub] drill issue #602: Improve Drill C++ connector

2016-10-27 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/602
  
No need for many PRs. Let's just squash this branch into commits for the 
individual JIRA's I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #622: DRILL-4369: Exchange name and version infos during handsha...

2016-10-27 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/622
  
Thanks for the responses! Sorry I posted my comments just after the commit. 
The issues are "nice-to-haves" for the original PR, do not justify another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #602: Improve Drill C++ connector

2016-10-27 Thread laurentgo
Github user laurentgo commented on the issue:

https://github.com/apache/drill/pull/602
  
@parthchandra do you want individual pull requests for each issue, or are 
you okay having this branch squashed and merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #602: Improve Drill C++ connector

2016-10-27 Thread laurentgo
Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/602#discussion_r85436247
  
--- Diff: contrib/native/client/cmakeModules/FindZookeeper.cmake ---
@@ -30,8 +30,10 @@
 if (MSVC)
 if(${CMAKE_BUILD_TYPE} MATCHES "Debug")
 set(ZK_BuildOutputDir "Debug")
+set(ZK_LibName "zookeeper_d")
 else()
 set(ZK_BuildOutputDir "Release")
+set(ZK_LibName "zookeeper_d")
--- End diff --

yes, agreed, it's a typo...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #602: Improve Drill C++ connector

2016-10-27 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/602#discussion_r85434357
  
--- Diff: contrib/native/client/cmakeModules/FindZookeeper.cmake ---
@@ -30,8 +30,10 @@
 if (MSVC)
 if(${CMAKE_BUILD_TYPE} MATCHES "Debug")
 set(ZK_BuildOutputDir "Debug")
+set(ZK_LibName "zookeeper_d")
 else()
 set(ZK_BuildOutputDir "Release")
+set(ZK_LibName "zookeeper_d")
--- End diff --

Spent the better part of a day on this now. The Zookeeper project has 
defined only the 32 bit configurations these have the lib name set to 
'zookeeper_d.lib' for debug and 'zookeeper.lib' for release. 
When creating a 64 bit configuration, depending on the path you choose, you 
can end up with either zookeeper_d.lib or zookeeper.lib as the import library 
name in both the release and debug build configurations. 
For correctness, I think we should use 'zookeeper' and not zookeeper_d for 
the release build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #602: Improve Drill C++ connector

2016-10-27 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/602#discussion_r85434499
  
--- Diff: contrib/native/client/cmakeModules/FindCppUnit.cmake ---
@@ -0,0 +1,67 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# A simple cmake module to find CppUnit (inspired by
+# 
http://root.cern.ch/viewvc/trunk/cint/reflex/cmake/modules/FindCppUnit.cmake)
+
+#
+# - Find CppUnit
+# This module finds an installed CppUnit package.
+#
+# It sets the following variables:
+#  CPPUNIT_FOUND   - Set to false if CppUnit isn't found.
+#  CPPUNIT_INCLUDE_DIR - The CppUnit include directory.
+#  CPPUNIT_LIBRARY - The CppUnit library to link against.
+
+if (MSVC)
+if (${CMAKE_BUILD_TYPE} MATCHES "Debug")
+set(CPPUNIT_BuildOutputDir "Debug")
+set(CPPUNIT_LibName "cppunitd")
+else()
+set(CPPUNIT_BuildOutputDir "Release")
+set(CPPUNIT_LibName "cppunit")
+endif()
+if ("${CPPUNIT_HOME}_" MATCHES  "^_$")
+message(" ")
+message("- Please set the cache variable CPPUNIT_HOME to point to 
the directory with the zookeeper source.")
--- End diff --

My fault really, but I suppose this should be "cppunit source" and not 
"zookeeper source"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Filter appears above project in query plan - Null Equality Join

2016-10-27 Thread Khurram Faraaz
Ok, if there is not much of a performance gain as suggested in the
responses, then there is not much that can be done in this case.

Thanks,
Khurram

On Thu, Oct 27, 2016 at 3:41 AM, Jinfeng Ni  wrote:

> Also, if the project operator is not doing any expression evaluation,
> the project operator itself would not introduce any big overhead.
> There probably is no big benefit if we push filter past the project
> operator, as Zelaine said.
>
>
> On Wed, Oct 26, 2016 at 3:07 PM, Jinfeng Ni  wrote:
> > The project under filter is for dynamic expansion of * column. Since
> > the join filter is referring to columns in the project's output, it's
> > not possible to push filter past that project.
> >
> >
> >
> > On Wed, Oct 26, 2016 at 1:50 PM, Zelaine Fong 
> wrote:
> >> The filter, I assume you' referring to, is a join filter.  So, at a
> >> minimum, it needs to be applied after the hash join.  I'm not sure
> there's
> >> a lot of benefit in pushing that filter past the project that's on top
> of
> >> the hash join.
> >>
> >> -- Zelaine
> >>
> >> On Wed, Oct 26, 2016 at 8:59 AM, Khurram Faraaz 
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> Filter is seen on top of Project in query plan for a null equality
> join.
> >>> This is over CSV data, shouldn't the filter appear below the project
> in the
> >>> query plan ?
> >>> I am on Drill 1.9.0 git commit id: a29f1e29
> >>>
> >>> Note : t1 has some nulls in it
> >>>t2 does not have any nulls in it.
> >>>
> >>> {noformat}
> >>> 0: jdbc:drill:schema=dfs.tmp> explain plan for
> >>> select * from `oneColDupsWnulls.csv` t1 JOIN `oneColWOnulls.csv` t2
> >>> ON t1.columns[0] = t2.columns[0]
> >>> WHERE t1.columns[0] IS NOT DISTINCT FROM t2.columns[0]
> >>> OR ( t1.columns[0] IS NULL AND t2.columns[0] IS NULL );
> >>> +--+--+
> >>> | text | json |
> >>> +--+--+
> >>> | 00-00Screen
> >>> 00-01  ProjectAllowDup(*=[$0], *0=[$1])
> >>> 00-02Project(T43¦¦*=[$0], T44¦¦*=[$2])
> >>> 00-03  SelectionVectorRemover
> >>> 00-04Filter(condition=[OR(CAST(CASE(IS NULL(ITEM($1, 0)),
> IS
> >>> NULL(ITEM($3, 0)), IS NULL(ITEM($3, 0)), IS NULL(ITEM($1, 0)),
> =(ITEM($1,
> >>> 0), ITEM($3, 0:BOOLEAN NOT NULL, AND(IS NULL(ITEM($1, 0)), IS
> >>> NULL(ITEM($3, 0])
> >>> 00-05  Project(T43¦¦*=[$0], columns=[$1], T44¦¦*=[$3],
> >>> columns0=[$4])
> >>> 00-06HashJoin(condition=[=($2, $5)], joinType=[inner])
> >>> 00-07  Project(T44¦¦*=[$0], columns0=[$1], $f20=[$2])
> >>> 00-09Project(T44¦¦*=[$0], columns=[$1],
> $f2=[ITEM($1,
> >>> 0)])
> >>> 00-11  Project(T44¦¦*=[$0], columns=[$1])
> >>> 00-13Scan(groupscan=[EasyGroupScan
> >>> [selectionRoot=maprfs:/tmp/oneColWOnulls.csv, numFiles=1,
> columns=[`*`],
> >>> files=[maprfs:///tmp/oneColWOnulls.csv]]])
> >>> 00-08  Project(T43¦¦*=[$0], columns=[$1], $f2=[ITEM($1,
> >>> 0)])
> >>> 00-10Project(T43¦¦*=[$0], columns=[$1])
> >>> 00-12  Scan(groupscan=[EasyGroupScan
> >>> [selectionRoot=maprfs:/tmp/oneColDupsWnulls.csv, numFiles=1,
> >>> columns=[`*`],
> >>> files=[maprfs:///tmp/oneColDupsWnulls.csv]]])
> >>> {noformat}
> >>>
> >>> Results returned by query
> >>>
> >>> {noformat}
> >>> 0: jdbc:drill:schema=dfs.tmp> select * from `oneColDupsWnulls.csv` t1
> JOIN
> >>> `oneColWOnulls.csv` t2 ON t1.columns[0] = t2.columns[0] WHERE
> t1.columns[0]
> >>> IS NOT DISTINCT FROM t2.columns[0] OR ( t1.columns[0] IS NULL AND
> >>> t2.columns[0] IS NULL );
> >>> +-+-+
> >>> |   columns   |  columns0   |
> >>> +-+-+
> >>> | ["test"]| ["test"]|
> >>> | ["foo"] | ["foo"] |
> >>> | ["foo"] | ["foo"] |
> >>> | ["bar"] | ["bar"] |
> >>> | ["yes"] | ["yes"] |
> >>> | ["yes"] | ["yes"] |
> >>> | ["no"]  | ["no"]  |
> >>> | ["no"]  | ["no"]  |
> >>> | ["foobar"]  | ["foobar"]  |
> >>> | ["foobar"]  | ["foobar"]  |
> >>> | ["never"]   | ["never"]   |
> >>> | ["never"]   | ["never"]   |
> >>> | ["ever"]| ["ever"]|
> >>> | ["ever"]| ["ever"]|
> >>> | ["here"]| ["here"]|
> >>> | ["there"]   | ["there"]   |
> >>> | ["no"]  | ["no"]  |
> >>> | ["no"]  | ["no"]  |
> >>> | ["yes"] | ["yes"] |
> >>> | ["yes"] | ["yes"] |
> >>> | ["foobar"]  | ["foobar"]  |
> >>> | ["foobar"]  | ["foobar"]  |
> >>> | ["temp"]| ["temp"]|
> >>> +-+-+
> >>> 23 rows selected (0.341 seconds)
> >>> {noformat}
> >>>
> >>> Thanks,
> >>> Khurram
> >>>
>


[jira] [Created] (DRILL-4973) Sqlline history

2016-10-27 Thread Andries Engelbrecht (JIRA)
Andries Engelbrecht created DRILL-4973:
--

 Summary: Sqlline history
 Key: DRILL-4973
 URL: https://issues.apache.org/jira/browse/DRILL-4973
 Project: Apache Drill
  Issue Type: Improvement
  Components: Client - CLI
Reporter: Andries Engelbrecht
Priority: Minor


Currently the history on sqlline stops working after 500 queries have been 
logged in the users .sqlline/history file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: TO_TIMESTAMP function returns in-correct results

2016-10-27 Thread Serhii Harnyk
Hello, Khurram

http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html

s   second of minute number55
S   fraction of second   number978



2016-10-27 13:54 GMT+03:00 Khurram Faraaz :

> All,
>
> I am on Drill 1.9.0 git commit ID : a29f1e29 on CentOS
>
> TO_TIMESTAMP function does not return correct results, note that the
> minutes, seconds and milliseconds parts of timestamp are incorrect in the
> results
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> VALUES(TO_TIMESTAMP('2015-03-30 20:49:59.10
> UTC', '-MM-dd HH:mm:ss.s z'));
> ++
> | EXPR$0 |
> ++
> | 2015-03-30 20:49:10.0  |
> ++
> 1 row selected (0.228 seconds)
> {noformat}
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> VALUES(CAST(TO_TIMESTAMP('2015-03-30
> 20:49:59.10 UTC', '-MM-dd HH:mm:ss.s z') AS TIMESTAMP));
> ++
> | EXPR$0 |
> ++
> | 2015-03-30 20:49:10.0  |
> ++
> 1 row selected (0.265 seconds)
> {noformat}
>
> This case returns correct results, when the same string used above is given
> as input to CAST function, note that minutes mm, seconds ss and millisecond
> s parts are honored
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> VALUES(CAST('2015-03-30 20:49:59.10 UTC' AS
> TIMESTAMP));
> ++
> | EXPR$0 |
> ++
> | 2015-03-30 20:49:59.1  |
> ++
> 1 row selected (0.304 seconds)
> {noformat}
>
> Thanks,
> Khurram
>


TO_TIMESTAMP function returns in-correct results

2016-10-27 Thread Khurram Faraaz
All,

I am on Drill 1.9.0 git commit ID : a29f1e29 on CentOS

TO_TIMESTAMP function does not return correct results, note that the
minutes, seconds and milliseconds parts of timestamp are incorrect in the
results

{noformat}
0: jdbc:drill:schema=dfs.tmp> VALUES(TO_TIMESTAMP('2015-03-30 20:49:59.10
UTC', '-MM-dd HH:mm:ss.s z'));
++
| EXPR$0 |
++
| 2015-03-30 20:49:10.0  |
++
1 row selected (0.228 seconds)
{noformat}

{noformat}
0: jdbc:drill:schema=dfs.tmp> VALUES(CAST(TO_TIMESTAMP('2015-03-30
20:49:59.10 UTC', '-MM-dd HH:mm:ss.s z') AS TIMESTAMP));
++
| EXPR$0 |
++
| 2015-03-30 20:49:10.0  |
++
1 row selected (0.265 seconds)
{noformat}

This case returns correct results, when the same string used above is given
as input to CAST function, note that minutes mm, seconds ss and millisecond
s parts are honored

{noformat}
0: jdbc:drill:schema=dfs.tmp> VALUES(CAST('2015-03-30 20:49:59.10 UTC' AS
TIMESTAMP));
++
| EXPR$0 |
++
| 2015-03-30 20:49:59.1  |
++
1 row selected (0.304 seconds)
{noformat}

Thanks,
Khurram