[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/592 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #634: DRILL-4974: NPE in FindPartitionConditions.analyzeCall() f...
Github user bitblender commented on the issue: https://github.com/apache/drill/pull/634 @amansinha100 Can you please review this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #631: DRILL-4968: Add column size to ColumnMetadata
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/631 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: isDateCorrect field in ParquetTableMetadata
FWIW: back on the magic flag issue… I noted Vitali’s concern about “1.9” and “1.9-SNAPSHOT” being too course grained for our needs. A typical solution is include the version of the Parquet writer in addition to that of Drill. Each time we change something in the writer, increment the version number. If we number changes, we can easily handle two changes in the same Drill release, or differentiate between the “early 1.9” files with old-style dates and “late 1.9” files with correct dates. Since we have no version now, start it at some arbitrary point (2?). Now, if the Parquet file has a Drill Writer version in the header, and that version is 2 or greater, the date is in the “correct” format. Anything written by Drill before writer version 2, the date is wrong. The “check the data to see if it is sane” approach is needed only for files were we can’t tell if an older Drill wrote it. Do other tools label the data? Does Hive say that it wrote the file? If so, we don’t need to do the sanity check if we can tell the data comes from Hive (or Impala, or anything other than old Drill.) - Paul > On Oct 27, 2016, at 4:03 PM, Zelaine Fongwrote: > > Vitalii -- are you still planning to open a ticket and pull request for the > fix you've noted below? > > -- Zelaine > > On Wed, Oct 26, 2016 at 8:28 AM, Vitalii Diravka > wrote: > >> @Paul Rogers >> It may be the undefined case when the file is generated with drill.version >> = 1.9-SNAPSHOT. >> It is more easy to determine corrupted date with this flag and there is no >> need to wait the end of release to merge these changes. >> >> @Jinfeng NI >> It looks like you are right. >> With consistent mode (isDateCorrect = true) all tests are passed. So I am >> going to open a jira ticket for it with next changes >> https://github.com/vdiravka/drill/commit/ff8d5c7d601915f760d1b0e9618730 >> 3410cac5d3 >> Thanks. >> >> Kind regards >> Vitalii >> >> 2016-10-25 18:36 GMT+00:00 Jinfeng Ni : >> >>> I'm not sure if I fully understand your answers. The bottom line is >>> quite simple: given a set of parquet files, the ParquetTableMeta >>> instance constructed in Drill should have identical value for >>> "isDateCorrect", whether it comes from parquet footer, or parquet >>> metadata cache, or whether there is partition pruning or not. However, >>> the code shows that this flag is not in consistent mode across >>> different cases. >>> >>> >>> >>> On Tue, Oct 25, 2016 at 11:24 AM, Vitalii Diravka >>> wrote: Hi Jinfeng, 1.If the parquet files are generated with Drill after Drill-4203 these files have "isDateCorrect = true" property. Drill serializes this property from metadata now. When we set this >>> property in the first constructor we will hide the value from metadata. IsDateCorrect will be false only if this value equals to the false (no >>> case for it now) or absent in parquet metadata footer. 2. I'm not sure the reason to change isDateCorrect metadata property >> when the user disable dates correction. If you have some use case it would be great if you provide it. 3. Maybe you are right regarding to when Parquet metadata is cloned. Here I added the property in the same manner as Jason's new property "drillVersion. So need it a separate unit test? Kind regards Vitalii 2016-10-25 16:23 GMT+00:00 Jinfeng Ni : > Forgot to copy the link to the code. > > [1] https://github.com/apache/drill/blob/master/exec/java- > exec/src/main/java/org/apache/drill/exec/store/parquet/ > Metadata.java#L950-L955 > > On Tue, Oct 25, 2016 at 9:16 AM, Jinfeng Ni wrote: >> @Jason, @Vitalli, >> >> Any thoughts on this question, since both you worked on fix of > DRILL-4203? >> >> Looking through the code, there is a third case [1], where this flag >> is set to false when Parquet metadata is cloned (after partition >> pruning, etc). That means, for the 2nd case where the flag is set >> to >> true, if there is pruning happening, the new parquet metadata will >> see >> the flag is flipped to false. This does not make sense to me. >> >> >> >> On Mon, Oct 24, 2016 at 3:10 PM, Jinfeng Ni wrote: >>> Hello All, >>> >>> DRILL-4203 addressed the date field issue. In the fix, it >> introduced >>> a new field in ParquetTableMetadata_v2 : isDateCorrect. I have >> some >>> difficulty in understanding the meaning of this field. >>> >>> According to [1], this field is set to false, when Drill gets >> parquet >>> metadata from parquet footer. This field is set to true in code >>> flow >>> of [2] and [3], when Drill gets parquet metadata from meta data >>> cache. >>> >>> Questions I have:
[GitHub] drill pull request #634: DRILL-4974: NPE in FindPartitionConditions.analyzeC...
GitHub user bitblender opened a pull request: https://github.com/apache/drill/pull/634 DRILL-4974: NPE in FindPartitionConditions.analyzeCall() for 'holistic' expressions Changes: Added a missing null check in FindPartitionConditions.analyzeCall(), to ensure that opStack.peek() value is dereferenced only after a null-check. Without this check, if the expression is holistic, opStack can be null, so using the value of opStack.peek() without a check can cause an NPE. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bitblender/drill DRILL-4974 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/634.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #634 commit a519a0987280abeb00e33a8088d2f7d6c9809eed Author: karthikDate: 2016-10-20T20:43:17Z DRILL-4974: Add missing null check in FindPartitionConditions.analyzeCall() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #600: DRILL-4373: Drill and Hive have incompatible timestamp rep...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/600 Changing this to -1 until unit test failure is addressed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #600: DRILL-4373: Drill and Hive have incompatible timest...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/600#discussion_r85449218 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java --- @@ -739,30 +741,76 @@ public void runTestAndValidate(String selection, String validationSelection, Str } /* - Test the reading of an int96 field. Impala encodes timestamps as int96 fields +Impala encodes timestamp values as int96 fields. Test the reading of an int96 field with two converters: +the first one converts parquet INT96 into drill VARBINARY and the second one (works while +store.parquet.reader.int96_as_timestamp option is enabled) converts parquet INT96 into drill TIMESTAMP. */ @Test public void testImpalaParquetInt96() throws Exception { compareParquetReadersColumnar("field_impala_ts", "cp.`parquet/int96_impala_1.parquet`"); +try { + test("alter session set %s = true", ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP); + compareParquetReadersColumnar("field_impala_ts", "cp.`parquet/int96_impala_1.parquet`"); +} finally { + test("alter session reset %s", ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP); +} } /* - Test the reading of a binary field where data is in dicationary _and_ non-dictionary encoded pages + Test the reading of a binary field as drill varbinary where data is in dicationary _and_ non-dictionary encoded pages */ @Test - public void testImpalaParquetVarBinary_DictChange() throws Exception { + public void testImpalaParquetBinaryAsVarBinary_DictChange() throws Exception { compareParquetReadersColumnar("field_impala_ts", "cp.`parquet/int96_dict_change.parquet`"); } /* + Test the reading of a binary field as drill timestamp where data is in dicationary _and_ non-dictionary encoded pages + */ + @Test + public void testImpalaParquetBinaryAsTimeStamp_DictChange() throws Exception { +final String WORKING_PATH = TestTools.getWorkingPath(); +final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources"; +try { + testBuilder() + .sqlQuery("select int96_ts from dfs_test.`%s/parquet/int96_dict_change`", TEST_RES_PATH) + .optionSettingQueriesForTestQuery( + "alter session set `%s` = true", ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP) + .ordered() + .csvBaselineFile("testframework/testParquetReader/testInt96DictChange/q1.tsv") + .baselineTypes(TypeProtos.MinorType.TIMESTAMP) + .baselineColumns("int96_ts") + .build().run(); +} finally { + test("alter system reset `%s`", ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP); +} + } + + /* Test the conversion from int96 to impala timestamp */ @Test - public void testImpalaParquetTimestampAsInt96() throws Exception { + public void testTimestampImpalaConvertFrom() throws Exception { compareParquetReadersColumnar("convert_from(field_impala_ts, 'TIMESTAMP_IMPALA')", "cp.`parquet/int96_impala_1.parquet`"); } /* + Test reading parquet Int96 as TimeStamp and comparing obtained values with the + old results (reading the same values as VarBinary and convert_fromTIMESTAMP_IMPALA function using) + */ + @Test + public void testImpalaParquetTimestampInt96AsTimeStamp() throws Exception { --- End diff -- The test testImpalaParquetTimestampInt96AsTimeStamp fails when run in a different timezone. Can you mark this as @Ignore unless you can fix the test to run across different timezones? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #631: DRILL-4968: Add column size to ColumnMetadata
Github user adeneche commented on the issue: https://github.com/apache/drill/pull/631 +1, LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: isDateCorrect field in ParquetTableMetadata
Vitalli, Thank you for looking into this, sorry I missed it in the review. When you open up a request to fix this issue could you update the check for correctness in the metadata to check for the is.date.correct flag, or a version greater than or equal to 1.9.0 (no snapshot)? This will allow us to stop writing the flag into the metadata at the release or shortly thereafter. It might be worth looking at how we can catch issues like this related to plan serialization. There were pretty thorough tests with the patch, but we still have code paths that only come up with remote fragment usage that we could test in other ways to avoid bugs like this. - Jason Jason Altekruse Software Engineer at Dremio Apache Drill Committer On Thu, Oct 27, 2016 at 4:03 PM, Zelaine Fongwrote: > Vitalii -- are you still planning to open a ticket and pull request for the > fix you've noted below? > > -- Zelaine > > On Wed, Oct 26, 2016 at 8:28 AM, Vitalii Diravka < > vitalii.dira...@gmail.com> > wrote: > > > @Paul Rogers > > It may be the undefined case when the file is generated with > drill.version > > = 1.9-SNAPSHOT. > > It is more easy to determine corrupted date with this flag and there is > no > > need to wait the end of release to merge these changes. > > > > @Jinfeng NI > > It looks like you are right. > > With consistent mode (isDateCorrect = true) all tests are passed. So I am > > going to open a jira ticket for it with next changes > > https://github.com/vdiravka/drill/commit/ff8d5c7d601915f760d1b0e9618730 > > 3410cac5d3 > > Thanks. > > > > Kind regards > > Vitalii > > > > 2016-10-25 18:36 GMT+00:00 Jinfeng Ni : > > > > > I'm not sure if I fully understand your answers. The bottom line is > > > quite simple: given a set of parquet files, the ParquetTableMeta > > > instance constructed in Drill should have identical value for > > > "isDateCorrect", whether it comes from parquet footer, or parquet > > > metadata cache, or whether there is partition pruning or not. However, > > > the code shows that this flag is not in consistent mode across > > > different cases. > > > > > > > > > > > > On Tue, Oct 25, 2016 at 11:24 AM, Vitalii Diravka > > > wrote: > > > > Hi Jinfeng, > > > > > > > > 1.If the parquet files are generated with Drill after Drill-4203 > these > > > > files have "isDateCorrect = true" property. > > > > Drill serializes this property from metadata now. When we set this > > > property > > > > in the first constructor we will hide the value from metadata. > > > > IsDateCorrect will be false only if this value equals to the false > (no > > > case > > > > for it now) or absent in parquet metadata footer. > > > > > > > > > > > > 2. I'm not sure the reason to change isDateCorrect metadata property > > when > > > > the user disable dates correction. > > > > If you have some use case it would be great if you provide it. > > > > > > > > 3. Maybe you are right regarding to when Parquet metadata is cloned. > > > > Here I added the property in the same manner as Jason's new property > > > > "drillVersion. So need it a separate unit test? > > > > > > > > > > > > Kind regards > > > > Vitalii > > > > > > > > 2016-10-25 16:23 GMT+00:00 Jinfeng Ni : > > > > > > > >> Forgot to copy the link to the code. > > > >> > > > >> [1] https://github.com/apache/drill/blob/master/exec/java- > > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/ > > > >> Metadata.java#L950-L955 > > > >> > > > >> On Tue, Oct 25, 2016 at 9:16 AM, Jinfeng Ni wrote: > > > >> > @Jason, @Vitalli, > > > >> > > > > >> > Any thoughts on this question, since both you worked on fix of > > > >> DRILL-4203? > > > >> > > > > >> > Looking through the code, there is a third case [1], where this > flag > > > >> > is set to false when Parquet metadata is cloned (after partition > > > >> > pruning, etc). That means, for the 2nd case where the flag is set > > to > > > >> > true, if there is pruning happening, the new parquet metadata will > > see > > > >> > the flag is flipped to false. This does not make sense to me. > > > >> > > > > >> > > > > >> > > > > >> > On Mon, Oct 24, 2016 at 3:10 PM, Jinfeng Ni > wrote: > > > >> >> Hello All, > > > >> >> > > > >> >> DRILL-4203 addressed the date field issue. In the fix, it > > introduced > > > >> >> a new field in ParquetTableMetadata_v2 : isDateCorrect. I have > > some > > > >> >> difficulty in understanding the meaning of this field. > > > >> >> > > > >> >> According to [1], this field is set to false, when Drill gets > > parquet > > > >> >> metadata from parquet footer. This field is set to true in code > > > flow > > > >> >> of [2] and [3], when Drill gets parquet metadata from meta data > > > cache. > > > >> >> > > > >> >> Questions I have: > > > >> >> 1. If the parquet files are generated with Drill after > DRILL-4203, > > > >> >> Drill still thinks date field is NOT correct (isDateCorrect
Re: isDateCorrect field in ParquetTableMetadata
Vitalii -- are you still planning to open a ticket and pull request for the fix you've noted below? -- Zelaine On Wed, Oct 26, 2016 at 8:28 AM, Vitalii Diravkawrote: > @Paul Rogers > It may be the undefined case when the file is generated with drill.version > = 1.9-SNAPSHOT. > It is more easy to determine corrupted date with this flag and there is no > need to wait the end of release to merge these changes. > > @Jinfeng NI > It looks like you are right. > With consistent mode (isDateCorrect = true) all tests are passed. So I am > going to open a jira ticket for it with next changes > https://github.com/vdiravka/drill/commit/ff8d5c7d601915f760d1b0e9618730 > 3410cac5d3 > Thanks. > > Kind regards > Vitalii > > 2016-10-25 18:36 GMT+00:00 Jinfeng Ni : > > > I'm not sure if I fully understand your answers. The bottom line is > > quite simple: given a set of parquet files, the ParquetTableMeta > > instance constructed in Drill should have identical value for > > "isDateCorrect", whether it comes from parquet footer, or parquet > > metadata cache, or whether there is partition pruning or not. However, > > the code shows that this flag is not in consistent mode across > > different cases. > > > > > > > > On Tue, Oct 25, 2016 at 11:24 AM, Vitalii Diravka > > wrote: > > > Hi Jinfeng, > > > > > > 1.If the parquet files are generated with Drill after Drill-4203 these > > > files have "isDateCorrect = true" property. > > > Drill serializes this property from metadata now. When we set this > > property > > > in the first constructor we will hide the value from metadata. > > > IsDateCorrect will be false only if this value equals to the false (no > > case > > > for it now) or absent in parquet metadata footer. > > > > > > > > > 2. I'm not sure the reason to change isDateCorrect metadata property > when > > > the user disable dates correction. > > > If you have some use case it would be great if you provide it. > > > > > > 3. Maybe you are right regarding to when Parquet metadata is cloned. > > > Here I added the property in the same manner as Jason's new property > > > "drillVersion. So need it a separate unit test? > > > > > > > > > Kind regards > > > Vitalii > > > > > > 2016-10-25 16:23 GMT+00:00 Jinfeng Ni : > > > > > >> Forgot to copy the link to the code. > > >> > > >> [1] https://github.com/apache/drill/blob/master/exec/java- > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/ > > >> Metadata.java#L950-L955 > > >> > > >> On Tue, Oct 25, 2016 at 9:16 AM, Jinfeng Ni wrote: > > >> > @Jason, @Vitalli, > > >> > > > >> > Any thoughts on this question, since both you worked on fix of > > >> DRILL-4203? > > >> > > > >> > Looking through the code, there is a third case [1], where this flag > > >> > is set to false when Parquet metadata is cloned (after partition > > >> > pruning, etc). That means, for the 2nd case where the flag is set > to > > >> > true, if there is pruning happening, the new parquet metadata will > see > > >> > the flag is flipped to false. This does not make sense to me. > > >> > > > >> > > > >> > > > >> > On Mon, Oct 24, 2016 at 3:10 PM, Jinfeng Ni wrote: > > >> >> Hello All, > > >> >> > > >> >> DRILL-4203 addressed the date field issue. In the fix, it > introduced > > >> >> a new field in ParquetTableMetadata_v2 : isDateCorrect. I have > some > > >> >> difficulty in understanding the meaning of this field. > > >> >> > > >> >> According to [1], this field is set to false, when Drill gets > parquet > > >> >> metadata from parquet footer. This field is set to true in code > > flow > > >> >> of [2] and [3], when Drill gets parquet metadata from meta data > > cache. > > >> >> > > >> >> Questions I have: > > >> >> 1. If the parquet files are generated with Drill after DRILL-4203, > > >> >> Drill still thinks date field is NOT correct (isDateCorrect = > false)? > > >> >> 2. Why does this filed have nothing to do with "autoCorrection" > flag > > >> >> [4]? If someone turns off autoCorrection, will it have impact on > > this > > >> >> "isDateCorrect" flag ? > > >> >> > > >> >> Thanks in advance for any input, > > >> >> > > >> >> Jinfeng > > >> >> > > >> >> > > >> >> [1] https://github.com/apache/drill/blob/master/exec/java- > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/ > > Metadata.java#L932 > > >> >> [2] https://github.com/apache/drill/blob/master/exec/java- > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/ > > Metadata.java#L936 > > >> >> [3] https://github.com/apache/drill/blob/master/exec/java- > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/ > > Metadata.java#L187 > > >> >> [4] https://github.com/apache/drill/blob/master/exec/java- > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/ > > >> Metadata.java#L354-L355 > > >> > > >
[GitHub] drill issue #602: Improve Drill C++ connector
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/602 No need for many PRs. Let's just squash this branch into commits for the individual JIRA's I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #622: DRILL-4369: Exchange name and version infos during handsha...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/622 Thanks for the responses! Sorry I posted my comments just after the commit. The issues are "nice-to-haves" for the original PR, do not justify another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #602: Improve Drill C++ connector
Github user laurentgo commented on the issue: https://github.com/apache/drill/pull/602 @parthchandra do you want individual pull requests for each issue, or are you okay having this branch squashed and merged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #602: Improve Drill C++ connector
Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/602#discussion_r85436247 --- Diff: contrib/native/client/cmakeModules/FindZookeeper.cmake --- @@ -30,8 +30,10 @@ if (MSVC) if(${CMAKE_BUILD_TYPE} MATCHES "Debug") set(ZK_BuildOutputDir "Debug") +set(ZK_LibName "zookeeper_d") else() set(ZK_BuildOutputDir "Release") +set(ZK_LibName "zookeeper_d") --- End diff -- yes, agreed, it's a typo... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #602: Improve Drill C++ connector
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/602#discussion_r85434357 --- Diff: contrib/native/client/cmakeModules/FindZookeeper.cmake --- @@ -30,8 +30,10 @@ if (MSVC) if(${CMAKE_BUILD_TYPE} MATCHES "Debug") set(ZK_BuildOutputDir "Debug") +set(ZK_LibName "zookeeper_d") else() set(ZK_BuildOutputDir "Release") +set(ZK_LibName "zookeeper_d") --- End diff -- Spent the better part of a day on this now. The Zookeeper project has defined only the 32 bit configurations these have the lib name set to 'zookeeper_d.lib' for debug and 'zookeeper.lib' for release. When creating a 64 bit configuration, depending on the path you choose, you can end up with either zookeeper_d.lib or zookeeper.lib as the import library name in both the release and debug build configurations. For correctness, I think we should use 'zookeeper' and not zookeeper_d for the release build. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #602: Improve Drill C++ connector
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/602#discussion_r85434499 --- Diff: contrib/native/client/cmakeModules/FindCppUnit.cmake --- @@ -0,0 +1,67 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# A simple cmake module to find CppUnit (inspired by +# http://root.cern.ch/viewvc/trunk/cint/reflex/cmake/modules/FindCppUnit.cmake) + +# +# - Find CppUnit +# This module finds an installed CppUnit package. +# +# It sets the following variables: +# CPPUNIT_FOUND - Set to false if CppUnit isn't found. +# CPPUNIT_INCLUDE_DIR - The CppUnit include directory. +# CPPUNIT_LIBRARY - The CppUnit library to link against. + +if (MSVC) +if (${CMAKE_BUILD_TYPE} MATCHES "Debug") +set(CPPUNIT_BuildOutputDir "Debug") +set(CPPUNIT_LibName "cppunitd") +else() +set(CPPUNIT_BuildOutputDir "Release") +set(CPPUNIT_LibName "cppunit") +endif() +if ("${CPPUNIT_HOME}_" MATCHES "^_$") +message(" ") +message("- Please set the cache variable CPPUNIT_HOME to point to the directory with the zookeeper source.") --- End diff -- My fault really, but I suppose this should be "cppunit source" and not "zookeeper source" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Filter appears above project in query plan - Null Equality Join
Ok, if there is not much of a performance gain as suggested in the responses, then there is not much that can be done in this case. Thanks, Khurram On Thu, Oct 27, 2016 at 3:41 AM, Jinfeng Niwrote: > Also, if the project operator is not doing any expression evaluation, > the project operator itself would not introduce any big overhead. > There probably is no big benefit if we push filter past the project > operator, as Zelaine said. > > > On Wed, Oct 26, 2016 at 3:07 PM, Jinfeng Ni wrote: > > The project under filter is for dynamic expansion of * column. Since > > the join filter is referring to columns in the project's output, it's > > not possible to push filter past that project. > > > > > > > > On Wed, Oct 26, 2016 at 1:50 PM, Zelaine Fong > wrote: > >> The filter, I assume you' referring to, is a join filter. So, at a > >> minimum, it needs to be applied after the hash join. I'm not sure > there's > >> a lot of benefit in pushing that filter past the project that's on top > of > >> the hash join. > >> > >> -- Zelaine > >> > >> On Wed, Oct 26, 2016 at 8:59 AM, Khurram Faraaz > >> wrote: > >> > >>> Hi All, > >>> > >>> Filter is seen on top of Project in query plan for a null equality > join. > >>> This is over CSV data, shouldn't the filter appear below the project > in the > >>> query plan ? > >>> I am on Drill 1.9.0 git commit id: a29f1e29 > >>> > >>> Note : t1 has some nulls in it > >>>t2 does not have any nulls in it. > >>> > >>> {noformat} > >>> 0: jdbc:drill:schema=dfs.tmp> explain plan for > >>> select * from `oneColDupsWnulls.csv` t1 JOIN `oneColWOnulls.csv` t2 > >>> ON t1.columns[0] = t2.columns[0] > >>> WHERE t1.columns[0] IS NOT DISTINCT FROM t2.columns[0] > >>> OR ( t1.columns[0] IS NULL AND t2.columns[0] IS NULL ); > >>> +--+--+ > >>> | text | json | > >>> +--+--+ > >>> | 00-00Screen > >>> 00-01 ProjectAllowDup(*=[$0], *0=[$1]) > >>> 00-02Project(T43¦¦*=[$0], T44¦¦*=[$2]) > >>> 00-03 SelectionVectorRemover > >>> 00-04Filter(condition=[OR(CAST(CASE(IS NULL(ITEM($1, 0)), > IS > >>> NULL(ITEM($3, 0)), IS NULL(ITEM($3, 0)), IS NULL(ITEM($1, 0)), > =(ITEM($1, > >>> 0), ITEM($3, 0:BOOLEAN NOT NULL, AND(IS NULL(ITEM($1, 0)), IS > >>> NULL(ITEM($3, 0]) > >>> 00-05 Project(T43¦¦*=[$0], columns=[$1], T44¦¦*=[$3], > >>> columns0=[$4]) > >>> 00-06HashJoin(condition=[=($2, $5)], joinType=[inner]) > >>> 00-07 Project(T44¦¦*=[$0], columns0=[$1], $f20=[$2]) > >>> 00-09Project(T44¦¦*=[$0], columns=[$1], > $f2=[ITEM($1, > >>> 0)]) > >>> 00-11 Project(T44¦¦*=[$0], columns=[$1]) > >>> 00-13Scan(groupscan=[EasyGroupScan > >>> [selectionRoot=maprfs:/tmp/oneColWOnulls.csv, numFiles=1, > columns=[`*`], > >>> files=[maprfs:///tmp/oneColWOnulls.csv]]]) > >>> 00-08 Project(T43¦¦*=[$0], columns=[$1], $f2=[ITEM($1, > >>> 0)]) > >>> 00-10Project(T43¦¦*=[$0], columns=[$1]) > >>> 00-12 Scan(groupscan=[EasyGroupScan > >>> [selectionRoot=maprfs:/tmp/oneColDupsWnulls.csv, numFiles=1, > >>> columns=[`*`], > >>> files=[maprfs:///tmp/oneColDupsWnulls.csv]]]) > >>> {noformat} > >>> > >>> Results returned by query > >>> > >>> {noformat} > >>> 0: jdbc:drill:schema=dfs.tmp> select * from `oneColDupsWnulls.csv` t1 > JOIN > >>> `oneColWOnulls.csv` t2 ON t1.columns[0] = t2.columns[0] WHERE > t1.columns[0] > >>> IS NOT DISTINCT FROM t2.columns[0] OR ( t1.columns[0] IS NULL AND > >>> t2.columns[0] IS NULL ); > >>> +-+-+ > >>> | columns | columns0 | > >>> +-+-+ > >>> | ["test"]| ["test"]| > >>> | ["foo"] | ["foo"] | > >>> | ["foo"] | ["foo"] | > >>> | ["bar"] | ["bar"] | > >>> | ["yes"] | ["yes"] | > >>> | ["yes"] | ["yes"] | > >>> | ["no"] | ["no"] | > >>> | ["no"] | ["no"] | > >>> | ["foobar"] | ["foobar"] | > >>> | ["foobar"] | ["foobar"] | > >>> | ["never"] | ["never"] | > >>> | ["never"] | ["never"] | > >>> | ["ever"]| ["ever"]| > >>> | ["ever"]| ["ever"]| > >>> | ["here"]| ["here"]| > >>> | ["there"] | ["there"] | > >>> | ["no"] | ["no"] | > >>> | ["no"] | ["no"] | > >>> | ["yes"] | ["yes"] | > >>> | ["yes"] | ["yes"] | > >>> | ["foobar"] | ["foobar"] | > >>> | ["foobar"] | ["foobar"] | > >>> | ["temp"]| ["temp"]| > >>> +-+-+ > >>> 23 rows selected (0.341 seconds) > >>> {noformat} > >>> > >>> Thanks, > >>> Khurram > >>> >
[jira] [Created] (DRILL-4973) Sqlline history
Andries Engelbrecht created DRILL-4973: -- Summary: Sqlline history Key: DRILL-4973 URL: https://issues.apache.org/jira/browse/DRILL-4973 Project: Apache Drill Issue Type: Improvement Components: Client - CLI Reporter: Andries Engelbrecht Priority: Minor Currently the history on sqlline stops working after 500 queries have been logged in the users .sqlline/history file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: TO_TIMESTAMP function returns in-correct results
Hello, Khurram http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html s second of minute number55 S fraction of second number978 2016-10-27 13:54 GMT+03:00 Khurram Faraaz: > All, > > I am on Drill 1.9.0 git commit ID : a29f1e29 on CentOS > > TO_TIMESTAMP function does not return correct results, note that the > minutes, seconds and milliseconds parts of timestamp are incorrect in the > results > > {noformat} > 0: jdbc:drill:schema=dfs.tmp> VALUES(TO_TIMESTAMP('2015-03-30 20:49:59.10 > UTC', '-MM-dd HH:mm:ss.s z')); > ++ > | EXPR$0 | > ++ > | 2015-03-30 20:49:10.0 | > ++ > 1 row selected (0.228 seconds) > {noformat} > > {noformat} > 0: jdbc:drill:schema=dfs.tmp> VALUES(CAST(TO_TIMESTAMP('2015-03-30 > 20:49:59.10 UTC', '-MM-dd HH:mm:ss.s z') AS TIMESTAMP)); > ++ > | EXPR$0 | > ++ > | 2015-03-30 20:49:10.0 | > ++ > 1 row selected (0.265 seconds) > {noformat} > > This case returns correct results, when the same string used above is given > as input to CAST function, note that minutes mm, seconds ss and millisecond > s parts are honored > > {noformat} > 0: jdbc:drill:schema=dfs.tmp> VALUES(CAST('2015-03-30 20:49:59.10 UTC' AS > TIMESTAMP)); > ++ > | EXPR$0 | > ++ > | 2015-03-30 20:49:59.1 | > ++ > 1 row selected (0.304 seconds) > {noformat} > > Thanks, > Khurram >
TO_TIMESTAMP function returns in-correct results
All, I am on Drill 1.9.0 git commit ID : a29f1e29 on CentOS TO_TIMESTAMP function does not return correct results, note that the minutes, seconds and milliseconds parts of timestamp are incorrect in the results {noformat} 0: jdbc:drill:schema=dfs.tmp> VALUES(TO_TIMESTAMP('2015-03-30 20:49:59.10 UTC', '-MM-dd HH:mm:ss.s z')); ++ | EXPR$0 | ++ | 2015-03-30 20:49:10.0 | ++ 1 row selected (0.228 seconds) {noformat} {noformat} 0: jdbc:drill:schema=dfs.tmp> VALUES(CAST(TO_TIMESTAMP('2015-03-30 20:49:59.10 UTC', '-MM-dd HH:mm:ss.s z') AS TIMESTAMP)); ++ | EXPR$0 | ++ | 2015-03-30 20:49:10.0 | ++ 1 row selected (0.265 seconds) {noformat} This case returns correct results, when the same string used above is given as input to CAST function, note that minutes mm, seconds ss and millisecond s parts are honored {noformat} 0: jdbc:drill:schema=dfs.tmp> VALUES(CAST('2015-03-30 20:49:59.10 UTC' AS TIMESTAMP)); ++ | EXPR$0 | ++ | 2015-03-30 20:49:59.1 | ++ 1 row selected (0.304 seconds) {noformat} Thanks, Khurram