[jira] [Commented] (DRILL-4479) JsonReader should pick a less restrictive type when creating the default column
[ https://issues.apache.org/jira/browse/DRILL-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186687#comment-15186687 ] ASF GitHub Bot commented on DRILL-4479: --- GitHub user amansinha100 opened a pull request: https://github.com/apache/drill/pull/420 DRILL-4479: Use varchar for default column when all_text_mode is enab… …led. You can merge this pull request into a Git repository by running: $ git pull https://github.com/amansinha100/incubator-drill DRILL-4479 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/420.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #420 commit c5b4aef5b35547561ea71ce880391429643a6ee0 Author: Aman SinhaDate: 2016-03-08T17:27:32Z DRILL-4479: Use varchar for default column when all_text_mode is enabled. > JsonReader should pick a less restrictive type when creating the default > column > --- > > Key: DRILL-4479 > URL: https://issues.apache.org/jira/browse/DRILL-4479 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.5.0 >Reporter: Aman Sinha > Attachments: mostlynulls.json > > > This JIRA is related to DRILL-3806 but has a narrower scope, so I decided to > create separate one. > The JsonReader has the method ensureAtLeastOneField() (see > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L91) > that ensures that when no columns are found, create an empty one and it > chooses to create a nullable int column. One consequence is that queries of > the following type fail: > {noformat} > select c1 from dfs.`mostlynulls.json`; > ... > ... > | null | > | null | > Error: DATA_READ ERROR: Error parsing JSON - You tried to write a VarChar > type when you are using a ValueWriter of type NullableIntWriterImpl. > File /Users/asinha/data/mostlynulls.json > Record 4097 > {noformat} > In this file the first 4096 rows have NULL values for c1 followed by rows > that have a valid string. > It would be useful for the Json reader to choose a less restrictive type such > as varchar in order to allow more types of queries to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4484) NPE when querying empty directory
[ https://issues.apache.org/jira/browse/DRILL-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated DRILL-4484: Affects Version/s: (was: 1.6.0) 1.5.0 > NPE when querying empty directory > --- > > Key: DRILL-4484 > URL: https://issues.apache.org/jira/browse/DRILL-4484 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Deneche A. Hakim > Fix For: 1.7.0 > > > {code} > 0: jdbc:drill:drillbit=localhost> select count(*) from > dfs.`/drill/xyz/201604*`; > Error: VALIDATION ERROR: null > SQL Query null > 0: jdbc:drill:drillbit=localhost> select count(*) from > dfs.`/drill/xyz/20160401`; > Error: VALIDATION ERROR: null > SQL Query null > [Error Id: 87366a2d-fc90-42f3-a076-aed5efdd27cb on atsqa4-133.qa.lab:31010] > (state=,code=0) > 0: jdbc:drill:drillbit=localhost> select count(*) from > dfs.`/drill/xyz/20160401/`; > Error: VALIDATION ERROR: null > SQL Query null > [Error Id: ac122243-488e-4fb8-b89f-dc01c7e5c63a on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > {code} > [Mon Mar 07 15:00:19 root@/drill/xyz ] # ls -lR > .: > total 5 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160101 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160102 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160103 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160104 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160105 > drwxr-xr-x 2 root root 1 Feb 26 16:31 20160201 > drwxr-xr-x 2 root root 3 Feb 26 16:31 20160202 > drwxr-xr-x 2 root root 4 Feb 26 16:31 20160301 > drwxr-xr-x 2 root root 0 Feb 26 16:31 20160401 > ./20160101: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160102: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160103: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160104: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160105: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160201: > total 0 > ./20160202: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet > ./20160301: > total 2 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet > -rw-r--r-- 1 root root 395 Feb 26 16:31 2_0_0.parquet > ./20160401: > total 0 > {code} > Hakim's analysis: > {code} > More details about the NPE, actually it's an IllegalArgumentException: what > happens is that during planing no file meets the wildcard selection and the > query should fail during planing with a "Table not found" message, instead > execution starts and the scanner fail because no file was assigned to them > {code} > Drill version: > {code} > #Generated by Git-Commit-Id-Plugin > #Mon Mar 07 19:38:24 UTC 2016 > git.commit.id.abbrev=a2fec78 > git.commit.user.email=adene...@gmail.com > git.commit.message.full=DRILL-4457\: Difference in results returned by window > function over BIGINT data\n\nthis closes \#410\n > git.commit.id=a2fec78695df979e240231cb9d32c7f18274a333 > git.commit.message.short=DRILL-4457\: Difference in results returned by > window function over BIGINT data > git.commit.user.name=adeneche > git.build.user.name=Unknown > git.commit.id.describe=0.9.0-625-ga2fec78-dirty > git.build.user.email=Unknown > git.branch=master > git.commit.time=07.03.2016 @ 17\:38\:42 UTC > git.build.time=07.03.2016 @ 19\:38\:24 UTC > git.remote.origin.url=https\://github.com/apache/drill > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4484) NPE when querying empty directory
[ https://issues.apache.org/jira/browse/DRILL-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated DRILL-4484: Fix Version/s: 1.7.0 > NPE when querying empty directory > --- > > Key: DRILL-4484 > URL: https://issues.apache.org/jira/browse/DRILL-4484 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.6.0 >Reporter: Victoria Markman >Assignee: Deneche A. Hakim > Fix For: 1.7.0 > > > {code} > 0: jdbc:drill:drillbit=localhost> select count(*) from > dfs.`/drill/xyz/201604*`; > Error: VALIDATION ERROR: null > SQL Query null > 0: jdbc:drill:drillbit=localhost> select count(*) from > dfs.`/drill/xyz/20160401`; > Error: VALIDATION ERROR: null > SQL Query null > [Error Id: 87366a2d-fc90-42f3-a076-aed5efdd27cb on atsqa4-133.qa.lab:31010] > (state=,code=0) > 0: jdbc:drill:drillbit=localhost> select count(*) from > dfs.`/drill/xyz/20160401/`; > Error: VALIDATION ERROR: null > SQL Query null > [Error Id: ac122243-488e-4fb8-b89f-dc01c7e5c63a on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > {code} > [Mon Mar 07 15:00:19 root@/drill/xyz ] # ls -lR > .: > total 5 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160101 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160102 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160103 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160104 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160105 > drwxr-xr-x 2 root root 1 Feb 26 16:31 20160201 > drwxr-xr-x 2 root root 3 Feb 26 16:31 20160202 > drwxr-xr-x 2 root root 4 Feb 26 16:31 20160301 > drwxr-xr-x 2 root root 0 Feb 26 16:31 20160401 > ./20160101: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160102: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160103: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160104: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160105: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160201: > total 0 > ./20160202: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet > ./20160301: > total 2 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet > -rw-r--r-- 1 root root 395 Feb 26 16:31 2_0_0.parquet > ./20160401: > total 0 > {code} > Hakim's analysis: > {code} > More details about the NPE, actually it's an IllegalArgumentException: what > happens is that during planing no file meets the wildcard selection and the > query should fail during planing with a "Table not found" message, instead > execution starts and the scanner fail because no file was assigned to them > {code} > Drill version: > {code} > #Generated by Git-Commit-Id-Plugin > #Mon Mar 07 19:38:24 UTC 2016 > git.commit.id.abbrev=a2fec78 > git.commit.user.email=adene...@gmail.com > git.commit.message.full=DRILL-4457\: Difference in results returned by window > function over BIGINT data\n\nthis closes \#410\n > git.commit.id=a2fec78695df979e240231cb9d32c7f18274a333 > git.commit.message.short=DRILL-4457\: Difference in results returned by > window function over BIGINT data > git.commit.user.name=adeneche > git.build.user.name=Unknown > git.commit.id.describe=0.9.0-625-ga2fec78-dirty > git.build.user.email=Unknown > git.branch=master > git.commit.time=07.03.2016 @ 17\:38\:42 UTC > git.build.time=07.03.2016 @ 19\:38\:24 UTC > git.remote.origin.url=https\://github.com/apache/drill > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4484) NPE when querying empty directory
[ https://issues.apache.org/jira/browse/DRILL-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim reassigned DRILL-4484: --- Assignee: Deneche A. Hakim > NPE when querying empty directory > --- > > Key: DRILL-4484 > URL: https://issues.apache.org/jira/browse/DRILL-4484 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.6.0 >Reporter: Victoria Markman >Assignee: Deneche A. Hakim > Fix For: 1.7.0 > > > {code} > 0: jdbc:drill:drillbit=localhost> select count(*) from > dfs.`/drill/xyz/201604*`; > Error: VALIDATION ERROR: null > SQL Query null > 0: jdbc:drill:drillbit=localhost> select count(*) from > dfs.`/drill/xyz/20160401`; > Error: VALIDATION ERROR: null > SQL Query null > [Error Id: 87366a2d-fc90-42f3-a076-aed5efdd27cb on atsqa4-133.qa.lab:31010] > (state=,code=0) > 0: jdbc:drill:drillbit=localhost> select count(*) from > dfs.`/drill/xyz/20160401/`; > Error: VALIDATION ERROR: null > SQL Query null > [Error Id: ac122243-488e-4fb8-b89f-dc01c7e5c63a on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > {code} > [Mon Mar 07 15:00:19 root@/drill/xyz ] # ls -lR > .: > total 5 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160101 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160102 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160103 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160104 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160105 > drwxr-xr-x 2 root root 1 Feb 26 16:31 20160201 > drwxr-xr-x 2 root root 3 Feb 26 16:31 20160202 > drwxr-xr-x 2 root root 4 Feb 26 16:31 20160301 > drwxr-xr-x 2 root root 0 Feb 26 16:31 20160401 > ./20160101: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160102: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160103: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160104: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160105: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160201: > total 0 > ./20160202: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet > ./20160301: > total 2 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet > -rw-r--r-- 1 root root 395 Feb 26 16:31 2_0_0.parquet > ./20160401: > total 0 > {code} > Hakim's analysis: > {code} > More details about the NPE, actually it's an IllegalArgumentException: what > happens is that during planing no file meets the wildcard selection and the > query should fail during planing with a "Table not found" message, instead > execution starts and the scanner fail because no file was assigned to them > {code} > Drill version: > {code} > #Generated by Git-Commit-Id-Plugin > #Mon Mar 07 19:38:24 UTC 2016 > git.commit.id.abbrev=a2fec78 > git.commit.user.email=adene...@gmail.com > git.commit.message.full=DRILL-4457\: Difference in results returned by window > function over BIGINT data\n\nthis closes \#410\n > git.commit.id=a2fec78695df979e240231cb9d32c7f18274a333 > git.commit.message.short=DRILL-4457\: Difference in results returned by > window function over BIGINT data > git.commit.user.name=adeneche > git.build.user.name=Unknown > git.commit.id.describe=0.9.0-625-ga2fec78-dirty > git.build.user.email=Unknown > git.branch=master > git.commit.time=07.03.2016 @ 17\:38\:42 UTC > git.build.time=07.03.2016 @ 19\:38\:24 UTC > git.remote.origin.url=https\://github.com/apache/drill > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4473) Removing trivial projects reveals bugs in handling of nonexistent columns in StreamingAggregate
[ https://issues.apache.org/jira/browse/DRILL-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-4473: -- Assignee: Sean Hsuan-Yi Chu > Removing trivial projects reveals bugs in handling of nonexistent columns in > StreamingAggregate > --- > > Key: DRILL-4473 > URL: https://issues.apache.org/jira/browse/DRILL-4473 > Project: Apache Drill > Issue Type: Bug >Reporter: Jacques Nadeau >Assignee: Sean Hsuan-Yi Chu > > We see a couple unit test failures in working with nonexistent columns once > DRILL-4467 is fixed. This is because trivial projects no longer protect > StreamingAggregate from non-existent columns. This is likely due to an > incorrect check before throwing a Unsupported error. An unknown/ANY type > should probably be allowed in the case of using sum/max/stddev > {code:title=Plan before DRILL-4467} > VOLCANO:Physical Planning (71ms): > ScreenPrel: rowcount = 1.0, cumulative cost = {464.1 rows, 2375.1 cpu, 0.0 > io, 0.0 network, 0.0 memory}, id = 185 > ProjectPrel(col1=[$0], col2=[$1], col3=[$2], col4=[$3], col5=[$4]): > rowcount = 1.0, cumulative cost = {464.0 rows, 2375.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 184 > StreamAggPrel(group=[{}], col1=[SUM($0)], col2=[SUM($1)], col3=[SUM($2)], > col4=[SUM($3)], col5=[SUM($4)]): rowcount = 1.0, cumulative cost = {464.0 > rows, 2375.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 183 > LimitPrel(offset=[0], fetch=[0]): rowcount = 1.0, cumulative cost = > {463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 182 > ProjectPrel(int_col=[$0], bigint_col=[$3], float4_col=[$4], > float8_col=[$1], interval_year_col=[$2]): rowcount = 463.0, cumulative cost = > {463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 181 > ScanPrel(groupscan=[EasyGroupScan > [selectionRoot=classpath:/employee.json, numFiles=1, columns=[`int_col`, > `bigint_col`, `float4_col`, `float8_col`, `interval_year_col`], > files=[classpath:/employee.json]]]): rowcount = 463.0, cumulative cost = > {463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 160 > {code} > {code:title=Plan after DRILL-4467} > VOLCANO:Physical Planning (63ms): > ScreenPrel: rowcount = 1.0, cumulative cost = {464.1 rows, 2375.1 cpu, 0.0 > io, 0.0 network, 0.0 memory}, id = 151 > ProjectPrel(col1=[$0], col2=[$1], col3=[$2], col4=[$3], col5=[$4]): > rowcount = 1.0, cumulative cost = {464.0 rows, 2375.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 150 > StreamAggPrel(group=[{}], col1=[SUM($0)], col2=[SUM($1)], col3=[SUM($2)], > col4=[SUM($3)], col5=[SUM($4)]): rowcount = 1.0, cumulative cost = {464.0 > rows, 2375.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 149 > LimitPrel(offset=[0], fetch=[0]): rowcount = 1.0, cumulative cost = > {463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 148 > ScanPrel(groupscan=[EasyGroupScan > [selectionRoot=classpath:/employee.json, numFiles=1, columns=[`int_col`, > `bigint_col`, `float4_col`, `float8_col`, `interval_year_col`], > files=[classpath:/employee.json]]]): rowcount = 463.0, cumulative cost = > {463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 141 > Tests disabled referring to this bug in TestAggregateFunctions show multiple > examples of this behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4323) Hive Native Reader : A simple count(*) throws Incoming batch has an empty schema error
[ https://issues.apache.org/jira/browse/DRILL-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Challapalli closed DRILL-4323. Verified and automated > Hive Native Reader : A simple count(*) throws Incoming batch has an empty > schema error > -- > > Key: DRILL-4323 > URL: https://issues.apache.org/jira/browse/DRILL-4323 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.5.0 >Reporter: Rahul Challapalli >Assignee: Sean Hsuan-Yi Chu >Priority: Critical > Fix For: 1.6.0 > > Attachments: error.log > > > git.commit.id.abbrev=3d0b4b0 > A simple count(*) query does not work when hive native reader is enabled > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from customer; > +-+ > | EXPR$0 | > +-+ > | 10 | > +-+ > 1 row selected (3.074 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `store.hive.optimize_scan_with_native_readers` = true; > +---++ > | ok |summary | > +---++ > | true | store.hive.optimize_scan_with_native_readers updated. | > +---++ > 1 row selected (0.2 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from customer; > Error: SYSTEM ERROR: IllegalStateException: Incoming batch [#1341, > ProjectRecordBatch] has an empty schema. This is not allowed. > Fragment 0:0 > [Error Id: 4c867440-0fd3-4eda-922f-0f5eadcb1463 on qa-node191.qa.lab:31010] > (state=,code=0) > {code} > Hive DDL for the table : > {code} > create table customer > ( > c_customer_sk int, > c_customer_id string, > c_current_cdemo_sk int, > c_current_hdemo_sk int, > c_current_addr_sk int, > c_first_shipto_date_sk int, > c_first_sales_date_sk int, > c_salutation string, > c_first_name string, > c_last_name string, > c_preferred_cust_flag string, > c_birth_day int, > c_birth_month int, > c_birth_year int, > c_birth_country string, > c_login string, > c_email_address string, > c_last_review_date string > ) > STORED AS PARQUET > LOCATION '/drill/testdata/customer' > {code} > Attached the log file with the stacktrace -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4491) FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public
[ https://issues.apache.org/jira/browse/DRILL-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186430#comment-15186430 ] ASF GitHub Bot commented on DRILL-4491: --- Github user adityakishore commented on the pull request: https://github.com/apache/drill/pull/418#issuecomment-194097775 Until I looked at the code, I was under assumption that we are using Jackson to extract the serializable properties. We can, and should, definitely go that route. The way code currently works is that it iterate through allthe table options and see if there is a Java field present in the corresponding FormatPluginConfig class. If it does find one, and this is why I say it is a bug in the current implementation, it makes is accessible (`setAccesible(true)`) implying that it is expected to work with non-public fields and sets the value to the one passed as parameter. > FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public > - > > Key: DRILL-4491 > URL: https://issues.apache.org/jira/browse/DRILL-4491 > Project: Apache Drill > Issue Type: Bug >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Minor > Fix For: 1.7.0 > > > The code uses {{getField()}} instead of {{getDeclaredField()}}, which returns > only the public fields. > {code:title=FormatPluginOptionsDescriptor.java:165|borderStyle=solid} > Field field = pluginConfigClass.getField(paramDef.name); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4491) FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public
[ https://issues.apache.org/jira/browse/DRILL-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186425#comment-15186425 ] Aditya Kishore edited comment on DRILL-4491 at 3/9/16 3:30 AM: --- Until I looked at the code, I was under assumption that we are using Jackson to extract the serializable properties. We can definitely go that route. The way code currently works is that it iterate through allthe table options and see if there is a Java field present in the corresponding FormatPluginConfig class. If it does find one, and this is why I say it is a bug in the current implementation, it makes is accessible ({{setAccesible(true)}}) implying that it is expected to work with non-public fields and sets the value to the one passed as parameter. was (Author: adityakishore): Until I looked at the code, I was under assumption that we are using Jackson to extract the serializable properties. We can definitely go that route. The way code currently work is that it iterate > FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public > - > > Key: DRILL-4491 > URL: https://issues.apache.org/jira/browse/DRILL-4491 > Project: Apache Drill > Issue Type: Bug >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Minor > Fix For: 1.7.0 > > > The code uses {{getField()}} instead of {{getDeclaredField()}}, which returns > only the public fields. > {code:title=FormatPluginOptionsDescriptor.java:165|borderStyle=solid} > Field field = pluginConfigClass.getField(paramDef.name); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4491) FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public
[ https://issues.apache.org/jira/browse/DRILL-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186425#comment-15186425 ] Aditya Kishore commented on DRILL-4491: --- Until I looked at the code, I was under assumption that we are using Jackson to extract the serializable properties. We can definitely go that route. The way code currently work is that it iterate > FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public > - > > Key: DRILL-4491 > URL: https://issues.apache.org/jira/browse/DRILL-4491 > Project: Apache Drill > Issue Type: Bug >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Minor > Fix For: 1.7.0 > > > The code uses {{getField()}} instead of {{getDeclaredField()}}, which returns > only the public fields. > {code:title=FormatPluginOptionsDescriptor.java:165|borderStyle=solid} > Field field = pluginConfigClass.getField(paramDef.name); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure
[ https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186293#comment-15186293 ] ASF GitHub Bot commented on DRILL-4482: --- Github user StevenMPhillips commented on the pull request: https://github.com/apache/drill/pull/419#issuecomment-194060258 +1 > Avro no longer selects data correctly from a sub-structure > -- > > Key: DRILL-4482 > URL: https://issues.apache.org/jira/browse/DRILL-4482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Stefán Baxter >Assignee: Stefán Baxter >Priority: Blocker > Fix For: 1.6.0 > > > Parquet: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/processed/<>/transactions` as s limit 1; > ++ > | EXPR$0 | > ++ > | 87.55.171.210 | > ++ > 1 row selected (1.184 seconds) > Avro: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/streaming/<>/transactions` as s limit 1; > +-+ > | EXPR$0 | > +-+ > | null| > +-+ > 1 row selected (0.29 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure
[ https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186280#comment-15186280 ] ASF GitHub Bot commented on DRILL-4482: --- GitHub user jaltekruse opened a pull request: https://github.com/apache/drill/pull/419 DRILL-4482: Avro subselection broken by 4382 This fix includes a number of test updates to ensure Avro files are being read correctly. The branch includes 4441, which is on a different PR, but touched some of the same code, so I just based this fix on that branch. The actual regression fix is in the AvroRecordReader, in the case of a Union, we should not be created a child of the fieldSelection, which was properly done in the case with maps and records. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaltekruse/incubator-drill 4441-4482-avro-bugs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/419.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #419 commit 56048dae231a6c11c2650384da6893a8c1011fee Author: Jason AltekruseDate: 2016-02-26T17:55:05Z DRILL-4441: Fix varchar data read out of Avro filtering incorrectly due to metadata bug The precision of the Varchar datatype was not being set causing inconsistent truncation of values to the default length of 1. Fixed the same issue with varbinary. The test framework was previously taking a string as the baseline for a binary value, which cannot express all possible values. Fixed the test to intstead use a byte array. Thie required updating the hive tests that were using the old method of specifying baselines with a String. Fix cast to varbinary when reading from a data source with schema needed for writing a test. commit 15209ea07a41b0a7bdccb382950b5738bd229b18 Author: Jason Altekruse Date: 2016-03-08T22:16:03Z DRILL-4482: Fix Avro nested field selection regression Update some of the Avro tests to properly verify their results, others still need to be fixed. These will be addressed in DRILL-4110. > Avro no longer selects data correctly from a sub-structure > -- > > Key: DRILL-4482 > URL: https://issues.apache.org/jira/browse/DRILL-4482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Stefán Baxter >Assignee: Stefán Baxter >Priority: Blocker > Fix For: 1.6.0 > > > Parquet: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/processed/<>/transactions` as s limit 1; > ++ > | EXPR$0 | > ++ > | 87.55.171.210 | > ++ > 1 row selected (1.184 seconds) > Avro: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/streaming/<>/transactions` as s limit 1; > +-+ > | EXPR$0 | > +-+ > | null| > +-+ > 1 row selected (0.29 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186247#comment-15186247 ] ASF GitHub Bot commented on DRILL-4474: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/406 > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +--+ > | cnt | > +--+ > | 525 | > +--+ > 1 row selected (14.318 seconds) > {noformat} > {quote} > {color} > -- > CASE-5 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . .
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186246#comment-15186246 ] ASF GitHub Bot commented on DRILL-4474: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/416 > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +--+ > | cnt | > +--+ > | 525 | > +--+ > 1 row selected (14.318 seconds) > {noformat} > {quote} > {color} > -- > CASE-5 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . .
[jira] [Commented] (DRILL-4491) FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public
[ https://issues.apache.org/jira/browse/DRILL-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186220#comment-15186220 ] ASF GitHub Bot commented on DRILL-4491: --- GitHub user adityakishore opened a pull request: https://github.com/apache/drill/pull/418 DRILL-4491: FormatPluginOptionsDescriptor requires FormatPluginConfig… … fields to be public You can merge this pull request into a Git repository by running: $ git pull https://github.com/adityakishore/drill DRILL-4491 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/418.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #418 commit cce8467c2476da871891bad7db6cab3236537f7c Author: Aditya KishoreDate: 2016-03-09T00:49:55Z DRILL-4491: FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public > FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public > - > > Key: DRILL-4491 > URL: https://issues.apache.org/jira/browse/DRILL-4491 > Project: Apache Drill > Issue Type: Bug >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Minor > Fix For: 1.7.0 > > > The code uses {{getField()}} instead of {{getDeclaredField()}}, which returns > only the public fields. > {code:title=FormatPluginOptionsDescriptor.java:165|borderStyle=solid} > Field field = pluginConfigClass.getField(paramDef.name); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186213#comment-15186213 ] ASF GitHub Bot commented on DRILL-4474: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/416#discussion_r55455931 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java --- @@ -117,6 +117,24 @@ public void onMatch(RelOptRuleCall call) { } else if (aggCall.getArgList().size() == 1) { // count(columnName) ==> Agg ( Scan )) ==> columnValueCount int index = aggCall.getArgList().get(0); + +if (proj != null) { + // project in the middle of Agg and Scan : Only when input of AggCall is a RexInputRef in Project, we find the index of Scan's field. + // For instance, + // Agg - count($0) + // \ + // Proj - Exp={$1} + //\ + // Scan (col1, col2). + // return count of "col2" in Scan's metadata, if found. + + if (proj.getProjects().get(index) instanceof RexInputRef) { +index = ((RexInputRef) proj.getProjects().get(index)).getIndex(); --- End diff -- Make sense. Let me add more unit test in the patch. > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} >
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186216#comment-15186216 ] ASF GitHub Bot commented on DRILL-4474: --- Github user amansinha100 commented on the pull request: https://github.com/apache/drill/pull/416#issuecomment-194042039 Overall, LGTM. +1 > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +--+ > | cnt | > +--+ > | 525 | > +--+ > 1 row selected (14.318 seconds) > {noformat} > {quote} > {color} > -- > CASE-5 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:>
[jira] [Created] (DRILL-4491) FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public
Aditya Kishore created DRILL-4491: - Summary: FormatPluginOptionsDescriptor requires FormatPluginConfig fields to be public Key: DRILL-4491 URL: https://issues.apache.org/jira/browse/DRILL-4491 Project: Apache Drill Issue Type: Bug Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Minor The code uses {{getField()}} instead of {{getDeclaredField()}}, which returns only the public fields. {code:title=FormatPluginOptionsDescriptor.java:165|borderStyle=solid} Field field = pluginConfigClass.getField(paramDef.name); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186209#comment-15186209 ] ASF GitHub Bot commented on DRILL-4474: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/416#discussion_r55455592 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java --- @@ -117,6 +117,24 @@ public void onMatch(RelOptRuleCall call) { } else if (aggCall.getArgList().size() == 1) { // count(columnName) ==> Agg ( Scan )) ==> columnValueCount int index = aggCall.getArgList().get(0); + +if (proj != null) { + // project in the middle of Agg and Scan : Only when input of AggCall is a RexInputRef in Project, we find the index of Scan's field. + // For instance, + // Agg - count($0) + // \ + // Proj - Exp={$1} + //\ + // Scan (col1, col2). + // return count of "col2" in Scan's metadata, if found. + + if (proj.getProjects().get(index) instanceof RexInputRef) { +index = ((RexInputRef) proj.getProjects().get(index)).getIndex(); --- End diff -- might be good to add a test case for that just in case calcite changes this behavior in future. > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} >
[jira] [Commented] (DRILL-4485) MapR profile - use MapR 5.1.0
[ https://issues.apache.org/jira/browse/DRILL-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186175#comment-15186175 ] ASF GitHub Bot commented on DRILL-4485: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/417 > MapR profile - use MapR 5.1.0 > - > > Key: DRILL-4485 > URL: https://issues.apache.org/jira/browse/DRILL-4485 > Project: Apache Drill > Issue Type: New Feature > Components: Tools, Build & Test >Reporter: Patrick Wong >Assignee: Parth Chandra > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186171#comment-15186171 ] ASF GitHub Bot commented on DRILL-4474: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/416#discussion_r55454105 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java --- @@ -117,6 +117,24 @@ public void onMatch(RelOptRuleCall call) { } else if (aggCall.getArgList().size() == 1) { // count(columnName) ==> Agg ( Scan )) ==> columnValueCount int index = aggCall.getArgList().get(0); + +if (proj != null) { + // project in the middle of Agg and Scan : Only when input of AggCall is a RexInputRef in Project, we find the index of Scan's field. + // For instance, + // Agg - count($0) + // \ + // Proj - Exp={$1} + //\ + // Scan (col1, col2). + // return count of "col2" in Scan's metadata, if found. + + if (proj.getProjects().get(index) instanceof RexInputRef) { +index = ((RexInputRef) proj.getProjects().get(index)).getIndex(); --- End diff -- Calcite rewrote count(100) or count(1) into count() ==> aggCall.getArgList.isEmpty() is true. So Line 113 will take care of those cases. > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} >
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186153#comment-15186153 ] ASF GitHub Bot commented on DRILL-4474: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/416#discussion_r55453543 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java --- @@ -117,6 +117,24 @@ public void onMatch(RelOptRuleCall call) { } else if (aggCall.getArgList().size() == 1) { // count(columnName) ==> Agg ( Scan )) ==> columnValueCount int index = aggCall.getArgList().get(0); + +if (proj != null) { + // project in the middle of Agg and Scan : Only when input of AggCall is a RexInputRef in Project, we find the index of Scan's field. + // For instance, + // Agg - count($0) + // \ + // Proj - Exp={$1} + //\ + // Scan (col1, col2). + // return count of "col2" in Scan's metadata, if found. + + if (proj.getProjects().get(index) instanceof RexInputRef) { +index = ((RexInputRef) proj.getProjects().get(index)).getIndex(); --- End diff -- Doesn't this mean count(100) & count(1) still fail to pushdown? > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} >
[jira] [Commented] (DRILL-4485) MapR profile - use MapR 5.1.0
[ https://issues.apache.org/jira/browse/DRILL-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186142#comment-15186142 ] ASF GitHub Bot commented on DRILL-4485: --- GitHub user pwong-mapr opened a pull request: https://github.com/apache/drill/pull/417 DRILL-4485 - MapR profile - switch to MapR 5.1.0, and improve compatibility with maprfs storage format and MapR DB storage plugin You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwong-mapr/incubator-drill DRILL-4485-4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #417 commit fc076488cb88e1071ef403c300a8681d0b9c584c Author: Patrick Wong Date: 2016-03-08T02:22:08Z DRILL-4485 - MapR profile - switch to MapR 5.1.0, and improve compatibility with maprfs storage format and MapR DB storage plugin > MapR profile - use MapR 5.1.0 > - > > Key: DRILL-4485 > URL: https://issues.apache.org/jira/browse/DRILL-4485 > Project: Apache Drill > Issue Type: New Feature > Components: Tools, Build & Test >Reporter: Patrick Wong >Assignee: Parth Chandra > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186144#comment-15186144 ] Jinfeng Ni commented on DRILL-4474: --- I submit a new PR, after putting a patch on top of Jacque's DRILL-4474 patch. https://github.com/apache/drill/pull/416 Complete run the pre-commit function and unit test. [~amansinha100] or [~jnadeau], could you please review the new PR for DRILL-4474? Thanks! > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +--+ > | cnt | > +--+ > | 525 | > +--+ > 1 row selected (14.318 seconds) > {noformat} > {quote} > {color} > -- > CASE-5 (Correct result) >
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186137#comment-15186137 ] ASF GitHub Bot commented on DRILL-4474: --- GitHub user jinfengni opened a pull request: https://github.com/apache/drill/pull/416 DRILL-4474: Ensure that ConvertCountToDirectScan does not push through project when nullable input of count is not RexInputRef You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinfengni/incubator-drill review/DRILL-4474 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/416.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #416 commit 0a5f8fab786f931665d9d28ea67cf19ab37c07fb Author: Jacques NadeauDate: 2016-03-04T21:27:26Z DRILL-4474: Ensure that ConvertCountToDirectScan only pushes through project when project is trivial. commit ab00e6aa9563d79e62154ba1f3bbb71dba7d8036 Author: Jinfeng Ni Date: 2016-03-08T22:15:27Z DRILL-4474: Ensure that ConvertCountToDirectScan does not push through project when nullable input of count is not RexInputRef > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} >
[jira] [Commented] (DRILL-4485) MapR profile - use MapR 5.1.0
[ https://issues.apache.org/jira/browse/DRILL-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186138#comment-15186138 ] ASF GitHub Bot commented on DRILL-4485: --- Github user pwong-mapr closed the pull request at: https://github.com/apache/drill/pull/413 > MapR profile - use MapR 5.1.0 > - > > Key: DRILL-4485 > URL: https://issues.apache.org/jira/browse/DRILL-4485 > Project: Apache Drill > Issue Type: New Feature > Components: Tools, Build & Test >Reporter: Patrick Wong >Assignee: Parth Chandra > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4490) Count(*) function returns as optional instead of required
Krystal created DRILL-4490: -- Summary: Count(*) function returns as optional instead of required Key: DRILL-4490 URL: https://issues.apache.org/jira/browse/DRILL-4490 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 1.6.0 Reporter: Krystal Assignee: Sean Hsuan-Yi Chu git.commit.id.abbrev=c8a7840 I have the following CTAS query: create table test as select count(*) as col1 from cp.`tpch/orders.parquet`; The schema of the test table shows col1 as optional: message root { optional int64 col1; } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186029#comment-15186029 ] ASF GitHub Bot commented on DRILL-4474: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/406#discussion_r5563 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java --- @@ -103,6 +104,10 @@ public void onMatch(RelOptRuleCall call) { return; } +if (proj != null && !ProjectRemoveRule.isTrivial(proj)) { --- End diff -- I have a patch, which works fine for Jacque's new unit test. It continues to use directScan for simple count query. The patch is pending pre-commit & unit test run. Will update results shortly. > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +--+ > |
[jira] [Closed] (DRILL-4489) Add ValueVector tests from Drill
[ https://issues.apache.org/jira/browse/DRILL-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Phillips closed DRILL-4489. -- Resolution: Invalid This jira should be in the Arrow project, not Drill > Add ValueVector tests from Drill > > > Key: DRILL-4489 > URL: https://issues.apache.org/jira/browse/DRILL-4489 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips > > There are some simple ValueVector tests that should be included in the Arrow > project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4489) Add ValueVector tests from Drill
Steven Phillips created DRILL-4489: -- Summary: Add ValueVector tests from Drill Key: DRILL-4489 URL: https://issues.apache.org/jira/browse/DRILL-4489 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips There are some simple ValueVector tests that should be included in the Arrow project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure
[ https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185887#comment-15185887 ] Stefán Baxter commented on DRILL-4482: -- good news, thank you On Tue, Mar 8, 2016 at 9:39 PM, Jason Altekruse (JIRA)> Avro no longer selects data correctly from a sub-structure > -- > > Key: DRILL-4482 > URL: https://issues.apache.org/jira/browse/DRILL-4482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Stefán Baxter >Assignee: Stefán Baxter >Priority: Blocker > Fix For: 1.6.0 > > > Parquet: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/processed/<>/transactions` as s limit 1; > ++ > | EXPR$0 | > ++ > | 87.55.171.210 | > ++ > 1 row selected (1.184 seconds) > Avro: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/streaming/<>/transactions` as s limit 1; > +-+ > | EXPR$0 | > +-+ > | null| > +-+ > 1 row selected (0.29 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure
[ https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185880#comment-15185880 ] Jason Altekruse commented on DRILL-4482: I definitely found a fixed the issue, the regression was introduced by DRILL-4382, but the tests were not written properly to catch the change. Adding more tests now, patch should be posted soon. > Avro no longer selects data correctly from a sub-structure > -- > > Key: DRILL-4482 > URL: https://issues.apache.org/jira/browse/DRILL-4482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Stefán Baxter >Assignee: Stefán Baxter >Priority: Blocker > Fix For: 1.6.0 > > > Parquet: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/processed/<>/transactions` as s limit 1; > ++ > | EXPR$0 | > ++ > | 87.55.171.210 | > ++ > 1 row selected (1.184 seconds) > Avro: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/streaming/<>/transactions` as s limit 1; > +-+ > | EXPR$0 | > +-+ > | null| > +-+ > 1 row selected (0.29 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185820#comment-15185820 ] ASF GitHub Bot commented on DRILL-4474: --- Github user amansinha100 commented on the pull request: https://github.com/apache/drill/pull/406#issuecomment-193971973 Agree with @jinfengni that the current fix can cause performance regression for simpler count queries. I will change my review to -1 and let's see how to get the proper nullability check. > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +--+ > | cnt | > +--+ > | 525 | > +--+ > 1 row selected (14.318 seconds) > {noformat} > {quote} > {color} > -- > CASE-5 (Correct result) >
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185808#comment-15185808 ] ASF GitHub Bot commented on DRILL-4474: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/406#discussion_r55428219 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java --- @@ -103,6 +104,10 @@ public void onMatch(RelOptRuleCall call) { return; } +if (proj != null && !ProjectRemoveRule.isTrivial(proj)) { --- End diff -- With the patch, the following query will not use directScan. {code} select count(*) from cp.`tpch/nation.parquet`; {code} {code} 00-00Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {75.1 rows, 425.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 169 00-01 Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {75.0 rows, 425.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 168 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {75.0 rows, 425.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 167 00-03 Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount = 25.0, cumulative cost = {50.0 rows, 125.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 166 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, usedMetadataFile=false, columns=[]]]) : rowType = RecordType(): rowcount = 25.0, cumulative cost = {25.0 rows, 25.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 165 {code} I debug a bit. Seems Line 115 is fine. But something is worng in the code Line 117 - 123. > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) >
[jira] [Commented] (DRILL-4487) add unit test for DRILL-4449
[ https://issues.apache.org/jira/browse/DRILL-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185760#comment-15185760 ] ASF GitHub Bot commented on DRILL-4487: --- Github user amansinha100 commented on the pull request: https://github.com/apache/drill/pull/414#issuecomment-193960370 +1 > add unit test for DRILL-4449 > > > Key: DRILL-4487 > URL: https://issues.apache.org/jira/browse/DRILL-4487 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Parquet >Reporter: Deneche A. Hakim >Assignee: Aman Sinha > Fix For: 1.7.0 > > > now that we have a simple reproduction, we should add a unit test to make > sure we don't regress -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185729#comment-15185729 ] ASF GitHub Bot commented on DRILL-4474: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/406#discussion_r55422075 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java --- @@ -103,6 +104,10 @@ public void onMatch(RelOptRuleCall call) { return; } +if (proj != null && !ProjectRemoveRule.isTrivial(proj)) { --- End diff -- We have a check whether the input to count() is nullable in Line 115. In theory, if the input is non-nullable, then count(non-nullalbe expression) = rowcount. My guess is that the query (case expression) with incorrect result is caused by the wrong type resolution for the case expression. > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click'
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185724#comment-15185724 ] ASF GitHub Bot commented on DRILL-4474: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/406#discussion_r55421789 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ConvertCountToDirectScan.java --- @@ -103,6 +104,10 @@ public void onMatch(RelOptRuleCall call) { return; } +if (proj != null && !ProjectRemoveRule.isTrivial(proj)) { --- End diff -- I feel that this check might over-kill some optimization opportunity. For example, select count(100) from `parquetTable`; In this case, count(100) is equal to rowcount in parquet table. However, the project is not a trial project, meaning the new code will disable the optimization. > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then
[jira] [Created] (DRILL-4488) Prefix "-" cause failure (NPE) in constant folding
Sean Hsuan-Yi Chu created DRILL-4488: Summary: Prefix "-" cause failure (NPE) in constant folding Key: DRILL-4488 URL: https://issues.apache.org/jira/browse/DRILL-4488 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Reporter: Sean Hsuan-Yi Chu Assignee: Sean Hsuan-Yi Chu For example, a query like this one: {code} SELECT -sqrt(5) as col from cp.`tpch/nation.parquet` {code} gives NPE. The reason is because of the translation of prefix "-" to -1 . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure
[ https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185672#comment-15185672 ] Stefán Baxter commented on DRILL-4482: -- Hi, This is via dfs. The relevant part of the schema is here: {"name": "client_ip", "type": ["null",{"name":"ClientIPEntry", "type":"record", "fields": [ {"name": "ip", "type": "string"}, {"name": "isp", "type": ["null","string"]}, {"name": "postal_code", "type": ["null","string"]}, {"name": "country_code", "type": ["null","string"]}, {"name": "latitude", "type": ["null","double"]}, {"name": "longitude", "type": ["null","double"]} ]}]}, - Stefán On Tue, Mar 8, 2016 at 7:56 PM, Jason Altekruse (JIRA)> Avro no longer selects data correctly from a sub-structure > -- > > Key: DRILL-4482 > URL: https://issues.apache.org/jira/browse/DRILL-4482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Stefán Baxter >Assignee: Stefán Baxter >Priority: Blocker > Fix For: 1.6.0 > > > Parquet: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/processed/<>/transactions` as s limit 1; > ++ > | EXPR$0 | > ++ > | 87.55.171.210 | > ++ > 1 row selected (1.184 seconds) > Avro: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/streaming/<>/transactions` as s limit 1; > +-+ > | EXPR$0 | > +-+ > | null| > +-+ > 1 row selected (0.29 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure
[ https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185658#comment-15185658 ] Jason Altekruse commented on DRILL-4482: I think I may have reproduced the issue, is this field in your dataset a Map or a Record in the avro schema? I am seeing nulls in a case with maps, I am trying to figure out the cause right now. I will be improving our test coverage for Avro as a part of this change to make sure we don't have regressions like this in the future. > Avro no longer selects data correctly from a sub-structure > -- > > Key: DRILL-4482 > URL: https://issues.apache.org/jira/browse/DRILL-4482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Stefán Baxter >Assignee: Stefán Baxter >Priority: Blocker > Fix For: 1.6.0 > > > Parquet: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/processed/<>/transactions` as s limit 1; > ++ > | EXPR$0 | > ++ > | 87.55.171.210 | > ++ > 1 row selected (1.184 seconds) > Avro: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/streaming/<>/transactions` as s limit 1; > +-+ > | EXPR$0 | > +-+ > | null| > +-+ > 1 row selected (0.29 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4184) Drill does not support Parquet DECIMAL values in variable length BINARY fields
[ https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185648#comment-15185648 ] ASF GitHub Bot commented on DRILL-4184: --- Github user daveoshinsky commented on a diff in the pull request: https://github.com/apache/drill/pull/372#discussion_r55417098 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableVarLengthValuesColumn.java --- @@ -69,11 +73,16 @@ protected boolean readAndStoreValueSizeInformation() throws IOException { if ( currDefLevel == -1 ) { currDefLevel = pageReader.definitionLevels.readInteger(); } -if ( columnDescriptor.getMaxDefinitionLevel() > currDefLevel) { + +if (columnDescriptor.getMaxDefinitionLevel() > currDefLevel) { nullsRead++; - // set length of zero, each index in the vector defaults to null so no need to set the nullability - variableWidthVector.getMutator().setValueLengthSafe( - valuesReadInCurrentPass + pageReader.valuesReadyToRead, 0); + // set length of zero, each index in the vector defaults to null so no + // need to set the nullability + if (variableWidthVector == null) { --- End diff -- Regarding the two variables variableWidthVector and fixedWidthVector that I added, here is my reasoning. Either variableWidthVector is set if we have a VariableWidthVector, or fixedWidthVector is set if we have a FixedWidthVector (i.e., decimal). Hence, variableWidthVector is non-null if and only if we are to invoke the pre-existing logic, that assumed a variable width vector. When variableWidthVector is null (fixedWidthVector is non-null, but not currently used), we invoke the new logic to save the length information in decimalLengths. If this is no good, please tell me why, and suggest an alternative. > Drill does not support Parquet DECIMAL values in variable length BINARY fields > -- > > Key: DRILL-4184 > URL: https://issues.apache.org/jira/browse/DRILL-4184 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.4.0 > Environment: Windows 7 Professional, Java 1.8.0_66 >Reporter: Dave Oshinsky > > Encoding a DECIMAL logical type in Parquet using the variable length BINARY > primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0. The > problem first surfaces with the ClassCastException shown below, but fixing > the immediate cause of the exception is not sufficient to support this > combination (DECIMAL, BINARY) in a Parquet file. > In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or > FIXED_LEN_BINARY_ARRAY. Are there any plans to support DECIMAL with variable > length BINARY? Avro definitely supports encoding DECIMAL in variable length > bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this > support in Parquet is less clear. > Selecting on a BINARY DECIMAL field in a parquet file throws an exception as > shown below (java.lang.ClassCastException: > org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector). The successful query at > bottom selected on a string field in the same file. > 0: jdbc:drill:zk=local> select count(*) from > dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=7020; > org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet > recor > d reader. > Message: Failure in setting up reader > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr { > required binary ACCT_NO (DECIMAL(20,0)); > optional binary SF_NO (UTF8); > optional binary LF_NO (UTF8); > optional binary BRANCH_NO (DECIMAL(20,0)); > optional binary INTRO_CUST_NO (DECIMAL(20,0)); > optional binary INTRO_ACCT_NO (DECIMAL(20,0)); > optional binary INTRO_SIGN (UTF8); > optional binary TYPE (UTF8); > optional binary OPR_MODE (UTF8); > optional binary CUR_ACCT_TYPE (UTF8); > optional binary TITLE (UTF8); > optional binary CORP_CUST_NO (DECIMAL(20,0)); > optional binary APLNDT (UTF8); > optional binary OPNDT (UTF8); > optional binary VERI_EMP_NO (DECIMAL(20,0)); > optional binary VERI_SIGN (UTF8); > optional binary MANAGER_SIGN (UTF8); > optional binary CURBAL (DECIMAL(8,2)); > optional binary STATUS (UTF8); > } > , metadata: > {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace" > :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal > ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co > lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec >
[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure
[ https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185439#comment-15185439 ] Stefán Baxter commented on DRILL-4482: -- Still behaves the same. This query is getting the *same records* from the *same Avro* file just: 0: jdbc:drill:zk=local> select *s.client_ip* from dfs.asa.`/streaming/venuepoint/transactions` as s limit 2; +---+ | +---+ {"ip":"77.106.147.165","postal_code":"2601","country_code":"NO","latitude":61.1151,"longitude":10.4663} | {"ip":"unknown","postal_code":"unknown","country_code":"unknown","latitude":0.0,"longitude":0.0,"isp":"unknown"} | +---+ 2 rows selected (0.39 seconds) 0: jdbc:drill:zk=local> select s.*client_ip.ip* from dfs.asa.`/streaming/venuepoint/transactions` as s limit 2; +-+ +-+ +-+ 2 rows selected (0.16 seconds) Notice what happens wen a reference to the the sub field is added. Regards, -Stefán On Tue, Mar 8, 2016 at 6:08 PM, Jason Altekruse (JIRA)wrote: > Avro no longer selects data correctly from a sub-structure > -- > > Key: DRILL-4482 > URL: https://issues.apache.org/jira/browse/DRILL-4482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Stefán Baxter >Assignee: Stefán Baxter >Priority: Blocker > Fix For: 1.6.0 > > > Parquet: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/processed/<>/transactions` as s limit 1; > ++ > | EXPR$0 | > ++ > | 87.55.171.210 | > ++ > 1 row selected (1.184 seconds) > Avro: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/streaming/<>/transactions` as s limit 1; > +-+ > | EXPR$0 | > +-+ > | null| > +-+ > 1 row selected (0.29 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185429#comment-15185429 ] ASF GitHub Bot commented on DRILL-4474: --- Github user amansinha100 closed the pull request at: https://github.com/apache/drill/pull/415 > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +--+ > | cnt | > +--+ > | 525 | > +--+ > 1 row selected (14.318 seconds) > {noformat} > {quote} > {color} > -- > CASE-5 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . .
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185418#comment-15185418 ] ASF GitHub Bot commented on DRILL-4474: --- Github user amansinha100 commented on the pull request: https://github.com/apache/drill/pull/415#issuecomment-193901238 oops ... sorry, closing this and will reopen against the correct JIRA. > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +--+ > | cnt | > +--+ > | 525 | > +--+ > 1 row selected (14.318 seconds) > {noformat} > {quote} > {color} > -- > CASE-5 (Correct result) > -- >
[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure
[ https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185392#comment-15185392 ] Jason Altekruse commented on DRILL-4482: [~acmeguy] Thanks for the quick response, I will continue to try to reproduce the failure. > Avro no longer selects data correctly from a sub-structure > -- > > Key: DRILL-4482 > URL: https://issues.apache.org/jira/browse/DRILL-4482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Stefán Baxter >Assignee: Stefán Baxter >Priority: Blocker > Fix For: 1.6.0 > > > Parquet: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/processed/<>/transactions` as s limit 1; > ++ > | EXPR$0 | > ++ > | 87.55.171.210 | > ++ > 1 row selected (1.184 seconds) > Avro: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/streaming/<>/transactions` as s limit 1; > +-+ > | EXPR$0 | > +-+ > | null| > +-+ > 1 row selected (0.29 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure
[ https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185380#comment-15185380 ] Stefán Baxter commented on DRILL-4482: -- I can pull the laster master/head to verify if this is still a problem. I will let you know once that is done On Tue, Mar 8, 2016 at 6:00 PM, Stefán Baxter> Avro no longer selects data correctly from a sub-structure > -- > > Key: DRILL-4482 > URL: https://issues.apache.org/jira/browse/DRILL-4482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Stefán Baxter >Assignee: Stefán Baxter >Priority: Blocker > Fix For: 1.6.0 > > > Parquet: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/processed/<>/transactions` as s limit 1; > ++ > | EXPR$0 | > ++ > | 87.55.171.210 | > ++ > 1 row selected (1.184 seconds) > Avro: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/streaming/<>/transactions` as s limit 1; > +-+ > | EXPR$0 | > +-+ > | null| > +-+ > 1 row selected (0.29 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure
[ https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185371#comment-15185371 ] Stefán Baxter commented on DRILL-4482: -- Yes I'm sure. I only included the limit there for the test. It returns null for all values and the underlying data includes no nulls. On Tue, Mar 8, 2016 at 5:58 PM, Jason Altekruse (JIRA)> Avro no longer selects data correctly from a sub-structure > -- > > Key: DRILL-4482 > URL: https://issues.apache.org/jira/browse/DRILL-4482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Stefán Baxter >Assignee: Stefán Baxter >Priority: Blocker > Fix For: 1.6.0 > > > Parquet: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/processed/<>/transactions` as s limit 1; > ++ > | EXPR$0 | > ++ > | 87.55.171.210 | > ++ > 1 row selected (1.184 seconds) > Avro: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/streaming/<>/transactions` as s limit 1; > +-+ > | EXPR$0 | > +-+ > | null| > +-+ > 1 row selected (0.29 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure
[ https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185364#comment-15185364 ] Jason Altekruse commented on DRILL-4482: [~acmeguy] I'm trying to reproduce this issue and not seeing it on a small avro file. There is no guarantee about read order when reading a directory, so running a limit 0 query over the same table in two formats (or even the same list of files two different times) will not be guaranteed to give the same result. Is transactions a directory or file without an extension? Are you sure that there are not null values in this column? Could you try to run a query with a predictable result like a max/min on the column or a limit with a sort? It is still possible that this is a Drill bug, and I will try with a distributed query to see if I can reproduce it, but if you have time to try to confirm any of these things it could help with creating a reproduction. > Avro no longer selects data correctly from a sub-structure > -- > > Key: DRILL-4482 > URL: https://issues.apache.org/jira/browse/DRILL-4482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.6.0 >Reporter: Stefán Baxter >Assignee: Stefán Baxter >Priority: Blocker > Fix For: 1.6.0 > > > Parquet: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/processed/<>/transactions` as s limit 1; > ++ > | EXPR$0 | > ++ > | 87.55.171.210 | > ++ > 1 row selected (1.184 seconds) > Avro: > 0: jdbc:drill:zk=local> select s.client_ip.ip from > dfs.asa.`/streaming/<>/transactions` as s limit 1; > +-+ > | EXPR$0 | > +-+ > | null| > +-+ > 1 row selected (0.29 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185328#comment-15185328 ] ASF GitHub Bot commented on DRILL-4474: --- Github user jaltekruse commented on the pull request: https://github.com/apache/drill/pull/415#issuecomment-193884035 Could you also close this PR and open a new one? the JIRA number was wrong in your commit so this is posting to the JIRA about incorrect creation if direct scans. The correct number is 4479 > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +--+ > | cnt | > +--+ > | 525 | > +--+ > 1 row selected (14.318 seconds) > {noformat} > {quote} > {color} > -- > CASE-5 (Correct result) >
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185318#comment-15185318 ] ASF GitHub Bot commented on DRILL-4474: --- Github user amansinha100 commented on the pull request: https://github.com/apache/drill/pull/415#issuecomment-193882461 Yes, I can do that. > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +--+ > | cnt | > +--+ > | 525 | > +--+ > 1 row selected (14.318 seconds) > {noformat} > {quote} > {color} > -- > CASE-5 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:>
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185313#comment-15185313 ] ASF GitHub Bot commented on DRILL-4474: --- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/415#issuecomment-193880193 Can you generate the test file as part of the test rather than check in static? > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +--+ > | cnt | > +--+ > | 525 | > +--+ > 1 row selected (14.318 seconds) > {noformat} > {quote} > {color} > -- > CASE-5 (Correct result) > --
[jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185306#comment-15185306 ] ASF GitHub Bot commented on DRILL-4474: --- GitHub user amansinha100 opened a pull request: https://github.com/apache/drill/pull/415 DRILL-4474: Use varchar for default column when all_text_mode is enab… …led. You can merge this pull request into a Git repository by running: $ git pull https://github.com/amansinha100/incubator-drill DRILL-4479 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/415.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #415 commit edfbf9bf0acd94fd0e8737f9162ca13281d00906 Author: Aman SinhaDate: 2016-03-08T17:27:32Z DRILL-4474: Use varchar for default column when all_text_mode is enabled. > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id =
[jira] [Assigned] (DRILL-4487) add unit test for DRILL-4449
[ https://issues.apache.org/jira/browse/DRILL-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim reassigned DRILL-4487: --- Assignee: Deneche A. Hakim (was: Aman Sinha) > add unit test for DRILL-4449 > > > Key: DRILL-4487 > URL: https://issues.apache.org/jira/browse/DRILL-4487 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Parquet >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim > Fix For: 1.7.0 > > > now that we have a simple reproduction, we should add a unit test to make > sure we don't regress -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4487) add unit test for DRILL-4449
[ https://issues.apache.org/jira/browse/DRILL-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated DRILL-4487: Assignee: Aman Sinha (was: Deneche A. Hakim) > add unit test for DRILL-4449 > > > Key: DRILL-4487 > URL: https://issues.apache.org/jira/browse/DRILL-4487 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Parquet >Reporter: Deneche A. Hakim >Assignee: Aman Sinha > Fix For: 1.7.0 > > > now that we have a simple reproduction, we should add a unit test to make > sure we don't regress -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4487) add unit test for DRILL-4449
[ https://issues.apache.org/jira/browse/DRILL-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185294#comment-15185294 ] ASF GitHub Bot commented on DRILL-4487: --- Github user adeneche commented on the pull request: https://github.com/apache/drill/pull/414#issuecomment-193876468 one easy way to reproduce the issue, and fail the unit test, is to change the [following line](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java#L716) from: ParquetTableMetadataBase metadata = parquetTableMetadata.clone(); to ParquetTableMetadataBase metadata = parquetTableMetadata; > add unit test for DRILL-4449 > > > Key: DRILL-4487 > URL: https://issues.apache.org/jira/browse/DRILL-4487 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Parquet >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim > Fix For: 1.7.0 > > > now that we have a simple reproduction, we should add a unit test to make > sure we don't regress -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4487) add unit test for DRILL-4449
[ https://issues.apache.org/jira/browse/DRILL-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185287#comment-15185287 ] ASF GitHub Bot commented on DRILL-4487: --- GitHub user adeneche opened a pull request: https://github.com/apache/drill/pull/414 DRILL-4487: add unit test for DRILL-4449 @amansinha100 can you please review ? thanks You can merge this pull request into a Git repository by running: $ git pull https://github.com/adeneche/incubator-drill DRILL-4487 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/414.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #414 commit b1f052d800bae05bbf36b3594fe3c171ea4cede4 Author: adenecheDate: 2016-03-08T15:54:31Z DRILL-4487: add unit test for DRILL-4449 > add unit test for DRILL-4449 > > > Key: DRILL-4487 > URL: https://issues.apache.org/jira/browse/DRILL-4487 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Parquet >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim > Fix For: 1.7.0 > > > now that we have a simple reproduction, we should add a unit test to make > sure we don't regress -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4313) C++ client - Improve method of drillbit selection from cluster
[ https://issues.apache.org/jira/browse/DRILL-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185220#comment-15185220 ] ASF GitHub Bot commented on DRILL-4313: --- Github user parthchandra closed the pull request at: https://github.com/apache/drill/pull/396 > C++ client - Improve method of drillbit selection from cluster > -- > > Key: DRILL-4313 > URL: https://issues.apache.org/jira/browse/DRILL-4313 > Project: Apache Drill > Issue Type: Improvement >Reporter: Parth Chandra >Assignee: Parth Chandra > Fix For: 1.6.0 > > > The current C++ client handles multiple parallel queries over the same > connection, but that creates a bottleneck as the queries get sent to the same > drillbit. > The client can manage this more effectively by choosing from a configurable > pool of connections and round robin queries to them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4313) C++ client - Improve method of drillbit selection from cluster
[ https://issues.apache.org/jira/browse/DRILL-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Parth Chandra resolved DRILL-4313. -- Resolution: Fixed Fixed in df0f0af3d963c1b65eb01c3141fe84532c53f5a5 > C++ client - Improve method of drillbit selection from cluster > -- > > Key: DRILL-4313 > URL: https://issues.apache.org/jira/browse/DRILL-4313 > Project: Apache Drill > Issue Type: Improvement >Reporter: Parth Chandra >Assignee: Parth Chandra > Fix For: 1.6.0 > > > The current C++ client handles multiple parallel queries over the same > connection, but that creates a bottleneck as the queries get sent to the same > drillbit. > The client can manage this more effectively by choosing from a configurable > pool of connections and round robin queries to them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4332) tests in TestFrameworkTest fail in Java 8
[ https://issues.apache.org/jira/browse/DRILL-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4332. Resolution: Fixed Fix Version/s: (was: Future) 1.6.0 Fixed in 447b093cd2b05bfeae001844a7e3573935e84389 > tests in TestFrameworkTest fail in Java 8 > - > > Key: DRILL-4332 > URL: https://issues.apache.org/jira/browse/DRILL-4332 > Project: Apache Drill > Issue Type: Sub-task > Components: Tools, Build & Test >Affects Versions: 1.5.0 >Reporter: Deneche A. Hakim >Assignee: Laurent Goujon > Fix For: 1.6.0 > > > the following unit tests fail in Java 8: > {noformat} > TestFrameworkTest.testRepeatedColumnMatching > TestFrameworkTest.testCSVVerificationOfOrder_checkFailure > {noformat} > The tests expect the query to fail with a specific error message. The message > generated by DrillTestWrapper.compareMergedVectors assumes a specific order > in a map keySet (which we shouldn't). In Java 8 it seems the order changed > which causes a slightly different error message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4486) Expression serializer incorrectly serializes escaped characters
[ https://issues.apache.org/jira/browse/DRILL-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4486. Resolution: Fixed Fix Version/s: 1.6.0 Fixed in 80316f3f8bef866720f99e609fe758ec8e0c4612 > Expression serializer incorrectly serializes escaped characters > --- > > Key: DRILL-4486 > URL: https://issues.apache.org/jira/browse/DRILL-4486 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Fix For: 1.6.0 > > > the drill expression parser requires backslashes to be escaped. But the > ExpressionStringBuilder is not properly escaping them. This causes problems, > especially in the case of regex expressions run with parallel execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4375) Fix the maven release profile, broken by jdbc jar size enforcer added in DRILL-4291
[ https://issues.apache.org/jira/browse/DRILL-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-4375. Resolution: Fixed Fix Version/s: 1.6.0 Fixed in 1f29914fc5c7d1e36651ac28167804c4012501fe > Fix the maven release profile, broken by jdbc jar size enforcer added in > DRILL-4291 > --- > > Key: DRILL-4375 > URL: https://issues.apache.org/jira/browse/DRILL-4375 > Project: Apache Drill > Issue Type: Bug >Reporter: Jason Altekruse >Assignee: Jason Altekruse > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting
[ https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse reassigned DRILL-2048: -- Assignee: Jason Altekruse > Malformed drill stoage config stored in zookeeper will prevent Drill from > starting > -- > > Key: DRILL-2048 > URL: https://issues.apache.org/jira/browse/DRILL-2048 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Reporter: Jason Altekruse >Assignee: Jason Altekruse > Fix For: 1.7.0 > > > We noticed this problem while trying to test dev builds on a common cluster. > When applying changes that added a field to the configuration of a storage > plugin, the new format of the configuration would be persisted in zookeeper. > When a different dev build that did not include the change set tried to be > deployed on the same cluster the config stored in zookeeper would fail to > parse and the drillbit would not be able to start. This is not system > critical configuration so the drillbit should be able to still start with the > plugin disabled. > This fix could also include changing the jackson mapper to allow ignoring > unexpected fields in the configuration. This would give a little better > chance for interoperability between future versions of Drill as we add new > configuration options as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting
[ https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse updated DRILL-2048: --- Fix Version/s: (was: Future) > Malformed drill stoage config stored in zookeeper will prevent Drill from > starting > -- > > Key: DRILL-2048 > URL: https://issues.apache.org/jira/browse/DRILL-2048 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Reporter: Jason Altekruse >Assignee: Jason Altekruse > Fix For: 1.7.0 > > > We noticed this problem while trying to test dev builds on a common cluster. > When applying changes that added a field to the configuration of a storage > plugin, the new format of the configuration would be persisted in zookeeper. > When a different dev build that did not include the change set tried to be > deployed on the same cluster the config stored in zookeeper would fail to > parse and the drillbit would not be able to start. This is not system > critical configuration so the drillbit should be able to still start with the > plugin disabled. > This fix could also include changing the jackson mapper to allow ignoring > unexpected fields in the configuration. This would give a little better > chance for interoperability between future versions of Drill as we add new > configuration options as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting
[ https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse updated DRILL-2048: --- Fix Version/s: 1.7.0 > Malformed drill stoage config stored in zookeeper will prevent Drill from > starting > -- > > Key: DRILL-2048 > URL: https://issues.apache.org/jira/browse/DRILL-2048 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Reporter: Jason Altekruse >Assignee: Jason Altekruse > Fix For: 1.7.0 > > > We noticed this problem while trying to test dev builds on a common cluster. > When applying changes that added a field to the configuration of a storage > plugin, the new format of the configuration would be persisted in zookeeper. > When a different dev build that did not include the change set tried to be > deployed on the same cluster the config stored in zookeeper would fail to > parse and the drillbit would not be able to start. This is not system > critical configuration so the drillbit should be able to still start with the > plugin disabled. > This fix could also include changing the jackson mapper to allow ignoring > unexpected fields in the configuration. This would give a little better > chance for interoperability between future versions of Drill as we add new > configuration options as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4449) Wrong results when using metadata cache with specific set of queries
[ https://issues.apache.org/jira/browse/DRILL-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185131#comment-15185131 ] Deneche A. Hakim commented on DRILL-4449: - a clarification here. To have a reproduction the table must be partitioned and referenced in both inner queries with different filters. The filters need to trigger a parquet partition pruning and leave more than one file after the pruning. > Wrong results when using metadata cache with specific set of queries > > > Key: DRILL-4449 > URL: https://issues.apache.org/jira/browse/DRILL-4449 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.5.0 >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim >Priority: Critical > Fix For: 1.6.0 > > > We are still working on a reproduction but when we have a query similar to > this one: > {noformat} > with q1 as ( > select a.field > from `table` a > where > group by a.field > having ... > ) > , q2 as ( > select a.field > from `table` a > where > group by a.field > ) > select * from ( > select count(*) as cnt from q1 > union all > select count(*) as cnt from q2 > ); > {noformat} > The table is partitioned and both sub queries will force a parquet pruning on > the table. Because we share the parquet metadata object in ParquetGroupScan, > the second query end up being "over pruned" and we get wrong results. > The plan doesn't show the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4487) add unit test for DRILL-4449
[ https://issues.apache.org/jira/browse/DRILL-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated DRILL-4487: Summary: add unit test for DRILL-4449 (was: add unit test fro DRILL-4449) > add unit test for DRILL-4449 > > > Key: DRILL-4487 > URL: https://issues.apache.org/jira/browse/DRILL-4487 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Parquet >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim > Fix For: 1.7.0 > > > now that we have a simple reproduction, we should add a unit test to make > sure we don't regress -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4487) add unit test fro DRILL-4449
Deneche A. Hakim created DRILL-4487: --- Summary: add unit test fro DRILL-4449 Key: DRILL-4487 URL: https://issues.apache.org/jira/browse/DRILL-4487 Project: Apache Drill Issue Type: Sub-task Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Fix For: 1.7.0 now that we have a simple reproduction, we should add a unit test to make sure we don't regress -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4449) Wrong results when using metadata cache with specific set of queries
[ https://issues.apache.org/jira/browse/DRILL-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185043#comment-15185043 ] Deneche A. Hakim commented on DRILL-4449: - I was able to create a reproduction of the issue, in case it's later needed for validation: create a partitioned table: {noformat} CREATE TABLE dfs.tmp.t PARTITION BY(l_discount) AS SELECT * FROM cp.`tpch/lineitem.parquet`; {noformat} The following query will give wrong results if the table has a metadata cache file: {noformat} SELECT COUNT(*) FROM ( SELECT l_orderkey FROM dfs.tmp.t WHERE l_discount < 0.05 UNION ALL SELECT l_orderkey FROM dfs.tmp.t WHERE l_discount > 0.02 ); {noformat} > Wrong results when using metadata cache with specific set of queries > > > Key: DRILL-4449 > URL: https://issues.apache.org/jira/browse/DRILL-4449 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.5.0 >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim >Priority: Critical > Fix For: 1.6.0 > > > We are still working on a reproduction but when we have a query similar to > this one: > {noformat} > with q1 as ( > select a.field > from `table` a > where > group by a.field > having ... > ) > , q2 as ( > select a.field > from `table` a > where > group by a.field > ) > select * from ( > select count(*) as cnt from q1 > union all > select count(*) as cnt from q2 > ); > {noformat} > The table is partitioned and both sub queries will force a parquet pruning on > the table. Because we share the parquet metadata object in ParquetGroupScan, > the second query end up being "over pruned" and we get wrong results. > The plan doesn't show the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4443) MIN/MAX on VARCHAR throw a NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184695#comment-15184695 ] ASF GitHub Bot commented on DRILL-4443: --- Github user adeneche closed the pull request at: https://github.com/apache/drill/pull/409 > MIN/MAX on VARCHAR throw a NullPointerException > --- > > Key: DRILL-4443 > URL: https://issues.apache.org/jira/browse/DRILL-4443 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.6.0 > Environment: 4 node cluster CentOS >Reporter: Khurram Faraaz >Assignee: Deneche A. Hakim >Priority: Critical > Fix For: 1.6.0 > > Attachments: DRILL_4443.parquet, test4443.csv > > > Using a simple csv file that contains at least 2 groups of rows: > {noformat} > a, > a, > a, > b, > {noformat} > Running a query with min/max throws a NullPointerException: > {noformat} > SELECT MIN(columns[1]) FROM `test4443.csv` GROUP BY columns[0]; > Error: SYSTEM ERROR: NullPointerException > ... > {noformat} > {noformat} > SELECT MAX(columns[1]) FROM `test4443.csv` GROUP BY columns[0]; > Error: SYSTEM ERROR: NullPointerException > ... > {noformat} > The problem is caused by {{VarCharAggrFunctions.java}} that is not reseting > it's internal buffer properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4443) MIN/MAX on VARCHAR throw a NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim reassigned DRILL-4443: --- Assignee: Deneche A. Hakim (was: Hanifi Gunes) > MIN/MAX on VARCHAR throw a NullPointerException > --- > > Key: DRILL-4443 > URL: https://issues.apache.org/jira/browse/DRILL-4443 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.6.0 > Environment: 4 node cluster CentOS >Reporter: Khurram Faraaz >Assignee: Deneche A. Hakim >Priority: Critical > Fix For: 1.6.0 > > Attachments: DRILL_4443.parquet, test4443.csv > > > Using a simple csv file that contains at least 2 groups of rows: > {noformat} > a, > a, > a, > b, > {noformat} > Running a query with min/max throws a NullPointerException: > {noformat} > SELECT MIN(columns[1]) FROM `test4443.csv` GROUP BY columns[0]; > Error: SYSTEM ERROR: NullPointerException > ... > {noformat} > {noformat} > SELECT MAX(columns[1]) FROM `test4443.csv` GROUP BY columns[0]; > Error: SYSTEM ERROR: NullPointerException > ... > {noformat} > The problem is caused by {{VarCharAggrFunctions.java}} that is not reseting > it's internal buffer properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4453) Difference in results over char data, window function query
[ https://issues.apache.org/jira/browse/DRILL-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184652#comment-15184652 ] Deneche A. Hakim commented on DRILL-4453: - [~khfaraaz] how about the results, are they still different ? > Difference in results over char data, window function query > --- > > Key: DRILL-4453 > URL: https://issues.apache.org/jira/browse/DRILL-4453 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.6.0 > Environment: 4 node cluster >Reporter: Khurram Faraaz >Assignee: Khurram Faraaz > Labels: window_function > Attachments: t_alltype.csv, t_alltype.parquet > > > Window function query with frame clause returns results that are different > from those returned by same query on Postgres 9.3 of same data. > Note that the two tables have same number of nulls in both Drill and Postgres. > The length of the result returned by MIN function is different on Postgres > 9.3 vs Drill 1.6.0 > Drill 1.6.0 => returns 1 as length. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select length(min(c4)) from dfs.tmp.`t_alltype`; > +-+ > | EXPR$0 | > +-+ > | 1 | > +-+ > 1 row selected (0.282 seconds) > {noformat} > Postgress 9.3 returns 0 as length. > {noformat} > postgres=# select length(min(c4)) from t_alltype; > length > > 0 > (1 row) > {noformat} > {noformat} > postgres=# \d t_alltype > Table "public.t_alltype" > Column |Type | Modifiers > +-+--- > c1 | integer | > c2 | integer | > c3 | bigint | > c4 | character(256) | > c5 | character varying(256) | > c6 | timestamp without time zone | > c7 | date| > c8 | boolean | > c9 | double precision| > postgres=# select c4 from t_alltype where c4 is null; > c4 > > (3 rows) > {noformat} > {noformat} > postgres=# SELECT MIN(c4) OVER(PARTITION BY c8 ORDER BY c1 ROWS BETWEEN > UNBOUNDED PRECEDING AND CURRENT ROW) FROM t_alltype; > > min > -- > gwfrW > ZAFOcferhjkcl > ZAFOcferhjkcl > ZAFOcferhjkcl > ZAFOcferhjkcl > ... > ... > > ApKK > ApKK > (145 rows) > {noformat} > Parquet schema details > {noformat} > [root@centos-01 parquet-tools]# ./parquet-schema > ./Datasources/window_functions/t_alltype.parquet > message root { > optional int32 c1; > optional int32 c2; > optional int64 c3; > optional binary c4 (UTF8); > optional binary c5 (UTF8); > optional int64 c6 (TIMESTAMP_MILLIS); > optional int32 c7 (DATE); > optional boolean c8; > optional double c9; > } > {noformat} > On Drill 1.6.0 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT MIN(c4) OVER(PARTITION BY c8 ORDER BY c1 > ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM dfs.tmp.`t_alltype`; > ++ > | EXPR$0 | > ++ > | gwfrW | > | ZAFOcferhjkcl | > | ZAFOcferhjkcl | > | ZAFOcferhjkcl | > | ZAFOcferhjkcl | > ... > ... > | ApKK | > | ApKK | > | | > | | > | | > | | > | | > | | > | | > | | > | | > | | > | null | > | null | > | | > | | > | | > +--+ > 145 rows selected (0.409 seconds) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)