[GitHub] drill pull request #584: DRILL-4884: Fix bug that drill sometimes produced I...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/584#discussion_r84831079 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java --- @@ -301,7 +301,7 @@ public IterOutcome next() { "Incoming batch [#%d, %s] has an empty schema. This is not allowed.", instNum, batchTypeName)); } -if (incoming.getRecordCount() > MAX_BATCH_SIZE) { +if (incoming.getRecordCount() >= MAX_BATCH_SIZE) { --- End diff -- Drill requires that batch with no selection vector(SV), and batch with SV2 is bounded by 65536. This requirement is valid across the entire Drill code base. What this IteratorVAlidator tries to enforce is to make sure every incoming batch meet this requirement, when assertion is enabled. However, it's each operator's responsibility to enforce this. For instance, as you saw, each reader in Drill should produce a batch no larger than 65536. If you develop a new storage plugin with a new reader, then the new reader should enforce this rule as well. Therefore, in your situation where you develop a new reader, the right approach is that you need make sure reader produces batch no larger than this threshold. The original code IteratorValidatorBatchIterator.java should be fine. For the repo I tried, I feel the fix should be in LimitRecordBatch.java. As you indicated earlier, the index "i" is defined as char, which is not right. Would you like to modify your patch ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-4961) Schema change error due to a missing column in a Json file
Boaz Ben-Zvi created DRILL-4961: --- Summary: Schema change error due to a missing column in a Json file Key: DRILL-4961 URL: https://issues.apache.org/jira/browse/DRILL-4961 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.8.0 Reporter: Boaz Ben-Zvi A missing column in a batch defaults to a (hard coded) nullable INT (e.g., see line 128 in ExpressionTreeMaterializer.java), which can cause a schema conflict when that column in another batch has a conflicting type (e.g. VARCHAR). To recreate (the following test also created DRILL-4960 ; which may be related) : Run a parallel aggregation over two small Json files (e.g. copy twice contrib/storage-mongo/src/test/resources/emp.json ) where in one of the files a whole column was eliminated (e.g. "last_name"). 0: jdbc:drill:zk=local> alter session set planner.slice_target = 1; +---++ | ok |summary | +---++ | true | planner.slice_target updated. | +---++ 1 row selected (0.091 seconds) 0: jdbc:drill:zk=local> select first_name, last_name from `drill/data/emp` group by first_name, last_name; Error: SYSTEM ERROR: SchemaChangeException: Incoming batches for merging receiver have different schemas! Fragment 1:0 [Error Id: 1315ddc5-5c31-404f-917b-c7a082d016cf on 10.250.57.63:31010] (state=,code=0) The above used a streaming aggregation; when switching to hash aggregation the same error manifests differently: 0: jdbc:drill:zk=local> alter session set `planner.enable_streamagg` = false; +---++ | ok | summary | +---++ | true | planner.enable_streamagg updated. | +---++ 1 row selected (0.083 seconds) 0: jdbc:drill:zk=local> select first_name, last_name from `drill/data/emp` group by first_name, last_name; Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.drill.exec.vector.NullableVarCharVector, field= last_name(VARCHAR:OPTIONAL)[$bits$(UINT1:REQUIRED), last_name(VARCHAR:OPTIONAL)[$offsets$(UINT4:REQUIRED)]] Fragment 2:0 [Error Id: 58d0-3bfe-4197-b4bd-44f9d7604d77 on 10.250.57.63:31010] (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4954) allTextMode in the MapRDB plugin always return nulls
[ https://issues.apache.org/jira/browse/DRILL-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam resolved DRILL-4954. Resolution: Fixed Fixed in [4efc9f2|https://github.com/apache/drill/commit/4efc9f248ef7ef4b86660a1a73a9f44662c082ba] > allTextMode in the MapRDB plugin always return nulls > > > Key: DRILL-4954 > URL: https://issues.apache.org/jira/browse/DRILL-4954 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MapRDB >Affects Versions: 1.8.0 > Environment: MapRDB >Reporter: Boaz Ben-Zvi >Assignee: Smidth Panchamia > Fix For: 1.9.0 > > > Setting the "allTextMode" option to "true" in the MapR fs plugin, like: > "formats": { > "maprdb": { > "type": "maprdb", > "allTextMode": true > } > makes the returned results null. Here’s an example: > << default plugin, unchanged >> > 0: jdbc:drill:> use mfs.tpch_sf1_maprdb_json; > +---++ > | ok |summary | > +---++ > | true | Default schema changed to [mfs1.tpch_sf1_maprdb_json] | > +---++ > 1 row selected (0.153 seconds) > 0: jdbc:drill:> select typeof(N_REGIONKEY) from nation limit 1; > +-+ > | EXPR$0 | > +-+ > | BIGINT | > +-+ > 1 row selected (0.206 seconds) > 0: jdbc:drill:> select N_REGIONKEY from nation limit 2; > +--+ > | N_REGIONKEY | > +--+ > | 0| > | 2| > +--+ > 2 rows selected (0.254 seconds) > << plugin changed to all text mode (as shown above) >> > 0: jdbc:drill:> select typeof(N_REGIONKEY) from nation limit 1; > +-+ > | EXPR$0 | > +-+ > | NULL| > +-+ > 1 row selected (0.321 seconds) > 0: jdbc:drill:> select N_REGIONKEY from nation limit 2; > +--+ > | N_REGIONKEY | > +--+ > | null | > | null | > +--+ > 2 rows selected (0.25 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4894) Fix unit test failure in 'storage-hive/core' module
[ https://issues.apache.org/jira/browse/DRILL-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam resolved DRILL-4894. Resolution: Fixed Fix Version/s: 1.9.0 Fixed in [f3c26e3|https://github.com/apache/drill/commit/f3c26e34e3a72ef338c4dbca1a0204f342176972] > Fix unit test failure in 'storage-hive/core' module > --- > > Key: DRILL-4894 > URL: https://issues.apache.org/jira/browse/DRILL-4894 > Project: Apache Drill > Issue Type: Bug >Reporter: Aditya Kishore >Assignee: Aditya Kishore > Fix For: 1.9.0 > > > As part of DRILL-4886, I added `hbase-server` as a dependency for > 'storage-hive/core' which pulled older version (2.5.1) of some hadoop jars, > incompatible with other hadoop jars used by drill (2.7.1). > This breaks unit tests in this module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3178) csv reader should allow newlines inside quotes
[ https://issues.apache.org/jira/browse/DRILL-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam resolved DRILL-3178. Resolution: Fixed Fix Version/s: (was: Future) 1.9.0 Fixed in [42948fe|https://github.com/apache/drill/commit/42948feb4a45f98f3d116d2e2a765cc3fadb5937] > csv reader should allow newlines inside quotes > --- > > Key: DRILL-3178 > URL: https://issues.apache.org/jira/browse/DRILL-3178 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.0.0 > Environment: Ubuntu Trusty 14.04.2 LTS >Reporter: Neal McBurnett >Assignee: F Méthot > Fix For: 1.9.0 > > Attachments: drill-3178.patch > > > When reading a csv file which contains newlines within quoted strings, e.g. > via > select * from dfs.`/tmp/q.csv`; > Drill 1.0 says: > Error: SYSTEM ERROR: com.univocity.parsers.common.TextParsingException: > Error processing input: Cannot use newline character within quoted string > But many tools produce csv files with newlines in quoted strings. Drill > should be able to handle them. > Workaround: the csvquote program (https://github.com/dbro/csvquote) can > encode embedded commas and newlines, and even decode them later if desired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4653) Malformed JSON should not stop the entire query from progressing
[ https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam resolved DRILL-4653. Resolution: Fixed Fix Version/s: (was: Future) 1.9.0 Fixed in [db48298|https://github.com/apache/drill/commit/db48298920575cb1c2283e03bdfc7b50e83ae217] > Malformed JSON should not stop the entire query from progressing > > > Key: DRILL-4653 > URL: https://issues.apache.org/jira/browse/DRILL-4653 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.6.0 >Reporter: subbu srinivasan > Fix For: 1.9.0 > > > Currently Drill query terminates upon first encounter of a invalid JSON line. > Drill has to continue progressing after ignoring the bad records. Something > similar to a setting of (ignore.malformed.json) would help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4369) Database driver fails to report any major or minor version information
[ https://issues.apache.org/jira/browse/DRILL-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam resolved DRILL-4369. Resolution: Fixed > Database driver fails to report any major or minor version information > -- > > Key: DRILL-4369 > URL: https://issues.apache.org/jira/browse/DRILL-4369 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.4.0 >Reporter: N Campbell >Assignee: Laurent Goujon > Fix For: 1.9.0 > > > Using Apache 1.4 Drill > The DatabaseMetadata.getters to obtain the Major and Minor versions of the > server or JDBC driver return 0 instead of 1.4. > This prevents an application from dynamically adjusting how it interacts > based on which version of Drill a connection is accessing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill pull request #584: DRILL-4884: Fix bug that drill sometimes produced I...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/584#discussion_r84801937 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/IteratorValidatorBatchIterator.java --- @@ -301,7 +301,7 @@ public IterOutcome next() { "Incoming batch [#%d, %s] has an empty schema. This is not allowed.", instNum, batchTypeName)); } -if (incoming.getRecordCount() > MAX_BATCH_SIZE) { +if (incoming.getRecordCount() >= MAX_BATCH_SIZE) { --- End diff -- I'm not sure if this is the right fix for this IOB problem. 1. IteratorValidator is only inserted when assertion is enabled [1]. Fixing only in IteratorValidatorBatchIterator will mean that the issue will be still there if assertion is disabled. 2. Even we do turn on assertion, will the query hit IllegalStateException, in stead of IOB? Can you try run the query when assertion is off / on, and see if the query is successful in both cases? [1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ImplCreator.java#L72-L74 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
isDateCorrect field in ParquetTableMetadata
Hello All, DRILL-4203 addressed the date field issue. In the fix, it introduced a new field in ParquetTableMetadata_v2 : isDateCorrect. I have some difficulty in understanding the meaning of this field. According to [1], this field is set to false, when Drill gets parquet metadata from parquet footer. This field is set to true in code flow of [2] and [3], when Drill gets parquet metadata from meta data cache. Questions I have: 1. If the parquet files are generated with Drill after DRILL-4203, Drill still thinks date field is NOT correct (isDateCorrect = false)? 2. Why does this filed have nothing to do with "autoCorrection" flag [4]? If someone turns off autoCorrection, will it have impact on this "isDateCorrect" flag ? Thanks in advance for any input, Jinfeng [1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L932 [2] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L936 [3] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L187 [4] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L354-L355
[jira] [Created] (DRILL-4959) Drill 8.1 not able to connect to S3
Gopal Nagar created DRILL-4959: -- Summary: Drill 8.1 not able to connect to S3 Key: DRILL-4959 URL: https://issues.apache.org/jira/browse/DRILL-4959 Project: Apache Drill Issue Type: Bug Affects Versions: 1.8.0 Reporter: Gopal Nagar Hi Team, I have followed below details to integrate Drill with AWS S3. Query keep running for hours and doesn't display any output (I am querying only 2 row file from S3). Reference link --- https://abhishek-tiwari.com/post/reflections-on-apache-drill https://drill.apache.org/docs/s3-storage-plugin/ Query Format (Tried from UI & CLI) select * from `s3`.`hive.csv` LIMIT 10; select * from `s3`.`bucket_name/hive.csv` LIMIT 10; After seeing below log, I tried including jets3t-0.9.3.jar in jars directory but it doesn't fix my problem. Log Details -- 2016-10-24 17:00:02,461 [27f1c1ec-d82e-ba2a-2840-e7104320418f:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id 27f1c1ec-d82e-ba2a-2840-e7104320418f: select * from `s3`.`hive.csv` LIMIT 10 2016-10-24 17:00:02,479 [drill-executor-39] ERROR o.a.d.exec.server.BootStrapContext - org.apache.drill.exec.work.foreman.Foreman.run() leaked an exception. java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4958) Union All qiery fails stating that parquet files are schema less
Khurram Faraaz created DRILL-4958: - Summary: Union All qiery fails stating that parquet files are schema less Key: DRILL-4958 URL: https://issues.apache.org/jira/browse/DRILL-4958 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.9.0 Reporter: Khurram Faraaz UNION ALL query over parquet files fails and reports that the Union-All was over schema-less tables. Parquet files are not with out a schema, they do have metadata. We need to fix this. Postgres returns results for the same query on same data. Drill 1.9.0 git commit ID : a29f1e29 {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `t_alltype.parquet` t1 union all select * from `t_alltype.parquet` t2; Error: UNSUPPORTED_OPERATION ERROR: Union-All over schema-less tables must specify the columns explicitly See Apache Drill JIRA: DRILL-2414 [Error Id: b6069bdc-8697-4578-a799-802ff3e80f00 on centos-01.qa.lab:31010] (state=,code=0) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)