[jira] [Commented] (DRILL-6204) Pass tables columns without partition columns to empty Hive reader
[ https://issues.apache.org/jira/browse/DRILL-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384792#comment-16384792 ] ASF GitHub Bot commented on DRILL-6204: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1146 > Pass tables columns without partition columns to empty Hive reader > -- > > Key: DRILL-6204 > URL: https://issues.apache.org/jira/browse/DRILL-6204 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.12.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > When {{store.hive.optimize_scan_with_native_readers}} is enabled, > {{HiveDrillNativeScanBatchCreator}} is used to read data from Hive tables > directly from file system. In case when table is empty or no row group are > matched, empty {{HiveDefaultReader}} is called to output the schema. > If such situation happens, currently Drill fails with the following error: > {noformat} > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > NullPointerException Setup failed for null > {noformat} > This happens because instead of passing only table columns to the empty > reader (as we do when creating non-empty reader), we passed all columns which > may contain partition columns as well. Readers fails to find partition column > in table schema. As mentioned in on lines 81 - 82 in > {{HiveDrillNativeScanBatchCreator}}, we deliberately separate out partition > columns and table columns to pass partition columns separately: > {noformat} > // Separate out the partition and non-partition columns. Non-partition > columns are passed directly to the > // ParquetRecordReader. Partition columns are passed to ScanBatch. > {noformat} > To fix the problem we need to pass table columns instead of all columns. > {code:java} > if (readers.size() == 0) { > readers.add(new HiveDefaultReader(table, null, null, newColumns, > context, conf, > ImpersonationUtil.createProxyUgi(config.getUserName(), > context.getQueryUserName(; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6204) Pass tables columns without partition columns to empty Hive reader
[ https://issues.apache.org/jira/browse/DRILL-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384590#comment-16384590 ] ASF GitHub Bot commented on DRILL-6204: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1146 +1 > Pass tables columns without partition columns to empty Hive reader > -- > > Key: DRILL-6204 > URL: https://issues.apache.org/jira/browse/DRILL-6204 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.12.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > When {{store.hive.optimize_scan_with_native_readers}} is enabled, > {{HiveDrillNativeScanBatchCreator}} is used to read data from Hive tables > directly from file system. In case when table is empty or no row group are > matched, empty {{HiveDefaultReader}} is called to output the schema. > If such situation happens, currently Drill fails with the following error: > {noformat} > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > NullPointerException Setup failed for null > {noformat} > This happens because instead of passing only table columns to the empty > reader (as we do when creating non-empty reader), we passed all columns which > may contain partition columns as well. Readers fails to find partition column > in table schema. As mentioned in on lines 81 - 82 in > {{HiveDrillNativeScanBatchCreator}}, we deliberately separate out partition > columns and table columns to pass partition columns separately: > {noformat} > // Separate out the partition and non-partition columns. Non-partition > columns are passed directly to the > // ParquetRecordReader. Partition columns are passed to ScanBatch. > {noformat} > To fix the problem we need to pass table columns instead of all columns. > {code:java} > if (readers.size() == 0) { > readers.add(new HiveDefaultReader(table, null, null, newColumns, > context, conf, > ImpersonationUtil.createProxyUgi(config.getUserName(), > context.getQueryUserName(; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6204) Pass tables columns without partition columns to empty Hive reader
[ https://issues.apache.org/jira/browse/DRILL-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383965#comment-16383965 ] ASF GitHub Bot commented on DRILL-6204: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1146 @vdiravka, thanks for the code review. Updated PR. > Pass tables columns without partition columns to empty Hive reader > -- > > Key: DRILL-6204 > URL: https://issues.apache.org/jira/browse/DRILL-6204 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.12.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > > When {{store.hive.optimize_scan_with_native_readers}} is enabled, > {{HiveDrillNativeScanBatchCreator}} is used to read data from Hive tables > directly from file system. In case when table is empty or no row group are > matched, empty {{HiveDefaultReader}} is called to output the schema. > If such situation happens, currently Drill fails with the following error: > {noformat} > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > NullPointerException Setup failed for null > {noformat} > This happens because instead of passing only table columns to the empty > reader (as we do when creating non-empty reader), we passed all columns which > may contain partition columns as well. Readers fails to find partition column > in table schema. As mentioned in on lines 81 - 82 in > {{HiveDrillNativeScanBatchCreator}}, we deliberately separate out partition > columns and table columns to pass partition columns separately: > {noformat} > // Separate out the partition and non-partition columns. Non-partition > columns are passed directly to the > // ParquetRecordReader. Partition columns are passed to ScanBatch. > {noformat} > To fix the problem we need to pass table columns instead of all columns. > {code:java} > if (readers.size() == 0) { > readers.add(new HiveDefaultReader(table, null, null, newColumns, > context, conf, > ImpersonationUtil.createProxyUgi(config.getUserName(), > context.getQueryUserName(; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6204) Pass tables columns without partition columns to empty Hive reader
[ https://issues.apache.org/jira/browse/DRILL-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383861#comment-16383861 ] ASF GitHub Bot commented on DRILL-6204: --- Github user vdiravka commented on a diff in the pull request: https://github.com/apache/drill/pull/1146#discussion_r171912800 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java --- @@ -174,7 +174,7 @@ public ScanBatch getBatch(ExecutorFragmentContext context, HiveDrillNativeParque // If there are no readers created (which is possible when the table is empty or no row groups are matched), // create an empty RecordReader to output the schema if (readers.size() == 0) { - readers.add(new HiveDefaultReader(table, null, null, columns, context, conf, + readers.add(new HiveDefaultReader(table, null, null, newColumns, context, conf, --- End diff -- Could we rename newColumns -> nonPartitionedColumns or tableColumns? > Pass tables columns without partition columns to empty Hive reader > -- > > Key: DRILL-6204 > URL: https://issues.apache.org/jira/browse/DRILL-6204 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.12.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > > When {{store.hive.optimize_scan_with_native_readers}} is enabled, > {{HiveDrillNativeScanBatchCreator}} is used to read data from Hive tables > directly from file system. In case when table is empty or no row group are > matched, empty {{HiveDefaultReader}} is called to output the schema. > If such situation happens, currently Drill fails with the following error: > {noformat} > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > NullPointerException Setup failed for null > {noformat} > This happens because instead of passing only table columns to the empty > reader (as we do when creating non-empty reader), we passed all columns which > may contain partition columns as well. Readers fails to find partition column > in table schema. As mentioned in on lines 81 - 82 in > {{HiveDrillNativeScanBatchCreator}}, we deliberately separate out partition > columns and table columns to pass partition columns separately: > {noformat} > // Separate out the partition and non-partition columns. Non-partition > columns are passed directly to the > // ParquetRecordReader. Partition columns are passed to ScanBatch. > {noformat} > To fix the problem we need to pass table columns instead of all columns. > {code:java} > if (readers.size() == 0) { > readers.add(new HiveDefaultReader(table, null, null, newColumns, > context, conf, > ImpersonationUtil.createProxyUgi(config.getUserName(), > context.getQueryUserName(; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6204) Pass tables columns without partition columns to empty Hive reader
[ https://issues.apache.org/jira/browse/DRILL-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383594#comment-16383594 ] ASF GitHub Bot commented on DRILL-6204: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1146 @parthchandra, @vdiravka please review. > Pass tables columns without partition columns to empty Hive reader > -- > > Key: DRILL-6204 > URL: https://issues.apache.org/jira/browse/DRILL-6204 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.12.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > > When {{store.hive.optimize_scan_with_native_readers}} is enabled, > {{HiveDrillNativeScanBatchCreator}} is used to read data from Hive tables > directly from file system. In case when table is empty or no row group are > matched, empty {{HiveDefaultReader}} is called to output the schema. > If such situation happens, currently Drill fails with the following error: > {noformat} > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > NullPointerException Setup failed for null > {noformat} > This happens because instead of passing only table columns to the empty > reader (as we do when creating non-empty reader), we passed all columns which > may contain partition columns as well. As mentioned in on lines 81 - 82 in > {{HiveDrillNativeScanBatchCreator}} , we deliberately separate out partition > columns and table columns to pass partition columns separately: > {noformat} > // Separate out the partition and non-partition columns. Non-partition > columns are passed directly to the > // ParquetRecordReader. Partition columns are passed to ScanBatch. > {noformat} > To fix the problem we need to pass table columns instead of all columns. > {code:java} > if (readers.size() == 0) { > readers.add(new HiveDefaultReader(table, null, null, newColumns, > context, conf, > ImpersonationUtil.createProxyUgi(config.getUserName(), > context.getQueryUserName(; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6204) Pass tables columns without partition columns to empty Hive reader
[ https://issues.apache.org/jira/browse/DRILL-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383571#comment-16383571 ] ASF GitHub Bot commented on DRILL-6204: --- GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/1146 DRILL-6204: Pass tables columns without partition columns to empty Hi… …ve reader Details in [DRILL-6204](https://issues.apache.org/jira/browse/DRILL-6204). You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-6204 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1146.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1146 commit 50dd97c612645d025dc3fa77795e142eabee2b70 Author: Arina IelchiievaDate: 2018-03-02T11:38:00Z DRILL-6204: Pass tables columns without partition columns to empty Hive reader > Pass tables columns without partition columns to empty Hive reader > -- > > Key: DRILL-6204 > URL: https://issues.apache.org/jira/browse/DRILL-6204 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.12.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.13.0 > > > When {{store.hive.optimize_scan_with_native_readers}} is enabled, > {{HiveDrillNativeScanBatchCreator}} is used to read data from Hive tables > directly from file system. In case when table is empty or no row group are > matched, empty {{HiveDefaultReader}} is called to output the schema. > If such situation happens, currently Drill fails with the following error: > {noformat} > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > NullPointerException Setup failed for null > {noformat} > This happens because instead of passing only table columns to the empty > reader (as we do when creating non-empty reader), we passed all columns which > may contain partition columns as well. As mentioned in on lines 81 - 82 in > {{HiveDrillNativeScanBatchCreator}} , we deliberately separate out partition > columns and table columns to pass partition columns separately: > {noformat} > // Separate out the partition and non-partition columns. Non-partition > columns are passed directly to the > // ParquetRecordReader. Partition columns are passed to ScanBatch. > {noformat} > To fix the problem we need to pass table columns instead of all columns. > {code:java} > if (readers.size() == 0) { > readers.add(new HiveDefaultReader(table, null, null, newColumns, > context, conf, > ImpersonationUtil.createProxyUgi(config.getUserName(), > context.getQueryUserName(; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)