[jira] [Created] (DRILL-6776) Drill Web UI takes long time for first time load in network isolated environment
Igor Guzenko created DRILL-6776: --- Summary: Drill Web UI takes long time for first time load in network isolated environment Key: DRILL-6776 URL: https://issues.apache.org/jira/browse/DRILL-6776 Project: Apache Drill Issue Type: Bug Components: Web Server Affects Versions: 1.14.0 Reporter: Igor Guzenko Assignee: Igor Guzenko Fix For: 1.15.0 In the case when cluster is built upon network isolated from internet, First-time loading of UI takes about 25 seconds. Where 15 seconds browser spent on waiting while request for /ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js timed out. JQuery was added to static resources in scope of DRILL-5699, but dependency on google's cdn wasn't fully removed. So the static resource was added as a [fallback|https://stackoverflow.com/questions/1014203/best-way-to-use-googles-hosted-jquery-but-fall-back-to-my-hosted-library-on-go] . I guess the main reason why fallback solution was applied is that there is a higher probability that library from google's mdn may be found & loaded from browser's cache. Unfortunately this graceful solution doesn't work for really isolated environment, that's why we need to fully remove the /ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js dependency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683489#comment-16683489 ] Igor Guzenko commented on DRILL-786: Documentation note: Due to it's nature cross joins can produce extremely large results, and we don't recommend to use the feature if you don't know that results won't cause out of memory errors. That's why cross joins are disabled by default, to allow explicit cross join syntax you'll have to enable it by setting _*planner.enable_nljoin_for_scalar_only*_ option to _*false*_. There is also another limitation related to usage of aggregation function over cross join relation. When input row count for aggregate function is bigger than value of _*planner.slice_target*_ option then query can't be planned (because 2 phase aggregation can't be created in such case), as a workaround you should set *_planner.enable_multiphase_agg_* to _*false*_. This limitation will be active until fix of https://issues.apache.org/jira/browse/DRILL-6839. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Affects Versions: 1.14.0 >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Labels: doc-impacting > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id =
[jira] [Updated] (DRILL-540) Allow querying hive views in drill
[ https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-540: --- Description: Currently hive views cannot be queried from drill. *Suggested approach* # Drill persists it's views metadata in file with suffix .view.drill using json format. For example: {noformat} { "name" : "view_from_calcite_1_4", "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0", "fields" : [ { "name" : "*", "type" : "ANY", "isNullable" : true } ], "workspaceSchemaPath" : [ "dfs", "tmp" ] }{noformat} Later drill parses the metadata and uses it to treat view names in SQL as a subquery. 2. In Apache Hive metadata about views is stored in similar way to tables. Below is example from metastore.TBLS : {noformat} TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME |TBL_TYPE |VIEW_EXPANDED_TEXT | ---||--|-|--|--|--|--|--|---| 2 |1542111078 |1 |0|mapr |0 |2 |cview |VIRTUAL_VIEW |SELECT COUNT(*) FROM `default`.`customers` |{noformat} 3. So in Hive metastore views are considered as tables of special type. And main benefit is that we also have expanded SQL definition of views (just like in view.drill files). Also reading of the metadata is already implemented in Drill with help of thrift Metastore API. 4. To enable querying of Hive views I'll reuse existing code for Drill views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for _*HiveReadEntry*_ I'll convert the metadata to instance of _*View*_ (_which is actually model for data persisted in .view.drill files_) and then based on this instance return new _*DrillViewTable*_. Using this approach drill will handle hive views the same way as if it was initially defined in Drill and persisted in .view.drill file. 5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ I'll reuse existing code from _*DrillHiveTable*_, so the conversion functionality will be extracted and used for both (table and view) fields type conversions. was: Currently hive views cannot be queried from drill. > Allow querying hive views in drill > -- > > Key: DRILL-540 > URL: https://issues.apache.org/jira/browse/DRILL-540 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Hive >Reporter: Ramana Inukonda Nagaraj >Assignee: Igor Guzenko >Priority: Major > Labels: doc-impacting > Fix For: 1.16.0 > > > Currently hive views cannot be queried from drill. > *Suggested approach* > # Drill persists it's views metadata in file with suffix .view.drill using > json format. For example: > {noformat} > { > "name" : "view_from_calcite_1_4", > "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0", > "fields" : [ { > "name" : "*", > "type" : "ANY", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "dfs", "tmp" ] > }{noformat} > Later drill parses the metadata and uses it to treat view names in > SQL as a subquery. > 2. In Apache Hive metadata about views is stored in similar way to > tables. Below is example from metastore.TBLS : > > {noformat} > TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID > |TBL_NAME |TBL_TYPE |VIEW_EXPANDED_TEXT | > ---||--|-|--|--|--|--|--|---| > 2 |1542111078 |1 |0|mapr |0 |2 |cview >|VIRTUAL_VIEW |SELECT COUNT(*) FROM `default`.`customers` |{noformat} > 3. So in Hive metastore views are considered as tables of special type. > And main benefit is that we also have expanded SQL definition of views (just > like in view.drill files). Also reading of the metadata is already > implemented in Drill with help of thrift Metastore API. > 4. To enable querying of Hive views I'll reuse existing code for Drill > views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for > _*HiveReadEntry*_ I'll convert the metadata to instance of _*View*_ (_which > is actually model for data persisted in .view.drill files_) and then based on > this instance return new _*DrillViewTable*_. Using this approach drill will > handle hive views the same way as if it was initially defined in Drill and > persisted in .view.drill file. > 5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ > I'll reuse existing code from _*DrillHiveTable*_, so the conversion > functionality will be extracted and used for both
[jira] [Created] (DRILL-6839) Failed to plan (aggregate + Hash or NL join) when slice target is low
Igor Guzenko created DRILL-6839: --- Summary: Failed to plan (aggregate + Hash or NL join) when slice target is low Key: DRILL-6839 URL: https://issues.apache.org/jira/browse/DRILL-6839 Project: Apache Drill Issue Type: Bug Reporter: Igor Guzenko Assignee: Igor Guzenko Case 1. When nested loop join is about to be used: -Option "planner.enable_nljoin_for_scalar_only" is set to false -Option "planner.slice_target" is set to low value for imitation of big input tables {code:java} @Category(SqlTest.class) public class CrossJoinTest extends ClusterTest { @BeforeClass public static void setUp() throws Exception { startCluster(ClusterFixture.builder(dirTestWatcher)); } @Test public void testCrossJoinSucceedsForLowSliceTarget() throws Exception { try { client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), false); client.alterSession(ExecConstants.SLICE_TARGET, 1); queryBuilder().sql( "SELECT COUNT(l.nation_id) " + "FROM cp.`tpch/nation.parquet` l " + "CROSS JOIN cp.`tpch/region.parquet` r") .run(); } finally { client.resetSession(ExecConstants.SLICE_TARGET); client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName()); } } }{code} Case 2. When hash join is about to be used: - Option "planner.enable_mergejoin" is set to false, so hash join will be used instead - Option "planner.slice_target" is set to low value for imitation of big input tables - Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in PlannerPhase.getPhysicalRules method {code:java} @Category(SqlTest.class) public class CrossJoinTest extends ClusterTest { @BeforeClass public static void setUp() throws Exception { startCluster(ClusterFixture.builder(dirTestWatcher)); } @Test public void testInnerJoinSucceedsForLowSliceTarget() throws Exception { try { client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false); client.alterSession(ExecConstants.SLICE_TARGET, 1); queryBuilder().sql( "SELECT COUNT(l.nation_id) " + "FROM cp.`tpch/nation.parquet` l " + "INNER JOIN cp.`tpch/region.parquet` r " + "ON r.nation_id = l.nation_id") .run(); } finally { client.resetSession(ExecConstants.SLICE_TARGET); client.resetSession(PlannerSettings.MERGEJOIN.getOptionName()); } } } {code} *Workaround:* To avoid the exception we need to set option "planner.enable_multiphase_agg" to false. By doing this we avoid unsuccessful attempts to create 2 phase aggregation plan in StreamAggPrule and guarantee that logical aggregate will be converted to physical one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6839) Failed to plan (aggregate + Hash or NL join) when slice target is low
[ https://issues.apache.org/jira/browse/DRILL-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-6839: Description: *Case 1.* When nested loop join is about to be used: - Option "_planner.enable_nljoin_for_scalar_only_" is set to false - Option "_planner.slice_target_" is set to low value for imitation of big input tables {code:java} @Category(SqlTest.class) public class CrossJoinTest extends ClusterTest { @BeforeClass public static void setUp() throws Exception { startCluster(ClusterFixture.builder(dirTestWatcher)); } @Test public void testCrossJoinSucceedsForLowSliceTarget() throws Exception { try { client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), false); client.alterSession(ExecConstants.SLICE_TARGET, 1); queryBuilder().sql( "SELECT COUNT(l.nation_id) " + "FROM cp.`tpch/nation.parquet` l " + ", cp.`tpch/region.parquet` r") .run(); } finally { client.resetSession(ExecConstants.SLICE_TARGET); client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName()); } } }{code} *Case 2.* When hash join is about to be used: - Option "planner.enable_mergejoin" is set to false, so hash join will be used instead - Option "planner.slice_target" is set to low value for imitation of big input tables - Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in PlannerPhase.getPhysicalRules method {code:java} @Category(SqlTest.class) public class CrossJoinTest extends ClusterTest { @BeforeClass public static void setUp() throws Exception { startCluster(ClusterFixture.builder(dirTestWatcher)); } @Test public void testInnerJoinSucceedsForLowSliceTarget() throws Exception { try { client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false); client.alterSession(ExecConstants.SLICE_TARGET, 1); queryBuilder().sql( "SELECT COUNT(l.nation_id) " + "FROM cp.`tpch/nation.parquet` l " + "INNER JOIN cp.`tpch/region.parquet` r " + "ON r.nation_id = l.nation_id") .run(); } finally { client.resetSession(ExecConstants.SLICE_TARGET); client.resetSession(PlannerSettings.MERGEJOIN.getOptionName()); } } } {code} *Workaround:* To avoid the exception we need to set option "_planner.enable_multiphase_agg_" to false. By doing this we avoid unsuccessful attempts to create 2 phase aggregation plan in StreamAggPrule and guarantee that logical aggregate will be converted to physical one. was: Case 1. When nested loop join is about to be used: -Option "planner.enable_nljoin_for_scalar_only" is set to false -Option "planner.slice_target" is set to low value for imitation of big input tables {code:java} @Category(SqlTest.class) public class CrossJoinTest extends ClusterTest { @BeforeClass public static void setUp() throws Exception { startCluster(ClusterFixture.builder(dirTestWatcher)); } @Test public void testCrossJoinSucceedsForLowSliceTarget() throws Exception { try { client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), false); client.alterSession(ExecConstants.SLICE_TARGET, 1); queryBuilder().sql( "SELECT COUNT(l.nation_id) " + "FROM cp.`tpch/nation.parquet` l " + "CROSS JOIN cp.`tpch/region.parquet` r") .run(); } finally { client.resetSession(ExecConstants.SLICE_TARGET); client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName()); } } }{code} Case 2. When hash join is about to be used: - Option "planner.enable_mergejoin" is set to false, so hash join will be used instead - Option "planner.slice_target" is set to low value for imitation of big input tables - Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in PlannerPhase.getPhysicalRules method {code:java} @Category(SqlTest.class) public class CrossJoinTest extends ClusterTest { @BeforeClass public static void setUp() throws Exception { startCluster(ClusterFixture.builder(dirTestWatcher)); } @Test public void testInnerJoinSucceedsForLowSliceTarget() throws Exception { try { client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false); client.alterSession(ExecConstants.SLICE_TARGET, 1); queryBuilder().sql( "SELECT COUNT(l.nation_id) " + "FROM cp.`tpch/nation.parquet` l " + "INNER JOIN cp.`tpch/region.parquet` r " + "ON r.nation_id = l.nation_id") .run(); } finally { client.resetSession(ExecConstants.SLICE_TARGET); client.resetSession(PlannerSettings.MERGEJOIN.getOptionName()); } } } {code} *Workaround:* To avoid the exception we need to set option "planner.enable_multiphase_agg" to false. By doing this we avoid unsuccessful attempts to create 2 phase aggregation plan in StreamAggPrule and guarantee that logical aggregate will be converted to physical one. > Failed to plan (aggregate + Hash or NL join) when
[jira] [Assigned] (DRILL-6839) Failed to plan (aggregate + Hash or NL join) when slice target is low
[ https://issues.apache.org/jira/browse/DRILL-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-6839: --- Assignee: (was: Igor Guzenko) > Failed to plan (aggregate + Hash or NL join) when slice target is low > -- > > Key: DRILL-6839 > URL: https://issues.apache.org/jira/browse/DRILL-6839 > Project: Apache Drill > Issue Type: Bug >Reporter: Igor Guzenko >Priority: Major > > *Case 1.* When nested loop join is about to be used: > - Option "_planner.enable_nljoin_for_scalar_only_" is set to false > - Option "_planner.slice_target_" is set to low value for imitation of big > input tables > > {code:java} > @Category(SqlTest.class) > public class CrossJoinTest extends ClusterTest { > @BeforeClass > public static void setUp() throws Exception { > startCluster(ClusterFixture.builder(dirTestWatcher)); > } > @Test > public void testCrossJoinSucceedsForLowSliceTarget() throws Exception { >try { > client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), > false); > client.alterSession(ExecConstants.SLICE_TARGET, 1); > queryBuilder().sql( > "SELECT COUNT(l.nation_id) " + > "FROM cp.`tpch/nation.parquet` l " + > ", cp.`tpch/region.parquet` r") > .run(); >} finally { > client.resetSession(ExecConstants.SLICE_TARGET); > client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName()); >} > } > }{code} > > *Case 2.* When hash join is about to be used: > - Option "planner.enable_mergejoin" is set to false, so hash join will be > used instead > - Option "planner.slice_target" is set to low value for imitation of big > input tables > - Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in > PlannerPhase.getPhysicalRules method > {code:java} > @Category(SqlTest.class) > public class CrossJoinTest extends ClusterTest { > @BeforeClass > public static void setUp() throws Exception { >startCluster(ClusterFixture.builder(dirTestWatcher)); > } > @Test > public void testInnerJoinSucceedsForLowSliceTarget() throws Exception { >try { > client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false); > client.alterSession(ExecConstants.SLICE_TARGET, 1); > queryBuilder().sql( > "SELECT COUNT(l.nation_id) " + > "FROM cp.`tpch/nation.parquet` l " + > "INNER JOIN cp.`tpch/region.parquet` r " + > "ON r.nation_id = l.nation_id") > .run(); >} finally { > client.resetSession(ExecConstants.SLICE_TARGET); > client.resetSession(PlannerSettings.MERGEJOIN.getOptionName()); >} > } > } > {code} > > *Workaround:* To avoid the exception we need to set option > "_planner.enable_multiphase_agg_" to false. By doing this we avoid > unsuccessful attempts to create 2 phase aggregation plan in StreamAggPrule > and guarantee that logical aggregate will be converted to physical one. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639674#comment-16639674 ] Igor Guzenko commented on DRILL-786: [~hanu.ncr] I've addressed all comments in PR. Could you please take a look ? > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Affects Versions: 1.14.0 >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Sets: > Set#22, type: (DrillRecordRow[*, age, name, studentnum]) > rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, > importance=0.59049001 > rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), > rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 > network} >
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306 ] Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:45 AM: -- We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. was (Author: ihorhuzenko): We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, {color:#d04437}it's really hard to implement this approach, it requires a lot of time and includes a lot of changes to Apache Calcite.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 >
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306 ] Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:53 AM: -- We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}. {color:#33} I have provided results of the investigation in the prior comments.{color}{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. was (Author: ihorhuzenko): We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL],
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306 ] Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:53 AM: -- We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}. {color:#33}I have provided results of the investigation in the prior comments.{color}{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. was (Author: ihorhuzenko): We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}. {color:#33} I have provided results of the investigation in the prior comments.{color}{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: >
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306 ] Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:50 AM: -- We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. Also we can consider changing default value of the option to false thus queries producing Cartesian product would always succeed. was (Author: ihorhuzenko): We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, *{color:#d04437}I didn't find how this could be implemented{color}*{color:#d04437}.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0
[jira] [Commented] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634323#comment-16634323 ] Igor Guzenko commented on DRILL-786: I've tried addition of joinContext map to Calcite's Join class and passed it through each point where join instance may be copied or recreated: JoinToMultiJoinRule.java LogicalJoin.java LoptOptimizeJoinRule.java MultiJoin.java MutableRels.java PigRelFactories.java RelBuilder.java RelFactories.java RelStructuredTypeFlattener.java SqlToRelConverter.java SubQueryRemoveRule.java But even with such verbose changes I wasn't able to overcome problem when both implicit and explicit cross joins are present in one query and option {color:#59afe1}planner.enable_nljoin_for_scalar_only {color:#33}is set to true{color}{color}. Such query should fail with exception that says: "This query cannot be planned possibly due to either a cartesian join or an inequality join", {color:#d04437}but it works{color}... I suggest to leave this case and simply enable NestedLoopJoin when explicit cross join is present in original query. Such solution may be implemented more easily and it won't require any changes to Calcite. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], >
[jira] [Updated] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-786: --- Affects Version/s: 1.14.0 Reviewer: Volodymyr Vysotskyi > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Affects Versions: 1.14.0 >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Sets: > Set#22, type: (DrillRecordRow[*, age, name, studentnum]) > rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, > importance=0.59049001 > rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), > rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 > network} >
[jira] [Commented] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637942#comment-16637942 ] Igor Guzenko commented on DRILL-786: Looks like we agreed to move on with option 3 without changing default value of planner.enable_nljoin_for_scalar_only. Also I'll update error message *from* "This query cannot be planned possibly due to either a cartesian join or an inequality join. " *to* "This query cannot be planned possibly due to either a cartesian join or an inequality join. If cartesian or inequality join is used intentionally, set option 'planner.enable_nljoin_for_scalar_only' to false and try again." > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Sets: > Set#22,
[jira] [Commented] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306 ] Igor Guzenko commented on DRILL-786: We considered 3 possible options how the feature could be implemented. Note, in text below when I mention option is enabled or disabled it relates to *planner.enable_nljoin_for_scalar_only* option. *Option 1. (Perfect case) :* Allow nested loop only for nodes that originated from explicit cross join syntax but prohibit implicit cross joins when option is enabled. So such query should fail when option is true: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} Because cross join of *a* and result of (*b* x *c*) is implicit and should depend on option value. But based on my investigation, {color:#d04437}it's really hard to implement this approach, it requires a lot of time and includes a lot of changes to Apache Calcite.{color} *Option 2. (Allow all queries with explicit cross join syntax)* We can allow nested loop join for all queries that contain explicit cross join syntax regardless of option value. For example following queries will work in such case: {code:java} SELECT * FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r {code} {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN cp.`tpch/nation.parquet` c {code} But queries that don't contain explicit syntax, will still be dependent on the option. For example the following query won't work when option is enabled: {code:java} SELECT * FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b {code} *Option 3. (Allow cross join syntax only when option enabled)* This approach is just more narrow case of the previous one. We could allow explicit cross join for enabled option, and prohibit it for disabled option. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io,
[jira] [Created] (DRILL-6944) UnsupportedOperationException thrown for view over MapR-DB binary table
Igor Guzenko created DRILL-6944: --- Summary: UnsupportedOperationException thrown for view over MapR-DB binary table Key: DRILL-6944 URL: https://issues.apache.org/jira/browse/DRILL-6944 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types, Storage - MapRDB Affects Versions: 1.15.0 Reporter: Igor Guzenko Assignee: Igor Guzenko Fix For: 1.16.0 1. Create MapR-DB binary table and put some data using HBase shell: {code:none} hbase shell create '/tmp/bintable','name','address' put '/tmp/bintable','100','name:first_name','john' put '/tmp/bintable','100','name:last_name','doe' put '/tmp/bintable','100','address:city','Newark' put '/tmp/bintable','100','address:state','nj' scan '/tmp/bintable' {code} 2. Drill config: ensure that dfs storage plugin has "connection": "maprfs:///" and contains format: {code:java} "maprdb": { "type": "maprdb", "allTextMode": true, "enablePushdown": false, "disableCountOptimization": true } {code} 3. Check that table can be selected from Drill : {code:java} select * from dfs.`/tmp/bintable`; {code} 4. Create Drill view {code:java} create view dfs.tmp.`testview` as select * from dfs.`/tmp/bintable`; {code} 5. Query the view results into exception: {code:java} 0: jdbc:drill:> select * from dfs.tmp.`testview`; Error: SYSTEM ERROR: UnsupportedOperationException: Unable to convert cast expression CastExpression [input=`address`, type=minor_type: MAP mode: REQUIRED ] into string. Please, refer to logs for more information. [Error Id: 109acd00-7456-4a74-8a17-485f8999000f on node1.cluster.com:31010] (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6977) Improve Hive tests configuration
Igor Guzenko created DRILL-6977: --- Summary: Improve Hive tests configuration Key: DRILL-6977 URL: https://issues.apache.org/jira/browse/DRILL-6977 Project: Apache Drill Issue Type: Bug Reporter: Igor Guzenko Class HiveTestDataGenerator is responsible for initialization of hive metadata service and configuration of hive storage plugin for tested drillbit. Originally it was supposed to be initialized once before all tests in hive module, but actually it's initialized for every test class. And such initialization takes a lot of time, so it's worth to spend some time to accelerate hive tests. This task has two main aims: # Use HiveTestDataGenerator once for all test classes # Provide flexible configuration of Hive tests that can be used with ClusterFicture for autonomic(not bounded to HiveTestBase) test classes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6977) Improve Hive tests configuration
[ https://issues.apache.org/jira/browse/DRILL-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-6977: Issue Type: Improvement (was: Bug) > Improve Hive tests configuration > > > Key: DRILL-6977 > URL: https://issues.apache.org/jira/browse/DRILL-6977 > Project: Apache Drill > Issue Type: Improvement >Reporter: Igor Guzenko >Priority: Major > > Class HiveTestDataGenerator is responsible for initialization of hive > metadata service and configuration of hive storage plugin for tested > drillbit. Originally it was supposed to be initialized once before all tests > in hive module, but actually it's initialized for every test class. And such > initialization takes a lot of time, so it's worth to spend some time to > accelerate hive tests. > This task has two main aims: > # Use HiveTestDataGenerator once for all test classes > # Provide flexible configuration of Hive tests that can be used with > ClusterFicture for autonomic(not bounded to HiveTestBase) test classes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-6977) Improve Hive tests configuration
[ https://issues.apache.org/jira/browse/DRILL-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-6977: --- Assignee: Igor Guzenko Fix Version/s: 1.16.0 Component/s: Tools, Build & Test > Improve Hive tests configuration > > > Key: DRILL-6977 > URL: https://issues.apache.org/jira/browse/DRILL-6977 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build Test >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > Class HiveTestDataGenerator is responsible for initialization of hive > metadata service and configuration of hive storage plugin for tested > drillbit. Originally it was supposed to be initialized once before all tests > in hive module, but actually it's initialized for every test class. And such > initialization takes a lot of time, so it's worth to spend some time to > accelerate hive tests. > This task has two main aims: > # Use HiveTestDataGenerator once for all test classes > # Provide flexible configuration of Hive tests that can be used with > ClusterFicture for autonomic(not bounded to HiveTestBase) test classes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6977) Improve Hive tests configuration
[ https://issues.apache.org/jira/browse/DRILL-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-6977: Description: Class HiveTestDataGenerator is responsible for initialization of hive metadata service and configuration of hive storage plugin for tested drillbit. Originally it was supposed to be initialized once before all tests in hive module, but actually it's initialized for every test class. And such initialization takes a lot of time, so it's worth to spend some time to accelerate hive tests. This task has two main aims: # Use HiveTestDataGenerator once for all test classes # Provide flexible configuration of Hive tests that can be used with ClusterFixture for autonomic(not bounded to HiveTestBase) test classes was: Class HiveTestDataGenerator is responsible for initialization of hive metadata service and configuration of hive storage plugin for tested drillbit. Originally it was supposed to be initialized once before all tests in hive module, but actually it's initialized for every test class. And such initialization takes a lot of time, so it's worth to spend some time to accelerate hive tests. This task has two main aims: # Use HiveTestDataGenerator once for all test classes # Provide flexible configuration of Hive tests that can be used with ClusterFicture for autonomic(not bounded to HiveTestBase) test classes > Improve Hive tests configuration > > > Key: DRILL-6977 > URL: https://issues.apache.org/jira/browse/DRILL-6977 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build Test >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > Class HiveTestDataGenerator is responsible for initialization of hive > metadata service and configuration of hive storage plugin for tested > drillbit. Originally it was supposed to be initialized once before all tests > in hive module, but actually it's initialized for every test class. And such > initialization takes a lot of time, so it's worth to spend some time to > accelerate hive tests. > This task has two main aims: > # Use HiveTestDataGenerator once for all test classes > # Provide flexible configuration of Hive tests that can be used with > ClusterFixture for autonomic(not bounded to HiveTestBase) test classes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6898) Web UI cannot be used without internet connection (jquery loaded from ajax.googleapis.com)
[ https://issues.apache.org/jira/browse/DRILL-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720016#comment-16720016 ] Igor Guzenko commented on DRILL-6898: - The issue with jQuery fixed in PR [https://github.com/apache/drill/pull/1495.] > Web UI cannot be used without internet connection (jquery loaded from > ajax.googleapis.com) > -- > > Key: DRILL-6898 > URL: https://issues.apache.org/jira/browse/DRILL-6898 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.14.0 >Reporter: Paul Bormans >Priority: Major > Fix For: 1.15.0 > > > When opening the web ui in an environment that does not have an internet > connection, then the jquery js library is not loaded and the website does not > function as it should. > One solution can be to add a configuration option to use local/packages > javascript libraries iso loading these from a CDN. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6898) Web UI cannot be used without internet connection (jquery loaded from ajax.googleapis.com)
[ https://issues.apache.org/jira/browse/DRILL-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-6898: Fix Version/s: 1.15.0 > Web UI cannot be used without internet connection (jquery loaded from > ajax.googleapis.com) > -- > > Key: DRILL-6898 > URL: https://issues.apache.org/jira/browse/DRILL-6898 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.14.0 >Reporter: Paul Bormans >Priority: Major > Fix For: 1.15.0 > > > When opening the web ui in an environment that does not have an internet > connection, then the jquery js library is not loaded and the website does not > function as it should. > One solution can be to add a configuration option to use local/packages > javascript libraries iso loading these from a CDN. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6898) Web UI cannot be used without internet connection (jquery loaded from ajax.googleapis.com)
[ https://issues.apache.org/jira/browse/DRILL-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko closed DRILL-6898. --- Resolution: Fixed > Web UI cannot be used without internet connection (jquery loaded from > ajax.googleapis.com) > -- > > Key: DRILL-6898 > URL: https://issues.apache.org/jira/browse/DRILL-6898 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.14.0 >Reporter: Paul Bormans >Priority: Major > Fix For: 1.15.0 > > > When opening the web ui in an environment that does not have an internet > connection, then the jquery js library is not loaded and the website does not > function as it should. > One solution can be to add a configuration option to use local/packages > javascript libraries iso loading these from a CDN. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6908) Hive Testing: Replace cumbersome assertions with TestBuilder.csvBaselineFile()
[ https://issues.apache.org/jira/browse/DRILL-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko closed DRILL-6908. --- Resolution: Won't Fix CSVTestBuilder is not suitable for tests with big variety of columns datatypes and it would be worthless to spend a lot of time for fixing the cast issues related to the test builder's validation query. > Hive Testing: Replace cumbersome assertions with TestBuilder.csvBaselineFile() > -- > > Key: DRILL-6908 > URL: https://issues.apache.org/jira/browse/DRILL-6908 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.16.0 >Reporter: Igor Guzenko >Priority: Minor > > For improving Hive tests readability it would be nice to rewrite some > cumbersome tests to use CSVTestBuilder for assertions. This issue was created > because it requires a lot of changes for different Hive tests and also some > additional time may be necessary to ensure that CSVTestBuilder can read all > datatypes (dates with different formats, floating point numbers, big decimals > etc.) into correct values for assertions. > Below is list of test methods to be rewritten: > * TestHiveStorage.readAllSupportedHiveDataTypes() > * TestHiveStorage.orderByOnHiveTable() > * TestHiveStorage.readingFromSmallTableWithSkipHeaderAndFooter() > > * TestHiveViewsSupport.selectStarFromView() > * TestHiveViewsSupport.useHiveAndSelectStarFromView() > * TestHiveViewsSupport.viewWithAllSupportedDataTypes() > * > TestHiveDrillNativeParquetReader.testReadAllSupportedHiveDataTypesNativeParquet() > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6917) Add test for DRILL-6912
[ https://issues.apache.org/jira/browse/DRILL-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-6917: Summary: Add test for DRILL-6912 (was: Add test for test case ) > Add test for DRILL-6912 > --- > > Key: DRILL-6917 > URL: https://issues.apache.org/jira/browse/DRILL-6917 > Project: Apache Drill > Issue Type: Test >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Minor > Attachments: TestTwoDrillbitsWithSamePort.java > > > Currently there is working test from attachment, but it's necessary to > migrate it into TestGracefulShutdown class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6917) Add test for test case
Igor Guzenko created DRILL-6917: --- Summary: Add test for test case Key: DRILL-6917 URL: https://issues.apache.org/jira/browse/DRILL-6917 Project: Apache Drill Issue Type: Test Reporter: Igor Guzenko Assignee: Igor Guzenko Attachments: TestTwoDrillbitsWithSamePort.java Currently there is working test from attachment, but it's necessary to migrate it into TestGracefulShutdown class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6917) Add test for DRILL-6912
[ https://issues.apache.org/jira/browse/DRILL-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko closed DRILL-6917. --- Resolution: Done Fix Version/s: 1.15.0 Part of commit: d4771219f6d15cf89f7aeab9357bc3b26d9bf052 > Add test for DRILL-6912 > --- > > Key: DRILL-6917 > URL: https://issues.apache.org/jira/browse/DRILL-6917 > Project: Apache Drill > Issue Type: Test >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Minor > Fix For: 1.15.0 > > Attachments: TestTwoDrillbitsWithSamePort.java > > > Currently there is working test from attachment, but it's necessary to > migrate it into TestGracefulShutdown class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-540) Allow querying hive views in Drill
[ https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-540: --- Description: Currently hive views cannot be queried from drill. This Jira aims to add support for Hive views in Drill. *Implementation details:* # Drill persists it's views metadata in file with suffix .view.drill using json format. For example: {noformat} { "name" : "view_from_calcite_1_4", "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0", "fields" : [ { "name" : "*", "type" : "ANY", "isNullable" : true } ], "workspaceSchemaPath" : [ "dfs", "tmp" ] } {noformat} Later Drill parses the metadata and uses it to treat view names in SQL as a subquery. 2. In Apache Hive metadata about views is stored in similar way to tables. Below is example from metastore.TBLS : {noformat} TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME |TBL_TYPE |VIEW_EXPANDED_TEXT | ---||--|-|--|--|--|--|--|---| 2 |1542111078 |1 |0|mapr |0 |2 |cview |VIRTUAL_VIEW |SELECT COUNT(*) FROM `default`.`customers` | {noformat} 3. So in Hive metastore views are considered as tables of special type. And main benefit is that we also have expanded SQL definition of views (just like in view.drill files). Also reading of the metadata is already implemented in Drill with help of thrift Metastore API. 4. To enable querying of Hive views we'll reuse existing code for Drill views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is actually model for data persisted in .view.drill files_) and then based on this instance return new _*DrillViewTable*_. Using this approach drill will handle hive views the same way as if it was initially defined in Drill and persisted in .view.drill file. 5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ we'll reuse existing code from _*DrillHiveTable*_, so the conversion functionality will be extracted and used for both (table and view) fields type conversions. *Security implications* Consider simple example case where we have users, {code:java} user0 user1 user2 \ / group12 {code} and sample db where object names contains user or group who should access them {code:java} db_all tbl_user0 vw_user0 tbl_group12 vw_group12 {code} There are two Hive authorization modes supported by Drill - SQL Standart and Strorage Based authorization. For SQL Standart authorization permissions were granted using SQL: {code:java} SET ROLE admin; GRANT SELECT ON db_all.tbl_user0 TO USER user0; GRANT SELECT ON db_all.vw_user0 TO USER user0; CREATE ROLE group12; GRANT ROLE group12 TO USER user1; GRANT ROLE group12 TO USER user2; GRANT SELECT ON db_all.tbl_group12 TO ROLE group12; GRANT SELECT ON db_all.vw_group12 TO ROLE group12; {code} And for Storage based authorization permissions were granted using commands: {code:java} hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0 hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0 hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12 hadoop fs -chown user1:group12 /user/hive/warehouse/db_all.db/tbl_group12{code} Then the following table shows us results of queries for both authorization models. *SQL Standart Storage Based Authorization* ||SQL||user0||user1||user2|| ||user0||user1||user2|| |*Queries executed using Drill :*| | | | | | | | |SHOW TABLES IN hive.db_all;| all| all| all| |Accessibe tables + all views|Accessibe tables + all views|Accessibe tables + all views| |SELECT * FROM hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)| (x)| (x)| |SELECT * FROM hive.db_all.vw_user0;| (/)| (x)| (x)| | (/)| (x)| (x)| |SELECT * FROM hive.db_all.tbl_group12;| (x)| (/)| (/)| | (x)| (/)| (/)| |SELECT * FROM hive.db_all.vw_group12;| (x)| (/)| (/)| | (x)| (/)| (/)| |SELECT * FROM INFORMATION_SCHEMA.`TABLES` WHERE TABLE_SCHEMA = 'hive.db_all';| all| all| all| |Accessibe tables + all views|Accessibe tables + all views|Accessibe tables + all views| |DESCRIBE hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)| (x)| (x)| |DESCRIBE hive.db_all.vw_user0;| (/)| (x) | (x)| | (/)| (/)| (/)| |DESCRIBE hive.db_all.tbl_group12;| (x)| (/)| (/)| | (x)| (/) | (/)| |DESCRIBE hive.db_all.vw_group12;| (x)| (/)| (/)| | (/)| (/)|
[jira] [Updated] (DRILL-6908) Hive Testing: Replace cumbersome assertions with TestBuilder.csvBaselineFile()
[ https://issues.apache.org/jira/browse/DRILL-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-6908: Description: For improving Hive tests readability it would be nice to rewrite some cumbersome tests to use CSVTestBuilder for assertions. This issue was created because it requires a lot of changes for different Hive tests and also some additional time may be necessary to ensure that CSVTestBuilder can read all datatypes (dates with different formats, floating point numbers, big decimals etc.) into correct values for assertions. Below is list of test methods to be rewritten: * TestHiveStorage.readAllSupportedHiveDataTypes() * TestHiveStorage.orderByOnHiveTable() * TestHiveStorage.readingFromSmallTableWithSkipHeaderAndFooter() * TestHiveViewsSupport.selectStarFromView() * TestHiveViewsSupport.useHiveAndSelectStarFromView() * TestHiveViewsSupport.viewWithAllSupportedDataTypes() * TestHiveDrillNativeParquetReader.testReadAllSupportedHiveDataTypesNativeParquet() was: For improving Hive tests readability it would be nice to rewrite some cumbersome tests to use CSVTestBuilder for assertions. This issue was created because it requires a lot of changes for different Hive tests and also some additional time may be necessary to ensure that CSVTestBuilder can read all datatypes (dates with different formats, floating point numbers, big decimals etc.) into correct values for assertions. Below is list of test methods to be rewritten: * TestHiveStorage.readAllSupportedHiveDataTypes() * TestHiveStorage.orderByOnHiveTable() * TestHiveStorage.readingFromSmallTableWithSkipHeaderAndFooter() * TestHiveViewsSupport.selectStarFromView() * TestHiveViewsSupport.useHiveAndSelectStarFromView() * TestHiveViewsSupport.viewWithAllSupportedDataTypes() * TestInfoSchemaOnHiveStorage.showTablesFromDb() * TestInfoSchemaOnHiveStorage.showDatabases() * TestInfoSchemaOnHiveStorage.varCharMaxLengthAndDecimalPrecisionInInfoSchema() * TestInfoSchemaOnHiveStorage.showInfoSchema() * TestHiveDrillNativeParquetReader.testReadAllSupportedHiveDataTypesNativeParquet() > Hive Testing: Replace cumbersome assertions with TestBuilder.csvBaselineFile() > -- > > Key: DRILL-6908 > URL: https://issues.apache.org/jira/browse/DRILL-6908 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.16.0 >Reporter: Igor Guzenko >Priority: Minor > > For improving Hive tests readability it would be nice to rewrite some > cumbersome tests to use CSVTestBuilder for assertions. This issue was created > because it requires a lot of changes for different Hive tests and also some > additional time may be necessary to ensure that CSVTestBuilder can read all > datatypes (dates with different formats, floating point numbers, big decimals > etc.) into correct values for assertions. > Below is list of test methods to be rewritten: > * TestHiveStorage.readAllSupportedHiveDataTypes() > * TestHiveStorage.orderByOnHiveTable() > * TestHiveStorage.readingFromSmallTableWithSkipHeaderAndFooter() > > * TestHiveViewsSupport.selectStarFromView() > * TestHiveViewsSupport.useHiveAndSelectStarFromView() > * TestHiveViewsSupport.viewWithAllSupportedDataTypes() > > * > TestHiveDrillNativeParquetReader.testReadAllSupportedHiveDataTypesNativeParquet() > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6908) Hive Testing: Replace cumbersome assertions with TestBuilder.csvBaselineFile()
Igor Guzenko created DRILL-6908: --- Summary: Hive Testing: Replace cumbersome assertions with TestBuilder.csvBaselineFile() Key: DRILL-6908 URL: https://issues.apache.org/jira/browse/DRILL-6908 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.16.0 Reporter: Igor Guzenko For improving Hive tests readability it would be nice to rewrite some cumbersome tests to use CSVTestBuilder for assertions. This issue was created because it requires a lot of changes for different Hive tests and also some additional time may be necessary to ensure that CSVTestBuilder can read all datatypes (dates with different formats, floating point numbers, big decimals etc.) into correct values for assertions. Below is list of test methods to be rewritten: * TestHiveStorage.readAllSupportedHiveDataTypes() * TestHiveStorage.orderByOnHiveTable() * TestHiveStorage.readingFromSmallTableWithSkipHeaderAndFooter() * TestHiveViewsSupport.selectStarFromView() * TestHiveViewsSupport.useHiveAndSelectStarFromView() * TestHiveViewsSupport.viewWithAllSupportedDataTypes() * TestInfoSchemaOnHiveStorage.showTablesFromDb() * TestInfoSchemaOnHiveStorage.showDatabases() * TestInfoSchemaOnHiveStorage.varCharMaxLengthAndDecimalPrecisionInInfoSchema() * TestInfoSchemaOnHiveStorage.showInfoSchema() * TestHiveDrillNativeParquetReader.testReadAllSupportedHiveDataTypesNativeParquet() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6923) Show schemas uses default(user defined) schema first for resolving table from information_schema
[ https://issues.apache.org/jira/browse/DRILL-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-6923: Description: Show tables tries to find table `information_schema`.`schemata` in default (user defined) schema, and after failed attempt it resolves table successfully against root schema. Please check description below for details explained using example with hive plugin. *Abstract* When Drill used with enabled Hive SQL Standard authorization, execution of queries like, {code:sql} USE hive.db_general; SHOW SCHEMAS LIKE 'hive.%'; {code} results in error DrillRuntimeException: Failed to use the Hive authorization components: Error getting object from metastore for Object [type=TABLE_OR_VIEW, name=db_general.information_schema] . *Details* Consider showSchemas() test similar to one defined in TestSqlStdBasedAuthorization : {code:java} @Test public void showSchemas() throws Exception { test("USE " + hivePluginName + "." + db_general); testBuilder() .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'") .unOrdered() .baselineColumns("SCHEMA_NAME") .baselineValues("hive.db_general") .baselineValues("hive.default") .go(); } {code} Currently execution of such test will produce following stacktrace: {code:none} Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed to use the Hive authorization components: Error getting object from metastore for Object [type=TABLE_OR_VIEW, name=db_general.information_schema] at org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149) at org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134) at org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450) at org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233) at org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214) at org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63) at org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83) at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288) at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143) at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99) at org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203) at org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105) at org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177) at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032) at org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014) at org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284) at org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60) at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943) at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918) at org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:628) at org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:192) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:664) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:200) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:173) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:155) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:90) at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:584) at
[jira] [Created] (DRILL-6923) Show schemas after use doesn't work with Hive authorization
Igor Guzenko created DRILL-6923: --- Summary: Show schemas after use doesn't work with Hive authorization Key: DRILL-6923 URL: https://issues.apache.org/jira/browse/DRILL-6923 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Reporter: Igor Guzenko Assignee: Igor Guzenko *Abstract* When Drill used with enabled Hive SQL Standard authorization, execution of queries like, {code:sql} USE hive.db_general; SHOW SCHEMAS LIKE 'hive.%'; {code} results in error DrillRuntimeException: Failed to use the Hive authorization components: Error getting object from metastore for Object [type=TABLE_OR_VIEW, name=db_general.information_schema] . *Details* Consider showSchemas() test similar to one defined in TestSqlStdBasedAuthorization : {code:java} @Test public void showSchemas() throws Exception { test("USE " + hivePluginName + "." + db_general); testBuilder() .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'") .unOrdered() .baselineColumns("SCHEMA_NAME") .baselineValues("hive.db_general") .baselineValues("hive.default") .go(); } {code} Currently execution of such test will produce following stacktrace: {code:none} Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed to use the Hive authorization components: Error getting object from metastore for Object [type=TABLE_OR_VIEW, name=db_general.information_schema] at org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149) at org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134) at org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450) at org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233) at org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214) at org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63) at org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83) at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288) at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143) at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99) at org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203) at org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105) at org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177) at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032) at org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014) at org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284) at org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60) at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943) at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225) at org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918) at org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:628) at org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:192) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:664) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:200) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:173) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:155) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:90) at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:584) at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:272) at ...(:0) Caused by:
[jira] [Updated] (DRILL-6923) Show schemas uses default(user defined) schema first for resolving table from information_schema
[ https://issues.apache.org/jira/browse/DRILL-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-6923: Summary: Show schemas uses default(user defined) schema first for resolving table from information_schema (was: Show schemas after use doesn't work with Hive authorization) > Show schemas uses default(user defined) schema first for resolving table from > information_schema > > > Key: DRILL-6923 > URL: https://issues.apache.org/jira/browse/DRILL-6923 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > > *Abstract* > When Drill used with enabled Hive SQL Standard authorization, execution of > queries like, > {code:sql} > USE hive.db_general; > SHOW SCHEMAS LIKE 'hive.%'; {code} > results in error DrillRuntimeException: Failed to use the Hive authorization > components: Error getting object from metastore for Object > [type=TABLE_OR_VIEW, name=db_general.information_schema] . > *Details* > Consider showSchemas() test similar to one defined in > TestSqlStdBasedAuthorization : > {code:java} > @Test > public void showSchemas() throws Exception { > test("USE " + hivePluginName + "." + db_general); > testBuilder() > .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'") > .unOrdered() > .baselineColumns("SCHEMA_NAME") > .baselineValues("hive.db_general") > .baselineValues("hive.default") > .go(); > } > {code} > Currently execution of such test will produce following stacktrace: > {code:none} > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed > to use the Hive authorization components: Error getting object from metastore > for Object [type=TABLE_OR_VIEW, name=db_general.information_schema] > at > org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149) > at > org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134) > at > org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450) > at > org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233) > at > org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214) > at > org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63) > at > org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83) > at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288) > at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143) > at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99) > at > org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203) > at > org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105) > at > org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032) > at > org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014) > at > org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284) > at > org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943) > at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:628) > at > org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:192) > at >
[jira] [Updated] (DRILL-540) Allow querying hive views in Drill
[ https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-540: --- Description: Currently hive views cannot be queried from drill. This Jira aims to add support for Hive views in Drill. *Implementation details:* # Drill persists it's views metadata in file with suffix .view.drill using json format. For example: {noformat} { "name" : "view_from_calcite_1_4", "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0", "fields" : [ { "name" : "*", "type" : "ANY", "isNullable" : true } ], "workspaceSchemaPath" : [ "dfs", "tmp" ] } {noformat} Later Drill parses the metadata and uses it to treat view names in SQL as a subquery. 2. In Apache Hive metadata about views is stored in similar way to tables. Below is example from metastore.TBLS : {noformat} TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME |TBL_TYPE |VIEW_EXPANDED_TEXT | ---||--|-|--|--|--|--|--|---| 2 |1542111078 |1 |0|mapr |0 |2 |cview |VIRTUAL_VIEW |SELECT COUNT(*) FROM `default`.`customers` | {noformat} 3. So in Hive metastore views are considered as tables of special type. And main benefit is that we also have expanded SQL definition of views (just like in view.drill files). Also reading of the metadata is already implemented in Drill with help of thrift Metastore API. 4. To enable querying of Hive views we'll reuse existing code for Drill views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is actually model for data persisted in .view.drill files_) and then based on this instance return new _*DrillViewTable*_. Using this approach drill will handle hive views the same way as if it was initially defined in Drill and persisted in .view.drill file. 5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ we'll reuse existing code from _*DrillHiveTable*_, so the conversion functionality will be extracted and used for both (table and view) fields type conversions. *Security implications* Consider simple example case where we have users, {code:java} user0 user1 user2 \ / group12 {code} and sample db where object names contains user or group who should access them {code:java} db_all tbl_user0 vw_user0 tbl_group12 vw_group12 {code} There are two Hive authorization modes supported by Drill - SQL Standart and Strorage Based authorization. For SQL Standart authorization permissions were granted using SQL: {code:java} SET ROLE admin; GRANT SELECT ON db_all.tbl_user0 TO USER user0; GRANT SELECT ON db_all.vw_user0 TO USER user0; CREATE ROLE group12; GRANT ROLE group12 TO USER user1; GRANT ROLE group12 TO USER user2; GRANT SELECT ON db_all.tbl_group12 TO ROLE group12; GRANT SELECT ON db_all.vw_group12 TO ROLE group12; {code} And for Storage based authorization permissions were granted using commands: {code:java} hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0 hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0 hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12 hadoop fs -chown user1:group12 /user/hive/warehouse/db_all.db/tbl_group12{code} Then the following table shows us results of queries for both authorization models. *SQL Standart Storage Based Authorization* ||SQL||user0||user1||user2|| ||user0||user1||user2|| |SHOW TABLES IN hive.db_all;| all| all| all| |Accessibe tables + all views|Accessibe tables + all views|Accessibe tables + all views| |SELECT * FROM hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)| (x)| (x)| |SELECT * FROM hive.db_all.vw_user0;| (/)| (x)| (x)| | (/)| (x)| (x)| |SELECT * FROM hive.db_all.tbl_group12;| (x)| (/)| (/)| | (x)| (/)| (/)| |SELECT * FROM hive.db_all.vw_group12;| (x)| (/)| (/)| | (x)| (/)| (/)| |SELECT * FROM INFORMATION_SCHEMA.`TABLES` WHERE TABLE_SCHEMA = 'hive.db_all';| all| all| all| |Accessibe tables + all views|Accessibe tables + all views|Accessibe tables + all views| |DESCRIBE hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)| (x)| (x)| |DESCRIBE hive.db_all.vw_user0;| (/)| (x) | (x)| | (/)| (/)| (/)| |DESCRIBE hive.db_all.tbl_group12;| (x)| (/)| (/)| | (x)| (/) | (/)| |DESCRIBE hive.db_all.vw_group12;| (x)| (/)| (/)| | (/)| (/)| (/)| (!)
[jira] [Updated] (DRILL-540) Allow querying hive views in Drill
[ https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-540: --- Description: Currently hive views cannot be queried from drill. This Jira aims to add support for Hive views in Drill. *Implementation details:* # Drill persists it's views metadata in file with suffix .view.drill using json format. For example: {noformat} { "name" : "view_from_calcite_1_4", "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0", "fields" : [ { "name" : "*", "type" : "ANY", "isNullable" : true } ], "workspaceSchemaPath" : [ "dfs", "tmp" ] } {noformat} Later Drill parses the metadata and uses it to treat view names in SQL as a subquery. 2. In Apache Hive metadata about views is stored in similar way to tables. Below is example from metastore.TBLS : {noformat} TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME |TBL_TYPE |VIEW_EXPANDED_TEXT | ---||--|-|--|--|--|--|--|---| 2 |1542111078 |1 |0|mapr |0 |2 |cview |VIRTUAL_VIEW |SELECT COUNT(*) FROM `default`.`customers` | {noformat} 3. So in Hive metastore views are considered as tables of special type. And main benefit is that we also have expanded SQL definition of views (just like in view.drill files). Also reading of the metadata is already implemented in Drill with help of thrift Metastore API. 4. To enable querying of Hive views we'll reuse existing code for Drill views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is actually model for data persisted in .view.drill files_) and then based on this instance return new _*DrillViewTable*_. Using this approach drill will handle hive views the same way as if it was initially defined in Drill and persisted in .view.drill file. 5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ we'll reuse existing code from _*DrillHiveTable*_, so the conversion functionality will be extracted and used for both (table and view) fields type conversions. *Security implications* Consider simple example case where we have users, {code:java} user0 user1 user2 \ / group12 {code} and sample db where object names contains user or group who should access them {code:java} db_all tbl_user0 vw_user0 tbl_group12 vw_group12 {code} There are two Hive authorization modes supported by Drill - SQL Standart and Strorage Based authorization. For SQL Standart authorization permissions were granted using SQL: {code:java} SET ROLE admin; GRANT SELECT ON db_all.tbl_user0 TO USER user0; GRANT SELECT ON db_all.vw_user0 TO USER user0; CREATE ROLE group12; GRANT ROLE group12 TO USER user1; GRANT ROLE group12 TO USER user2; GRANT SELECT ON db_all.tbl_group12 TO ROLE group12; GRANT SELECT ON db_all.vw_group12 TO ROLE group12; {code} And for Storage based authorization permissions were granted using commands: {code:java} hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0 hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0 hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12 hadoop fs -chown user1:group12 /user/hive/warehouse/db_all.db/tbl_group12{code} Then the following table shows us results of queries for both authorization models. *SQL Standart Storage Based Authorization* ||SQL||user0||user1||user2|| ||user0||user1||user2|| |SHOW TABLES IN hive.db_all;| all| all| all| |Accessibe tables + all views|Accessibe tables + all views|Accessibe tables + all views| |SELECT * FROM hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)| (x)| (x)| |SELECT * FROM hive.db_all.vw_user0;| (/)| (x)| (x)| | (/)| (x)| (x)| |SELECT * FROM hive.db_all.tbl_group12;| (x)| (/)| (/)| | (x)| (/)| (/)| |SELECT * FROM hive.db_all.vw_group12;| (x)| (/)| (/)| | (x)| (/)| (/)| |SELECT * FROM INFORMATION_SCHEMA.`TABLES` WHERE TABLE_SCHEMA = 'hive.db_all';| all| all| all| |Accessibe tables + all views|Accessibe tables + all views|Accessibe tables + all views| |DESCRIBE hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)| (x)| (x)| |DESCRIBE hive.db_all.vw_user0;| (/)| (x) | (x)| | (/)| (/)| (/)| |DESCRIBE hive.db_all.tbl_group12;| (x)| (/)| (/)| | (x)| (/) | (/)| |DESCRIBE hive.db_all.vw_group12;| (x)| (/)| (/)| | (/)| (/)| (/)| (!) *Warning:* Because views in
[jira] [Updated] (DRILL-6862) Update Calcite to 1.18.0
[ https://issues.apache.org/jira/browse/DRILL-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-6862: Summary: Update Calcite to 1.18.0 (was: Migrate to Calcite 1.18.0 ) > Update Calcite to 1.18.0 > - > > Key: DRILL-6862 > URL: https://issues.apache.org/jira/browse/DRILL-6862 > Project: Apache Drill > Issue Type: Task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > > After ongoing release of the new Calcite version we will change our > dependency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6862) Migrate to Calcite 1.18.0
Igor Guzenko created DRILL-6862: --- Summary: Migrate to Calcite 1.18.0 Key: DRILL-6862 URL: https://issues.apache.org/jira/browse/DRILL-6862 Project: Apache Drill Issue Type: Task Reporter: Igor Guzenko Assignee: Igor Guzenko After ongoing release of the new Calcite version we will change our dependency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6944) UnsupportedOperationException thrown for view over MapR-DB binary table
[ https://issues.apache.org/jira/browse/DRILL-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-6944: Description: 1. Create MapR-DB binary table and put some data using HBase shell: {code:none} hbase shell create '/tmp/bintable','name','address' put '/tmp/bintable','100','name:first_name','john' put '/tmp/bintable','100','name:last_name','doe' put '/tmp/bintable','100','address:city','Newark' put '/tmp/bintable','100','address:state','nj' scan '/tmp/bintable' {code} 2. Drill config: ensure that dfs storage plugin has "connection": "maprfs:///" and contains format: {code:java} "maprdb": { "type": "maprdb", "allTextMode": true, "enablePushdown": false, "disableCountOptimization": true } {code} 3. Check that table can be selected from Drill : {code:java} select * from dfs.`/tmp/bintable`; {code} 4. Create Drill view {code:java} create view dfs.tmp.`testview` as select * from dfs.`/tmp/bintable`; {code} 5. Query the view results into exception: {code:java} 0: jdbc:drill:> select * from dfs.tmp.`testview`; Error: SYSTEM ERROR: UnsupportedOperationException: Unable to convert cast expression CastExpression [input=`address`, type=minor_type: MAP mode: REQUIRED ] into string. Please, refer to logs for more information. [Error Id: 109acd00-7456-4a74-8a17-485f8999000f on node1.cluster.com:31010] (state=,code=0) {code} *UPDATE* This issue may be reproduced also when avro files with map columns are queried using Drill. Appropriate test was added to PR commit. was: 1. Create MapR-DB binary table and put some data using HBase shell: {code:none} hbase shell create '/tmp/bintable','name','address' put '/tmp/bintable','100','name:first_name','john' put '/tmp/bintable','100','name:last_name','doe' put '/tmp/bintable','100','address:city','Newark' put '/tmp/bintable','100','address:state','nj' scan '/tmp/bintable' {code} 2. Drill config: ensure that dfs storage plugin has "connection": "maprfs:///" and contains format: {code:java} "maprdb": { "type": "maprdb", "allTextMode": true, "enablePushdown": false, "disableCountOptimization": true } {code} 3. Check that table can be selected from Drill : {code:java} select * from dfs.`/tmp/bintable`; {code} 4. Create Drill view {code:java} create view dfs.tmp.`testview` as select * from dfs.`/tmp/bintable`; {code} 5. Query the view results into exception: {code:java} 0: jdbc:drill:> select * from dfs.tmp.`testview`; Error: SYSTEM ERROR: UnsupportedOperationException: Unable to convert cast expression CastExpression [input=`address`, type=minor_type: MAP mode: REQUIRED ] into string. Please, refer to logs for more information. [Error Id: 109acd00-7456-4a74-8a17-485f8999000f on node1.cluster.com:31010] (state=,code=0) {code} > UnsupportedOperationException thrown for view over MapR-DB binary table > --- > > Key: DRILL-6944 > URL: https://issues.apache.org/jira/browse/DRILL-6944 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types, Storage - MapRDB >Affects Versions: 1.15.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > > 1. Create MapR-DB binary table and put some data using HBase shell: > {code:none} > hbase shell > create '/tmp/bintable','name','address' > put '/tmp/bintable','100','name:first_name','john' > put '/tmp/bintable','100','name:last_name','doe' > put '/tmp/bintable','100','address:city','Newark' > put '/tmp/bintable','100','address:state','nj' > scan '/tmp/bintable' > {code} > 2. Drill config: ensure that dfs storage plugin has "connection": > "maprfs:///" and contains format: > {code:java} > "maprdb": { > "type": "maprdb", > "allTextMode": true, > "enablePushdown": false, > "disableCountOptimization": true > } > {code} > 3. Check that table can be selected from Drill : > {code:java} > select * from dfs.`/tmp/bintable`; > {code} > 4. Create Drill view > {code:java} > create view dfs.tmp.`testview` as select * from dfs.`/tmp/bintable`; > {code} > 5. Query the view results into exception: > {code:java} > 0: jdbc:drill:> select * from dfs.tmp.`testview`; > Error: SYSTEM ERROR: UnsupportedOperationException: Unable to convert cast > expression CastExpression [input=`address`, type=minor_type: MAP > mode: REQUIRED > ] into string. > Please, refer to logs for more information. > [Error Id: 109acd00-7456-4a74-8a17-485f8999000f on node1.cluster.com:31010] > (state=,code=0) > {code} > *UPDATE* > This issue may be reproduced also when avro files with map columns are > queried using Drill. Appropriate test was added to PR commit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7151) Show only accessible tables when Hive authorization enabled
Igor Guzenko created DRILL-7151: --- Summary: Show only accessible tables when Hive authorization enabled Key: DRILL-7151 URL: https://issues.apache.org/jira/browse/DRILL-7151 Project: Apache Drill Issue Type: Improvement Reporter: Igor Guzenko Assignee: Igor Guzenko The SHOW TABLES for Hive worked inconsistently for very long time. Before changes introduced by DRILL-7115 only accessible tables were shown only when Hive Storage Based Authorization is enabled, but for SQL Standard Based Authorization all tables were shown to user ([related discussion|https://github.com/apache/drill/pull/461#discussion_r58753354]). In scope of DRILL-7115 the only accessible restriction for Storage Based Authorization was weakened in order to improve query performance. There is still need to improve security of Hive show tables query and at the same time do not violate performance requirements. For SQL Standard Based Authorization this can be done by asking ```HiveAuthorizationHelper.authorizerV2``` for table's 'SELECT' permission. For Storage Based Authorization performance acceptable approach is not known for now, one of ideas is try using appropriate Hive storage based authorizer class for the purpose. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-540) Allow querying hive views in Drill
[ https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-540: --- Description: Currently hive views cannot be queried from drill. This Jira aims to add support for Hive views in Drill. *Implementation details:* # Drill persists it's views metadata in file with suffix .view.drill using json format. For example: {noformat} { "name" : "view_from_calcite_1_4", "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0", "fields" : [ { "name" : "*", "type" : "ANY", "isNullable" : true } ], "workspaceSchemaPath" : [ "dfs", "tmp" ] } {noformat} Later Drill parses the metadata and uses it to treat view names in SQL as a subquery. 2. In Apache Hive metadata about views is stored in similar way to tables. Below is example from metastore.TBLS : {noformat} TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME |TBL_TYPE |VIEW_EXPANDED_TEXT | ---||--|-|--|--|--|--|--|---| 2 |1542111078 |1 |0|mapr |0 |2 |cview |VIRTUAL_VIEW |SELECT COUNT(*) FROM `default`.`customers` | {noformat} 3. So in Hive metastore views are considered as tables of special type. And main benefit is that we also have expanded SQL definition of views (just like in view.drill files). Also reading of the metadata is already implemented in Drill with help of thrift Metastore API. 4. To enable querying of Hive views we'll reuse existing code for Drill views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is actually model for data persisted in .view.drill files_) and then based on this instance return new _*DrillViewTable*_. Using this approach drill will handle hive views the same way as if it was initially defined in Drill and persisted in .view.drill file. 5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ we'll reuse existing code from _*DrillHiveTable*_, so the conversion functionality will be extracted and used for both (table and view) fields type conversions. *Security implications* Consider simple example case where we have users, {code:java} user0 user1 user2 \ / group12 {code} and sample db where object names contains user or group who should access them {code:java} db_all tbl_user0 vw_user0 tbl_group12 vw_group12 {code} There are two Hive authorization modes supported by Drill - SQL Standart and Strorage Based authorization. For SQL Standart authorization permissions were granted using SQL: {code:java} SET ROLE admin; GRANT SELECT ON db_all.tbl_user0 TO USER user0; GRANT SELECT ON db_all.vw_user0 TO USER user0; CREATE ROLE group12; GRANT ROLE group12 TO USER user1; GRANT ROLE group12 TO USER user2; GRANT SELECT ON db_all.tbl_group12 TO ROLE group12; GRANT SELECT ON db_all.vw_group12 TO ROLE group12; {code} And for Storage based authorization permissions were granted using commands: {code:java} hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0 hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0 hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12 hadoop fs -chown user1:group12 /user/hive/warehouse/db_all.db/tbl_group12{code} Then the following table shows us results of queries for both authorization models. *SQL Standart | Storage Based Authorization* ||SQL||user0||user1||user2|| ||user0||user1||user2|| |*Queries executed using Drill :*| | | | | | | | |SHOW TABLES IN hive.db_all;| all| all| all| |Accessibe tables + all views|Accessibe tables + all views|Accessibe tables + all views| |SELECT * FROM hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)| (x)| (x)| |SELECT * FROM hive.db_all.vw_user0;| (/)| (x)| (x)| | (/)| (x)| (x)| |SELECT * FROM hive.db_all.tbl_group12;| (x)| (/)| (/)| | (x)| (/)| (/)| |SELECT * FROM hive.db_all.vw_group12;| (x)| (/)| (/)| | (x)| (/)| (/)| |SELECT * FROM INFORMATION_SCHEMA.`TABLES` WHERE TABLE_SCHEMA = 'hive.db_all';| all| all| all| |Accessibe tables + all views|Accessibe tables + all views|Accessibe tables + all views| |DESCRIBE hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)| (x)| (x)| |DESCRIBE hive.db_all.vw_user0;| (/)| (x) | (x)| | (/)| (/)| (/)| |DESCRIBE hive.db_all.tbl_group12;| (x)| (/)| (/)| | (x)| (/) | (/)| |DESCRIBE hive.db_all.vw_group12;| (x)| (/)|
[jira] [Commented] (DRILL-540) Allow querying hive views in Drill
[ https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808496#comment-16808496 ] Igor Guzenko commented on DRILL-540: Hi [~bbevens], I think adding the warning is useful, except the last sentence 'For current example views were defined as selection over appropriate tables'. Actually, for *Storage Based Authorization* it impacts only show tables query. Like I shown in comparison table, all views will be returned as a result. Because we don't know permissions until user tries to select the view, then view is expanded (converted to query) and underlying tables used in query are validated for permissions. Thanks, Igor > Allow querying hive views in Drill > -- > > Key: DRILL-540 > URL: https://issues.apache.org/jira/browse/DRILL-540 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Hive >Reporter: Ramana Inukonda Nagaraj >Assignee: Igor Guzenko >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.16.0 > > > Currently hive views cannot be queried from drill. > This Jira aims to add support for Hive views in Drill. > *Implementation details:* > # Drill persists it's views metadata in file with suffix .view.drill using > json format. For example: > {noformat} > { > "name" : "view_from_calcite_1_4", > "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0", > "fields" : [ { > "name" : "*", > "type" : "ANY", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "dfs", "tmp" ] > } > {noformat} > Later Drill parses the metadata and uses it to treat view names in SQL as a > subquery. > 2. In Apache Hive metadata about views is stored in similar way to > tables. Below is example from metastore.TBLS : > > {noformat} > TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID > |TBL_NAME |TBL_TYPE |VIEW_EXPANDED_TEXT | > ---||--|-|--|--|--|--|--|---| > 2 |1542111078 |1 |0|mapr |0 |2 |cview >|VIRTUAL_VIEW |SELECT COUNT(*) FROM `default`.`customers` | > {noformat} > 3. So in Hive metastore views are considered as tables of special type. > And main benefit is that we also have expanded SQL definition of views (just > like in view.drill files). Also reading of the metadata is already > implemented in Drill with help of thrift Metastore API. > 4. To enable querying of Hive views we'll reuse existing code for Drill > views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for > _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which > is actually model for data persisted in .view.drill files_) and then based on > this instance return new _*DrillViewTable*_. Using this approach drill will > handle hive views the same way as if it was initially defined in Drill and > persisted in .view.drill file. > 5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ > we'll reuse existing code from _*DrillHiveTable*_, so the conversion > functionality will be extracted and used for both (table and view) fields > type conversions. > *Security implications* > Consider simple example case where we have users, > {code:java} > user0 user1 user2 >\ / > group12 > {code} > and sample db where object names contains user or group who should access > them > {code:java} > db_all > tbl_user0 > vw_user0 > tbl_group12 > vw_group12 > {code} > There are two Hive authorization modes supported by Drill - SQL Standart and > Strorage Based authorization. For SQL Standart authorization permissions > were granted using SQL: > {code:java} > SET ROLE admin; > GRANT SELECT ON db_all.tbl_user0 TO USER user0; > GRANT SELECT ON db_all.vw_user0 TO USER user0; > CREATE ROLE group12; > GRANT ROLE group12 TO USER user1; > GRANT ROLE group12 TO USER user2; > GRANT SELECT ON db_all.tbl_group12 TO ROLE group12; > GRANT SELECT ON db_all.vw_group12 TO ROLE group12; > {code} > And for Storage based authorization permissions were granted using commands: > {code:java} > hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0 > hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0 > hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12 > hadoop fs -chown user1:group12 > /user/hive/warehouse/db_all.db/tbl_group12{code} > Then the following table shows us results of queries for both authorization > models. > > *SQL Standart | Storage Based >
[jira] [Commented] (DRILL-7087) Integrate Arrow's Gandiva into Drill
[ https://issues.apache.org/jira/browse/DRILL-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789486#comment-16789486 ] Igor Guzenko commented on DRILL-7087: - Hello [~weijie]. As well as I understand, related changes won't bring fundamental changes of Drill's value vectors internal structure. Looks like you're going to pass our vectors along with computations into Apache Arrow, so then Arrow will perform computation on existing data inside vector. Correct ? > Integrate Arrow's Gandiva into Drill > > > Key: DRILL-7087 > URL: https://issues.apache.org/jira/browse/DRILL-7087 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Codegen, Execution - Relational Operators >Reporter: weijie.tong >Assignee: weijie.tong >Priority: Major > > It's a prior work to integrate arrow into drill by invoking the its gandiva > feature. Comparing arrow and drill 's in memory column representation , > there's different null representation internal now. Drill use 1 byte while > arrow using 1 bit to indicate one null row. Also all columns of arrow is > nullable now. Apart from those basic differences , they have same memory > representation to the different data types. > The integrating strategy is to invoke arrow's JniWrapper's native method > directly by passing the ValueVector's memory address. > I have done a implementation at our own Drill version by integrating gandiva > into Drill's project operator. The performance shows that there's nearly 1 > times performance gain at expression computation. > So if there's no objection , I will submit a related PR to contribute this > feature. Also this issue waits for arrow's related issue[ARROW-4819]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7096) Develop vector for canonical Map
[ https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791928#comment-16791928 ] Igor Guzenko commented on DRILL-7096: - Hello [~Paul.Rogers], I have few thoughts about related concerns: 1) About problem described in the Jira (effective get by key) I think most probably we will go with sorting of keys before writing maps into new vector. As well as the Map vector will be used for Hive Map and MapObjectInspector will give us Map for each row in column, it won't be an issue to sort keys before writing. 2) Unnest functionality that you mentioned may be implemented as conversion from new MapVector to current Map(*Struct*)Vector. All keys across all rows will be converted to strings and on meeting new key, new vector will be created for holding value assigned to the key. Of course users should be aware that their rows has limited sets of shared keys, otherwise when all keys in all rows are unique we will get OOM error very quickly. I guess we can calculate rate of new unique key additions while converting each row and detect the key uniqueness problem very quickly. 3) What relates to use cases, first place where the new vector will be used is reading map columns from Hive. And it looks reasonable to follow their restriction on keys (use only primitives). Also at least we need to support all existing functionality related to Map datatype. I started listing of use cases in [Hive Complex Types design document|https://docs.google.com/document/d/1yEcaJi9dyksfMs4w5_GsZCQH_Pffe-HLeLVNNKsV7CA/edit?usp=sharing], which is in progress now and later will be attached to DRILL-3290. Please feel free to add comments in design doc, everything will be useful for me because I'm writing such document for the first time. 4) About using unions for values I guess you're thinking in therms of support JSON maps flexibility. In such case I'd rather go with all text mode for map values, than pollute memory and code with unions. For case when type of map values is clearly determined (like in Hive) we have rich set of datatype specific vectors, though Hive unions also may be used as map values, at least we will know clearly amount of necessary types for them. 5) Now [~KazydubB] is working on the new vector design and he'll contribute his results to design document mentioned previously. Thanks, Igor Guzenko > Develop vector for canonical Map > - > > Key: DRILL-7096 > URL: https://issues.apache.org/jira/browse/DRILL-7096 > Project: Apache Drill > Issue Type: Improvement >Reporter: Igor Guzenko >Assignee: Bohdan Kazydub >Priority: Major > > Canonical Map datatype can be represented using combination of three > value vectors: > keysVector - vector for storing keys of each map > valuesVector - vector for storing values of each map > offsetsVector - vector for storing of start indexes of next each map > So it's not very hard to create such Map vector, but there is a major issue > with such map representation. It's hard to search maps values by key in such > vector, need to investigate some advanced techniques to make such search > efficient. Or find other more suitable options to represent map datatype in > world of vectors. > After question about maps, Apache Arrow developers responded that for Java > they don't have real Map vector, for now they just have logical Map type > definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for > Arrow too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7115) Improve Hive schema show tables performance
Igor Guzenko created DRILL-7115: --- Summary: Improve Hive schema show tables performance Key: DRILL-7115 URL: https://issues.apache.org/jira/browse/DRILL-7115 Project: Apache Drill Issue Type: Improvement Components: Storage - Hive, Storage - Information Schema Reporter: Igor Guzenko Assignee: Igor Guzenko In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 20mins. The schema has nearly ~8000 tables. Whereas the same in beeline(Hive) is throwing the result in a split second(~ 0.2 secs). I tested the same in my test cluster by creating 6000 tables(empty!) in Hive and then doing "show tables" in Drill. It took more than 2 mins(~140 secs). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6923) Show schemas uses default(user defined) schema first for resolving table from information_schema
[ https://issues.apache.org/jira/browse/DRILL-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-6923: Fix Version/s: (was: 1.16.0) 1.17.0 > Show schemas uses default(user defined) schema first for resolving table from > information_schema > > > Key: DRILL-6923 > URL: https://issues.apache.org/jira/browse/DRILL-6923 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.14.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Minor > Fix For: 1.17.0 > > > Show tables tries to find table `information_schema`.`schemata` in default > (user defined) schema, and after failed attempt it resolves table > successfully against root schema. Please check description below for details > explained using example with hive plugin. > *Abstract* > When Drill used with enabled Hive SQL Standard authorization, execution of > queries like, > {code:sql} > USE hive.db_general; > SHOW SCHEMAS LIKE 'hive.%'; {code} > results in error DrillRuntimeException: Failed to use the Hive authorization > components: Error getting object from metastore for Object > [type=TABLE_OR_VIEW, name=db_general.information_schema] . > *Details* > Consider showSchemas() test similar to one defined in > TestSqlStdBasedAuthorization : > {code:java} > @Test > public void showSchemas() throws Exception { > test("USE " + hivePluginName + "." + db_general); > testBuilder() > .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'") > .unOrdered() > .baselineColumns("SCHEMA_NAME") > .baselineValues("hive.db_general") > .baselineValues("hive.default") > .go(); > } > {code} > Currently execution of such test will produce following stacktrace: > {code:none} > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed > to use the Hive authorization components: Error getting object from metastore > for Object [type=TABLE_OR_VIEW, name=db_general.information_schema] > at > org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149) > at > org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134) > at > org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450) > at > org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233) > at > org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214) > at > org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63) > at > org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83) > at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288) > at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143) > at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99) > at > org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203) > at > org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105) > at > org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032) > at > org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014) > at > org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284) > at > org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943) > at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918) >
[jira] [Updated] (DRILL-6923) Show schemas uses default(user defined) schema first for resolving table from information_schema
[ https://issues.apache.org/jira/browse/DRILL-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-6923: Priority: Minor (was: Major) > Show schemas uses default(user defined) schema first for resolving table from > information_schema > > > Key: DRILL-6923 > URL: https://issues.apache.org/jira/browse/DRILL-6923 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.14.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Minor > Fix For: 1.16.0 > > > Show tables tries to find table `information_schema`.`schemata` in default > (user defined) schema, and after failed attempt it resolves table > successfully against root schema. Please check description below for details > explained using example with hive plugin. > *Abstract* > When Drill used with enabled Hive SQL Standard authorization, execution of > queries like, > {code:sql} > USE hive.db_general; > SHOW SCHEMAS LIKE 'hive.%'; {code} > results in error DrillRuntimeException: Failed to use the Hive authorization > components: Error getting object from metastore for Object > [type=TABLE_OR_VIEW, name=db_general.information_schema] . > *Details* > Consider showSchemas() test similar to one defined in > TestSqlStdBasedAuthorization : > {code:java} > @Test > public void showSchemas() throws Exception { > test("USE " + hivePluginName + "." + db_general); > testBuilder() > .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'") > .unOrdered() > .baselineColumns("SCHEMA_NAME") > .baselineValues("hive.db_general") > .baselineValues("hive.default") > .go(); > } > {code} > Currently execution of such test will produce following stacktrace: > {code:none} > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed > to use the Hive authorization components: Error getting object from metastore > for Object [type=TABLE_OR_VIEW, name=db_general.information_schema] > at > org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149) > at > org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134) > at > org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450) > at > org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233) > at > org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214) > at > org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63) > at > org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83) > at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288) > at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143) > at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99) > at > org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203) > at > org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105) > at > org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032) > at > org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014) > at > org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284) > at > org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60) > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943) > at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918) > at >
[jira] [Assigned] (DRILL-7096) Develop vector for canonical Map
[ https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-7096: --- Assignee: Bohdan Kazydub (was: Igor Guzenko) > Develop vector for canonical Map > - > > Key: DRILL-7096 > URL: https://issues.apache.org/jira/browse/DRILL-7096 > Project: Apache Drill > Issue Type: Improvement >Reporter: Igor Guzenko >Assignee: Bohdan Kazydub >Priority: Major > > Canonical Map datatype can be represented using combination of three > value vectors: > keysVector - vector for storing keys of each map > valuesVector - vector for storing values of each map > offsetsVector - vector for storing of start indexes of next each map > So it's not very hard to create such Map vector, but there is a major issue > with such map representation. It's hard to search maps values by key in such > vector, need to investigate some advanced techniques to make such search > efficient. Or find other more suitable options to represent map datatype in > world of vectors. > After question about maps, Apache Arrow developers responded that for Java > they don't have real Map vector, for now they just have logical Map type > definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for > Arrow too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-3587) Select hive's struct data gives IndexOutOfBoundsException instead of unsupported error
[ https://issues.apache.org/jira/browse/DRILL-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-3587: --- Assignee: Igor Guzenko > Select hive's struct data gives IndexOutOfBoundsException instead of > unsupported error > -- > > Key: DRILL-3587 > URL: https://issues.apache.org/jira/browse/DRILL-3587 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.2.0 >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: Future > > > I have a hive table that has a STRUCT data column. > hive> select c15 from alltypes; > OK > NULL > {"r":null,"s":null} > {"r":1,"s":{"a":2,"b":"x"}} > From drill: > select c15 from alltypes; > Error: SYSTEM ERROR: IndexOutOfBoundsException: index (1) must be less than > size (1) > Since drill currently does not support hive struct data type, drill should > display user friendly error that hive struct data type is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7097) Rename MapVector to StructVector
Igor Guzenko created DRILL-7097: --- Summary: Rename MapVector to StructVector Key: DRILL-7097 URL: https://issues.apache.org/jira/browse/DRILL-7097 Project: Apache Drill Issue Type: Improvement Reporter: Igor Guzenko Assignee: Igor Guzenko For a long time Drill's MapVector was actually more suitable for representing Struct data. And in Apache Arrow it was actually renamed to StructVector. To align our code with Arrow and give space for planned implementation of canonical Map (DRILL-7096) we need to rename existing MapVector and all related classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7096) Develop vector for canonical Map
Igor Guzenko created DRILL-7096: --- Summary: Develop vector for canonical Map Key: DRILL-7096 URL: https://issues.apache.org/jira/browse/DRILL-7096 Project: Apache Drill Issue Type: Improvement Reporter: Igor Guzenko Assignee: Igor Guzenko Canonical Map datatype can be represented using combination of three value vectors: keysVector - vector for storing keys of each map valuesVector - vector for storing values of each map offsetsVector - vector for storing of start indexes of next each map So it's not very hard to create such Map vector, but there is a major issue with such map representation. It's hard to search maps values by key in such vector, need to investigate some advanced techniques to make such search efficient. Or find other more suitable options to represent map datatype in world of vectors. After question about maps, Apache Arrow developers responded that for Java they don't have real Map vector, for now they just have logical Map type definition where they define Map like: List< Struct >. So implementation of value vector would be useful for Arrow too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6856) Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions
[ https://issues.apache.org/jira/browse/DRILL-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko closed DRILL-6856. --- Resolution: Fixed Reviewer: Volodymyr Vysotskyi > Wrong result returned if the query filters a boolean column with both "is > true" and "is null" conditions > > > Key: DRILL-6856 > URL: https://issues.apache.org/jira/browse/DRILL-6856 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > Attachments: 0_0_0.parquet > > > *Data:* > A parquet file with a boolean column that contains null values. > An example is attached. > *Query:* > {code:sql} > select bool_col from dfs.tmp.`Test_data` where bool_col is true or bool_col > is null > {code} > *Result:* > {noformat} > null > null > {noformat} > *Plan:* > {noformat} > 00-00Screen : rowType = RecordType(ANY bool_col): rowcount = 3.75, > cumulative cost = {37.875 rows, 97.875 cpu, 15.0 io, 0.0 network, 0.0 > memory}, id = 1980 > 00-01 Project(bool_col=[$0]) : rowType = RecordType(ANY bool_col): > rowcount = 3.75, cumulative cost = {37.5 rows, 97.5 cpu, 15.0 io, 0.0 > network, 0.0 memory}, id = 1979 > 00-02SelectionVectorRemover : rowType = RecordType(ANY bool_col): > rowcount = 3.75, cumulative cost = {33.75 rows, 93.75 cpu, 15.0 io, 0.0 > network, 0.0 memory}, id = 1978 > 00-03 Filter(condition=[IS NULL($0)]) : rowType = RecordType(ANY > bool_col): rowcount = 3.75, cumulative cost = {30.0 rows, 90.0 cpu, 15.0 io, > 0.0 network, 0.0 memory}, id = 1977 > 00-04Scan(table=[[dfs, tmp, Test_data]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///tmp/Test_data]], selectionRoot=maprfs:/tmp/Test_data, > numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`bool_col`]]]) : > rowType = RecordType(ANY bool_col): rowcount = 15.0, cumulative cost = {15.0 > rows, 15.0 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1976 > {noformat} > *Notes:* > - "true" values were not included in the result though they should have. > - Result is correct if use "bool_col = true" instead of "is true" > - In the plan you can see that "is true" condition is absent in the Filter > operator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6856) Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions
[ https://issues.apache.org/jira/browse/DRILL-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758951#comment-16758951 ] Igor Guzenko commented on DRILL-6856: - Fixed by Calcite update in [pull request|[https://github.com/apache/drill/pull/1631]|https://github.com/apache/drill/pull/1631].] (added test for the case). > Wrong result returned if the query filters a boolean column with both "is > true" and "is null" conditions > > > Key: DRILL-6856 > URL: https://issues.apache.org/jira/browse/DRILL-6856 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > Attachments: 0_0_0.parquet > > > *Data:* > A parquet file with a boolean column that contains null values. > An example is attached. > *Query:* > {code:sql} > select bool_col from dfs.tmp.`Test_data` where bool_col is true or bool_col > is null > {code} > *Result:* > {noformat} > null > null > {noformat} > *Plan:* > {noformat} > 00-00Screen : rowType = RecordType(ANY bool_col): rowcount = 3.75, > cumulative cost = {37.875 rows, 97.875 cpu, 15.0 io, 0.0 network, 0.0 > memory}, id = 1980 > 00-01 Project(bool_col=[$0]) : rowType = RecordType(ANY bool_col): > rowcount = 3.75, cumulative cost = {37.5 rows, 97.5 cpu, 15.0 io, 0.0 > network, 0.0 memory}, id = 1979 > 00-02SelectionVectorRemover : rowType = RecordType(ANY bool_col): > rowcount = 3.75, cumulative cost = {33.75 rows, 93.75 cpu, 15.0 io, 0.0 > network, 0.0 memory}, id = 1978 > 00-03 Filter(condition=[IS NULL($0)]) : rowType = RecordType(ANY > bool_col): rowcount = 3.75, cumulative cost = {30.0 rows, 90.0 cpu, 15.0 io, > 0.0 network, 0.0 memory}, id = 1977 > 00-04Scan(table=[[dfs, tmp, Test_data]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///tmp/Test_data]], selectionRoot=maprfs:/tmp/Test_data, > numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`bool_col`]]]) : > rowType = RecordType(ANY bool_col): rowcount = 15.0, cumulative cost = {15.0 > rows, 15.0 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1976 > {noformat} > *Notes:* > - "true" values were not included in the result though they should have. > - Result is correct if use "bool_col = true" instead of "is true" > - In the plan you can see that "is true" condition is absent in the Filter > operator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DRILL-6856) Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions
[ https://issues.apache.org/jira/browse/DRILL-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758951#comment-16758951 ] Igor Guzenko edited comment on DRILL-6856 at 2/2/19 10:44 AM: -- Fixed by Calcite update in [pull request|https://github.com/apache/drill/pull/1631]. was (Author: ihorhuzenko): Fixed by Calcite update in [pull request|[https://github.com/apache/drill/pull/1631]|https://github.com/apache/drill/pull/1631].] (added test for the case). > Wrong result returned if the query filters a boolean column with both "is > true" and "is null" conditions > > > Key: DRILL-6856 > URL: https://issues.apache.org/jira/browse/DRILL-6856 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > Attachments: 0_0_0.parquet > > > *Data:* > A parquet file with a boolean column that contains null values. > An example is attached. > *Query:* > {code:sql} > select bool_col from dfs.tmp.`Test_data` where bool_col is true or bool_col > is null > {code} > *Result:* > {noformat} > null > null > {noformat} > *Plan:* > {noformat} > 00-00Screen : rowType = RecordType(ANY bool_col): rowcount = 3.75, > cumulative cost = {37.875 rows, 97.875 cpu, 15.0 io, 0.0 network, 0.0 > memory}, id = 1980 > 00-01 Project(bool_col=[$0]) : rowType = RecordType(ANY bool_col): > rowcount = 3.75, cumulative cost = {37.5 rows, 97.5 cpu, 15.0 io, 0.0 > network, 0.0 memory}, id = 1979 > 00-02SelectionVectorRemover : rowType = RecordType(ANY bool_col): > rowcount = 3.75, cumulative cost = {33.75 rows, 93.75 cpu, 15.0 io, 0.0 > network, 0.0 memory}, id = 1978 > 00-03 Filter(condition=[IS NULL($0)]) : rowType = RecordType(ANY > bool_col): rowcount = 3.75, cumulative cost = {30.0 rows, 90.0 cpu, 15.0 io, > 0.0 network, 0.0 memory}, id = 1977 > 00-04Scan(table=[[dfs, tmp, Test_data]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///tmp/Test_data]], selectionRoot=maprfs:/tmp/Test_data, > numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`bool_col`]]]) : > rowType = RecordType(ANY bool_col): rowcount = 15.0, cumulative cost = {15.0 > rows, 15.0 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1976 > {noformat} > *Notes:* > - "true" values were not included in the result though they should have. > - Result is correct if use "bool_col = true" instead of "is true" > - In the plan you can see that "is true" condition is absent in the Filter > operator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-540) Allow querying hive views in Drill
[ https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813007#comment-16813007 ] Igor Guzenko commented on DRILL-540: Hi [~bbevens], Sounds very good, thank you. > Allow querying hive views in Drill > -- > > Key: DRILL-540 > URL: https://issues.apache.org/jira/browse/DRILL-540 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Hive >Reporter: Ramana Inukonda Nagaraj >Assignee: Igor Guzenko >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.16.0 > > > Currently hive views cannot be queried from drill. > This Jira aims to add support for Hive views in Drill. > *Implementation details:* > # Drill persists it's views metadata in file with suffix .view.drill using > json format. For example: > {noformat} > { > "name" : "view_from_calcite_1_4", > "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0", > "fields" : [ { > "name" : "*", > "type" : "ANY", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "dfs", "tmp" ] > } > {noformat} > Later Drill parses the metadata and uses it to treat view names in SQL as a > subquery. > 2. In Apache Hive metadata about views is stored in similar way to > tables. Below is example from metastore.TBLS : > > {noformat} > TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID > |TBL_NAME |TBL_TYPE |VIEW_EXPANDED_TEXT | > ---||--|-|--|--|--|--|--|---| > 2 |1542111078 |1 |0|mapr |0 |2 |cview >|VIRTUAL_VIEW |SELECT COUNT(*) FROM `default`.`customers` | > {noformat} > 3. So in Hive metastore views are considered as tables of special type. > And main benefit is that we also have expanded SQL definition of views (just > like in view.drill files). Also reading of the metadata is already > implemented in Drill with help of thrift Metastore API. > 4. To enable querying of Hive views we'll reuse existing code for Drill > views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for > _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which > is actually model for data persisted in .view.drill files_) and then based on > this instance return new _*DrillViewTable*_. Using this approach drill will > handle hive views the same way as if it was initially defined in Drill and > persisted in .view.drill file. > 5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ > we'll reuse existing code from _*DrillHiveTable*_, so the conversion > functionality will be extracted and used for both (table and view) fields > type conversions. > *Security implications* > Consider simple example case where we have users, > {code:java} > user0 user1 user2 >\ / > group12 > {code} > and sample db where object names contains user or group who should access > them > {code:java} > db_all > tbl_user0 > vw_user0 > tbl_group12 > vw_group12 > {code} > There are two Hive authorization modes supported by Drill - SQL Standart and > Strorage Based authorization. For SQL Standart authorization permissions > were granted using SQL: > {code:java} > SET ROLE admin; > GRANT SELECT ON db_all.tbl_user0 TO USER user0; > GRANT SELECT ON db_all.vw_user0 TO USER user0; > CREATE ROLE group12; > GRANT ROLE group12 TO USER user1; > GRANT ROLE group12 TO USER user2; > GRANT SELECT ON db_all.tbl_group12 TO ROLE group12; > GRANT SELECT ON db_all.vw_group12 TO ROLE group12; > {code} > And for Storage based authorization permissions were granted using commands: > {code:java} > hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0 > hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0 > hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12 > hadoop fs -chown user1:group12 > /user/hive/warehouse/db_all.db/tbl_group12{code} > Then the following table shows us results of queries for both authorization > models. > > *SQL Standart | Storage Based > Authorization* > ||SQL||user0||user1||user2|| ||user0||user1||user2|| > |*Queries executed using Drill :*| | | | | | | | > |SHOW TABLES IN hive.db_all;| all| all| all| |Accessibe tables + all > views|Accessibe tables + all views|Accessibe tables + all views| > |SELECT * FROM hive.db_all.tbl_user0;| (/)| (x)| (x)| | (/)| > (x)| (x)| > |SELECT * FROM hive.db_all.vw_user0;| (/)| (x)| (x)| | (/)| > (x)| (x)|
[jira] [Closed] (DRILL-7097) Rename MapVector to StructVector
[ https://issues.apache.org/jira/browse/DRILL-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko closed DRILL-7097. --- Resolution: Abandoned Abandoned according to discussion [https://lists.apache.org/thread.html/5773447b82c9d6e508a62f66354613b812493cbb8c0c1cc463ccdd9f@%3Cdev.drill.apache.org%3E] . > Rename MapVector to StructVector > > > Key: DRILL-7097 > URL: https://issues.apache.org/jira/browse/DRILL-7097 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > > For a long time Drill's MapVector was actually more suitable for representing > Struct data. And in Apache Arrow it was actually renamed to StructVector. To > align our code with Arrow and give space for planned implementation of > canonical Map (DRILL-7096) we need to rename existing MapVector and all > related classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7280) Support Hive UDFs for arrays
[ https://issues.apache.org/jira/browse/DRILL-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848718#comment-16848718 ] Igor Guzenko commented on DRILL-7280: - Sample tests to add into TestInbuiltHiveUDFs: {code:java} @Test public void arraySize() throws Exception { testBuilder() .sqlQuery("SELECT size(arr_n_0) from hive.int_array order by rid") .ordered() .baselineColumns("EXPR$0") .baselineValuesForSingleColumn(3) .baselineValuesForSingleColumn(0) .baselineValuesForSingleColumn(1) .go(); } @Test public void arrayContains() throws Exception { testBuilder() .sqlQuery("SELECT array_contains(arr_n_0, 0) FROM hive.int_array order by rid") .ordered() .baselineColumns("EXPR$0") .baselineValuesForSingleColumn(true) .baselineValuesForSingleColumn(false) .baselineValuesForSingleColumn(false) .go(); } @Test public void sortArray() throws Exception { testBuilder() .sqlQuery("SELECT sort_array(arr_n_0) FROM hive.int_array order by rid") .ordered() .baselineColumns("EXPR$0") .baselineValues(asList(-1,0,1)) .baselineValues(asList()) .baselineValues(asList(100500)) .go(); } @Test public void concatWs() throws Exception { testBuilder() .sqlQuery("SELECT concat_ws(',',arr_n_0) FROM hive.string_array order by rid") .ordered() .baselineColumns("EXPR$0") .baselineValues("First Value Of Array,komlnp,The Last Value") .baselineValues("") .baselineValues("ABCaBcA-1-2-3") .go(); } {code} > Support Hive UDFs for arrays > > > Key: DRILL-7280 > URL: https://issues.apache.org/jira/browse/DRILL-7280 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Minor > > Add support for Hive UDFs accepting or returning arrays: > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] . Some > examples of such UDFs are: > > ||Hive UDF||Drill alternative|| > |size(array)|repeated_count(array)| > |array_contains(array, value)|repeated_contains(array, value)| > |sort_array(arr_n_0)|NA| > |concat_ws(string SEP, array)|NA| > etc. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7280) Support Hive UDFs for arrays
Igor Guzenko created DRILL-7280: --- Summary: Support Hive UDFs for arrays Key: DRILL-7280 URL: https://issues.apache.org/jira/browse/DRILL-7280 Project: Apache Drill Issue Type: Sub-task Reporter: Igor Guzenko Assignee: Igor Guzenko Add support for Hive UDFs accepting or returning arrays: [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] . Some examples of such UDFs are: ||Hive UDF||Drill alternative|| |size(array)|repeated_count(array)| |array_contains(array, value)|repeated_contains(array, value)| |sort_array(arr_n_0)|NA| |concat_ws(string SEP, array)|NA| etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7251) Read Hive array w/o nulls
Igor Guzenko created DRILL-7251: --- Summary: Read Hive array w/o nulls Key: DRILL-7251 URL: https://issues.apache.org/jira/browse/DRILL-7251 Project: Apache Drill Issue Type: Sub-task Components: Storage - Hive Reporter: Igor Guzenko -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7254) Read Hive union w/o nulls
Igor Guzenko created DRILL-7254: --- Summary: Read Hive union w/o nulls Key: DRILL-7254 URL: https://issues.apache.org/jira/browse/DRILL-7254 Project: Apache Drill Issue Type: Sub-task Reporter: Igor Guzenko -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7252) Read Hive map using canonical Map vector
Igor Guzenko created DRILL-7252: --- Summary: Read Hive map using canonical Map vector Key: DRILL-7252 URL: https://issues.apache.org/jira/browse/DRILL-7252 Project: Apache Drill Issue Type: Sub-task Reporter: Igor Guzenko -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7253) Read Hive struct w/o nulls
Igor Guzenko created DRILL-7253: --- Summary: Read Hive struct w/o nulls Key: DRILL-7253 URL: https://issues.apache.org/jira/browse/DRILL-7253 Project: Apache Drill Issue Type: Sub-task Reporter: Igor Guzenko -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7252) Read Hive map using canonical Map vector
[ https://issues.apache.org/jira/browse/DRILL-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-7252: --- Assignee: Igor Guzenko > Read Hive map using canonical Map vector > - > > Key: DRILL-7252 > URL: https://issues.apache.org/jira/browse/DRILL-7252 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7253) Read Hive struct w/o nulls
[ https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-7253: --- Assignee: Igor Guzenko > Read Hive struct w/o nulls > -- > > Key: DRILL-7253 > URL: https://issues.apache.org/jira/browse/DRILL-7253 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7254) Read Hive union w/o nulls
[ https://issues.apache.org/jira/browse/DRILL-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-7254: --- Assignee: Igor Guzenko > Read Hive union w/o nulls > - > > Key: DRILL-7254 > URL: https://issues.apache.org/jira/browse/DRILL-7254 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7097) Rename MapVector to StructVector
[ https://issues.apache.org/jira/browse/DRILL-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7097: Issue Type: Sub-task (was: Improvement) Parent: DRILL-3290 > Rename MapVector to StructVector > > > Key: DRILL-7097 > URL: https://issues.apache.org/jira/browse/DRILL-7097 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > > For a long time Drill's MapVector was actually more suitable for representing > Struct data. And in Apache Arrow it was actually renamed to StructVector. To > align our code with Arrow and give space for planned implementation of > canonical Map (DRILL-7096) we need to rename existing MapVector and all > related classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-4782) TO_TIME function cannot separate time from date time string
[ https://issues.apache.org/jira/browse/DRILL-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-4782: Labels: ready-to-commit (was: ) > TO_TIME function cannot separate time from date time string > --- > > Key: DRILL-4782 > URL: https://issues.apache.org/jira/browse/DRILL-4782 > Project: Apache Drill > Issue Type: Improvement > Components: Server >Affects Versions: 1.6.0, 1.7.0 > Environment: CentOS 7 >Reporter: Matt Keranen >Assignee: Dmytriy Grinchenko >Priority: Minor > Labels: ready-to-commit > Fix For: 1.17.0 > > > TO_TIME('2016-03-03 00:00', ''-MM-dd HH:mm') returns "05:14:46.656" > instead of the expected "00:00:00" > Adding and additional split does work as expected: TO_TIME(SPLIT('2016-03-03 > 00:00', ' ')[1], 'HH:mm') -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7096) Develop vector for canonical Map
[ https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7096: Issue Type: Sub-task (was: Improvement) Parent: DRILL-3290 > Develop vector for canonical Map > - > > Key: DRILL-7096 > URL: https://issues.apache.org/jira/browse/DRILL-7096 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Bohdan Kazydub >Priority: Major > > Canonical Map datatype can be represented using combination of three > value vectors: > keysVector - vector for storing keys of each map > valuesVector - vector for storing values of each map > offsetsVector - vector for storing of start indexes of next each map > So it's not very hard to create such Map vector, but there is a major issue > with such map representation. It's hard to search maps values by key in such > vector, need to investigate some advanced techniques to make such search > efficient. Or find other more suitable options to represent map datatype in > world of vectors. > After question about maps, Apache Arrow developers responded that for Java > they don't have real Map vector, for now they just have logical Map type > definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for > Arrow too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7251) Read Hive array w/o nulls
[ https://issues.apache.org/jira/browse/DRILL-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-7251: --- Assignee: Igor Guzenko > Read Hive array w/o nulls > - > > Key: DRILL-7251 > URL: https://issues.apache.org/jira/browse/DRILL-7251 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Hive >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7255) Support nulls for all levels of nesting
Igor Guzenko created DRILL-7255: --- Summary: Support nulls for all levels of nesting Key: DRILL-7255 URL: https://issues.apache.org/jira/browse/DRILL-7255 Project: Apache Drill Issue Type: Sub-task Reporter: Igor Guzenko -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7255) Support nulls for all levels of nesting
[ https://issues.apache.org/jira/browse/DRILL-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-7255: --- Assignee: Igor Guzenko > Support nulls for all levels of nesting > --- > > Key: DRILL-7255 > URL: https://issues.apache.org/jira/browse/DRILL-7255 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-2000) Hive generated parquet files with maps show up in drill as map(key value)
[ https://issues.apache.org/jira/browse/DRILL-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-2000: --- Assignee: Bohdan Kazydub > Hive generated parquet files with maps show up in drill as map(key value) > - > > Key: DRILL-2000 > URL: https://issues.apache.org/jira/browse/DRILL-2000 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 0.7.0 >Reporter: Ramana Inukonda Nagaraj >Assignee: Bohdan Kazydub >Priority: Major > Fix For: Future > > > Created a parquet file in hive having the following DDL > hive> desc alltypesparquet; > OK > c1 int > c2 boolean > c3 double > c4 string > c5 array > c6 map > c7 map > c8 struct > c9 tinyint > c10 smallint > c11 float > c12 bigint > c13 array> > c15 struct> > c16 array,n:int>> > Time taken: 0.076 seconds, Fetched: 15 row(s) > Columns which are maps such as c6 map > show up as > 0: jdbc:drill:> select c6 from `/user/hive/warehouse/alltypesparquet`; > ++ > | c6 | > ++ > | {"map":[]} | > | {"map":[]} | > | {"map":[{"key":1,"value":"eA=="},{"key":2,"value":"eQ=="}]} | > ++ > 3 rows selected (0.078 seconds) > hive> select c6 from alltypesparquet; > NULL > NULL > {1:"x",2:"y"} > Ignore the wrong values, I have raised DRILL-1997 for the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7268) Read Hive array with parquet native reader
Igor Guzenko created DRILL-7268: --- Summary: Read Hive array with parquet native reader Key: DRILL-7268 URL: https://issues.apache.org/jira/browse/DRILL-7268 Project: Apache Drill Issue Type: Sub-task Reporter: Igor Guzenko -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-7268) Read Hive array with parquet native reader
[ https://issues.apache.org/jira/browse/DRILL-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-7268: --- Assignee: Igor Guzenko > Read Hive array with parquet native reader > -- > > Key: DRILL-7268 > URL: https://issues.apache.org/jira/browse/DRILL-7268 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7315) Revise precision and scale order in the method arguments
[ https://issues.apache.org/jira/browse/DRILL-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7315: Labels: ready-to-commit (was: ) > Revise precision and scale order in the method arguments > > > Key: DRILL-7315 > URL: https://issues.apache.org/jira/browse/DRILL-7315 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.16.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The current code has different variations of scale and precision orderings in > the method arguments. The goal for this Jira is to make it more consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7268) Read Hive array with parquet native reader
[ https://issues.apache.org/jira/browse/DRILL-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7268: Description: When Hive stores array data in parquet format, it creates schema for such columns, like: arr_n_0 ARRAY {code:java} optional group arr_n_0 (LIST) { repeated group bag { optional int32 array_element; } } {code} Sample result before the changes was: {code:java} {"bag":[{"array_element":1},\{"array_element":2}]} {code} After the changes Drill reads only array elements data without additional keys like "bag" or "array_element": {code}[1,2] \{code} . Please read Design Doc linked to parent task for more details. was: When Hive stores array data in parquet format, it creates schema for such columns, like: arr_n_0 ARRAY {code} optional group arr_n_0 (LIST) { repeated group bag { optional int32 array_element; } } {code} Sample result before the changes was: {code}\{"bag":[{"array_element":1},\{"array_element":2}]} \{code} After the changes Drill reads only array elements data without additional keys like "bag" or "array_element": {code} [1,2] \{code} . Please read Design Doc linked to parent task for more details. > Read Hive array with parquet native reader > -- > > Key: DRILL-7268 > URL: https://issues.apache.org/jira/browse/DRILL-7268 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > When Hive stores array data in parquet format, it creates schema for such > columns, like: > arr_n_0 ARRAY > {code:java} > optional group arr_n_0 (LIST) { > repeated group bag { > optional int32 array_element; > } > } > {code} > Sample result before the changes was: > {code:java} > {"bag":[{"array_element":1},\{"array_element":2}]} > {code} > After the changes Drill reads only array elements data without additional > keys like "bag" or "array_element": > {code}[1,2] \{code} . > > Please read Design Doc linked to parent task for more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7268) Read Hive array with parquet native reader
[ https://issues.apache.org/jira/browse/DRILL-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7268: Description: When Hive stores array data in parquet format, it creates schema for such columns, like: arr_n_0 ARRAY {code} optional group arr_n_0 (LIST) { repeated group bag { optional int32 array_element; } } {code} Sample result before the changes was: {code}\{"bag":[{"array_element":1},\{"array_element":2}]} \{code} After the changes Drill reads only array elements data without additional keys like "bag" or "array_element": {code} [1,2] \{code} . Please read Design Doc linked to parent task for more details. > Read Hive array with parquet native reader > -- > > Key: DRILL-7268 > URL: https://issues.apache.org/jira/browse/DRILL-7268 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > When Hive stores array data in parquet format, it creates schema for such > columns, like: > arr_n_0 ARRAY > {code} > optional group arr_n_0 (LIST) { > repeated group bag { > optional int32 array_element; > } > } > {code} > Sample result before the changes was: > {code}\{"bag":[{"array_element":1},\{"array_element":2}]} \{code} > After the changes Drill reads only array elements data without additional > keys like "bag" or "array_element": > {code} [1,2] \{code} . > > Please read Design Doc linked to parent task for more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7268) Read Hive array with parquet native reader
[ https://issues.apache.org/jira/browse/DRILL-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7268: Description: When Hive stores array data in parquet format, it creates schema for such columns, like: arr_n_0 ARRAY {code:java} optional group arr_n_0 (LIST) { repeated group bag { optional int32 array_element; } } {code} Sample result before the changes was: {code:java} {"bag":[{"array_element":1},{"array_element":2}]} {code} After the changes Drill reads only array elements data without additional keys like "bag" or "array_element": {code:java} [1,2]{code} Please read Design Doc linked to parent task for more details. was: When Hive stores array data in parquet format, it creates schema for such columns, like: arr_n_0 ARRAY {code:java} optional group arr_n_0 (LIST) { repeated group bag { optional int32 array_element; } } {code} Sample result before the changes was: {code:java} {"bag":[{"array_element":1},\{"array_element":2}]} {code} After the changes Drill reads only array elements data without additional keys like "bag" or "array_element": {code}[1,2] \{code} . Please read Design Doc linked to parent task for more details. > Read Hive array with parquet native reader > -- > > Key: DRILL-7268 > URL: https://issues.apache.org/jira/browse/DRILL-7268 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > When Hive stores array data in parquet format, it creates schema for such > columns, like: > arr_n_0 ARRAY > {code:java} > optional group arr_n_0 (LIST) { >repeated group bag { > optional int32 array_element; >} > } > {code} > Sample result before the changes was: > {code:java} > {"bag":[{"array_element":1},{"array_element":2}]} > {code} > After the changes Drill reads only array elements data without additional > keys like "bag" or "array_element": > {code:java} > [1,2]{code} > > > Please read Design Doc linked to parent task for more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7215) Null error shown when select file without format
Igor Guzenko created DRILL-7215: --- Summary: Null error shown when select file without format Key: DRILL-7215 URL: https://issues.apache.org/jira/browse/DRILL-7215 Project: Apache Drill Issue Type: Bug Affects Versions: 1.15.0, 1.16.0 Reporter: Igor Guzenko Null error message is shown while querying file without format after use schema: {code:none} select * from `dir/noformat`; Error: VALIDATION ERROR: null [Error Id: b9e3e3a4-f60a-4836-97e9-6078c742f7ad ] (state=,code=0) {code} Steps to reproduce: # Create dir and file w/o format: {code:java} mkdir /tmp/dir& /tmp/dir/noformat{code} # Run drill in embedded mode # Use tmp schema {code:sql} USE dfs.tmp; {code} # Query created file {code:sql} SELECT * FROM `dir/noformat`;{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7345) Strange Behavior for UDFs with ComplexWriter Output
[ https://issues.apache.org/jira/browse/DRILL-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905362#comment-16905362 ] Igor Guzenko commented on DRILL-7345: - Hi [~cgivre], could you please check that the issue is not caused by changes which was added as part of DRILL-6810 ? [Here|https://github.com/apache/drill/blob/85c77134d5d1bb9f96a5417036cccfb263ae8ae7/exec/java-exec/src/main/java/org/apache/drill/exec/expr/annotations/FunctionTemplate.java#L150] in javadoc described some limitations related to ComplexWriter output. > Strange Behavior for UDFs with ComplexWriter Output > --- > > Key: DRILL-7345 > URL: https://issues.apache.org/jira/browse/DRILL-7345 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Minor > > I wrote some UDFs recently and noticed some strange behavior when debugging > them. > This behavior only occurs when there is ComplexWriter as output. > Basically, if the input to the UDF is nullable, Drill doesn't recognize the > UDF at all. I've found that the only way to get Drill to recognize UDFs that > have ComplexWriters as output is: > * Use a non-nullable holder as input > * Remove the null setting completely from the function parameters. > This approach has a drawback in that if the function receives a null value, > it will throw an error and halt execution. My preference would be to allow > null handling, but I've not figured out how to make that happen. > Note: This behavior ONLY occurs when using a ComplexWriter as output. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (DRILL-7326) "Unsupported Operation Exception" appears on attempting to create table in Drill from json, with double nested array
[ https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-7326: --- Assignee: Igor Guzenko > "Unsupported Operation Exception" appears on attempting to create table in > Drill from json, with double nested array > > > Key: DRILL-7326 > URL: https://issues.apache.org/jira/browse/DRILL-7326 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Pavel Semenov >Assignee: Igor Guzenko >Priority: Major > > *STEPS TO REPRODUCE* > # Create json file with which has double nesting array as a value e.g. > {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]] > {code} > # Use CTAS to create table in drill with created json file > # Observe the result > *EXPECTED RESULT* > Table is created > *ACTUAL RESULT* > UnsupportedOperationException appears on attempting to create the table > *ADDITIONAL INFO* > It is possible to create table with with *single* nested array > Error log > {code:java} > Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST > Fragment 0:0 > Please, refer to logs for more information. > [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010] > (java.lang.UnsupportedOperationException) Unsupported type LIST > org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295 > org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226 > org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211 > org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160 > org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (DRILL-7326) Support repeated lists for CTAS parquet format
[ https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7326: Summary: Support repeated lists for CTAS parquet format (was: "Unsupported Operation Exception" appears on attempting to create table in Drill from json, with double nested array) > Support repeated lists for CTAS parquet format > -- > > Key: DRILL-7326 > URL: https://issues.apache.org/jira/browse/DRILL-7326 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Pavel Semenov >Assignee: Igor Guzenko >Priority: Major > > *STEPS TO REPRODUCE* > # Create json file with which has double nesting array as a value e.g. > {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]] > {code} > # Use CTAS to create table in drill with created json file > # Observe the result > *EXPECTED RESULT* > Table is created > *ACTUAL RESULT* > UnsupportedOperationException appears on attempting to create the table > *ADDITIONAL INFO* > It is possible to create table with with *single* nested array > Error log > {code:java} > Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST > Fragment 0:0 > Please, refer to logs for more information. > [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010] > (java.lang.UnsupportedOperationException) Unsupported type LIST > org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295 > org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226 > org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211 > org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160 > org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7326) Support repeated lists for CTAS parquet format
[ https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7326: Issue Type: New Feature (was: Bug) > Support repeated lists for CTAS parquet format > -- > > Key: DRILL-7326 > URL: https://issues.apache.org/jira/browse/DRILL-7326 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.16.0 >Reporter: Pavel Semenov >Assignee: Igor Guzenko >Priority: Major > > *STEPS TO REPRODUCE* > # Create json file with which has double nesting array as a value e.g. > {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]] > {code} > # Use CTAS to create table in drill with created json file > # Observe the result > *EXPECTED RESULT* > Table is created > *ACTUAL RESULT* > UnsupportedOperationException appears on attempting to create the table > *ADDITIONAL INFO* > It is possible to create table with with *single* nested array > Error log > {code:java} > Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST > Fragment 0:0 > Please, refer to logs for more information. > [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010] > (java.lang.UnsupportedOperationException) Unsupported type LIST > org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295 > org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226 > org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211 > org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160 > org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (DRILL-7326) Support repeated lists for CTAS parquet format
[ https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920355#comment-16920355 ] Igor Guzenko commented on DRILL-7326: - Merged to Apache master with commit id [ffab527|https://github.com/apache/drill/commit/ffab527451e0a23eca96f38bce52c790553cc47e]. > Support repeated lists for CTAS parquet format > -- > > Key: DRILL-7326 > URL: https://issues.apache.org/jira/browse/DRILL-7326 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.16.0 >Reporter: Pavel Semenov >Assignee: Igor Guzenko >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > *STEPS TO REPRODUCE* > # Create json file with which has double nesting array as a value e.g. > {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]] > {code} > # Use CTAS to create table in drill with created json file > # Observe the result > *EXPECTED RESULT* > Table is created > *ACTUAL RESULT* > UnsupportedOperationException appears on attempting to create the table > *ADDITIONAL INFO* > It is possible to create table with with *single* nested array > Error log > {code:java} > Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST > Fragment 0:0 > Please, refer to logs for more information. > [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010] > (java.lang.UnsupportedOperationException) Unsupported type LIST > org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295 > org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226 > org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211 > org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160 > org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Issue Comment Deleted] (DRILL-7326) Support repeated lists for CTAS parquet format
[ https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7326: Comment: was deleted (was: Merged to Apache master with commit id [ffab527|https://github.com/apache/drill/commit/ffab527451e0a23eca96f38bce52c790553cc47e].) > Support repeated lists for CTAS parquet format > -- > > Key: DRILL-7326 > URL: https://issues.apache.org/jira/browse/DRILL-7326 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.16.0 >Reporter: Pavel Semenov >Assignee: Igor Guzenko >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > *STEPS TO REPRODUCE* > # Create json file with which has double nesting array as a value e.g. > {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]] > {code} > # Use CTAS to create table in drill with created json file > # Observe the result > *EXPECTED RESULT* > Table is created > *ACTUAL RESULT* > UnsupportedOperationException appears on attempting to create the table > *ADDITIONAL INFO* > It is possible to create table with with *single* nested array > Error log > {code:java} > Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST > Fragment 0:0 > Please, refer to logs for more information. > [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010] > (java.lang.UnsupportedOperationException) Unsupported type LIST > org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295 > org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226 > org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211 > org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160 > org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (DRILL-6181) CTAS should support writing nested structures (nested lists) to parquet.
[ https://issues.apache.org/jira/browse/DRILL-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko resolved DRILL-6181. - Resolution: Fixed Done in scope of DRILL-7326. > CTAS should support writing nested structures (nested lists) to parquet. > > > Key: DRILL-6181 > URL: https://issues.apache.org/jira/browse/DRILL-6181 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.12.0 >Reporter: Khurram Faraaz >Priority: Major > > Both Parquet and Hive support writing nested structures into parquet > https://issues.apache.org/jira/browse/HIVE-8909 > https://issues.apache.org/jira/browse/PARQUET-113 > A CTAS from Drill fails when there is a nested list of lists, in one of the > columns in the project. > JSON data used in the test, note that "arr" is a nested list of lists > > {noformat} > [root@qa102-45 ~]# cat jsonToParquet_02.json > {"id":"123","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]} > {"id":"3","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]} > {"id":"13","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]} > {"id":"12","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]} > {"id":"2","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]} > {"id":"1","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]} > {"id":"230","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]} > {"id":"1230","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]} > {"id":"1123","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]} > {"id":"2123","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]} > {"id":"1523","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]} > [root@qa102-45 ~]# > {noformat} > CTAS fails with UnsupportedOperationException on Drill 1.12.0-mapr commit id > bb07ebbb9ba8742f44689f8bd8efb5853c5edea0 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE tbl_prq_from_json_02 as select > id, arr from `jsonToParquet_02.json`; > Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST > Fragment 0:0 > [Error Id: 7e5b3c2d-9cf1-4e87-96c8-e7e7e8055ddf on qa102-45.qa.lab:31010] > (state=,code=0) > {noformat} > Stack trace from drillbit.log > {noformat} > 2018-02-22 09:56:54,368 [2570fb99-62da-a516-2c1f-0381e21723ae:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: > UnsupportedOperationException: Unsupported type LIST > Fragment 0:0 > [Error Id: 7e5b3c2d-9cf1-4e87-96c8-e7e7e8055ddf on qa102-45.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > UnsupportedOperationException: Unsupported type LIST > Fragment 0:0 > [Error Id: 7e5b3c2d-9cf1-4e87-96c8-e7e7e8055ddf on qa102-45.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586) > ~[drill-common-1.12.0-mapr.jar:1.12.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:301) > [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) > [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267) > [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.12.0-mapr.jar:1.12.0-mapr] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_161] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > Caused by: java.lang.UnsupportedOperationException: Unsupported type LIST > at > org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType(ParquetRecordWriter.java:253) > ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr] > at > org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema(ParquetRecordWriter.java:205) > ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr] > at > org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema(ParquetRecordWriter.java:190) > ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr] > at > org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema(WriterRecordBatch.java:157) > ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr] > at > org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:103) > ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) > ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr] > at
[jira] [Resolved] (DRILL-2241) CTAS fails when writing a repeated list
[ https://issues.apache.org/jira/browse/DRILL-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko resolved DRILL-2241. - Resolution: Fixed Done in scope of DRILL-7326. > CTAS fails when writing a repeated list > --- > > Key: DRILL-2241 > URL: https://issues.apache.org/jira/browse/DRILL-2241 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Parquet >Affects Versions: 0.8.0 >Reporter: Abhishek Girish >Priority: Major > Fix For: Future > > Attachments: drillbit_replist.log > > > Drill can read the following JSON file with a repeated list: > { > "a" : null > "b" : [ ["B1", "B2"] ], > } > Writing this to Parquet via a simple CTAS fails. > > create table temp as select * from `replist.json`; > Log indicates this to be unsupported (UnsupportedOperationException: > Unsupported type LIST) > Log attached. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (DRILL-2768) Improve error message for CTAS fails when writing a repeated list
[ https://issues.apache.org/jira/browse/DRILL-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko resolved DRILL-2768. - Resolution: Won't Fix Not relevant after DRILL-7326. > Improve error message for CTAS fails when writing a repeated list > - > > Key: DRILL-2768 > URL: https://issues.apache.org/jira/browse/DRILL-2768 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Parquet >Affects Versions: 1.0.0 >Reporter: Deneche A. Hakim >Assignee: Bohdan Kazydub >Priority: Major > Fix For: Future > > > Using the following json file: > {code} > { "a" : null, "b" : [ ["B1", "B2"] ] } > {code} > The following CTAS query fails because parquet doesn't support a list > directly nested inside another. We should improve the error message to better > explain this: > {noformat} > 0: jdbc:drill:zk=local> create table t2241 as select * from `2241.json`; > Query failed: Unsupported type LIST > [04423be8-706d-47c2-b73f-384201163d10 on abdel-11.qa.lab:31010] > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7255) Support nulls for all levels of nesting in complex types
[ https://issues.apache.org/jira/browse/DRILL-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7255: Summary: Support nulls for all levels of nesting in complex types (was: Support nulls for all levels of nesting in complex type) > Support nulls for all levels of nesting in complex types > > > Key: DRILL-7255 > URL: https://issues.apache.org/jira/browse/DRILL-7255 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7255) Support nulls for all levels of nesting in complex type
[ https://issues.apache.org/jira/browse/DRILL-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7255: Summary: Support nulls for all levels of nesting in complex type (was: Support nulls for all levels of nesting) > Support nulls for all levels of nesting in complex type > --- > > Key: DRILL-7255 > URL: https://issues.apache.org/jira/browse/DRILL-7255 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7365) Failed to read column added to existing Hive partition
[ https://issues.apache.org/jira/browse/DRILL-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7365: Description: *Prerequisities:* Enable ACID in Hive [https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions]. *Steps to reproduce:* 1) create table hive_bucketed2 (emp_id int, first_name string) PARTITIONED BY (`col_year_month` string) clustered by (emp_id) into 4 buckets stored as orc tblproperties ('transactional'='true'); 2) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values (1, 'A'),(2, 'B'); 3) alter table hive_bucketed2 add columns (age INT); 4) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values (11, '1A', 10),(12, '1B', 22); 5) select * from hive.hive_bucketed2; *Workaround* (may be a little bit {color:#de350b}risky{color}:) : 1. Connect to Hive metastore database. [https://analyticsanvil.files.wordpress.com/2016/08/hive_metastore_database_diagram.png] 2. Find SDS linked to desired PARTITIONS . Actually you need CD_ID's for such SDS. 3. Insert your column into COLUMNS_V2 with CD_ID found at previous step. was: Prerequisities: Enable ACID in Hive https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions. Steps to reproduce: 1) create table hive_bucketed2 (emp_id int, first_name string) PARTITIONED BY (`col_year_month` string) clustered by (emp_id) into 4 buckets stored as orc tblproperties ('transactional'='true'); 2) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values (1, 'A'),(2, 'B'); 3) alter table hive_bucketed2 add columns (age INT); 4) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values (11, '1A', 10),(12, '1B', 22); 5) select * from hive.hive_bucketed2; Workaround (may be a little bit risky:) : 1. Connect to Hive metastore database. https://analyticsanvil.files.wordpress.com/2016/08/hive_metastore_database_diagram.png 2. Find SDS linked to desired PARTITIONS . Actually you need CD_ID's for such SDS. 3. Insert your column into COLUMNS_V2 with CD_ID found at previous step. > Failed to read column added to existing Hive partition > -- > > Key: DRILL-7365 > URL: https://issues.apache.org/jira/browse/DRILL-7365 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Reporter: Igor Guzenko >Priority: Major > > *Prerequisities:* > Enable ACID in Hive > [https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions]. > *Steps to reproduce:* > 1) create table hive_bucketed2 (emp_id int, first_name string) PARTITIONED BY > (`col_year_month` string) clustered by (emp_id) into 4 buckets stored as orc > tblproperties ('transactional'='true'); > 2) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values > (1, 'A'),(2, 'B'); > 3) alter table hive_bucketed2 add columns (age INT); > 4) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values > (11, '1A', 10),(12, '1B', 22); > 5) select * from hive.hive_bucketed2; > *Workaround* (may be a little bit {color:#de350b}risky{color}:) : > 1. Connect to Hive metastore database. > [https://analyticsanvil.files.wordpress.com/2016/08/hive_metastore_database_diagram.png] > 2. Find SDS linked to desired PARTITIONS . Actually you need CD_ID's for such > SDS. > 3. Insert your column into COLUMNS_V2 with CD_ID found at previous step. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (DRILL-7365) Failed to read column added to existing Hive partition
Igor Guzenko created DRILL-7365: --- Summary: Failed to read column added to existing Hive partition Key: DRILL-7365 URL: https://issues.apache.org/jira/browse/DRILL-7365 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Reporter: Igor Guzenko Prerequisities: Enable ACID in Hive https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions. Steps to reproduce: 1) create table hive_bucketed2 (emp_id int, first_name string) PARTITIONED BY (`col_year_month` string) clustered by (emp_id) into 4 buckets stored as orc tblproperties ('transactional'='true'); 2) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values (1, 'A'),(2, 'B'); 3) alter table hive_bucketed2 add columns (age INT); 4) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values (11, '1A', 10),(12, '1B', 22); 5) select * from hive.hive_bucketed2; Workaround (may be a little bit risky:) : 1. Connect to Hive metastore database. https://analyticsanvil.files.wordpress.com/2016/08/hive_metastore_database_diagram.png 2. Find SDS linked to desired PARTITIONS . Actually you need CD_ID's for such SDS. 3. Insert your column into COLUMNS_V2 with CD_ID found at previous step. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7252) Read Hive map using Dict vector
[ https://issues.apache.org/jira/browse/DRILL-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7252: Summary: Read Hive map using Dict vector (was: Read Hive map using canonical Map vector) > Read Hive map using Dict vector > > > Key: DRILL-7252 > URL: https://issues.apache.org/jira/browse/DRILL-7252 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > > Described in DRILL-3290 design doc. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7253) Read Hive struct w/o nulls
[ https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7253: Description: Described in DRILL-3290 design doc. > Read Hive struct w/o nulls > -- > > Key: DRILL-7253 > URL: https://issues.apache.org/jira/browse/DRILL-7253 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.16.0 >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.17.0 > > > Described in DRILL-3290 design doc. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7251) Read Hive array w/o nulls
[ https://issues.apache.org/jira/browse/DRILL-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7251: Description: Described in DRILL-3290 design doc. > Read Hive array w/o nulls > - > > Key: DRILL-7251 > URL: https://issues.apache.org/jira/browse/DRILL-7251 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Hive >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > Described in DRILL-3290 design doc. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7373) Fix problems involving reading from DICT type
[ https://issues.apache.org/jira/browse/DRILL-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7373: Labels: ready-to-commit (was: ) > Fix problems involving reading from DICT type > - > > Key: DRILL-7373 > URL: https://issues.apache.org/jira/browse/DRILL-7373 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > Add better support for different key types ({{boolean}}, {{decimal}}, > {{float}}, {{double}} etc.) when retrieving values by key from {{DICT}} > column when querying data source with known (during query validation phase) > field types (such as Hive table), so that actual key object instance is > created in generated code and is passed to given {{DICT}} reader instead of > generating its value for every row based on {{int}} ({{ArraySegment}}) or > {{String}} ({{NamedSegment}}) value. > This may be achieved by storing original literal value of passed key (as > {{Object}}) in {{PathSegment}} and its type (as {{MajorType}}) and using it > during code generation when reading {{DICT}}'s values by key in > {{EvaluationVisitor}}. > Also, fix NPE when reading some cases involving reading values from {{DICT}} > and fix wrong result when reading complex structures using many ITEM > operators (i.e. , [] brackets), e.g. > {code} > SELECT rid, mc.map_arr_map['key01'][1]['key01.1'] p16 FROM > hive.map_complex_tbl mc > {code} > where {{map_arr_map}} is of following type: {{MAP INT>>>}} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (DRILL-7252) Read Hive map using canonical Map vector
[ https://issues.apache.org/jira/browse/DRILL-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7252: Description: Described in DRILL-3290 design doc. > Read Hive map using canonical Map vector > - > > Key: DRILL-7252 > URL: https://issues.apache.org/jira/browse/DRILL-7252 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > > Described in DRILL-3290 design doc. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (DRILL-7380) Query of a field inside of an array of structs returns null
[ https://issues.apache.org/jira/browse/DRILL-7380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-7380: --- Assignee: Igor Guzenko > Query of a field inside of an array of structs returns null > --- > > Key: DRILL-7380 > URL: https://issues.apache.org/jira/browse/DRILL-7380 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Anton Gozhiy >Assignee: Igor Guzenko >Priority: Major > Attachments: customer_complex.zip > > > *Query:* > {code:sql} > select t.c_orders[0].o_orderstatus from hive.customer_complex t limit 10; > {code} > *Expected results (given from Hive):* > {noformat} > OK > O > F > NULL > O > O > NULL > O > O > NULL > F > {noformat} > *Actual results:* > {noformat} > null > null > null > null > null > null > null > null > null > null > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (DRILL-7381) Query to a map field returns nulls with hive native reader
[ https://issues.apache.org/jira/browse/DRILL-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-7381: --- Assignee: Igor Guzenko > Query to a map field returns nulls with hive native reader > -- > > Key: DRILL-7381 > URL: https://issues.apache.org/jira/browse/DRILL-7381 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Anton Gozhiy >Assignee: Igor Guzenko >Priority: Major > Attachments: customer_complex.zip > > > *Query:* > {code:sql} > select t.c_nation.n_region.r_name from hive.customer_complex t limit 5 > {code} > *Expected results:* > {noformat} > AFRICA > MIDDLE EAST > AMERICA > MIDDLE EAST > AMERICA > {noformat} > *Actual results:* > {noformat} > null > null > null > null > null > {noformat} > *Workaround:* > {code:sql} > set store.hive.optimize_scan_with_native_readers = false; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7381) Query to a map field returns nulls with hive native reader
[ https://issues.apache.org/jira/browse/DRILL-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7381: Labels: ready-to-commit (was: ) > Query to a map field returns nulls with hive native reader > -- > > Key: DRILL-7381 > URL: https://issues.apache.org/jira/browse/DRILL-7381 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Anton Gozhiy >Assignee: Igor Guzenko >Priority: Major > Labels: ready-to-commit > Attachments: customer_complex.zip > > > *Query:* > {code:sql} > select t.c_nation.n_region.r_name from hive.customer_complex t limit 5 > {code} > *Expected results:* > {noformat} > AFRICA > MIDDLE EAST > AMERICA > MIDDLE EAST > AMERICA > {noformat} > *Actual results:* > {noformat} > null > null > null > null > null > {noformat} > *Workaround:* > {code:sql} > set store.hive.optimize_scan_with_native_readers = false; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (DRILL-7380) Query of a field inside of an array of structs returns null
[ https://issues.apache.org/jira/browse/DRILL-7380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko updated DRILL-7380: Labels: ready-to-commit (was: ) > Query of a field inside of an array of structs returns null > --- > > Key: DRILL-7380 > URL: https://issues.apache.org/jira/browse/DRILL-7380 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Anton Gozhiy >Assignee: Igor Guzenko >Priority: Major > Labels: ready-to-commit > Attachments: customer_complex.zip > > > *Query:* > {code:sql} > select t.c_orders[0].o_orderstatus from hive.customer_complex t limit 10; > {code} > *Expected results (given from Hive):* > {noformat} > OK > O > F > NULL > O > O > NULL > O > O > NULL > F > {noformat} > *Actual results:* > {noformat} > null > null > null > null > null > null > null > null > null > null > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)