[jira] [Created] (DRILL-6776) Drill Web UI takes long time for first time load in network isolated environment

2018-10-05 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-6776:
---

 Summary: Drill Web UI takes long time for first time load in 
network isolated environment
 Key: DRILL-6776
 URL: https://issues.apache.org/jira/browse/DRILL-6776
 Project: Apache Drill
  Issue Type: Bug
  Components: Web Server
Affects Versions: 1.14.0
Reporter: Igor Guzenko
Assignee: Igor Guzenko
 Fix For: 1.15.0


In the case when cluster is built upon network isolated from internet, 
First-time loading of UI takes about 25 seconds. Where 15 seconds browser spent 
on waiting while request for 
/ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js
timed out.

JQuery was added to static resources in scope of DRILL-5699, but dependency on 
google's cdn wasn't fully removed. So the static resource was added as a 
[fallback|https://stackoverflow.com/questions/1014203/best-way-to-use-googles-hosted-jquery-but-fall-back-to-my-hosted-library-on-go]
 . I guess the main reason why fallback solution was applied is that there is a 
higher probability that library from google's mdn may be found & loaded from 
browser's cache.


Unfortunately this graceful solution doesn't work for really isolated 
environment, that's 
why we need to fully remove the 
/ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js dependency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-786) Implement CROSS JOIN

2018-11-12 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683489#comment-16683489
 ] 

Igor Guzenko commented on DRILL-786:


Documentation note: 

Due to it's nature cross joins can produce extremely large results, and we 
don't recommend to use the feature if you don't know that results won't cause 
out of memory errors. That's why cross joins are disabled by default, to allow 
explicit cross join syntax you'll have to enable it by setting  
_*planner.enable_nljoin_for_scalar_only*_ option to _*false*_. There is also 
another limitation related to usage of aggregation function over cross join 
relation. When input row count for aggregate function is bigger than value of 
_*planner.slice_target*_ option then query can't be planned (because 2 phase 
aggregation can't be created in such case), as a workaround you should set 
*_planner.enable_multiphase_agg_* to _*false*_. This limitation will be active 
until fix of https://issues.apache.org/jira/browse/DRILL-6839. 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 

[jira] [Updated] (DRILL-540) Allow querying hive views in drill

2018-11-14 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-540:
---
Description: 
Currently hive views cannot be queried from drill.

*Suggested approach*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}{noformat}
        Later drill parses the metadata and uses it to treat view names in SQL 
as a subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views I'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ I'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
I'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

 

  was:
Currently hive views cannot be queried from drill.



> Allow querying hive views in drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> *Suggested approach*
>  # Drill persists it's views metadata in file with suffix .view.drill using 
> json format. For example: 
> {noformat}
> {
>  "name" : "view_from_calcite_1_4",
>  "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
>  "fields" : [ {
>  "name" : "*",
>  "type" : "ANY",
>  "isNullable" : true
>  } ],
>  "workspaceSchemaPath" : [ "dfs", "tmp" ]
> }{noformat}
>         Later drill parses the metadata and uses it to treat view names in 
> SQL as a subquery.
>       2. In Apache Hive metadata about views is stored in similar way to 
> tables. Below is example from metastore.TBLS :
>  
> {noformat}
> TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID 
> |TBL_NAME  |TBL_TYPE  |VIEW_EXPANDED_TEXT |
> ---||--|-|--|--|--|--|--|---|
> 2  |1542111078  |1 |0|mapr  |0 |2 |cview  
>|VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |{noformat}
>       3. So in Hive metastore views are considered as tables of special type. 
> And main benefit is that we also have expanded SQL definition of views (just 
> like in view.drill files). Also reading of the metadata is already 
> implemented in Drill with help of thrift Metastore API.
>       4. To enable querying of Hive views I'll reuse existing code for Drill 
> views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
> _*HiveReadEntry*_ I'll convert the metadata to instance of _*View*_ (_which 
> is actually model for data persisted in .view.drill files_) and then based on 
> this instance return new _*DrillViewTable*_. Using this approach drill will 
> handle hive views the same way as if it was initially defined in Drill and 
> persisted in .view.drill file. 
>      5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
> I'll reuse existing code from _*DrillHiveTable*_, so the conversion 
> functionality will be extracted and used for both 

[jira] [Created] (DRILL-6839) Failed to plan (aggregate + Hash or NL join) when slice target is low

2018-11-09 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-6839:
---

 Summary: Failed to plan (aggregate + Hash or NL join) when slice 
target is low 
 Key: DRILL-6839
 URL: https://issues.apache.org/jira/browse/DRILL-6839
 Project: Apache Drill
  Issue Type: Bug
Reporter: Igor Guzenko
Assignee: Igor Guzenko


Case 1. When nested loop join is about to be used:
-Option "planner.enable_nljoin_for_scalar_only" is set to false
-Option "planner.slice_target" is set to low value for imitation of big input 
tables

 
{code:java}
@Category(SqlTest.class)
public class CrossJoinTest extends ClusterTest {
 @BeforeClass
 public static void setUp() throws Exception {
 startCluster(ClusterFixture.builder(dirTestWatcher));
 }

 @Test
 public void testCrossJoinSucceedsForLowSliceTarget() throws Exception {
   try {
 client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), 
false);
 client.alterSession(ExecConstants.SLICE_TARGET, 1);
 queryBuilder().sql(
"SELECT COUNT(l.nation_id) " +
"FROM cp.`tpch/nation.parquet` l " +
"CROSS JOIN cp.`tpch/region.parquet` r")
 .run();
   } finally {
client.resetSession(ExecConstants.SLICE_TARGET);
client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName());
   }
 }
}{code}
 

Case 2. When hash join is about to be used:
- Option "planner.enable_mergejoin" is set to false, so hash join will be used 
instead
- Option "planner.slice_target" is set to low value for imitation of big input 
tables
- Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in 
PlannerPhase.getPhysicalRules method
{code:java}
@Category(SqlTest.class)
public class CrossJoinTest extends ClusterTest {
 @BeforeClass
 public static void setUp() throws Exception {
   startCluster(ClusterFixture.builder(dirTestWatcher));
 }

 @Test
 public void testInnerJoinSucceedsForLowSliceTarget() throws Exception {
   try {
client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false);
client.alterSession(ExecConstants.SLICE_TARGET, 1);
queryBuilder().sql(
  "SELECT COUNT(l.nation_id) " +
  "FROM cp.`tpch/nation.parquet` l " +
  "INNER JOIN cp.`tpch/region.parquet` r " +
  "ON r.nation_id = l.nation_id")
.run();
   } finally {
client.resetSession(ExecConstants.SLICE_TARGET);
client.resetSession(PlannerSettings.MERGEJOIN.getOptionName());
   }
 }
}
{code}
*Workaround:* To avoid the exception we need to set option
"planner.enable_multiphase_agg" to false. By doing this we avoid unsuccessful 
attempts to create 2 phase aggregation plan 
in StreamAggPrule and guarantee that logical aggregate will be converted to 
physical one. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6839) Failed to plan (aggregate + Hash or NL join) when slice target is low

2018-11-09 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6839:

Description: 
*Case 1.* When nested loop join is about to be used:
 - Option "_planner.enable_nljoin_for_scalar_only_" is set to false
 - Option "_planner.slice_target_" is set to low value for imitation of big 
input tables

 
{code:java}
@Category(SqlTest.class)
public class CrossJoinTest extends ClusterTest {
 @BeforeClass
 public static void setUp() throws Exception {
 startCluster(ClusterFixture.builder(dirTestWatcher));
 }

 @Test
 public void testCrossJoinSucceedsForLowSliceTarget() throws Exception {
   try {
 client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), 
false);
 client.alterSession(ExecConstants.SLICE_TARGET, 1);
 queryBuilder().sql(
"SELECT COUNT(l.nation_id) " +
"FROM cp.`tpch/nation.parquet` l " +
", cp.`tpch/region.parquet` r")
 .run();
   } finally {
client.resetSession(ExecConstants.SLICE_TARGET);
client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName());
   }
 }
}{code}
 

*Case 2.* When hash join is about to be used:
 - Option "planner.enable_mergejoin" is set to false, so hash join will be used 
instead
 - Option "planner.slice_target" is set to low value for imitation of big input 
tables
 - Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in 
PlannerPhase.getPhysicalRules method
{code:java}
@Category(SqlTest.class)
public class CrossJoinTest extends ClusterTest {
 @BeforeClass
 public static void setUp() throws Exception {
   startCluster(ClusterFixture.builder(dirTestWatcher));
 }

 @Test
 public void testInnerJoinSucceedsForLowSliceTarget() throws Exception {
   try {
client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false);
client.alterSession(ExecConstants.SLICE_TARGET, 1);
queryBuilder().sql(
  "SELECT COUNT(l.nation_id) " +
  "FROM cp.`tpch/nation.parquet` l " +
  "INNER JOIN cp.`tpch/region.parquet` r " +
  "ON r.nation_id = l.nation_id")
.run();
   } finally {
client.resetSession(ExecConstants.SLICE_TARGET);
client.resetSession(PlannerSettings.MERGEJOIN.getOptionName());
   }
 }
}
{code}
 

*Workaround:* To avoid the exception we need to set option 
"_planner.enable_multiphase_agg_" to false. By doing this we avoid unsuccessful 
attempts to create 2 phase aggregation plan in StreamAggPrule and guarantee 
that logical aggregate will be converted to physical one. 

 

  was:
Case 1. When nested loop join is about to be used:
-Option "planner.enable_nljoin_for_scalar_only" is set to false
-Option "planner.slice_target" is set to low value for imitation of big input 
tables

 
{code:java}
@Category(SqlTest.class)
public class CrossJoinTest extends ClusterTest {
 @BeforeClass
 public static void setUp() throws Exception {
 startCluster(ClusterFixture.builder(dirTestWatcher));
 }

 @Test
 public void testCrossJoinSucceedsForLowSliceTarget() throws Exception {
   try {
 client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), 
false);
 client.alterSession(ExecConstants.SLICE_TARGET, 1);
 queryBuilder().sql(
"SELECT COUNT(l.nation_id) " +
"FROM cp.`tpch/nation.parquet` l " +
"CROSS JOIN cp.`tpch/region.parquet` r")
 .run();
   } finally {
client.resetSession(ExecConstants.SLICE_TARGET);
client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName());
   }
 }
}{code}
 

Case 2. When hash join is about to be used:
- Option "planner.enable_mergejoin" is set to false, so hash join will be used 
instead
- Option "planner.slice_target" is set to low value for imitation of big input 
tables
- Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in 
PlannerPhase.getPhysicalRules method
{code:java}
@Category(SqlTest.class)
public class CrossJoinTest extends ClusterTest {
 @BeforeClass
 public static void setUp() throws Exception {
   startCluster(ClusterFixture.builder(dirTestWatcher));
 }

 @Test
 public void testInnerJoinSucceedsForLowSliceTarget() throws Exception {
   try {
client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false);
client.alterSession(ExecConstants.SLICE_TARGET, 1);
queryBuilder().sql(
  "SELECT COUNT(l.nation_id) " +
  "FROM cp.`tpch/nation.parquet` l " +
  "INNER JOIN cp.`tpch/region.parquet` r " +
  "ON r.nation_id = l.nation_id")
.run();
   } finally {
client.resetSession(ExecConstants.SLICE_TARGET);
client.resetSession(PlannerSettings.MERGEJOIN.getOptionName());
   }
 }
}
{code}
*Workaround:* To avoid the exception we need to set option
"planner.enable_multiphase_agg" to false. By doing this we avoid unsuccessful 
attempts to create 2 phase aggregation plan 
in StreamAggPrule and guarantee that logical aggregate will be converted to 
physical one. 

 


> Failed to plan (aggregate + Hash or NL join) when 

[jira] [Assigned] (DRILL-6839) Failed to plan (aggregate + Hash or NL join) when slice target is low

2018-11-09 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-6839:
---

Assignee: (was: Igor Guzenko)

> Failed to plan (aggregate + Hash or NL join) when slice target is low 
> --
>
> Key: DRILL-6839
> URL: https://issues.apache.org/jira/browse/DRILL-6839
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Igor Guzenko
>Priority: Major
>
> *Case 1.* When nested loop join is about to be used:
>  - Option "_planner.enable_nljoin_for_scalar_only_" is set to false
>  - Option "_planner.slice_target_" is set to low value for imitation of big 
> input tables
>  
> {code:java}
> @Category(SqlTest.class)
> public class CrossJoinTest extends ClusterTest {
>  @BeforeClass
>  public static void setUp() throws Exception {
>  startCluster(ClusterFixture.builder(dirTestWatcher));
>  }
>  @Test
>  public void testCrossJoinSucceedsForLowSliceTarget() throws Exception {
>try {
>  client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), 
> false);
>  client.alterSession(ExecConstants.SLICE_TARGET, 1);
>  queryBuilder().sql(
> "SELECT COUNT(l.nation_id) " +
> "FROM cp.`tpch/nation.parquet` l " +
> ", cp.`tpch/region.parquet` r")
>  .run();
>} finally {
> client.resetSession(ExecConstants.SLICE_TARGET);
> client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName());
>}
>  }
> }{code}
>  
> *Case 2.* When hash join is about to be used:
>  - Option "planner.enable_mergejoin" is set to false, so hash join will be 
> used instead
>  - Option "planner.slice_target" is set to low value for imitation of big 
> input tables
>  - Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in 
> PlannerPhase.getPhysicalRules method
> {code:java}
> @Category(SqlTest.class)
> public class CrossJoinTest extends ClusterTest {
>  @BeforeClass
>  public static void setUp() throws Exception {
>startCluster(ClusterFixture.builder(dirTestWatcher));
>  }
>  @Test
>  public void testInnerJoinSucceedsForLowSliceTarget() throws Exception {
>try {
> client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false);
> client.alterSession(ExecConstants.SLICE_TARGET, 1);
> queryBuilder().sql(
>   "SELECT COUNT(l.nation_id) " +
>   "FROM cp.`tpch/nation.parquet` l " +
>   "INNER JOIN cp.`tpch/region.parquet` r " +
>   "ON r.nation_id = l.nation_id")
> .run();
>} finally {
> client.resetSession(ExecConstants.SLICE_TARGET);
> client.resetSession(PlannerSettings.MERGEJOIN.getOptionName());
>}
>  }
> }
> {code}
>  
> *Workaround:* To avoid the exception we need to set option 
> "_planner.enable_multiphase_agg_" to false. By doing this we avoid 
> unsuccessful attempts to create 2 phase aggregation plan in StreamAggPrule 
> and guarantee that logical aggregate will be converted to physical one. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-786) Implement CROSS JOIN

2018-10-05 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639674#comment-16639674
 ] 

Igor Guzenko commented on DRILL-786:


[~hanu.ncr] I've addressed all comments in PR. Could you please take a look ? 

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, type: (DrillRecordRow[*, age, name, studentnum])
> rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, 
> importance=0.59049001
> rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 
> network}
> 

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306
 ] 

Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:45 AM:
--

We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 

 


was (Author: ihorhuzenko):
We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, {color:#d04437}it's 
really hard to implement this approach, it requires a lot of time and includes 
a lot of changes to Apache Calcite.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> 

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306
 ] 

Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:53 AM:
--

We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}. {color:#33} I 
have provided results of the investigation in the prior comments.{color}{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 


was (Author: ihorhuzenko):
We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], 

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306
 ] 

Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:53 AM:
--

We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.  {color:#33}I 
have provided results of the investigation in the prior comments.{color}{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 


was (Author: ihorhuzenko):
We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}. {color:#33} I 
have provided results of the investigation in the prior comments.{color}{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> 

[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306
 ] 

Igor Guzenko edited comment on DRILL-786 at 10/2/18 11:50 AM:
--

We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 Also we can consider changing default value of the option to false thus 
queries producing Cartesian product would always succeed.

 


was (Author: ihorhuzenko):
We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, *{color:#d04437}I didn't 
find how this could be implemented{color}*{color:#d04437}.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

2018-10-01 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634323#comment-16634323
 ] 

Igor Guzenko commented on DRILL-786:


I've tried addition of joinContext map to Calcite's Join class and passed it 
through each point where join instance may be copied or recreated: 
JoinToMultiJoinRule.java
LogicalJoin.java
LoptOptimizeJoinRule.java
MultiJoin.java
MutableRels.java
PigRelFactories.java
RelBuilder.java
RelFactories.java
RelStructuredTypeFlattener.java
SqlToRelConverter.java
SubQueryRemoveRule.java 

But even with such verbose changes I wasn't able to overcome problem when both 
implicit and explicit cross joins are present in one query and option  
{color:#59afe1}planner.enable_nljoin_for_scalar_only {color:#33}is set to 
true{color}{color}. Such query should fail with exception that says:  "This 
query cannot be planned possibly due to either a cartesian join or an 
inequality join", {color:#d04437}but it works{color}...  I suggest to leave 
this case and simply enable NestedLoopJoin when explicit cross join is present 
in original query. Such solution may be implemented more easily and it won't 
require any changes to Calcite. 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> 

[jira] [Updated] (DRILL-786) Implement CROSS JOIN

2018-10-04 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-786:
---
Affects Version/s: 1.14.0
 Reviewer: Volodymyr Vysotskyi

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, type: (DrillRecordRow[*, age, name, studentnum])
> rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, 
> importance=0.59049001
> rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 
> network}
> 

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

2018-10-04 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637942#comment-16637942
 ] 

Igor Guzenko commented on DRILL-786:


Looks like we agreed to move on with option 3 without changing default value of 
planner.enable_nljoin_for_scalar_only. Also I'll update error message *from* 

"This query cannot be planned possibly due to either a cartesian join or an 
inequality join. " 

*to* 

"This query cannot be planned possibly due to either a cartesian join or an 
inequality join. 
If cartesian or inequality join is used intentionally, set option 
'planner.enable_nljoin_for_scalar_only' to false and try again."

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, 

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

2018-10-02 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635306#comment-16635306
 ] 

Igor Guzenko commented on DRILL-786:


We considered 3 possible options how the feature could be implemented. Note, in 
text below when I mention option is enabled or disabled it relates to 
*planner.enable_nljoin_for_scalar_only* option.

*Option 1. (Perfect case) :*

Allow nested loop only for nodes that originated from explicit cross join 
syntax but prohibit implicit cross joins when option is enabled. So such query 
should fail when option is true: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
Because cross join of *a* and result of (*b* x *c*) is implicit and should 
depend on option value. But based on my investigation, {color:#d04437}it's 
really hard to implement this approach, it requires a lot of time and includes 
a lot of changes to Apache Calcite.{color}

*Option 2. (Allow all queries with explicit cross join syntax)*

We can allow nested loop join for all queries that contain explicit cross join 
syntax regardless of option value. For example following queries will work in 
such case: 

 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` l CROSS JOIN cp.`tpch/nation.parquet` r  
{code}
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b CROSS JOIN 
cp.`tpch/nation.parquet` c
{code}
But queries that don't contain explicit syntax, will still be dependent on the 
option. For example the following query won't work when option is enabled: 
{code:java}
SELECT * 
FROM cp.`tpch/nation.parquet` a, cp.`tpch/nation.parquet` b
{code}
*Option 3. (Allow cross join syntax only when option enabled)*

This approach is just more narrow case of the previous one. We could allow 
explicit cross join for enabled option, and prohibit it for disabled option. 

 

 

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 

[jira] [Created] (DRILL-6944) UnsupportedOperationException thrown for view over MapR-DB binary table

2019-01-03 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-6944:
---

 Summary: UnsupportedOperationException thrown for view over 
MapR-DB binary table
 Key: DRILL-6944
 URL: https://issues.apache.org/jira/browse/DRILL-6944
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Storage - MapRDB
Affects Versions: 1.15.0
Reporter: Igor Guzenko
Assignee: Igor Guzenko
 Fix For: 1.16.0


1. Create MapR-DB binary table and put some data using HBase shell:
{code:none}
hbase shell
create '/tmp/bintable','name','address'
put '/tmp/bintable','100','name:first_name','john'
put '/tmp/bintable','100','name:last_name','doe'
put '/tmp/bintable','100','address:city','Newark'
put '/tmp/bintable','100','address:state','nj'
scan '/tmp/bintable'
{code}
2. Drill config: ensure that dfs storage plugin has "connection": "maprfs:///" 
and contains format: 
{code:java}
"maprdb": {
"type": "maprdb",
"allTextMode": true,
"enablePushdown": false,
"disableCountOptimization": true
}
{code}
3. Check that table can be selected from Drill : 
{code:java}
select * from dfs.`/tmp/bintable`;
{code}
4. Create Drill view 
{code:java}
create view dfs.tmp.`testview` as select * from dfs.`/tmp/bintable`;
{code}
5. Query the view results into exception:
{code:java}
0: jdbc:drill:> select * from dfs.tmp.`testview`;
Error: SYSTEM ERROR: UnsupportedOperationException: Unable to convert cast 
expression CastExpression [input=`address`, type=minor_type: MAP
mode: REQUIRED
] into string.


Please, refer to logs for more information.

[Error Id: 109acd00-7456-4a74-8a17-485f8999000f on node1.cluster.com:31010] 
(state=,code=0)

{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6977) Improve Hive tests configuration

2019-01-15 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-6977:
---

 Summary: Improve Hive tests configuration
 Key: DRILL-6977
 URL: https://issues.apache.org/jira/browse/DRILL-6977
 Project: Apache Drill
  Issue Type: Bug
Reporter: Igor Guzenko


Class HiveTestDataGenerator is responsible for initialization of hive metadata 
service and configuration of hive storage plugin for tested drillbit. 
Originally it was supposed to be initialized once before all tests in hive 
module, but actually it's initialized for every test class. And such 
initialization takes a lot of time, so it's worth to spend some time to 
accelerate hive tests.

This task has two main aims: 
 # Use HiveTestDataGenerator once for all test classes 
 # Provide flexible configuration of Hive tests that can be used with 
ClusterFicture for autonomic(not bounded to HiveTestBase) test classes 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6977) Improve Hive tests configuration

2019-01-15 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6977:

Issue Type: Improvement  (was: Bug)

> Improve Hive tests configuration
> 
>
> Key: DRILL-6977
> URL: https://issues.apache.org/jira/browse/DRILL-6977
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Igor Guzenko
>Priority: Major
>
> Class HiveTestDataGenerator is responsible for initialization of hive 
> metadata service and configuration of hive storage plugin for tested 
> drillbit. Originally it was supposed to be initialized once before all tests 
> in hive module, but actually it's initialized for every test class. And such 
> initialization takes a lot of time, so it's worth to spend some time to 
> accelerate hive tests.
> This task has two main aims: 
>  # Use HiveTestDataGenerator once for all test classes 
>  # Provide flexible configuration of Hive tests that can be used with 
> ClusterFicture for autonomic(not bounded to HiveTestBase) test classes 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6977) Improve Hive tests configuration

2019-01-15 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-6977:
---

 Assignee: Igor Guzenko
Fix Version/s: 1.16.0
  Component/s: Tools, Build & Test

> Improve Hive tests configuration
> 
>
> Key: DRILL-6977
> URL: https://issues.apache.org/jira/browse/DRILL-6977
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build  Test
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> Class HiveTestDataGenerator is responsible for initialization of hive 
> metadata service and configuration of hive storage plugin for tested 
> drillbit. Originally it was supposed to be initialized once before all tests 
> in hive module, but actually it's initialized for every test class. And such 
> initialization takes a lot of time, so it's worth to spend some time to 
> accelerate hive tests.
> This task has two main aims: 
>  # Use HiveTestDataGenerator once for all test classes 
>  # Provide flexible configuration of Hive tests that can be used with 
> ClusterFicture for autonomic(not bounded to HiveTestBase) test classes 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6977) Improve Hive tests configuration

2019-01-15 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6977:

Description: 
Class HiveTestDataGenerator is responsible for initialization of hive metadata 
service and configuration of hive storage plugin for tested drillbit. 
Originally it was supposed to be initialized once before all tests in hive 
module, but actually it's initialized for every test class. And such 
initialization takes a lot of time, so it's worth to spend some time to 
accelerate hive tests.

This task has two main aims: 
 # Use HiveTestDataGenerator once for all test classes 
 # Provide flexible configuration of Hive tests that can be used with 
ClusterFixture for autonomic(not bounded to HiveTestBase) test classes 

  was:
Class HiveTestDataGenerator is responsible for initialization of hive metadata 
service and configuration of hive storage plugin for tested drillbit. 
Originally it was supposed to be initialized once before all tests in hive 
module, but actually it's initialized for every test class. And such 
initialization takes a lot of time, so it's worth to spend some time to 
accelerate hive tests.

This task has two main aims: 
 # Use HiveTestDataGenerator once for all test classes 
 # Provide flexible configuration of Hive tests that can be used with 
ClusterFicture for autonomic(not bounded to HiveTestBase) test classes 


> Improve Hive tests configuration
> 
>
> Key: DRILL-6977
> URL: https://issues.apache.org/jira/browse/DRILL-6977
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build  Test
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> Class HiveTestDataGenerator is responsible for initialization of hive 
> metadata service and configuration of hive storage plugin for tested 
> drillbit. Originally it was supposed to be initialized once before all tests 
> in hive module, but actually it's initialized for every test class. And such 
> initialization takes a lot of time, so it's worth to spend some time to 
> accelerate hive tests.
> This task has two main aims: 
>  # Use HiveTestDataGenerator once for all test classes 
>  # Provide flexible configuration of Hive tests that can be used with 
> ClusterFixture for autonomic(not bounded to HiveTestBase) test classes 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6898) Web UI cannot be used without internet connection (jquery loaded from ajax.googleapis.com)

2018-12-13 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720016#comment-16720016
 ] 

Igor Guzenko commented on DRILL-6898:
-

The issue with jQuery  fixed in PR [https://github.com/apache/drill/pull/1495.] 

> Web UI cannot be used without internet connection (jquery loaded from 
> ajax.googleapis.com)
> --
>
> Key: DRILL-6898
> URL: https://issues.apache.org/jira/browse/DRILL-6898
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Paul Bormans
>Priority: Major
> Fix For: 1.15.0
>
>
> When opening the web ui in an environment that does not have an internet 
> connection, then the jquery js library is not loaded and the website does not 
> function as it should.
> One solution can be to add a configuration option to use local/packages 
> javascript libraries iso loading these from a CDN.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6898) Web UI cannot be used without internet connection (jquery loaded from ajax.googleapis.com)

2018-12-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6898:

Fix Version/s: 1.15.0

> Web UI cannot be used without internet connection (jquery loaded from 
> ajax.googleapis.com)
> --
>
> Key: DRILL-6898
> URL: https://issues.apache.org/jira/browse/DRILL-6898
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Paul Bormans
>Priority: Major
> Fix For: 1.15.0
>
>
> When opening the web ui in an environment that does not have an internet 
> connection, then the jquery js library is not loaded and the website does not 
> function as it should.
> One solution can be to add a configuration option to use local/packages 
> javascript libraries iso loading these from a CDN.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6898) Web UI cannot be used without internet connection (jquery loaded from ajax.googleapis.com)

2018-12-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko closed DRILL-6898.
---
Resolution: Fixed

> Web UI cannot be used without internet connection (jquery loaded from 
> ajax.googleapis.com)
> --
>
> Key: DRILL-6898
> URL: https://issues.apache.org/jira/browse/DRILL-6898
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Paul Bormans
>Priority: Major
> Fix For: 1.15.0
>
>
> When opening the web ui in an environment that does not have an internet 
> connection, then the jquery js library is not loaded and the website does not 
> function as it should.
> One solution can be to add a configuration option to use local/packages 
> javascript libraries iso loading these from a CDN.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6908) Hive Testing: Replace cumbersome assertions with TestBuilder.csvBaselineFile()

2018-12-17 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko closed DRILL-6908.
---
Resolution: Won't Fix

 CSVTestBuilder is not suitable for tests with big variety of columns datatypes 
and it would be worthless to spend a lot of time for fixing the cast issues 
related to the test builder's validation query.

> Hive Testing: Replace cumbersome assertions with TestBuilder.csvBaselineFile()
> --
>
> Key: DRILL-6908
> URL: https://issues.apache.org/jira/browse/DRILL-6908
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Igor Guzenko
>Priority: Minor
>
> For improving Hive tests readability it would be nice to rewrite some 
> cumbersome tests to use CSVTestBuilder for assertions. This issue was created 
> because it requires a lot of changes for different Hive tests and also some 
> additional time may be necessary to ensure that CSVTestBuilder can read all 
> datatypes (dates with different formats, floating point numbers, big decimals 
> etc.) into correct values for assertions. 
> Below is list of test methods to be rewritten: 
>  * TestHiveStorage.readAllSupportedHiveDataTypes()
>  * TestHiveStorage.orderByOnHiveTable()
>  * TestHiveStorage.readingFromSmallTableWithSkipHeaderAndFooter()
>  
>  * TestHiveViewsSupport.selectStarFromView()
>  * TestHiveViewsSupport.useHiveAndSelectStarFromView()
>  * TestHiveViewsSupport.viewWithAllSupportedDataTypes()
>  * 
> TestHiveDrillNativeParquetReader.testReadAllSupportedHiveDataTypesNativeParquet()
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6917) Add test for DRILL-6912

2018-12-20 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6917:

Summary: Add test for DRILL-6912  (was: Add test for test case )

> Add test for DRILL-6912
> ---
>
> Key: DRILL-6917
> URL: https://issues.apache.org/jira/browse/DRILL-6917
> Project: Apache Drill
>  Issue Type: Test
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Minor
> Attachments: TestTwoDrillbitsWithSamePort.java
>
>
> Currently there is working test from attachment, but it's necessary to 
> migrate it into TestGracefulShutdown class. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6917) Add test for test case

2018-12-20 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-6917:
---

 Summary: Add test for test case 
 Key: DRILL-6917
 URL: https://issues.apache.org/jira/browse/DRILL-6917
 Project: Apache Drill
  Issue Type: Test
Reporter: Igor Guzenko
Assignee: Igor Guzenko
 Attachments: TestTwoDrillbitsWithSamePort.java

Currently there is working test from attachment, but it's necessary to migrate 
it into TestGracefulShutdown class. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6917) Add test for DRILL-6912

2018-12-20 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko closed DRILL-6917.
---
   Resolution: Done
Fix Version/s: 1.15.0

Part of commit: d4771219f6d15cf89f7aeab9357bc3b26d9bf052

> Add test for DRILL-6912
> ---
>
> Key: DRILL-6917
> URL: https://issues.apache.org/jira/browse/DRILL-6917
> Project: Apache Drill
>  Issue Type: Test
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Minor
> Fix For: 1.15.0
>
> Attachments: TestTwoDrillbitsWithSamePort.java
>
>
> Currently there is working test from attachment, but it's necessary to 
> migrate it into TestGracefulShutdown class. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-540) Allow querying hive views in Drill

2018-12-17 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-540:
---
Description: 
Currently hive views cannot be queried from drill.

This Jira aims to add support for Hive views in Drill.

*Implementation details:*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}
{noformat}
Later Drill parses the metadata and uses it to treat view names in SQL as a 
subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views we'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

*Security implications*

Consider simple example case where we have users, 
{code:java}
user0  user1 user2
   \ /
  group12
{code}
and  sample db where object names contains user or group who should access them 
     
{code:java}
db_all
tbl_user0
vw_user0
tbl_group12
vw_group12
{code}
There are two Hive authorization modes supported  by Drill - SQL Standart and 
Strorage Based  authorization. For SQL Standart authorization permissions were 
granted using SQL: 
{code:java}
SET ROLE admin;
GRANT SELECT ON db_all.tbl_user0 TO USER user0;
GRANT SELECT ON db_all.vw_user0 TO USER user0;
CREATE ROLE group12;
GRANT ROLE group12 TO USER user1;
GRANT ROLE group12 TO USER user2;
GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
{code}
And for Storage based authorization permissions were granted using commands: 
{code:java}
hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
hadoop fs -chown user1:group12 /user/hive/warehouse/db_all.db/tbl_group12{code}
 Then the following table shows us results of queries for both authorization 
models. 

                                                                    *SQL 
Standart                    Storage Based Authorization*
||SQL||user0||user1||user2||   ||user0||user1||user2||
|*Queries executed using Drill :*| | | | | | | |
|SHOW TABLES IN hive.db_all;|   all|    all|   all| |Accessibe tables + all 
views|Accessibe tables + all views|Accessibe tables + all views|
|SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|       
 (x)|         (x)|
|SELECT * FROM hive.db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|        
(x)|         (x)|
|SELECT * FROM hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|     
   (/)|         (/)|
|SELECT * FROM hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (x)|      
  (/)|         (/)|
|SELECT * FROM INFORMATION_SCHEMA.`TABLES`
 WHERE TABLE_SCHEMA = 'hive.db_all';|   all|  all|  all| |Accessibe tables + 
all views|Accessibe tables + all views|Accessibe tables + all views|
|DESCRIBE hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|         
(x)|         (x)|
|DESCRIBE hive.db_all.vw_user0;|   (/)|   (x) |   (x)| |        (/)|         
(/)|         (/)|
|DESCRIBE hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|         
(/) |         (/)|
|DESCRIBE hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (/)|         
(/)|    

[jira] [Updated] (DRILL-6908) Hive Testing: Replace cumbersome assertions with TestBuilder.csvBaselineFile()

2018-12-17 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6908:

Description: 
For improving Hive tests readability it would be nice to rewrite some 
cumbersome tests to use CSVTestBuilder for assertions. This issue was created 
because it requires a lot of changes for different Hive tests and also some 
additional time may be necessary to ensure that CSVTestBuilder can read all 
datatypes (dates with different formats, floating point numbers, big decimals 
etc.) into correct values for assertions. 

Below is list of test methods to be rewritten: 
 * TestHiveStorage.readAllSupportedHiveDataTypes()

 * TestHiveStorage.orderByOnHiveTable()

 * TestHiveStorage.readingFromSmallTableWithSkipHeaderAndFooter()

 
 * TestHiveViewsSupport.selectStarFromView()

 * TestHiveViewsSupport.useHiveAndSelectStarFromView()

 * TestHiveViewsSupport.viewWithAllSupportedDataTypes()

 
 * 
TestHiveDrillNativeParquetReader.testReadAllSupportedHiveDataTypesNativeParquet()

 

  was:
For improving Hive tests readability it would be nice to rewrite some 
cumbersome tests to use CSVTestBuilder for assertions. This issue was created 
because it requires a lot of changes for different Hive tests and also some 
additional time may be necessary to ensure that CSVTestBuilder can read all 
datatypes (dates with different formats, floating point numbers, big decimals 
etc.) into correct values for assertions. 

Below is list of test methods to be rewritten: 
 * TestHiveStorage.readAllSupportedHiveDataTypes()

 * TestHiveStorage.orderByOnHiveTable()

 * TestHiveStorage.readingFromSmallTableWithSkipHeaderAndFooter()

 

 * TestHiveViewsSupport.selectStarFromView()

 * TestHiveViewsSupport.useHiveAndSelectStarFromView()

 * TestHiveViewsSupport.viewWithAllSupportedDataTypes()

 

 * TestInfoSchemaOnHiveStorage.showTablesFromDb()

 * TestInfoSchemaOnHiveStorage.showDatabases()

 * TestInfoSchemaOnHiveStorage.varCharMaxLengthAndDecimalPrecisionInInfoSchema()

 * TestInfoSchemaOnHiveStorage.showInfoSchema()

 

 * 
TestHiveDrillNativeParquetReader.testReadAllSupportedHiveDataTypesNativeParquet()

 


> Hive Testing: Replace cumbersome assertions with TestBuilder.csvBaselineFile()
> --
>
> Key: DRILL-6908
> URL: https://issues.apache.org/jira/browse/DRILL-6908
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Igor Guzenko
>Priority: Minor
>
> For improving Hive tests readability it would be nice to rewrite some 
> cumbersome tests to use CSVTestBuilder for assertions. This issue was created 
> because it requires a lot of changes for different Hive tests and also some 
> additional time may be necessary to ensure that CSVTestBuilder can read all 
> datatypes (dates with different formats, floating point numbers, big decimals 
> etc.) into correct values for assertions. 
> Below is list of test methods to be rewritten: 
>  * TestHiveStorage.readAllSupportedHiveDataTypes()
>  * TestHiveStorage.orderByOnHiveTable()
>  * TestHiveStorage.readingFromSmallTableWithSkipHeaderAndFooter()
>  
>  * TestHiveViewsSupport.selectStarFromView()
>  * TestHiveViewsSupport.useHiveAndSelectStarFromView()
>  * TestHiveViewsSupport.viewWithAllSupportedDataTypes()
>  
>  * 
> TestHiveDrillNativeParquetReader.testReadAllSupportedHiveDataTypesNativeParquet()
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6908) Hive Testing: Replace cumbersome assertions with TestBuilder.csvBaselineFile()

2018-12-17 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-6908:
---

 Summary: Hive Testing: Replace cumbersome assertions with 
TestBuilder.csvBaselineFile()
 Key: DRILL-6908
 URL: https://issues.apache.org/jira/browse/DRILL-6908
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.16.0
Reporter: Igor Guzenko


For improving Hive tests readability it would be nice to rewrite some 
cumbersome tests to use CSVTestBuilder for assertions. This issue was created 
because it requires a lot of changes for different Hive tests and also some 
additional time may be necessary to ensure that CSVTestBuilder can read all 
datatypes (dates with different formats, floating point numbers, big decimals 
etc.) into correct values for assertions. 

Below is list of test methods to be rewritten: 
 * TestHiveStorage.readAllSupportedHiveDataTypes()

 * TestHiveStorage.orderByOnHiveTable()

 * TestHiveStorage.readingFromSmallTableWithSkipHeaderAndFooter()

 

 * TestHiveViewsSupport.selectStarFromView()

 * TestHiveViewsSupport.useHiveAndSelectStarFromView()

 * TestHiveViewsSupport.viewWithAllSupportedDataTypes()

 

 * TestInfoSchemaOnHiveStorage.showTablesFromDb()

 * TestInfoSchemaOnHiveStorage.showDatabases()

 * TestInfoSchemaOnHiveStorage.varCharMaxLengthAndDecimalPrecisionInInfoSchema()

 * TestInfoSchemaOnHiveStorage.showInfoSchema()

 

 * 
TestHiveDrillNativeParquetReader.testReadAllSupportedHiveDataTypesNativeParquet()

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6923) Show schemas uses default(user defined) schema first for resolving table from information_schema

2018-12-21 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6923:

Description: 
Show tables tries to find table `information_schema`.`schemata` in default 
(user defined) schema, and after failed attempt it resolves table successfully 
against root schema. Please check description below for details explained using 
example with hive plugin. 

*Abstract* 

When Drill used with enabled Hive SQL Standard authorization, execution of 
queries like,
{code:sql}
USE hive.db_general;
SHOW SCHEMAS LIKE 'hive.%'; {code}
results in error DrillRuntimeException: Failed to use the Hive authorization 
components: Error getting object from metastore for Object [type=TABLE_OR_VIEW, 
name=db_general.information_schema] . 

*Details* 

Consider showSchemas() test similar to one defined in 
TestSqlStdBasedAuthorization : 
{code:java}
@Test
public void showSchemas() throws Exception {
  test("USE " + hivePluginName + "." + db_general);
  testBuilder()
  .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'")
  .unOrdered()
  .baselineColumns("SCHEMA_NAME")
  .baselineValues("hive.db_general")
  .baselineValues("hive.default")
  .go();
}
{code}
Currently execution of such test will produce following stacktrace: 
{code:none}
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed to 
use the Hive authorization components: Error getting object from metastore for 
Object [type=TABLE_OR_VIEW, name=db_general.information_schema]
at 
org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149)
at 
org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134)
at 
org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450)
at 
org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233)
at 
org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214)
at 
org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63)
at 
org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83)
at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288)
at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143)
at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99)
at 
org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203)
at 
org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105)
at 
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177)
at 
org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032)
at 
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014)
at 
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284)
at 
org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
at 
org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:628)
at 
org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:192)
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:664)
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:200)
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:173)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:155)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:90)
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:584)
at 

[jira] [Created] (DRILL-6923) Show schemas after use doesn't work with Hive authorization

2018-12-21 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-6923:
---

 Summary: Show schemas after use doesn't work with Hive 
authorization
 Key: DRILL-6923
 URL: https://issues.apache.org/jira/browse/DRILL-6923
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Reporter: Igor Guzenko
Assignee: Igor Guzenko


*Abstract*

When Drill used with enabled Hive SQL Standard authorization, execution of 
queries like,
{code:sql}
USE hive.db_general;
SHOW SCHEMAS LIKE 'hive.%'; {code}
results in error DrillRuntimeException: Failed to use the Hive authorization 
components: Error getting object from metastore for Object [type=TABLE_OR_VIEW, 
name=db_general.information_schema] . 

*Details* 

Consider showSchemas() test similar to one defined in 
TestSqlStdBasedAuthorization : 
{code:java}
@Test
public void showSchemas() throws Exception {
  test("USE " + hivePluginName + "." + db_general);
  testBuilder()
  .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'")
  .unOrdered()
  .baselineColumns("SCHEMA_NAME")
  .baselineValues("hive.db_general")
  .baselineValues("hive.default")
  .go();
}
{code}
Currently execution of such test will produce following stacktrace: 
{code:none}
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed to 
use the Hive authorization components: Error getting object from metastore for 
Object [type=TABLE_OR_VIEW, name=db_general.information_schema]
at 
org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149)
at 
org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134)
at 
org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450)
at 
org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233)
at 
org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214)
at 
org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63)
at 
org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83)
at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288)
at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143)
at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99)
at 
org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203)
at 
org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105)
at 
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177)
at 
org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032)
at 
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014)
at 
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284)
at 
org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
at 
org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:628)
at 
org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:192)
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:664)
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:200)
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:173)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:155)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:90)
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:584)
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:272)
at ...(:0)
Caused by: 

[jira] [Updated] (DRILL-6923) Show schemas uses default(user defined) schema first for resolving table from information_schema

2018-12-21 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6923:

Summary: Show schemas uses default(user defined) schema first for resolving 
table from information_schema  (was: Show schemas after use doesn't work with 
Hive authorization)

> Show schemas uses default(user defined) schema first for resolving table from 
> information_schema
> 
>
> Key: DRILL-6923
> URL: https://issues.apache.org/jira/browse/DRILL-6923
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> *Abstract*
> When Drill used with enabled Hive SQL Standard authorization, execution of 
> queries like,
> {code:sql}
> USE hive.db_general;
> SHOW SCHEMAS LIKE 'hive.%'; {code}
> results in error DrillRuntimeException: Failed to use the Hive authorization 
> components: Error getting object from metastore for Object 
> [type=TABLE_OR_VIEW, name=db_general.information_schema] . 
> *Details* 
> Consider showSchemas() test similar to one defined in 
> TestSqlStdBasedAuthorization : 
> {code:java}
> @Test
> public void showSchemas() throws Exception {
>   test("USE " + hivePluginName + "." + db_general);
>   testBuilder()
>   .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'")
>   .unOrdered()
>   .baselineColumns("SCHEMA_NAME")
>   .baselineValues("hive.db_general")
>   .baselineValues("hive.default")
>   .go();
> }
> {code}
> Currently execution of such test will produce following stacktrace: 
> {code:none}
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed 
> to use the Hive authorization components: Error getting object from metastore 
> for Object [type=TABLE_OR_VIEW, name=db_general.information_schema]
> at 
> org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149)
> at 
> org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134)
> at 
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214)
> at 
> org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63)
> at 
> org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83)
> at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288)
> at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143)
> at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99)
> at 
> org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203)
> at 
> org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105)
> at 
> org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177)
> at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284)
> at 
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
> at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
> at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:628)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:192)
> at 
> 

[jira] [Updated] (DRILL-540) Allow querying hive views in Drill

2018-11-29 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-540:
---
Description: 
Currently hive views cannot be queried from drill.

This Jira aims to add support for Hive views in Drill.

*Implementation details:*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}
{noformat}
Later Drill parses the metadata and uses it to treat view names in SQL as a 
subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views we'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

*Security implications*

Consider simple example case where we have users, 

 
{code:java}
user0  user1 user2
   \ /
  group12
{code}
and  sample db where object names contains user or group who should access them 
     
{code:java}
db_all
tbl_user0
vw_user0
tbl_group12
vw_group12
{code}
 

There are two Hive authorization modes supported  by Drill - SQL Standart and 
Strorage Based  authorization. 

For SQL Standart authorization permissions were granted using SQL: 

 
{code:java}
SET ROLE admin;
GRANT SELECT ON db_all.tbl_user0 TO USER user0;
GRANT SELECT ON db_all.vw_user0 TO USER user0;
CREATE ROLE group12;
GRANT ROLE group12 TO USER user1;
GRANT ROLE group12 TO USER user2;
GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
{code}
And for Storage based authorization permissions were granted using commands: 

 

 
{code:java}
hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
hadoop fs -chown user1:group12 /user/hive/warehouse/db_all.db/tbl_group12{code}
 

 
Then the following table shows us results of queries for both authorization 
models. 

                                                                    *SQL 
Standart                    Storage Based Authorization*
||SQL||user0||user1||user2||   ||user0||user1||user2||
|SHOW TABLES IN hive.db_all;|   all|    all|   all| |Accessibe tables + all 
views|Accessibe tables + all views|Accessibe tables + all views|
|SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|       
 (x)|         (x)|
|SELECT * FROM hive.db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|        
(x)|         (x)|
|SELECT * FROM hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|     
   (/)|         (/)|
|SELECT * FROM hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (x)|      
  (/)|         (/)|
|SELECT * FROM INFORMATION_SCHEMA.`TABLES`
 WHERE TABLE_SCHEMA = 'hive.db_all';|   all|  all|  all| |Accessibe tables + 
all views|Accessibe tables + all views|Accessibe tables + all views|
|DESCRIBE hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|         
(x)|         (x)|
|DESCRIBE hive.db_all.vw_user0;|   (/)|   (x) |   (x)| |        (/)|         
(/)|         (/)|
|DESCRIBE hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|         
(/) |         (/)|
|DESCRIBE hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (/)|         
(/)|         (/)|

 

 

(!)  

[jira] [Updated] (DRILL-540) Allow querying hive views in Drill

2018-11-29 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-540:
---
Description: 
Currently hive views cannot be queried from drill.

This Jira aims to add support for Hive views in Drill.

*Implementation details:*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}
{noformat}
Later Drill parses the metadata and uses it to treat view names in SQL as a 
subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views we'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

*Security implications*

Consider simple example case where we have users, 
{code:java}
user0  user1 user2
   \ /
  group12
{code}
and  sample db where object names contains user or group who should access them 
     
{code:java}
db_all
tbl_user0
vw_user0
tbl_group12
vw_group12
{code}
There are two Hive authorization modes supported  by Drill - SQL Standart and 
Strorage Based  authorization. For SQL Standart authorization permissions were 
granted using SQL: 
{code:java}
SET ROLE admin;
GRANT SELECT ON db_all.tbl_user0 TO USER user0;
GRANT SELECT ON db_all.vw_user0 TO USER user0;
CREATE ROLE group12;
GRANT ROLE group12 TO USER user1;
GRANT ROLE group12 TO USER user2;
GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
{code}
And for Storage based authorization permissions were granted using commands: 
{code:java}
hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
hadoop fs -chown user1:group12 /user/hive/warehouse/db_all.db/tbl_group12{code}
 Then the following table shows us results of queries for both authorization 
models. 

                                                                    *SQL 
Standart                    Storage Based Authorization*
||SQL||user0||user1||user2||   ||user0||user1||user2||
|SHOW TABLES IN hive.db_all;|   all|    all|   all| |Accessibe tables + all 
views|Accessibe tables + all views|Accessibe tables + all views|
|SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|       
 (x)|         (x)|
|SELECT * FROM hive.db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|        
(x)|         (x)|
|SELECT * FROM hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|     
   (/)|         (/)|
|SELECT * FROM hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (x)|      
  (/)|         (/)|
|SELECT * FROM INFORMATION_SCHEMA.`TABLES`
 WHERE TABLE_SCHEMA = 'hive.db_all';|   all|  all|  all| |Accessibe tables + 
all views|Accessibe tables + all views|Accessibe tables + all views|
|DESCRIBE hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|         
(x)|         (x)|
|DESCRIBE hive.db_all.vw_user0;|   (/)|   (x) |   (x)| |        (/)|         
(/)|         (/)|
|DESCRIBE hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|         
(/) |         (/)|
|DESCRIBE hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (/)|         
(/)|         (/)|

 (!)  *Warning:*  Because views in 

[jira] [Updated] (DRILL-6862) Update Calcite to 1.18.0

2018-11-20 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6862:

Summary: Update Calcite to 1.18.0   (was: Migrate to Calcite 1.18.0 )

> Update Calcite to 1.18.0 
> -
>
> Key: DRILL-6862
> URL: https://issues.apache.org/jira/browse/DRILL-6862
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> After ongoing release of the new Calcite version we will change our 
> dependency.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6862) Migrate to Calcite 1.18.0

2018-11-20 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-6862:
---

 Summary: Migrate to Calcite 1.18.0 
 Key: DRILL-6862
 URL: https://issues.apache.org/jira/browse/DRILL-6862
 Project: Apache Drill
  Issue Type: Task
Reporter: Igor Guzenko
Assignee: Igor Guzenko


After ongoing release of the new Calcite version we will change our dependency. 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6944) UnsupportedOperationException thrown for view over MapR-DB binary table

2019-01-08 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6944:

Description: 
1. Create MapR-DB binary table and put some data using HBase shell:
{code:none}
hbase shell
create '/tmp/bintable','name','address'
put '/tmp/bintable','100','name:first_name','john'
put '/tmp/bintable','100','name:last_name','doe'
put '/tmp/bintable','100','address:city','Newark'
put '/tmp/bintable','100','address:state','nj'
scan '/tmp/bintable'
{code}
2. Drill config: ensure that dfs storage plugin has "connection": "maprfs:///" 
and contains format: 
{code:java}
"maprdb": {
"type": "maprdb",
"allTextMode": true,
"enablePushdown": false,
"disableCountOptimization": true
}
{code}
3. Check that table can be selected from Drill : 
{code:java}
select * from dfs.`/tmp/bintable`;
{code}
4. Create Drill view 
{code:java}
create view dfs.tmp.`testview` as select * from dfs.`/tmp/bintable`;
{code}
5. Query the view results into exception:
{code:java}
0: jdbc:drill:> select * from dfs.tmp.`testview`;
Error: SYSTEM ERROR: UnsupportedOperationException: Unable to convert cast 
expression CastExpression [input=`address`, type=minor_type: MAP
mode: REQUIRED
] into string.


Please, refer to logs for more information.

[Error Id: 109acd00-7456-4a74-8a17-485f8999000f on node1.cluster.com:31010] 
(state=,code=0)

{code}
*UPDATE*

This issue may be reproduced also when avro files with map columns are queried 
using Drill. Appropriate test was added to PR commit. 

 

  was:
1. Create MapR-DB binary table and put some data using HBase shell:
{code:none}
hbase shell
create '/tmp/bintable','name','address'
put '/tmp/bintable','100','name:first_name','john'
put '/tmp/bintable','100','name:last_name','doe'
put '/tmp/bintable','100','address:city','Newark'
put '/tmp/bintable','100','address:state','nj'
scan '/tmp/bintable'
{code}
2. Drill config: ensure that dfs storage plugin has "connection": "maprfs:///" 
and contains format: 
{code:java}
"maprdb": {
"type": "maprdb",
"allTextMode": true,
"enablePushdown": false,
"disableCountOptimization": true
}
{code}
3. Check that table can be selected from Drill : 
{code:java}
select * from dfs.`/tmp/bintable`;
{code}
4. Create Drill view 
{code:java}
create view dfs.tmp.`testview` as select * from dfs.`/tmp/bintable`;
{code}
5. Query the view results into exception:
{code:java}
0: jdbc:drill:> select * from dfs.tmp.`testview`;
Error: SYSTEM ERROR: UnsupportedOperationException: Unable to convert cast 
expression CastExpression [input=`address`, type=minor_type: MAP
mode: REQUIRED
] into string.


Please, refer to logs for more information.

[Error Id: 109acd00-7456-4a74-8a17-485f8999000f on node1.cluster.com:31010] 
(state=,code=0)

{code}
 


> UnsupportedOperationException thrown for view over MapR-DB binary table
> ---
>
> Key: DRILL-6944
> URL: https://issues.apache.org/jira/browse/DRILL-6944
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - MapRDB
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> 1. Create MapR-DB binary table and put some data using HBase shell:
> {code:none}
> hbase shell
> create '/tmp/bintable','name','address'
> put '/tmp/bintable','100','name:first_name','john'
> put '/tmp/bintable','100','name:last_name','doe'
> put '/tmp/bintable','100','address:city','Newark'
> put '/tmp/bintable','100','address:state','nj'
> scan '/tmp/bintable'
> {code}
> 2. Drill config: ensure that dfs storage plugin has "connection": 
> "maprfs:///" and contains format: 
> {code:java}
> "maprdb": {
> "type": "maprdb",
> "allTextMode": true,
> "enablePushdown": false,
> "disableCountOptimization": true
> }
> {code}
> 3. Check that table can be selected from Drill : 
> {code:java}
> select * from dfs.`/tmp/bintable`;
> {code}
> 4. Create Drill view 
> {code:java}
> create view dfs.tmp.`testview` as select * from dfs.`/tmp/bintable`;
> {code}
> 5. Query the view results into exception:
> {code:java}
> 0: jdbc:drill:> select * from dfs.tmp.`testview`;
> Error: SYSTEM ERROR: UnsupportedOperationException: Unable to convert cast 
> expression CastExpression [input=`address`, type=minor_type: MAP
> mode: REQUIRED
> ] into string.
> Please, refer to logs for more information.
> [Error Id: 109acd00-7456-4a74-8a17-485f8999000f on node1.cluster.com:31010] 
> (state=,code=0)
> {code}
> *UPDATE*
> This issue may be reproduced also when avro files with map columns are 
> queried using Drill. Appropriate test was added to PR commit. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7151) Show only accessible tables when Hive authorization enabled

2019-04-02 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7151:
---

 Summary: Show only accessible tables when Hive authorization 
enabled
 Key: DRILL-7151
 URL: https://issues.apache.org/jira/browse/DRILL-7151
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Igor Guzenko
Assignee: Igor Guzenko


The SHOW TABLES for Hive worked inconsistently for very long time.

Before changes introduced by DRILL-7115 only accessible tables were shown only 
when Hive Storage Based Authorization is enabled, but for SQL Standard Based 
Authorization all tables were shown to user ([related 
discussion|https://github.com/apache/drill/pull/461#discussion_r58753354]). 

In scope of DRILL-7115 the only accessible restriction for Storage Based 
Authorization was weakened in order to improve query performance.

There is still need to improve security of Hive show tables query and at the 
same time do not violate performance requirements. 

For SQL Standard Based Authorization this can be done by asking 
```HiveAuthorizationHelper.authorizerV2``` for table's 'SELECT' permission.

For Storage Based Authorization performance acceptable approach is not known 
for now, one of ideas is try using appropriate Hive storage based authorizer 
class for the purpose. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-540) Allow querying hive views in Drill

2019-04-03 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-540:
---
Description: 
Currently hive views cannot be queried from drill.

This Jira aims to add support for Hive views in Drill.

*Implementation details:*
 # Drill persists it's views metadata in file with suffix .view.drill using 
json format. For example: 

{noformat}
{
 "name" : "view_from_calcite_1_4",
 "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
 "fields" : [ {
 "name" : "*",
 "type" : "ANY",
 "isNullable" : true
 } ],
 "workspaceSchemaPath" : [ "dfs", "tmp" ]
}
{noformat}
Later Drill parses the metadata and uses it to treat view names in SQL as a 
subquery.

      2. In Apache Hive metadata about views is stored in similar way to 
tables. Below is example from metastore.TBLS :

 
{noformat}
TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID |TBL_NAME 
 |TBL_TYPE  |VIEW_EXPANDED_TEXT |
---||--|-|--|--|--|--|--|---|
2  |1542111078  |1 |0|mapr  |0 |2 |cview
 |VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |

{noformat}
      3. So in Hive metastore views are considered as tables of special type. 
And main benefit is that we also have expanded SQL definition of views (just 
like in view.drill files). Also reading of the metadata is already implemented 
in Drill with help of thrift Metastore API.

      4. To enable querying of Hive views we'll reuse existing code for Drill 
views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
_*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which is 
actually model for data persisted in .view.drill files_) and then based on this 
instance return new _*DrillViewTable*_. Using this approach drill will handle 
hive views the same way as if it was initially defined in Drill and persisted 
in .view.drill file. 

     5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
functionality will be extracted and used for both (table and view) fields type 
conversions. 

*Security implications*

Consider simple example case where we have users, 
{code:java}
user0  user1 user2
   \ /
  group12
{code}
and  sample db where object names contains user or group who should access them 
     
{code:java}
db_all
tbl_user0
vw_user0
tbl_group12
vw_group12
{code}
There are two Hive authorization modes supported  by Drill - SQL Standart and 
Strorage Based  authorization. For SQL Standart authorization permissions were 
granted using SQL: 
{code:java}
SET ROLE admin;
GRANT SELECT ON db_all.tbl_user0 TO USER user0;
GRANT SELECT ON db_all.vw_user0 TO USER user0;
CREATE ROLE group12;
GRANT ROLE group12 TO USER user1;
GRANT ROLE group12 TO USER user2;
GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
{code}
And for Storage based authorization permissions were granted using commands: 
{code:java}
hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
hadoop fs -chown user1:group12 /user/hive/warehouse/db_all.db/tbl_group12{code}
 Then the following table shows us results of queries for both authorization 
models. 

                                                                                
                        *SQL Standart     |            Storage Based 
Authorization*
||SQL||user0||user1||user2||   ||user0||user1||user2||
|*Queries executed using Drill :*| | | | | | | |
|SHOW TABLES IN hive.db_all;|   all|    all|   all| |Accessibe tables + all 
views|Accessibe tables + all views|Accessibe tables + all views|
|SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|       
 (x)|         (x)|
|SELECT * FROM hive.db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|        
(x)|         (x)|
|SELECT * FROM hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|     
   (/)|         (/)|
|SELECT * FROM hive.db_all.vw_group12;|   (x)|   (/)|   (/)| |        (x)|      
  (/)|         (/)|
|SELECT * FROM INFORMATION_SCHEMA.`TABLES`
 WHERE TABLE_SCHEMA = 'hive.db_all';|   all|  all|  all| |Accessibe tables + 
all views|Accessibe tables + all views|Accessibe tables + all views|
|DESCRIBE hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|         
(x)|         (x)|
|DESCRIBE hive.db_all.vw_user0;|   (/)|   (x) |   (x)| |        (/)|         
(/)|         (/)|
|DESCRIBE hive.db_all.tbl_group12;|   (x)|   (/)|   (/)| |        (x)|         
(/) |         (/)|
|DESCRIBE hive.db_all.vw_group12;|   (x)|   (/)|   

[jira] [Commented] (DRILL-540) Allow querying hive views in Drill

2019-04-03 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808496#comment-16808496
 ] 

Igor Guzenko commented on DRILL-540:


Hi [~bbevens],

I think adding the warning is useful, except the last sentence 'For current 
example views were defined as selection over appropriate tables'. Actually, for 
*Storage Based Authorization* it impacts only show tables query. Like I shown 
in comparison table, all views will be returned as a result. Because we don't 
know permissions until user tries to select the view, then view is expanded 
(converted to query) and underlying tables used in query are validated for 
permissions. 

Thanks, 

Igor

> Allow querying hive views in Drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> This Jira aims to add support for Hive views in Drill.
> *Implementation details:*
>  # Drill persists it's views metadata in file with suffix .view.drill using 
> json format. For example: 
> {noformat}
> {
>  "name" : "view_from_calcite_1_4",
>  "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
>  "fields" : [ {
>  "name" : "*",
>  "type" : "ANY",
>  "isNullable" : true
>  } ],
>  "workspaceSchemaPath" : [ "dfs", "tmp" ]
> }
> {noformat}
> Later Drill parses the metadata and uses it to treat view names in SQL as a 
> subquery.
>       2. In Apache Hive metadata about views is stored in similar way to 
> tables. Below is example from metastore.TBLS :
>  
> {noformat}
> TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID 
> |TBL_NAME  |TBL_TYPE  |VIEW_EXPANDED_TEXT |
> ---||--|-|--|--|--|--|--|---|
> 2  |1542111078  |1 |0|mapr  |0 |2 |cview  
>|VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |
> {noformat}
>       3. So in Hive metastore views are considered as tables of special type. 
> And main benefit is that we also have expanded SQL definition of views (just 
> like in view.drill files). Also reading of the metadata is already 
> implemented in Drill with help of thrift Metastore API.
>       4. To enable querying of Hive views we'll reuse existing code for Drill 
> views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
> _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which 
> is actually model for data persisted in .view.drill files_) and then based on 
> this instance return new _*DrillViewTable*_. Using this approach drill will 
> handle hive views the same way as if it was initially defined in Drill and 
> persisted in .view.drill file. 
>      5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
> we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
> functionality will be extracted and used for both (table and view) fields 
> type conversions. 
> *Security implications*
> Consider simple example case where we have users, 
> {code:java}
> user0  user1 user2
>\ /
>   group12
> {code}
> and  sample db where object names contains user or group who should access 
> them      
> {code:java}
> db_all
> tbl_user0
> vw_user0
> tbl_group12
> vw_group12
> {code}
> There are two Hive authorization modes supported  by Drill - SQL Standart and 
> Strorage Based  authorization. For SQL Standart authorization permissions 
> were granted using SQL: 
> {code:java}
> SET ROLE admin;
> GRANT SELECT ON db_all.tbl_user0 TO USER user0;
> GRANT SELECT ON db_all.vw_user0 TO USER user0;
> CREATE ROLE group12;
> GRANT ROLE group12 TO USER user1;
> GRANT ROLE group12 TO USER user2;
> GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
> GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
> {code}
> And for Storage based authorization permissions were granted using commands: 
> {code:java}
> hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
> hadoop fs -chown user1:group12 
> /user/hive/warehouse/db_all.db/tbl_group12{code}
>  Then the following table shows us results of queries for both authorization 
> models. 
>                                                                               
>                           *SQL Standart     |            Storage Based 
> 

[jira] [Commented] (DRILL-7087) Integrate Arrow's Gandiva into Drill

2019-03-11 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789486#comment-16789486
 ] 

Igor Guzenko commented on DRILL-7087:
-

Hello [~weijie]. As well as I understand, related changes won't bring 
fundamental changes of Drill's value vectors internal structure. Looks like 
you're going to pass our vectors along with computations into Apache Arrow, so 
then Arrow will perform computation on existing data inside vector. Correct ?  

> Integrate Arrow's Gandiva into Drill
> 
>
> Key: DRILL-7087
> URL: https://issues.apache.org/jira/browse/DRILL-7087
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen, Execution - Relational Operators
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>
> It's a prior work to integrate arrow into drill by invoking the its gandiva 
> feature. Comparing arrow and drill 's in memory column representation , 
> there's different null representation internal now. Drill use 1 byte while 
> arrow using 1 bit to indicate one null row. Also all columns of arrow is 
> nullable now. Apart from those basic differences , they have same memory 
> representation to the different data types. 
> The integrating strategy is to invoke arrow's JniWrapper's native method 
> directly by passing the ValueVector's memory address. 
> I have done a implementation at our own Drill version by integrating gandiva 
> into Drill's project operator. The performance shows that there's nearly 1 
> times performance gain at expression computation.
> So if there's no objection , I will submit a related PR to contribute this 
> feature. Also this issue waits for arrow's related issue[ARROW-4819].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-03-13 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791928#comment-16791928
 ] 

Igor Guzenko commented on DRILL-7096:
-

Hello [~Paul.Rogers],

I have few thoughts about related concerns:

1) About problem described in the Jira (effective get by key) I think most 
probably we will go with sorting of keys before writing 
maps into new vector. As well as the Map vector will be used for Hive Map and 
MapObjectInspector will give us Map for 
each row in column, it won't be an issue to sort keys before writing.


2) Unnest functionality that you mentioned may be implemented as conversion 
from new MapVector to current Map(*Struct*)Vector. 
All keys across all rows will be converted to strings and on meeting new key, 
new vector will be created for holding value assigned to the key. 
Of course users should be aware that their rows has limited sets of shared 
keys, otherwise when all keys in all rows are unique we 
will get OOM error very quickly. I guess we can calculate rate of new unique 
key additions while converting each row and detect the 
key uniqueness problem very quickly.


3) What relates to use cases, first place where the new vector will be used is 
reading map columns from Hive. And it looks reasonable to follow 
their restriction on keys (use only primitives). Also at least we need to 
support all existing functionality related to Map datatype. I started 
listing of use cases in [Hive Complex Types design 
document|https://docs.google.com/document/d/1yEcaJi9dyksfMs4w5_GsZCQH_Pffe-HLeLVNNKsV7CA/edit?usp=sharing],
 which is in progress now and later will be attached to DRILL-3290. Please feel 
free to add comments in design doc, everything will be useful for me because 
I'm writing such document for the first time. 

4) About using unions for values I guess you're thinking in therms of support 
JSON maps flexibility. In such case I'd rather go with all text mode 
for map values, than pollute memory and code with unions. For case when type of 
map values is clearly determined (like in Hive) we have rich set of 
datatype specific vectors, though Hive unions also may be used as map values, 
at least we will know clearly amount of necessary types for them.

5) Now [~KazydubB] is working on the new vector design and he'll contribute his 
results to design document mentioned previously.

Thanks, Igor Guzenko

 

> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7115) Improve Hive schema show tables performance

2019-03-19 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7115:
---

 Summary: Improve Hive schema show tables performance
 Key: DRILL-7115
 URL: https://issues.apache.org/jira/browse/DRILL-7115
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Hive, Storage - Information Schema
Reporter: Igor Guzenko
Assignee: Igor Guzenko


In Sqlline(Drill), "show tables" on a Hive schema is taking nearly 15mins to 
20mins. The schema has nearly ~8000 tables.
Whereas the same in beeline(Hive) is throwing the result in a split second(~ 
0.2 secs).

I tested the same in my test cluster by creating 6000 tables(empty!) in Hive 
and then doing "show tables" in Drill. It took more than 2 mins(~140 secs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6923) Show schemas uses default(user defined) schema first for resolving table from information_schema

2019-03-19 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6923:

Fix Version/s: (was: 1.16.0)
   1.17.0

> Show schemas uses default(user defined) schema first for resolving table from 
> information_schema
> 
>
> Key: DRILL-6923
> URL: https://issues.apache.org/jira/browse/DRILL-6923
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.14.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Minor
> Fix For: 1.17.0
>
>
> Show tables tries to find table `information_schema`.`schemata` in default 
> (user defined) schema, and after failed attempt it resolves table 
> successfully against root schema. Please check description below for details 
> explained using example with hive plugin. 
> *Abstract* 
> When Drill used with enabled Hive SQL Standard authorization, execution of 
> queries like,
> {code:sql}
> USE hive.db_general;
> SHOW SCHEMAS LIKE 'hive.%'; {code}
> results in error DrillRuntimeException: Failed to use the Hive authorization 
> components: Error getting object from metastore for Object 
> [type=TABLE_OR_VIEW, name=db_general.information_schema] . 
> *Details* 
> Consider showSchemas() test similar to one defined in 
> TestSqlStdBasedAuthorization : 
> {code:java}
> @Test
> public void showSchemas() throws Exception {
>   test("USE " + hivePluginName + "." + db_general);
>   testBuilder()
>   .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'")
>   .unOrdered()
>   .baselineColumns("SCHEMA_NAME")
>   .baselineValues("hive.db_general")
>   .baselineValues("hive.default")
>   .go();
> }
> {code}
> Currently execution of such test will produce following stacktrace: 
> {code:none}
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed 
> to use the Hive authorization components: Error getting object from metastore 
> for Object [type=TABLE_OR_VIEW, name=db_general.information_schema]
> at 
> org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149)
> at 
> org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134)
> at 
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214)
> at 
> org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63)
> at 
> org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83)
> at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288)
> at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143)
> at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99)
> at 
> org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203)
> at 
> org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105)
> at 
> org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177)
> at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284)
> at 
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
> at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
> at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918)
> 

[jira] [Updated] (DRILL-6923) Show schemas uses default(user defined) schema first for resolving table from information_schema

2019-03-19 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-6923:

Priority: Minor  (was: Major)

> Show schemas uses default(user defined) schema first for resolving table from 
> information_schema
> 
>
> Key: DRILL-6923
> URL: https://issues.apache.org/jira/browse/DRILL-6923
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.14.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Minor
> Fix For: 1.16.0
>
>
> Show tables tries to find table `information_schema`.`schemata` in default 
> (user defined) schema, and after failed attempt it resolves table 
> successfully against root schema. Please check description below for details 
> explained using example with hive plugin. 
> *Abstract* 
> When Drill used with enabled Hive SQL Standard authorization, execution of 
> queries like,
> {code:sql}
> USE hive.db_general;
> SHOW SCHEMAS LIKE 'hive.%'; {code}
> results in error DrillRuntimeException: Failed to use the Hive authorization 
> components: Error getting object from metastore for Object 
> [type=TABLE_OR_VIEW, name=db_general.information_schema] . 
> *Details* 
> Consider showSchemas() test similar to one defined in 
> TestSqlStdBasedAuthorization : 
> {code:java}
> @Test
> public void showSchemas() throws Exception {
>   test("USE " + hivePluginName + "." + db_general);
>   testBuilder()
>   .sqlQuery("SHOW SCHEMAS LIKE 'hive.%'")
>   .unOrdered()
>   .baselineColumns("SCHEMA_NAME")
>   .baselineValues("hive.db_general")
>   .baselineValues("hive.default")
>   .go();
> }
> {code}
> Currently execution of such test will produce following stacktrace: 
> {code:none}
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Failed 
> to use the Hive authorization components: Error getting object from metastore 
> for Object [type=TABLE_OR_VIEW, name=db_general.information_schema]
> at 
> org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorize(HiveAuthorizationHelper.java:149)
> at 
> org.apache.drill.exec.store.hive.HiveAuthorizationHelper.authorizeReadTable(HiveAuthorizationHelper.java:134)
> at 
> org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:450)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:233)
> at 
> org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:214)
> at 
> org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:63)
> at 
> org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83)
> at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:288)
> at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143)
> at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99)
> at 
> org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203)
> at 
> org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105)
> at 
> org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177)
> at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3032)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3014)
> at 
> org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom(SqlConverter.java:274)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3284)
> at 
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
> at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:967)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:943)
> at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225)
> at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:918)
> at 
> 

[jira] [Assigned] (DRILL-7096) Develop vector for canonical Map

2019-03-12 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7096:
---

Assignee: Bohdan Kazydub  (was: Igor Guzenko)

> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-3587) Select hive's struct data gives IndexOutOfBoundsException instead of unsupported error

2019-03-12 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-3587:
---

Assignee: Igor Guzenko

> Select hive's struct data gives IndexOutOfBoundsException instead of 
> unsupported error
> --
>
> Key: DRILL-3587
> URL: https://issues.apache.org/jira/browse/DRILL-3587
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.2.0
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: Future
>
>
> I have a hive table that has a STRUCT data column.
> hive> select c15 from alltypes;
> OK
> NULL
> {"r":null,"s":null}
> {"r":1,"s":{"a":2,"b":"x"}}
> From drill:
> select c15 from alltypes;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index (1) must be less than 
> size (1)
> Since drill currently does not support hive struct data type, drill should 
> display user friendly error that hive struct data type is not supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7097) Rename MapVector to StructVector

2019-03-12 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7097:
---

 Summary: Rename MapVector to StructVector
 Key: DRILL-7097
 URL: https://issues.apache.org/jira/browse/DRILL-7097
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Igor Guzenko
Assignee: Igor Guzenko


For a long time Drill's MapVector was actually more suitable for representing 
Struct data. And in Apache Arrow it was actually renamed to StructVector. To 
align our code with Arrow and give space for planned implementation of 
canonical Map (DRILL-7096) we need to rename existing MapVector and all 
related classes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7096) Develop vector for canonical Map

2019-03-12 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7096:
---

 Summary: Develop vector for canonical Map
 Key: DRILL-7096
 URL: https://issues.apache.org/jira/browse/DRILL-7096
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Igor Guzenko
Assignee: Igor Guzenko


Canonical Map datatype can be represented using combination of three value 
vectors:

keysVector - vector for storing keys of each map
valuesVector - vector for storing values of each map
offsetsVector - vector for storing of start indexes of next each map

So it's not very hard to create such Map vector, but there is a major issue 
with such map representation. It's hard to search maps values by key in such 
vector, need to investigate some advanced techniques to make such search 
efficient. Or find other more suitable options to represent map datatype in 
world of vectors.

After question about maps, Apache Arrow developers responded that for Java they 
don't have real Map vector, for now they just have logical Map type definition 
where they define Map like: List< Struct >. So 
implementation of value vector would be useful for Arrow too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6856) Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions

2019-02-02 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko closed DRILL-6856.
---
Resolution: Fixed
  Reviewer: Volodymyr Vysotskyi

> Wrong result returned if the query filters a boolean column with both "is 
> true" and "is null" conditions
> 
>
> Key: DRILL-6856
> URL: https://issues.apache.org/jira/browse/DRILL-6856
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 0_0_0.parquet
>
>
> *Data:*
> A parquet file with a boolean column that contains null values.
> An example is attached.
> *Query:*
> {code:sql}
> select bool_col from dfs.tmp.`Test_data` where bool_col is true or bool_col 
> is null
> {code}
> *Result:*
> {noformat}
> null
> null
> {noformat}
> *Plan:*
> {noformat}
> 00-00Screen : rowType = RecordType(ANY bool_col): rowcount = 3.75, 
> cumulative cost = {37.875 rows, 97.875 cpu, 15.0 io, 0.0 network, 0.0 
> memory}, id = 1980
> 00-01  Project(bool_col=[$0]) : rowType = RecordType(ANY bool_col): 
> rowcount = 3.75, cumulative cost = {37.5 rows, 97.5 cpu, 15.0 io, 0.0 
> network, 0.0 memory}, id = 1979
> 00-02SelectionVectorRemover : rowType = RecordType(ANY bool_col): 
> rowcount = 3.75, cumulative cost = {33.75 rows, 93.75 cpu, 15.0 io, 0.0 
> network, 0.0 memory}, id = 1978
> 00-03  Filter(condition=[IS NULL($0)]) : rowType = RecordType(ANY 
> bool_col): rowcount = 3.75, cumulative cost = {30.0 rows, 90.0 cpu, 15.0 io, 
> 0.0 network, 0.0 memory}, id = 1977
> 00-04Scan(table=[[dfs, tmp, Test_data]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///tmp/Test_data]], selectionRoot=maprfs:/tmp/Test_data, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`bool_col`]]]) : 
> rowType = RecordType(ANY bool_col): rowcount = 15.0, cumulative cost = {15.0 
> rows, 15.0 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1976
> {noformat}
> *Notes:* 
> - "true" values were not included in the result though they should have.
> - Result is correct if use "bool_col = true" instead of "is true"
> - In the plan you can see that "is true" condition is absent in the Filter 
> operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6856) Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions

2019-02-02 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758951#comment-16758951
 ] 

Igor Guzenko commented on DRILL-6856:
-

Fixed by Calcite update in [pull 
request|[https://github.com/apache/drill/pull/1631]|https://github.com/apache/drill/pull/1631].]
 (added test for the case).  

> Wrong result returned if the query filters a boolean column with both "is 
> true" and "is null" conditions
> 
>
> Key: DRILL-6856
> URL: https://issues.apache.org/jira/browse/DRILL-6856
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 0_0_0.parquet
>
>
> *Data:*
> A parquet file with a boolean column that contains null values.
> An example is attached.
> *Query:*
> {code:sql}
> select bool_col from dfs.tmp.`Test_data` where bool_col is true or bool_col 
> is null
> {code}
> *Result:*
> {noformat}
> null
> null
> {noformat}
> *Plan:*
> {noformat}
> 00-00Screen : rowType = RecordType(ANY bool_col): rowcount = 3.75, 
> cumulative cost = {37.875 rows, 97.875 cpu, 15.0 io, 0.0 network, 0.0 
> memory}, id = 1980
> 00-01  Project(bool_col=[$0]) : rowType = RecordType(ANY bool_col): 
> rowcount = 3.75, cumulative cost = {37.5 rows, 97.5 cpu, 15.0 io, 0.0 
> network, 0.0 memory}, id = 1979
> 00-02SelectionVectorRemover : rowType = RecordType(ANY bool_col): 
> rowcount = 3.75, cumulative cost = {33.75 rows, 93.75 cpu, 15.0 io, 0.0 
> network, 0.0 memory}, id = 1978
> 00-03  Filter(condition=[IS NULL($0)]) : rowType = RecordType(ANY 
> bool_col): rowcount = 3.75, cumulative cost = {30.0 rows, 90.0 cpu, 15.0 io, 
> 0.0 network, 0.0 memory}, id = 1977
> 00-04Scan(table=[[dfs, tmp, Test_data]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///tmp/Test_data]], selectionRoot=maprfs:/tmp/Test_data, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`bool_col`]]]) : 
> rowType = RecordType(ANY bool_col): rowcount = 15.0, cumulative cost = {15.0 
> rows, 15.0 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1976
> {noformat}
> *Notes:* 
> - "true" values were not included in the result though they should have.
> - Result is correct if use "bool_col = true" instead of "is true"
> - In the plan you can see that "is true" condition is absent in the Filter 
> operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6856) Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions

2019-02-02 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758951#comment-16758951
 ] 

Igor Guzenko edited comment on DRILL-6856 at 2/2/19 10:44 AM:
--

Fixed by Calcite update in [pull 
request|https://github.com/apache/drill/pull/1631]. 


was (Author: ihorhuzenko):
Fixed by Calcite update in [pull 
request|[https://github.com/apache/drill/pull/1631]|https://github.com/apache/drill/pull/1631].]
 (added test for the case).  

> Wrong result returned if the query filters a boolean column with both "is 
> true" and "is null" conditions
> 
>
> Key: DRILL-6856
> URL: https://issues.apache.org/jira/browse/DRILL-6856
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 0_0_0.parquet
>
>
> *Data:*
> A parquet file with a boolean column that contains null values.
> An example is attached.
> *Query:*
> {code:sql}
> select bool_col from dfs.tmp.`Test_data` where bool_col is true or bool_col 
> is null
> {code}
> *Result:*
> {noformat}
> null
> null
> {noformat}
> *Plan:*
> {noformat}
> 00-00Screen : rowType = RecordType(ANY bool_col): rowcount = 3.75, 
> cumulative cost = {37.875 rows, 97.875 cpu, 15.0 io, 0.0 network, 0.0 
> memory}, id = 1980
> 00-01  Project(bool_col=[$0]) : rowType = RecordType(ANY bool_col): 
> rowcount = 3.75, cumulative cost = {37.5 rows, 97.5 cpu, 15.0 io, 0.0 
> network, 0.0 memory}, id = 1979
> 00-02SelectionVectorRemover : rowType = RecordType(ANY bool_col): 
> rowcount = 3.75, cumulative cost = {33.75 rows, 93.75 cpu, 15.0 io, 0.0 
> network, 0.0 memory}, id = 1978
> 00-03  Filter(condition=[IS NULL($0)]) : rowType = RecordType(ANY 
> bool_col): rowcount = 3.75, cumulative cost = {30.0 rows, 90.0 cpu, 15.0 io, 
> 0.0 network, 0.0 memory}, id = 1977
> 00-04Scan(table=[[dfs, tmp, Test_data]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///tmp/Test_data]], selectionRoot=maprfs:/tmp/Test_data, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`bool_col`]]]) : 
> rowType = RecordType(ANY bool_col): rowcount = 15.0, cumulative cost = {15.0 
> rows, 15.0 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1976
> {noformat}
> *Notes:* 
> - "true" values were not included in the result though they should have.
> - Result is correct if use "bool_col = true" instead of "is true"
> - In the plan you can see that "is true" condition is absent in the Filter 
> operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-540) Allow querying hive views in Drill

2019-04-08 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813007#comment-16813007
 ] 

Igor Guzenko commented on DRILL-540:


Hi [~bbevens], 

Sounds very good, thank you. 

> Allow querying hive views in Drill
> --
>
> Key: DRILL-540
> URL: https://issues.apache.org/jira/browse/DRILL-540
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> Currently hive views cannot be queried from drill.
> This Jira aims to add support for Hive views in Drill.
> *Implementation details:*
>  # Drill persists it's views metadata in file with suffix .view.drill using 
> json format. For example: 
> {noformat}
> {
>  "name" : "view_from_calcite_1_4",
>  "sql" : "SELECT * FROM `cp`.`store.json`WHERE `store_id` = 0",
>  "fields" : [ {
>  "name" : "*",
>  "type" : "ANY",
>  "isNullable" : true
>  } ],
>  "workspaceSchemaPath" : [ "dfs", "tmp" ]
> }
> {noformat}
> Later Drill parses the metadata and uses it to treat view names in SQL as a 
> subquery.
>       2. In Apache Hive metadata about views is stored in similar way to 
> tables. Below is example from metastore.TBLS :
>  
> {noformat}
> TBL_ID |CREATE_TIME |DB_ID |LAST_ACCESS_TIME |OWNER |RETENTION |SD_ID 
> |TBL_NAME  |TBL_TYPE  |VIEW_EXPANDED_TEXT |
> ---||--|-|--|--|--|--|--|---|
> 2  |1542111078  |1 |0|mapr  |0 |2 |cview  
>|VIRTUAL_VIEW  |SELECT COUNT(*) FROM `default`.`customers` |
> {noformat}
>       3. So in Hive metastore views are considered as tables of special type. 
> And main benefit is that we also have expanded SQL definition of views (just 
> like in view.drill files). Also reading of the metadata is already 
> implemented in Drill with help of thrift Metastore API.
>       4. To enable querying of Hive views we'll reuse existing code for Drill 
> views as much as possible. First in *_HiveSchemaFactory.getDrillTable_* for 
> _*HiveReadEntry*_ we'll convert the metadata to instance of _*View*_ (_which 
> is actually model for data persisted in .view.drill files_) and then based on 
> this instance return new _*DrillViewTable*_. Using this approach drill will 
> handle hive views the same way as if it was initially defined in Drill and 
> persisted in .view.drill file. 
>      5. For conversion of Hive types: from _*FieldSchema*_ to _*RelDataType*_ 
> we'll reuse existing code from _*DrillHiveTable*_, so the conversion 
> functionality will be extracted and used for both (table and view) fields 
> type conversions. 
> *Security implications*
> Consider simple example case where we have users, 
> {code:java}
> user0  user1 user2
>\ /
>   group12
> {code}
> and  sample db where object names contains user or group who should access 
> them      
> {code:java}
> db_all
> tbl_user0
> vw_user0
> tbl_group12
> vw_group12
> {code}
> There are two Hive authorization modes supported  by Drill - SQL Standart and 
> Strorage Based  authorization. For SQL Standart authorization permissions 
> were granted using SQL: 
> {code:java}
> SET ROLE admin;
> GRANT SELECT ON db_all.tbl_user0 TO USER user0;
> GRANT SELECT ON db_all.vw_user0 TO USER user0;
> CREATE ROLE group12;
> GRANT ROLE group12 TO USER user1;
> GRANT ROLE group12 TO USER user2;
> GRANT SELECT ON db_all.tbl_group12 TO ROLE group12;
> GRANT SELECT ON db_all.vw_group12 TO ROLE group12;
> {code}
> And for Storage based authorization permissions were granted using commands: 
> {code:java}
> hadoop fs -chown user0:user0 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 700 /user/hive/warehouse/db_all.db/tbl_user0
> hadoop fs -chmod 750 /user/hive/warehouse/db_all.db/tbl_group12
> hadoop fs -chown user1:group12 
> /user/hive/warehouse/db_all.db/tbl_group12{code}
>  Then the following table shows us results of queries for both authorization 
> models. 
>                                                                               
>                           *SQL Standart     |            Storage Based 
> Authorization*
> ||SQL||user0||user1||user2||   ||user0||user1||user2||
> |*Queries executed using Drill :*| | | | | | | |
> |SHOW TABLES IN hive.db_all;|   all|    all|   all| |Accessibe tables + all 
> views|Accessibe tables + all views|Accessibe tables + all views|
> |SELECT * FROM hive.db_all.tbl_user0;|   (/)|   (x)|   (x)| |        (/)|     
>    (x)|         (x)|
> |SELECT * FROM hive.db_all.vw_user0;|   (/)|   (x)|   (x)| |        (/)|      
>   (x)|         (x)|

[jira] [Closed] (DRILL-7097) Rename MapVector to StructVector

2019-06-03 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko closed DRILL-7097.
---
Resolution: Abandoned

Abandoned according to discussion 
[https://lists.apache.org/thread.html/5773447b82c9d6e508a62f66354613b812493cbb8c0c1cc463ccdd9f@%3Cdev.drill.apache.org%3E]
 .

> Rename MapVector to StructVector
> 
>
> Key: DRILL-7097
> URL: https://issues.apache.org/jira/browse/DRILL-7097
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> For a long time Drill's MapVector was actually more suitable for representing 
> Struct data. And in Apache Arrow it was actually renamed to StructVector. To 
> align our code with Arrow and give space for planned implementation of 
> canonical Map (DRILL-7096) we need to rename existing MapVector and all 
> related classes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7280) Support Hive UDFs for arrays

2019-05-27 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848718#comment-16848718
 ] 

Igor Guzenko commented on DRILL-7280:
-

Sample tests to add into TestInbuiltHiveUDFs:

{code:java}

   @Test
  public void arraySize() throws Exception {
testBuilder()
.sqlQuery("SELECT size(arr_n_0) from hive.int_array order by rid")
.ordered()
.baselineColumns("EXPR$0")
.baselineValuesForSingleColumn(3)
.baselineValuesForSingleColumn(0)
.baselineValuesForSingleColumn(1)
.go();
  }

  @Test
  public void arrayContains() throws Exception {
testBuilder()
.sqlQuery("SELECT array_contains(arr_n_0, 0) FROM hive.int_array order 
by rid")
.ordered()
.baselineColumns("EXPR$0")
.baselineValuesForSingleColumn(true)
.baselineValuesForSingleColumn(false)
.baselineValuesForSingleColumn(false)
.go();
  }

  @Test
  public void sortArray() throws Exception {
testBuilder()
.sqlQuery("SELECT sort_array(arr_n_0) FROM hive.int_array order by rid")
.ordered()
.baselineColumns("EXPR$0")
.baselineValues(asList(-1,0,1))
.baselineValues(asList())
.baselineValues(asList(100500))
.go();
  }

  @Test
  public void concatWs() throws Exception {
testBuilder()
.sqlQuery("SELECT concat_ws(',',arr_n_0) FROM hive.string_array order 
by rid")
.ordered()
.baselineColumns("EXPR$0")
.baselineValues("First Value Of Array,komlnp,The Last Value")
.baselineValues("")
.baselineValues("ABCaBcA-1-2-3")
.go();
  }

{code}

 

> Support Hive UDFs for arrays
> 
>
> Key: DRILL-7280
> URL: https://issues.apache.org/jira/browse/DRILL-7280
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Minor
>
> Add support for Hive UDFs accepting or returning arrays: 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] . Some 
> examples of such UDFs are: 
>  
> ||Hive UDF||Drill alternative||
> |size(array)|repeated_count(array)|
> |array_contains(array, value)|repeated_contains(array, value)|
> |sort_array(arr_n_0)|NA|
> |concat_ws(string SEP, array)|NA|
> etc. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7280) Support Hive UDFs for arrays

2019-05-27 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7280:
---

 Summary: Support Hive UDFs for arrays
 Key: DRILL-7280
 URL: https://issues.apache.org/jira/browse/DRILL-7280
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Igor Guzenko
Assignee: Igor Guzenko


Add support for Hive UDFs accepting or returning arrays: 
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF] . Some 
examples of such UDFs are: 

 
||Hive UDF||Drill alternative||
|size(array)|repeated_count(array)|
|array_contains(array, value)|repeated_contains(array, value)|
|sort_array(arr_n_0)|NA|
|concat_ws(string SEP, array)|NA|

etc. 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7251) Read Hive array w/o nulls

2019-05-13 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7251:
---

 Summary: Read Hive array w/o nulls
 Key: DRILL-7251
 URL: https://issues.apache.org/jira/browse/DRILL-7251
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Storage - Hive
Reporter: Igor Guzenko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7254) Read Hive union w/o nulls

2019-05-13 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7254:
---

 Summary: Read Hive union w/o nulls
 Key: DRILL-7254
 URL: https://issues.apache.org/jira/browse/DRILL-7254
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Igor Guzenko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7252) Read Hive map using canonical Map vector

2019-05-13 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7252:
---

 Summary: Read Hive map using canonical Map vector
 Key: DRILL-7252
 URL: https://issues.apache.org/jira/browse/DRILL-7252
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Igor Guzenko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7253) Read Hive struct w/o nulls

2019-05-13 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7253:
---

 Summary: Read Hive struct w/o nulls
 Key: DRILL-7253
 URL: https://issues.apache.org/jira/browse/DRILL-7253
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Igor Guzenko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7252) Read Hive map using canonical Map vector

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7252:
---

Assignee: Igor Guzenko

> Read Hive map using canonical Map vector
> -
>
> Key: DRILL-7252
> URL: https://issues.apache.org/jira/browse/DRILL-7252
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7253) Read Hive struct w/o nulls

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7253:
---

Assignee: Igor Guzenko

> Read Hive struct w/o nulls
> --
>
> Key: DRILL-7253
> URL: https://issues.apache.org/jira/browse/DRILL-7253
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7254) Read Hive union w/o nulls

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7254:
---

Assignee: Igor Guzenko

> Read Hive union w/o nulls
> -
>
> Key: DRILL-7254
> URL: https://issues.apache.org/jira/browse/DRILL-7254
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7097) Rename MapVector to StructVector

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7097:

Issue Type: Sub-task  (was: Improvement)
Parent: DRILL-3290

> Rename MapVector to StructVector
> 
>
> Key: DRILL-7097
> URL: https://issues.apache.org/jira/browse/DRILL-7097
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> For a long time Drill's MapVector was actually more suitable for representing 
> Struct data. And in Apache Arrow it was actually renamed to StructVector. To 
> align our code with Arrow and give space for planned implementation of 
> canonical Map (DRILL-7096) we need to rename existing MapVector and all 
> related classes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4782) TO_TIME function cannot separate time from date time string

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-4782:

Labels: ready-to-commit  (was: )

> TO_TIME function cannot separate time from date time string
> ---
>
> Key: DRILL-4782
> URL: https://issues.apache.org/jira/browse/DRILL-4782
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.6.0, 1.7.0
> Environment: CentOS 7
>Reporter: Matt Keranen
>Assignee: Dmytriy Grinchenko
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> TO_TIME('2016-03-03 00:00', ''-MM-dd HH:mm') returns "05:14:46.656" 
> instead of the expected "00:00:00"
> Adding and additional split does work as expected: TO_TIME(SPLIT('2016-03-03 
> 00:00', ' ')[1], 'HH:mm')



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7096) Develop vector for canonical Map

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7096:

Issue Type: Sub-task  (was: Improvement)
Parent: DRILL-3290

> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7251) Read Hive array w/o nulls

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7251:
---

Assignee: Igor Guzenko

> Read Hive array w/o nulls
> -
>
> Key: DRILL-7251
> URL: https://issues.apache.org/jira/browse/DRILL-7251
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7255) Support nulls for all levels of nesting

2019-05-13 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7255:
---

 Summary: Support nulls for all levels of nesting
 Key: DRILL-7255
 URL: https://issues.apache.org/jira/browse/DRILL-7255
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Igor Guzenko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7255) Support nulls for all levels of nesting

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7255:
---

Assignee: Igor Guzenko

> Support nulls for all levels of nesting
> ---
>
> Key: DRILL-7255
> URL: https://issues.apache.org/jira/browse/DRILL-7255
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-2000) Hive generated parquet files with maps show up in drill as map(key value)

2019-05-20 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-2000:
---

Assignee: Bohdan Kazydub

> Hive generated parquet files with maps show up in drill as map(key value)
> -
>
> Key: DRILL-2000
> URL: https://issues.apache.org/jira/browse/DRILL-2000
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 0.7.0
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: Future
>
>
> Created a parquet file in hive having the following DDL
> hive> desc alltypesparquet; 
> OK
> c1 int 
> c2 boolean 
> c3 double 
> c4 string 
> c5 array 
> c6 map 
> c7 map 
> c8 struct
> c9 tinyint 
> c10 smallint 
> c11 float 
> c12 bigint 
> c13 array>  
> c15 struct>
> c16 array,n:int>> 
> Time taken: 0.076 seconds, Fetched: 15 row(s)
> Columns which are maps such as c6 map 
> show up as 
> 0: jdbc:drill:> select c6 from `/user/hive/warehouse/alltypesparquet`;
> ++
> | c6 |
> ++
> | {"map":[]} |
> | {"map":[]} |
> | {"map":[{"key":1,"value":"eA=="},{"key":2,"value":"eQ=="}]} |
> ++
> 3 rows selected (0.078 seconds)
> hive> select c6 from alltypesparquet;   
> NULL
> NULL
> {1:"x",2:"y"}
> Ignore the wrong values, I have raised DRILL-1997 for the same. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7268) Read Hive array with parquet native reader

2019-05-20 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7268:
---

 Summary: Read Hive array with parquet native reader
 Key: DRILL-7268
 URL: https://issues.apache.org/jira/browse/DRILL-7268
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Igor Guzenko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7268) Read Hive array with parquet native reader

2019-05-20 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7268:
---

Assignee: Igor Guzenko

> Read Hive array with parquet native reader
> --
>
> Key: DRILL-7268
> URL: https://issues.apache.org/jira/browse/DRILL-7268
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7315) Revise precision and scale order in the method arguments

2019-07-05 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7315:

Labels: ready-to-commit  (was: )

> Revise precision and scale order in the method arguments
> 
>
> Key: DRILL-7315
> URL: https://issues.apache.org/jira/browse/DRILL-7315
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> The current code has different variations of scale and precision orderings in 
> the method arguments. The goal for this Jira is to make it more consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7268) Read Hive array with parquet native reader

2019-06-26 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7268:

Description: 
When Hive stores array data in parquet format, it creates schema for such 
columns, like: 
 arr_n_0 ARRAY
{code:java}
 optional group arr_n_0 (LIST) {
 repeated group bag {
 optional int32 array_element;
 }
 }
{code}
Sample result before the changes was:
{code:java}
{"bag":[{"array_element":1},\{"array_element":2}]}
{code}
After the changes Drill reads only array elements data without additional keys 
like "bag" or "array_element": 

{code}[1,2] \{code} . 

 

Please read Design Doc linked to parent task for more details. 

  was:
When Hive stores array data in parquet format, it creates schema for such 
columns, like: 
arr_n_0 ARRAY

{code}
 optional group arr_n_0 (LIST) {
 repeated group bag {
 optional int32 array_element;
 }
 }
{code}

Sample result before the changes was:

{code}\{"bag":[{"array_element":1},\{"array_element":2}]} \{code}

After the changes Drill reads only array elements data without additional keys 
like "bag" or "array_element":

{code} [1,2] \{code} . 

 

Please read Design Doc linked to parent task for more details. 


> Read Hive array with parquet native reader
> --
>
> Key: DRILL-7268
> URL: https://issues.apache.org/jira/browse/DRILL-7268
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> When Hive stores array data in parquet format, it creates schema for such 
> columns, like: 
>  arr_n_0 ARRAY
> {code:java}
>  optional group arr_n_0 (LIST) {
>  repeated group bag {
>  optional int32 array_element;
>  }
>  }
> {code}
> Sample result before the changes was:
> {code:java}
> {"bag":[{"array_element":1},\{"array_element":2}]}
> {code}
> After the changes Drill reads only array elements data without additional 
> keys like "bag" or "array_element": 
> {code}[1,2] \{code} . 
>  
> Please read Design Doc linked to parent task for more details. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7268) Read Hive array with parquet native reader

2019-06-26 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7268:

Description: 
When Hive stores array data in parquet format, it creates schema for such 
columns, like: 
arr_n_0 ARRAY

{code}
 optional group arr_n_0 (LIST) {
 repeated group bag {
 optional int32 array_element;
 }
 }
{code}

Sample result before the changes was:

{code}\{"bag":[{"array_element":1},\{"array_element":2}]} \{code}

After the changes Drill reads only array elements data without additional keys 
like "bag" or "array_element":

{code} [1,2] \{code} . 

 

Please read Design Doc linked to parent task for more details. 

> Read Hive array with parquet native reader
> --
>
> Key: DRILL-7268
> URL: https://issues.apache.org/jira/browse/DRILL-7268
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> When Hive stores array data in parquet format, it creates schema for such 
> columns, like: 
> arr_n_0 ARRAY
> {code}
>  optional group arr_n_0 (LIST) {
>  repeated group bag {
>  optional int32 array_element;
>  }
>  }
> {code}
> Sample result before the changes was:
> {code}\{"bag":[{"array_element":1},\{"array_element":2}]} \{code}
> After the changes Drill reads only array elements data without additional 
> keys like "bag" or "array_element":
> {code} [1,2] \{code} . 
>  
> Please read Design Doc linked to parent task for more details. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7268) Read Hive array with parquet native reader

2019-06-26 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7268:

Description: 
When Hive stores array data in parquet format, it creates schema for such 
columns, like: 
 arr_n_0 ARRAY
{code:java}
 optional group arr_n_0 (LIST) {
   repeated group bag {
 optional int32 array_element;
   }
 }
{code}
Sample result before the changes was:
{code:java}
{"bag":[{"array_element":1},{"array_element":2}]}
{code}
After the changes Drill reads only array elements data without additional keys 
like "bag" or "array_element":
{code:java}
[1,2]{code}
 

 

Please read Design Doc linked to parent task for more details. 

  was:
When Hive stores array data in parquet format, it creates schema for such 
columns, like: 
 arr_n_0 ARRAY
{code:java}
 optional group arr_n_0 (LIST) {
 repeated group bag {
 optional int32 array_element;
 }
 }
{code}
Sample result before the changes was:
{code:java}
{"bag":[{"array_element":1},\{"array_element":2}]}
{code}
After the changes Drill reads only array elements data without additional keys 
like "bag" or "array_element": 

{code}[1,2] \{code} . 

 

Please read Design Doc linked to parent task for more details. 


> Read Hive array with parquet native reader
> --
>
> Key: DRILL-7268
> URL: https://issues.apache.org/jira/browse/DRILL-7268
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> When Hive stores array data in parquet format, it creates schema for such 
> columns, like: 
>  arr_n_0 ARRAY
> {code:java}
>  optional group arr_n_0 (LIST) {
>repeated group bag {
>  optional int32 array_element;
>}
>  }
> {code}
> Sample result before the changes was:
> {code:java}
> {"bag":[{"array_element":1},{"array_element":2}]}
> {code}
> After the changes Drill reads only array elements data without additional 
> keys like "bag" or "array_element":
> {code:java}
> [1,2]{code}
>  
>  
> Please read Design Doc linked to parent task for more details. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7215) Null error shown when select file without format

2019-04-25 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7215:
---

 Summary: Null error shown when select file without format
 Key: DRILL-7215
 URL: https://issues.apache.org/jira/browse/DRILL-7215
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.15.0, 1.16.0
Reporter: Igor Guzenko


Null error message is shown while querying file without format after use 
schema: 
{code:none}
select * from `dir/noformat`; 

Error: VALIDATION ERROR: null


[Error Id: b9e3e3a4-f60a-4836-97e9-6078c742f7ad ] (state=,code=0)
{code}
 

Steps to reproduce:
 # Create dir and file w/o format: 
{code:java}
 mkdir /tmp/dir& /tmp/dir/noformat{code}

 # Run drill in embedded mode 
 # Use tmp schema 
{code:sql}
USE dfs.tmp; {code}

 # Query created file 
{code:sql}
SELECT * FROM `dir/noformat`;{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7345) Strange Behavior for UDFs with ComplexWriter Output

2019-08-12 Thread Igor Guzenko (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905362#comment-16905362
 ] 

Igor Guzenko commented on DRILL-7345:
-

Hi [~cgivre], could you please check that the issue is not caused by changes 
which was added as part of DRILL-6810 ? 
[Here|https://github.com/apache/drill/blob/85c77134d5d1bb9f96a5417036cccfb263ae8ae7/exec/java-exec/src/main/java/org/apache/drill/exec/expr/annotations/FunctionTemplate.java#L150]
 in javadoc described some limitations related to ComplexWriter output. 

> Strange Behavior for UDFs with ComplexWriter Output
> ---
>
> Key: DRILL-7345
> URL: https://issues.apache.org/jira/browse/DRILL-7345
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Minor
>
> I wrote some UDFs recently and noticed some strange behavior when debugging 
> them. 
> This behavior only occurs when there is ComplexWriter as output.  
> Basically, if the input to the UDF is nullable, Drill doesn't recognize the 
> UDF at all.  I've found that the only way to get Drill to recognize UDFs that 
> have ComplexWriters as output is:
> * Use a non-nullable holder as input
> * Remove the null setting completely from the function parameters.
> This approach has a drawback in that if the function receives a null value, 
> it will throw an error and halt execution.  My preference would be to allow 
> null handling, but I've not figured out how to make that happen.
> Note:  This behavior ONLY occurs when using a ComplexWriter as output.  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (DRILL-7326) "Unsupported Operation Exception" appears on attempting to create table in Drill from json, with double nested array

2019-08-14 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7326:
---

Assignee: Igor Guzenko

> "Unsupported Operation Exception" appears on attempting to create table in 
> Drill from json, with double nested array
> 
>
> Key: DRILL-7326
> URL: https://issues.apache.org/jira/browse/DRILL-7326
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Pavel Semenov
>Assignee: Igor Guzenko
>Priority: Major
>
> *STEPS TO REPRODUCE*
>  # Create json file with which has double nesting array as a value e.g.
> {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]]
>  {code}
>  # Use CTAS to create table in drill with created json file
>  # Observe the result
> *EXPECTED RESULT*
>  Table is created
> *ACTUAL RESULT*
>  UnsupportedOperationException appears on attempting to create the table
> *ADDITIONAL INFO*
>  It is possible to create table with with *single* nested array
>  Error log
> {code:java}
> Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010]
> (java.lang.UnsupportedOperationException) Unsupported type LIST
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.record.AbstractRecordBatch.next():126
>  org.apache.drill.exec.record.AbstractRecordBatch.next():116
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1669
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748 (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (DRILL-7326) Support repeated lists for CTAS parquet format

2019-08-20 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7326:

Summary: Support repeated lists for CTAS parquet format  (was: "Unsupported 
Operation Exception" appears on attempting to create table in Drill from json, 
with double nested array)

> Support repeated lists for CTAS parquet format
> --
>
> Key: DRILL-7326
> URL: https://issues.apache.org/jira/browse/DRILL-7326
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Pavel Semenov
>Assignee: Igor Guzenko
>Priority: Major
>
> *STEPS TO REPRODUCE*
>  # Create json file with which has double nesting array as a value e.g.
> {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]]
>  {code}
>  # Use CTAS to create table in drill with created json file
>  # Observe the result
> *EXPECTED RESULT*
>  Table is created
> *ACTUAL RESULT*
>  UnsupportedOperationException appears on attempting to create the table
> *ADDITIONAL INFO*
>  It is possible to create table with with *single* nested array
>  Error log
> {code:java}
> Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010]
> (java.lang.UnsupportedOperationException) Unsupported type LIST
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.record.AbstractRecordBatch.next():126
>  org.apache.drill.exec.record.AbstractRecordBatch.next():116
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1669
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748 (state=,code=0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7326) Support repeated lists for CTAS parquet format

2019-08-20 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7326:

Issue Type: New Feature  (was: Bug)

> Support repeated lists for CTAS parquet format
> --
>
> Key: DRILL-7326
> URL: https://issues.apache.org/jira/browse/DRILL-7326
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.16.0
>Reporter: Pavel Semenov
>Assignee: Igor Guzenko
>Priority: Major
>
> *STEPS TO REPRODUCE*
>  # Create json file with which has double nesting array as a value e.g.
> {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]]
>  {code}
>  # Use CTAS to create table in drill with created json file
>  # Observe the result
> *EXPECTED RESULT*
>  Table is created
> *ACTUAL RESULT*
>  UnsupportedOperationException appears on attempting to create the table
> *ADDITIONAL INFO*
>  It is possible to create table with with *single* nested array
>  Error log
> {code:java}
> Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010]
> (java.lang.UnsupportedOperationException) Unsupported type LIST
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.record.AbstractRecordBatch.next():126
>  org.apache.drill.exec.record.AbstractRecordBatch.next():116
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1669
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748 (state=,code=0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7326) Support repeated lists for CTAS parquet format

2019-09-01 Thread Igor Guzenko (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920355#comment-16920355
 ] 

Igor Guzenko commented on DRILL-7326:
-

Merged to Apache master with commit id 
[ffab527|https://github.com/apache/drill/commit/ffab527451e0a23eca96f38bce52c790553cc47e].

> Support repeated lists for CTAS parquet format
> --
>
> Key: DRILL-7326
> URL: https://issues.apache.org/jira/browse/DRILL-7326
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.16.0
>Reporter: Pavel Semenov
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> *STEPS TO REPRODUCE*
>  # Create json file with which has double nesting array as a value e.g.
> {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]]
>  {code}
>  # Use CTAS to create table in drill with created json file
>  # Observe the result
> *EXPECTED RESULT*
>  Table is created
> *ACTUAL RESULT*
>  UnsupportedOperationException appears on attempting to create the table
> *ADDITIONAL INFO*
>  It is possible to create table with with *single* nested array
>  Error log
> {code:java}
> Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010]
> (java.lang.UnsupportedOperationException) Unsupported type LIST
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.record.AbstractRecordBatch.next():126
>  org.apache.drill.exec.record.AbstractRecordBatch.next():116
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1669
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748 (state=,code=0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Issue Comment Deleted] (DRILL-7326) Support repeated lists for CTAS parquet format

2019-09-01 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7326:

Comment: was deleted

(was: Merged to Apache master with commit id 
[ffab527|https://github.com/apache/drill/commit/ffab527451e0a23eca96f38bce52c790553cc47e].)

> Support repeated lists for CTAS parquet format
> --
>
> Key: DRILL-7326
> URL: https://issues.apache.org/jira/browse/DRILL-7326
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.16.0
>Reporter: Pavel Semenov
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> *STEPS TO REPRODUCE*
>  # Create json file with which has double nesting array as a value e.g.
> {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]]
>  {code}
>  # Use CTAS to create table in drill with created json file
>  # Observe the result
> *EXPECTED RESULT*
>  Table is created
> *ACTUAL RESULT*
>  UnsupportedOperationException appears on attempting to create the table
> *ADDITIONAL INFO*
>  It is possible to create table with with *single* nested array
>  Error log
> {code:java}
> Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010]
> (java.lang.UnsupportedOperationException) Unsupported type LIST
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.record.AbstractRecordBatch.next():126
>  org.apache.drill.exec.record.AbstractRecordBatch.next():116
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1669
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748 (state=,code=0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (DRILL-6181) CTAS should support writing nested structures (nested lists) to parquet.

2019-09-01 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko resolved DRILL-6181.
-
Resolution: Fixed

Done in scope of DRILL-7326.



> CTAS should support writing nested structures (nested lists) to parquet.
> 
>
> Key: DRILL-6181
> URL: https://issues.apache.org/jira/browse/DRILL-6181
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.12.0
>Reporter: Khurram Faraaz
>Priority: Major
>
> Both Parquet and Hive support writing nested structures into parquet
> https://issues.apache.org/jira/browse/HIVE-8909
> https://issues.apache.org/jira/browse/PARQUET-113
> A CTAS from Drill fails when there is a nested list of lists, in one of the 
> columns in the project.
> JSON data used in the test, note that "arr" is a nested list of lists 
>  
> {noformat} 
> [root@qa102-45 ~]# cat jsonToParquet_02.json
> {"id":"123","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]}
> {"id":"3","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]}
> {"id":"13","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]}
> {"id":"12","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]}
> {"id":"2","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]}
> {"id":"1","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]}
> {"id":"230","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]}
> {"id":"1230","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]}
> {"id":"1123","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]}
> {"id":"2123","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]}
> {"id":"1523","arr":[[1,2,3,4],[5,6,7,8,9,10],[11,12,13,14,15]]}
> [root@qa102-45 ~]#
> {noformat}
> CTAS fails with UnsupportedOperationException on Drill 1.12.0-mapr commit id 
> bb07ebbb9ba8742f44689f8bd8efb5853c5edea0
> {noformat}
>  0: jdbc:drill:schema=dfs.tmp> CREATE TABLE tbl_prq_from_json_02 as select 
> id, arr from `jsonToParquet_02.json`;
> Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> [Error Id: 7e5b3c2d-9cf1-4e87-96c8-e7e7e8055ddf on qa102-45.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-02-22 09:56:54,368 [2570fb99-62da-a516-2c1f-0381e21723ae:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: 
> UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> [Error Id: 7e5b3c2d-9cf1-4e87-96c8-e7e7e8055ddf on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> [Error Id: 7e5b3c2d-9cf1-4e87-96c8-e7e7e8055ddf on qa102-45.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
>  ~[drill-common-1.12.0-mapr.jar:1.12.0-mapr]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:301)
>  [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
>  [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.12.0-mapr.jar:1.12.0-mapr]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_161]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_161]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.UnsupportedOperationException: Unsupported type LIST
>  at 
> org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType(ParquetRecordWriter.java:253)
>  ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
>  at 
> org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema(ParquetRecordWriter.java:205)
>  ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
>  at 
> org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema(ParquetRecordWriter.java:190)
>  ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
>  at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema(WriterRecordBatch.java:157)
>  ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
>  at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:103)
>  ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
>  ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
>  at 

[jira] [Resolved] (DRILL-2241) CTAS fails when writing a repeated list

2019-09-01 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko resolved DRILL-2241.
-
Resolution: Fixed

Done in scope of DRILL-7326. 

> CTAS fails when writing a repeated list
> ---
>
> Key: DRILL-2241
> URL: https://issues.apache.org/jira/browse/DRILL-2241
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Parquet
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
>Priority: Major
> Fix For: Future
>
> Attachments: drillbit_replist.log
>
>
> Drill can read the following JSON file with a repeated list:
> {
>   "a" : null
>   "b" : [ ["B1", "B2"] ],
> }
> Writing this to Parquet via a simple CTAS fails. 
> > create table temp as select * from `replist.json`;
> Log indicates this to be unsupported (UnsupportedOperationException: 
> Unsupported type LIST)
> Log attached. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (DRILL-2768) Improve error message for CTAS fails when writing a repeated list

2019-09-01 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko resolved DRILL-2768.
-
Resolution: Won't Fix

Not relevant after DRILL-7326.

> Improve error message for CTAS fails when writing a repeated list
> -
>
> Key: DRILL-2768
> URL: https://issues.apache.org/jira/browse/DRILL-2768
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Parquet
>Affects Versions: 1.0.0
>Reporter: Deneche A. Hakim
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: Future
>
>
> Using the following json file:
> {code}
> { "a" : null, "b" : [ ["B1", "B2"] ] }
> {code}
> The following CTAS query fails because parquet doesn't support a list 
> directly nested inside another. We should improve the error message to better 
> explain this:
> {noformat}
> 0: jdbc:drill:zk=local> create table t2241 as select * from `2241.json`;
> Query failed: Unsupported type LIST
> [04423be8-706d-47c2-b73f-384201163d10 on abdel-11.qa.lab:31010]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7255) Support nulls for all levels of nesting in complex types

2019-08-28 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7255:

Summary: Support nulls for all levels of nesting in complex types  (was: 
Support nulls for all levels of nesting in complex type)

> Support nulls for all levels of nesting in complex types
> 
>
> Key: DRILL-7255
> URL: https://issues.apache.org/jira/browse/DRILL-7255
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7255) Support nulls for all levels of nesting in complex type

2019-08-28 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7255:

Summary: Support nulls for all levels of nesting in complex type  (was: 
Support nulls for all levels of nesting)

> Support nulls for all levels of nesting in complex type
> ---
>
> Key: DRILL-7255
> URL: https://issues.apache.org/jira/browse/DRILL-7255
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7365) Failed to read column added to existing Hive partition

2019-09-04 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7365:

Description: 
*Prerequisities:*

Enable ACID in Hive 
[https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions].

*Steps to reproduce:*

1) create table hive_bucketed2 (emp_id int, first_name string) PARTITIONED BY 
(`col_year_month` string) clustered by (emp_id) into 4 buckets stored as orc 
tblproperties ('transactional'='true');
 2) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values 
(1, 'A'),(2, 'B');
 3) alter table hive_bucketed2 add columns (age INT);
 4) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values 
(11, '1A', 10),(12, '1B', 22);
 5) select * from hive.hive_bucketed2;

*Workaround* (may be a little bit {color:#de350b}risky{color}:) :

1. Connect to Hive metastore database.

[https://analyticsanvil.files.wordpress.com/2016/08/hive_metastore_database_diagram.png]

2. Find SDS linked to desired PARTITIONS . Actually you need CD_ID's for such 
SDS.

3. Insert your column into COLUMNS_V2 with CD_ID found at previous step.

  was:
Prerequisities:

Enable ACID in Hive 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions.

Steps to reproduce:

1) create table hive_bucketed2 (emp_id int, first_name string) PARTITIONED BY 
(`col_year_month` string) clustered by (emp_id) into 4 buckets stored as orc 
tblproperties ('transactional'='true');
2) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values (1, 
'A'),(2, 'B');
3) alter table hive_bucketed2 add columns (age INT);
4) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values 
(11, '1A', 10),(12, '1B', 22);
5) select * from hive.hive_bucketed2;


Workaround (may be a little bit risky:) :

1. Connect to Hive metastore database.

https://analyticsanvil.files.wordpress.com/2016/08/hive_metastore_database_diagram.png

2. Find SDS linked to desired PARTITIONS . Actually you need CD_ID's for such 
SDS.

3. Insert your column into COLUMNS_V2 with CD_ID found at previous step.


> Failed to read column added to existing Hive partition
> --
>
> Key: DRILL-7365
> URL: https://issues.apache.org/jira/browse/DRILL-7365
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Reporter: Igor Guzenko
>Priority: Major
>
> *Prerequisities:*
> Enable ACID in Hive 
> [https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions].
> *Steps to reproduce:*
> 1) create table hive_bucketed2 (emp_id int, first_name string) PARTITIONED BY 
> (`col_year_month` string) clustered by (emp_id) into 4 buckets stored as orc 
> tblproperties ('transactional'='true');
>  2) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values 
> (1, 'A'),(2, 'B');
>  3) alter table hive_bucketed2 add columns (age INT);
>  4) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values 
> (11, '1A', 10),(12, '1B', 22);
>  5) select * from hive.hive_bucketed2;
> *Workaround* (may be a little bit {color:#de350b}risky{color}:) :
> 1. Connect to Hive metastore database.
> [https://analyticsanvil.files.wordpress.com/2016/08/hive_metastore_database_diagram.png]
> 2. Find SDS linked to desired PARTITIONS . Actually you need CD_ID's for such 
> SDS.
> 3. Insert your column into COLUMNS_V2 with CD_ID found at previous step.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (DRILL-7365) Failed to read column added to existing Hive partition

2019-09-04 Thread Igor Guzenko (Jira)
Igor Guzenko created DRILL-7365:
---

 Summary: Failed to read column added to existing Hive partition
 Key: DRILL-7365
 URL: https://issues.apache.org/jira/browse/DRILL-7365
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Reporter: Igor Guzenko


Prerequisities:

Enable ACID in Hive 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions.

Steps to reproduce:

1) create table hive_bucketed2 (emp_id int, first_name string) PARTITIONED BY 
(`col_year_month` string) clustered by (emp_id) into 4 buckets stored as orc 
tblproperties ('transactional'='true');
2) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values (1, 
'A'),(2, 'B');
3) alter table hive_bucketed2 add columns (age INT);
4) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values 
(11, '1A', 10),(12, '1B', 22);
5) select * from hive.hive_bucketed2;


Workaround (may be a little bit risky:) :

1. Connect to Hive metastore database.

https://analyticsanvil.files.wordpress.com/2016/08/hive_metastore_database_diagram.png

2. Find SDS linked to desired PARTITIONS . Actually you need CD_ID's for such 
SDS.

3. Insert your column into COLUMNS_V2 with CD_ID found at previous step.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7252) Read Hive map using Dict vector

2019-09-13 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7252:

Summary: Read Hive map using Dict vector  (was: Read Hive map using 
canonical Map vector)

> Read Hive map using Dict vector
> 
>
> Key: DRILL-7252
> URL: https://issues.apache.org/jira/browse/DRILL-7252
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> Described in DRILL-3290 design doc. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7253) Read Hive struct w/o nulls

2019-09-05 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7253:

Description: Described in DRILL-3290 design doc. 

> Read Hive struct w/o nulls
> --
>
> Key: DRILL-7253
> URL: https://issues.apache.org/jira/browse/DRILL-7253
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Described in DRILL-3290 design doc. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7251) Read Hive array w/o nulls

2019-09-05 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7251:

Description: Described in DRILL-3290 design doc. 

> Read Hive array w/o nulls
> -
>
> Key: DRILL-7251
> URL: https://issues.apache.org/jira/browse/DRILL-7251
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Described in DRILL-3290 design doc. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7373) Fix problems involving reading from DICT type

2019-09-16 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7373:

Labels: ready-to-commit  (was: )

> Fix problems involving reading from DICT type
> -
>
> Key: DRILL-7373
> URL: https://issues.apache.org/jira/browse/DRILL-7373
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Add better support for different key types ({{boolean}}, {{decimal}}, 
> {{float}}, {{double}} etc.) when retrieving values by key from {{DICT}} 
> column  when querying data source with known (during query validation phase) 
> field types (such as Hive table), so that actual key object instance  is 
> created in generated code and is passed to given {{DICT}} reader instead of 
> generating its value for every row based on {{int}} ({{ArraySegment}}) or 
> {{String}} ({{NamedSegment}}) value.
> This may be achieved by storing original literal value of passed key (as 
> {{Object}}) in {{PathSegment}} and its type (as {{MajorType}}) and using it 
> during code generation when reading {{DICT}}'s values by key in 
> {{EvaluationVisitor}}.
> Also, fix NPE when reading some cases involving reading values from {{DICT}} 
> and fix wrong result when reading complex structures using many ITEM 
> operators (i.e. , [] brackets), e.g. 
> {code}
> SELECT rid, mc.map_arr_map['key01'][1]['key01.1'] p16 FROM 
> hive.map_complex_tbl mc
> {code}
> where {{map_arr_map}} is of following type: {{MAP INT>>>}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7252) Read Hive map using canonical Map vector

2019-09-05 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7252:

Description: Described in DRILL-3290 design doc. 

> Read Hive map using canonical Map vector
> -
>
> Key: DRILL-7252
> URL: https://issues.apache.org/jira/browse/DRILL-7252
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> Described in DRILL-3290 design doc. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (DRILL-7380) Query of a field inside of an array of structs returns null

2019-09-19 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7380:
---

Assignee: Igor Guzenko

> Query of a field inside of an array of structs returns null
> ---
>
> Key: DRILL-7380
> URL: https://issues.apache.org/jira/browse/DRILL-7380
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Anton Gozhiy
>Assignee: Igor Guzenko
>Priority: Major
> Attachments: customer_complex.zip
>
>
> *Query:*
> {code:sql}
> select t.c_orders[0].o_orderstatus from hive.customer_complex t limit 10;
> {code}
> *Expected results (given from Hive):*
> {noformat}
> OK
> O
> F
> NULL
> O
> O
> NULL
> O
> O
> NULL
> F
> {noformat}
> *Actual results:*
> {noformat}
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-7381) Query to a map field returns nulls with hive native reader

2019-09-19 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7381:
---

Assignee: Igor Guzenko

> Query to a map field returns nulls with hive native reader
> --
>
> Key: DRILL-7381
> URL: https://issues.apache.org/jira/browse/DRILL-7381
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Anton Gozhiy
>Assignee: Igor Guzenko
>Priority: Major
> Attachments: customer_complex.zip
>
>
> *Query:*
> {code:sql}
> select t.c_nation.n_region.r_name from hive.customer_complex t limit 5
> {code}
> *Expected results:*
> {noformat}
> AFRICA
> MIDDLE EAST
> AMERICA
> MIDDLE EAST
> AMERICA
> {noformat}
> *Actual results:*
> {noformat}
> null
> null
> null
> null
> null
> {noformat}
> *Workaround:*
> {code:sql}
> set store.hive.optimize_scan_with_native_readers = false;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7381) Query to a map field returns nulls with hive native reader

2019-09-26 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7381:

Labels: ready-to-commit  (was: )

> Query to a map field returns nulls with hive native reader
> --
>
> Key: DRILL-7381
> URL: https://issues.apache.org/jira/browse/DRILL-7381
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Anton Gozhiy
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Attachments: customer_complex.zip
>
>
> *Query:*
> {code:sql}
> select t.c_nation.n_region.r_name from hive.customer_complex t limit 5
> {code}
> *Expected results:*
> {noformat}
> AFRICA
> MIDDLE EAST
> AMERICA
> MIDDLE EAST
> AMERICA
> {noformat}
> *Actual results:*
> {noformat}
> null
> null
> null
> null
> null
> {noformat}
> *Workaround:*
> {code:sql}
> set store.hive.optimize_scan_with_native_readers = false;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7380) Query of a field inside of an array of structs returns null

2019-09-26 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7380:

Labels: ready-to-commit  (was: )

> Query of a field inside of an array of structs returns null
> ---
>
> Key: DRILL-7380
> URL: https://issues.apache.org/jira/browse/DRILL-7380
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Anton Gozhiy
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Attachments: customer_complex.zip
>
>
> *Query:*
> {code:sql}
> select t.c_orders[0].o_orderstatus from hive.customer_complex t limit 10;
> {code}
> *Expected results (given from Hive):*
> {noformat}
> OK
> O
> F
> NULL
> O
> O
> NULL
> O
> O
> NULL
> F
> {noformat}
> *Actual results:*
> {noformat}
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >