[jira] [Created] (HIVE-15785) Add S3 support for druid storage handler

2017-02-01 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15785:
-

 Summary: Add S3 support for druid storage handler
 Key: HIVE-15785
 URL: https://issues.apache.org/jira/browse/HIVE-15785
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
 Fix For: 2.2.0


Add S3 support for druid storage handler



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15809) Typo in the PostgreSQL database name for druid service

2017-02-03 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15809:
-

 Summary: Typo in the PostgreSQL database name for druid service
 Key: HIVE-15809
 URL: https://issues.apache.org/jira/browse/HIVE-15809
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: slim bouguerra
Assignee: slim bouguerra
Priority: Trivial
 Fix For: 2.2.0






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15727) Add pre insert work to give storage handler the possibility to perform pre insert checking

2017-01-25 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15727:
-

 Summary: Add pre insert work to give storage handler the 
possibility to perform pre insert checking
 Key: HIVE-15727
 URL: https://issues.apache.org/jira/browse/HIVE-15727
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 2.2.0


Add pre insert work stage to give storage handler the possibility to perform 
pre insert checking. For instance for the druid storage handler this will block 
the statement INSERT INTO statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15951) Make sure base persist directory is unique and deleted

2017-02-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15951:
-

 Summary: Make sure base persist directory is unique and deleted
 Key: HIVE-15951
 URL: https://issues.apache.org/jira/browse/HIVE-15951
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: slim bouguerra
Priority: Critical
 Fix For: 2.2.0


In some cases the base persist directory will contain old data or shared 
between reducer in the same physical VM.
That will lead to the failure of the job till that the directory is cleaned.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16025) Where IN clause throws exception

2017-02-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16025:
-

 Summary: Where IN clause throws exception
 Key: HIVE-16025
 URL: https://issues.apache.org/jira/browse/HIVE-16025
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Priority: Critical


{code}
select * from login_druid where userid IN ("user1", "user2");
Exception in thread "main" java.lang.AssertionError: cannot translate filter: 
IN($1, _UTF-16LE'user1', _UTF-16LE'user2')
at 
org.apache.calcite.adapter.druid.DruidQuery$Translator.translateFilter(DruidQuery.java:886)
at 
org.apache.calcite.adapter.druid.DruidQuery$Translator.access$000(DruidQuery.java:786)
at 
org.apache.calcite.adapter.druid.DruidQuery.getQuery(DruidQuery.java:424)
at 
org.apache.calcite.adapter.druid.DruidQuery.deriveQuerySpec(DruidQuery.java:402)
at 
org.apache.calcite.adapter.druid.DruidQuery.getQuerySpec(DruidQuery.java:351)
at 
org.apache.calcite.adapter.druid.DruidQuery.deriveRowType(DruidQuery.java:271)
at 
org.apache.calcite.rel.AbstractRelNode.getRowType(AbstractRelNode.java:219)
at 
org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:343)
at 
org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
at 
org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:225)
at 
org.apache.calcite.adapter.druid.DruidRules$DruidFilterRule.onMatch(DruidRules.java:142)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:314)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:502)
at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:381)
at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:247)
at 
org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:125)
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:206)
at 
org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:193)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:1775)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1504)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1260)
at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:113)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:997)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1068)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1084)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:363)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11026)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:285)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:511)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16026) Generated query will timeout and/or kill the druid cluster.

2017-02-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16026:
-

 Summary: Generated query will timeout and/or kill the druid 
cluster.
 Key: HIVE-16026
 URL: https://issues.apache.org/jira/browse/HIVE-16026
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Grouping by `__time` and another dimension generate a query with granularity 
NONE with an interval from 1970 to 3000. This will kill the druid cluster 
because druid group by strategy will create cursor for every ms and there is 
lot of milliseconds between 1970 and 3000. Hence such query can turn into a 
select then do the group by within hive. This should only happen when we don't 
know the `__time` granularity.
{code}
explain select `__time`, userid from login_druid group by `__time`, userid
> ;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
limit:-1
Select Operator [SEL_1]
  Output:["_col0","_col1"]
  TableScan [TS_0]

Output:["__time","userid"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_user_login\",\"granularity\":\"NONE\",\"dimensions\":[\"userid\"],\"limitSpec\":{\"type\":\"default\"},\"aggregations\":[{\"type\":\"longSum\",\"name\":\"dummy_agg\",\"fieldName\":\"dummy_agg\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"}
{code}  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15877) Upload dependency jars for druid storage handler

2017-02-10 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15877:
-

 Summary: Upload dependency jars for druid storage handler
 Key: HIVE-15877
 URL: https://issues.apache.org/jira/browse/HIVE-15877
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


Upload dependency jars for druid storage handler



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-11-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15277:
-

 Summary: Teach Hive how to create/delete Druid segments 
 Key: HIVE-15277
 URL: https://issues.apache.org/jira/browse/HIVE-15277
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: slim bouguerra
Assignee: slim bouguerra




We want to extend the DruidStorageHandler to support CTAS queries.
In this implementation Hive will generate druid segment files and insert the 
metadata to signal the handoff to druid.

The syntax will be as follows:

CREATE TABLE druid_table_1
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "datasourcename")
AS ;

This statement stores the results of query  in a Druid datasource 
named 'datasourcename'. One of the columns of the query needs to be the time 
dimension, which is mandatory in Druid. In particular, we use the same 
convention that it is used for Druid: there needs to be a the column named 
'__time' in the result of the executed query, which will act as the time 
dimension column in Druid. Currently, the time column dimension needs to be a 
'timestamp' type column.
metrics can be of type long, double and float while dimensions are strings. 
Keep in mind that druid has a clear separation between dimensions and metrics, 
therefore if you have a column in hive that contains number and need to be 
presented as dimension use the cast operator to cast as string. 
This initial implementation interacts with Druid Meta data storage to 
add/remove the table in druid, user need to supply the meta data config as 
--hiveconf hive.druid.metadata.password=XXX --hiveconf 
hive.druid.metadata.username=druid --hiveconf 
hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15273) Http Client not configured correctly

2016-11-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15273:
-

 Summary: Http Client not configured correctly
 Key: HIVE-15273
 URL: https://issues.apache.org/jira/browse/HIVE-15273
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
Priority: Minor


Current used http client by the druid-hive record reader is constructed with 
default values. Default values of numConnection and ReadTimeout are very small 
which can lead to following exception " ERROR 
[2ee34a2b-c8a5-4748-ab91-db3621d2aa5c main] CliDriver: Failed with exception 
java.io.IOException:java.io.IOException: java.io.IOException: org.apache.h
ive.druid.org.jboss.netty.channel.ChannelException: Channel disconnected"
Full stack can be found 
here.https://gist.github.com/b-slim/384ca6a96698f5b51ad9b171cff556a2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15274) wrong results on the column __time

2016-11-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15274:
-

 Summary: wrong results on the column __time
 Key: HIVE-15274
 URL: https://issues.apache.org/jira/browse/HIVE-15274
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: Jesus Camacho Rodriguez
Priority: Minor


issuing select * from table will return wrong time column.
expected results
 ─┬┬─┐
│ __time  │ dimension1 │ metric1 │
├─┼┼─┤
│ Wed Dec 31 2014 16:00:00 GMT-0800 (PST) │ value1 │ 1   │
│ Wed Dec 31 2014 16:00:00 GMT-0800 (PST) │ value1.1   │ 1   │
│ Sun May 31 2015 19:00:00 GMT-0700 (PDT) │ value2 │ 20.5│
│ Sun May 31 2015 19:00:00 GMT-0700 (PDT) │ value2.1   │ 32  │
└─┴┴─┘

returned result

2014-12-31 19:00:00 value1  1.0
2014-12-31 19:00:00 value1.11.0
2014-12-31 19:00:00 value2  20.5
2014-12-31 19:00:00 value2.132.0





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15393) Update Guava version

2016-12-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15393:
-

 Summary: Update Guava version
 Key: HIVE-15393
 URL: https://issues.apache.org/jira/browse/HIVE-15393
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: slim bouguerra
Priority: Blocker


Druid base code is using newer version of guava 16.0.1 that is not compatible 
with the current version used by Hive.
FYI Hadoop project is moving to Guava 18 not sure if it is better to move to 
guava 18 or even 19.
https://issues.apache.org/jira/browse/HADOOP-10101



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15439) Support INSERT OVERWRITE for internal druid datasources.

2016-12-15 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15439:
-

 Summary: Support INSERT OVERWRITE for internal druid datasources.
 Key: HIVE-15439
 URL: https://issues.apache.org/jira/browse/HIVE-15439
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: slim bouguerra
Assignee: slim bouguerra


Add support for SQL statement INSERT OVERWRITE TABLE druid_internal_table.
In order to add this support will need to add new post insert hook to update 
the druid metadata. Creation of the segment will be the same as CTAS.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15571) Support Insert into for druid storage handler

2017-01-10 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15571:
-

 Summary: Support Insert into for druid storage handler
 Key: HIVE-15571
 URL: https://issues.apache.org/jira/browse/HIVE-15571
 Project: Hive
  Issue Type: New Feature
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15586) Make Insert and Create statement Transactional

2017-01-11 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-15586:
-

 Summary: Make Insert and Create statement Transactional
 Key: HIVE-15586
 URL: https://issues.apache.org/jira/browse/HIVE-15586
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently insert/create will return the handle to user without waiting for the 
data been loaded by the druid cluster. In order to avoid that will add a 
passive wait till the segment are loaded by historical in case the coordinator 
is UP.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-16210) Use jvm temporary tmp dir by default

2017-03-14 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16210:
-

 Summary: Use jvm temporary tmp dir by default
 Key: HIVE-16210
 URL: https://issues.apache.org/jira/browse/HIVE-16210
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


instead of using "/tmp" by default, it makes more sense to use the jvm default 
tmp dir. This can have dramatic consequences if the indexed files are huge. For 
instance application run by run containers can be provisioned with a dedicated 
tmp dir. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16371) Add bitmap selection strategy for druid storage handler

2017-04-04 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16371:
-

 Summary: Add bitmap selection strategy for druid storage handler
 Key: HIVE-16371
 URL: https://issues.apache.org/jira/browse/HIVE-16371
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently only Concise Bitmap strategy is supported.
This Pr is to make Roaring bitmap encoding the default and Concise optional if 
needed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16404) Renaming of public classes in Calcite 12 breeaking druid integration

2017-04-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16404:
-

 Summary: Renaming of public classes in Calcite 12 breeaking druid 
integration
 Key: HIVE-16404
 URL: https://issues.apache.org/jira/browse/HIVE-16404
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: slim bouguerra
 Fix For: 3.0.0


Changes to names in the druid rules is backward incompatible with current 
implementation.
https://github.com/apache/calcite/commit/a89c62cd6d6cc181c90881afa0bf099746739a91



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16482) Druid Ser/Desr need to use dimension output name in order to function with Extraction function

2017-04-19 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16482:
-

 Summary: Druid Ser/Desr need to use dimension output name in order 
to function with Extraction function
 Key: HIVE-16482
 URL: https://issues.apache.org/jira/browse/HIVE-16482
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


Druid Ser/Desr need to use dimension output name in order to function with 
Extraction function.
Some part of the Ser/Desr code uses the method {code} 
DimensionSpec.getDimension(){code} although when extraction function are in 
game the name of the dimension will be defined by 
{code}DimensionSpec.getOutputName() {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16149) Druid query path fails when using LLAP mode

2017-03-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16149:
-

 Summary: Druid query path fails when using LLAP mode
 Key: HIVE-16149
 URL: https://issues.apache.org/jira/browse/HIVE-16149
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: Ashutosh Chauhan


{code}
hive> select i_item_desc ,i_category ,i_class ,i_current_price ,i_item_id 
,sum(ss_ext_sales_price)
> as itemrevenue ,sum(ss_ext_sales_price)*100/sum(sum(ss_ext_sales_price)) 
over (partition by i_class) as revenueratio
> from tpcds_store_sales_sold_time_1000_day_all
> where  (i_category ='Jewelry' or  i_category = 'Sports' or i_category 
='Books') and `__time` >= cast('2001-01-12' as date) and `__time` <= 
cast('2001-02-11' as date)
> group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price 
order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 10;
Query ID = sbouguerra_20170308131436_225330b7-1142-4e4e-a05a-46ef544c8ee8
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id 
application_1488231257387_1862)

--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 llapINITED  1  001
   0   0
Reducer 2 llapINITED  2  002
   0   0
Reducer 3 llapINITED  1  001
   0   0
--
VERTICES: 00/03  [>>--] 0%ELAPSED TIME: 59.68 s
--
Status: Failed
Dag received [DAG_TERMINATE, SERVICE_PLUGIN_ERROR] in RUNNING state.
Error reported by TaskScheduler [[2:LLAP]][SERVICE_UNAVAILABLE] No LLAP Daemons 
are running
Vertex killed, vertexName=Reducer 3, vertexId=vertex_1488231257387_1862_3_02, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not 
succeed due to DAG_TERMINATED, failedTasks:0 killedTasks:1, Vertex 
vertex_1488231257387_1862_3_02 [Reducer 3] killed/failed due to:DAG_TERMINATED]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1488231257387_1862_3_01, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not 
succeed due to DAG_TERMINATED, failedTasks:0 killedTasks:2, Vertex 
vertex_1488231257387_1862_3_01 [Reducer 2] killed/failed due to:DAG_TERMINATED]
Vertex killed, vertexName=Map 1, vertexId=vertex_1488231257387_1862_3_00, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not 
succeed due to DAG_TERMINATED, failedTasks:0 killedTasks:1, Vertex 
vertex_1488231257387_1862_3_00 [Map 1] killed/failed due to:DAG_TERMINATED]
DAG did not succeed due to SERVICE_PLUGIN_ERROR. failedVertices:0 
killedVertices:3
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Dag received [DAG_TERMINATE, 
SERVICE_PLUGIN_ERROR] in RUNNING state.Error reported by TaskScheduler 
[[2:LLAP]][SERVICE_UNAVAILABLE] No LLAP Daemons are runningVertex killed, 
vertexName=Reducer 3, vertexId=vertex_1488231257387_1862_3_02, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not 
succeed due to DAG_TERMINATED, failedTasks:0 killedTasks:1, Vertex 
vertex_1488231257387_1862_3_02 [Reducer 3] killed/failed due 
to:DAG_TERMINATED]Vertex killed, vertexName=Reducer 2, 
vertexId=vertex_1488231257387_1862_3_01, diagnostics=[Vertex received Kill 
while in RUNNING state., Vertex did not succeed due to DAG_TERMINATED, 
failedTasks:0 killedTasks:2, Vertex vertex_1488231257387_1862_3_01 [Reducer 2] 
killed/failed due to:DAG_TERMINATED]Vertex killed, vertexName=Map 1, 
vertexId=vertex_1488231257387_1862_3_00, diagnostics=[Vertex received Kill 
while in RUNNING state., Vertex did not succeed due to DAG_TERMINATED, 
failedTasks:0 killedTasks:1, Vertex vertex_1488231257387_1862_3_00 [Map 1] 
killed/failed due to:DAG_TERMINATED]DAG did not succeed due to 
SERVICE_PLUGIN_ERROR. failedVertices:0 killedVertices:3
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16124) Drop the segments data as soon it is pushed to HDFS

2017-03-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16124:
-

 Summary: Drop the segments data as soon it is pushed to HDFS
 Key: HIVE-16124
 URL: https://issues.apache.org/jira/browse/HIVE-16124
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Drop the pushed segments from the indexer as soon as the HDFS push is done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16123) Let user chose the granularity of bucketing.

2017-03-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16123:
-

 Summary: Let user chose the granularity of bucketing.
 Key: HIVE-16123
 URL: https://issues.apache.org/jira/browse/HIVE-16123
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Currently we index the data with granularity of none which puts lot of pressure 
on the indexer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16122) NPE Hive Druid split introduced by HIVE-15928

2017-03-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16122:
-

 Summary: NPE Hive Druid split introduced by HIVE-15928
 Key: HIVE-16122
 URL: https://issues.apache.org/jira/browse/HIVE-16122
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16126) push all the time extraction to druid

2017-03-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16126:
-

 Summary: push all the time extraction to druid
 Key: HIVE-16126
 URL: https://issues.apache.org/jira/browse/HIVE-16126
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


currently we don't push most of the time extractions to druid which leads to 
selecting all the data, bad!.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16125) Split work between reducers.

2017-03-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16125:
-

 Summary: Split work between reducers.
 Key: HIVE-16125
 URL: https://issues.apache.org/jira/browse/HIVE-16125
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Split work between reducer.
currently we have one reducer per segment granularity even if the interval will 
be partitioned over multiple partitions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16095) Filter generation is not taking into account the column type.

2017-03-02 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16095:
-

 Summary: Filter generation is not taking into account the column 
type.
 Key: HIVE-16095
 URL: https://issues.apache.org/jira/browse/HIVE-16095
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


We are suppose to get alphanumeric comparison when we have a cast to numeric 
type. This looks like to be a calcite issue.  
{code}
hive> explain select * from login_druid where userid < 2
> ;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
limit:-1
Select Operator [SEL_1]
  Output:["_col0","_col1","_col2"]
  TableScan [TS_0]

Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"filter\":{\"type\":\"bound\",\"dimension\":\"userid\",\"upper\":\"2\",\"upperStrict\":true,\"alphaNumeric\":false},\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}

Time taken: 1.548 seconds, Fetched: 10 row(s)
hive> explain select * from login_druid where cast (userid as int) < 2;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
limit:-1
Select Operator [SEL_1]
  Output:["_col0","_col1","_col2"]
  TableScan [TS_0]

Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"filter\":{\"type\":\"bound\",\"dimension\":\"userid\",\"upper\":\"2\",\"upperStrict\":true,\"alphaNumeric\":false},\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}

Time taken: 0.27 seconds, Fetched: 10 row(s)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16096) Predicate `__time` In ("date", "date") is not pused

2017-03-02 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16096:
-

 Summary: Predicate `__time` In ("date", "date") is not pused
 Key: HIVE-16096
 URL: https://issues.apache.org/jira/browse/HIVE-16096
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


{code}
 explain select * from login_druid where `__time` in ("2003-1-1", "2004-1-1" );
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
limit:-1
Select Operator [SEL_2]
  Output:["_col0","_col1","_col2"]
  Filter Operator [FIL_4]
predicate:(__time) IN ('2003-1-1', '2004-1-1')
TableScan [TS_0]
  
Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}

{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16519) Fix exception thrown by checkOutputSpecs

2017-04-24 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16519:
-

 Summary: Fix exception thrown by checkOutputSpecs
 Key: HIVE-16519
 URL: https://issues.apache.org/jira/browse/HIVE-16519
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


do not throw exception by checkOutputSpecs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message

2017-08-11 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17302:
-

 Summary: ReduceRecordSource should not add batch string to 
Exception message
 Key: HIVE-17302
 URL: https://issues.apache.org/jira/browse/HIVE-17302
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


ReduceRecordSource is adding the batch data as a string to the exception stack, 
this can lead to an OOM of the Query AM when the query fails due to other issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17303) Missmatch between roaring bitmap library used by druid and the one coming from tez

2017-08-11 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17303:
-

 Summary: Missmatch between roaring bitmap library used by druid 
and the one coming from tez
 Key: HIVE-17303
 URL: https://issues.apache.org/jira/browse/HIVE-17303
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


{code} 


 
Caused by: java.util.concurrent.ExecutionException: 
java.lang.NoSuchMethodError: 
org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
  at 
org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
  at 
org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
  at 
org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
  at 
org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:165)
  ... 25 more
Caused by: java.lang.NoSuchMethodError: 
org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
  at 
org.apache.hive.druid.com.metamx.collections.bitmap.WrappedRoaringBitmap.toImmutableBitmap(WrappedRoaringBitmap.java:65)
  at 
org.apache.hive.druid.com.metamx.collections.bitmap.RoaringBitmapFactory.makeImmutableBitmap(RoaringBitmapFactory.java:88)
  at 
org.apache.hive.druid.io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:348)
  at 
org.apache.hive.druid.io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:218)
  at 
org.apache.hive.druid.io.druid.segment.IndexMerger.merge(IndexMerger.java:438)
  at 
org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:186)
  at 
org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:152)
  at 
org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:996)
  at 
org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:93)
  at 
org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:385)
  at 
org.apache.hive.druid.io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44)
  ... 4 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
killedTasks:89, Vertex vertex_1502470020457_0005_12_05 [Reducer 2] 
killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to 
VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)

Options

Attachments
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-07-24 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17160:
-

 Summary: Adding kerberos Authorization to the Druid hive 
integration
 Key: HIVE-17160
 URL: https://issues.apache.org/jira/browse/HIVE-17160
 Project: Hive
  Issue Type: New Feature
  Components: Druid integration
Reporter: slim bouguerra


This goal of this feature is to allow hive querying a secured druid cluster 
using kerberos credentials.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16522) Hive is query timer is not keeping track of the fetch task execution

2017-04-24 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16522:
-

 Summary: Hive is query timer is not keeping track of the fetch 
task execution
 Key: HIVE-16522
 URL: https://issues.apache.org/jira/browse/HIVE-16522
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently Hive CLI query execution time does not include fetch time execution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-17372) update druid dependency to druid 0.10.1

2017-08-22 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17372:
-

 Summary: update druid dependency to druid 0.10.1
 Key: HIVE-17372
 URL: https://issues.apache.org/jira/browse/HIVE-17372
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


Update to most recent druid version to be released August 23.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16816) Chained Group by support for druid.

2017-06-02 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16816:
-

 Summary: Chained Group by support for druid.
 Key: HIVE-16816
 URL: https://issues.apache.org/jira/browse/HIVE-16816
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra


This is more likely to be a calcite enhancement but am logging it here to track 
it any way.
Currently queries like {code} select count (distinct dim) from table {code} is 
pushed partially to druid as group by dim followed by a count executed by hive 
QE. This can be enhanced by using the nested (eg chained execution) group by 
query such as the first (inner) GB query does group by key and the second 
(outer) does the  count. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16588) Ressource leak by druid http client

2017-05-04 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-16588:
-

 Summary: Ressource leak by druid http client
 Key: HIVE-16588
 URL: https://issues.apache.org/jira/browse/HIVE-16588
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
 Fix For: 3.0.0


Current implementation of druid storage handler does leak some resources if the 
creation of the http client fails due to too many files exception.
The reason this is leaking is the fact the cleaning hook is registered after 
the client starts.
In order to fix this will extract the creation of the HTTP client to become 
static and reusable instead of per query creation.
 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-17581) Replace some calcite dependencies with native ones

2017-09-22 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17581:
-

 Summary: Replace some calcite dependencies with native ones
 Key: HIVE-17581
 URL: https://issues.apache.org/jira/browse/HIVE-17581
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra
Assignee: slim bouguerra


This is a followup of HIVE-17468. This patch excludes some unwanted 
druid-calcite dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17582) Followup of HIVE-15708

2017-09-22 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17582:
-

 Summary: Followup of HIVE-15708
 Key: HIVE-17582
 URL: https://issues.apache.org/jira/browse/HIVE-17582
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra
Assignee: slim bouguerra


HIVE-15708 commit be59e024420ed5ca970e87a6dec402fecee21f06 
introduced some unwanted bugs
it did change the following code 
org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat#169
{code}
  builder.intervals(Arrays.asList(DruidTable.DEFAULT_INTERVAL));
{code}
with 
{code}
final List intervals = Arrays.asList();
builder.intervals(intervals);
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17468:
-

 Summary: Shade and package appropriate jackson version for druid 
storage handler
 Key: HIVE-17468
 URL: https://issues.apache.org/jira/browse/HIVE-17468
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
 Fix For: 3.0.0


Currently we are excluding all the jackson core dependencies coming from druid. 
This is wrong in my opinion since this will lead to the packaging of unwanted 
jackson library from other projects.
As you can see the file hive-druid-deps.txt currently jacskon core is coming 
from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
by druid. This patch exclude the unwanted jars and make sure to bring in druid 
jackson dependency from druid it self.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-12 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17523:
-

 Summary: Insert into druid table  hangs Hive server2 in an infinit 
loop
 Key: HIVE-17523
 URL: https://issues.apache.org/jira/browse/HIVE-17523
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Inserting data via insert into table backed by druid can lead to a Hive server 
hang.
This is due to some bug in the naming of druid segments partitions.
To reproduce the issue 
{code}
drop table login_hive;
create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
double);
insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);

insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);

insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);

insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);


drop table login_druid;
CREATE TABLE login_druid
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
"druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
AS
select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
select * FROM login_druid;

insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
{code}

This patch unifies the logic of pushing and segments naming by using Druid data 
segment pusher as much as possible.
This patch also has some minor code refactoring and test enhancements.
 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17653) Druid storage handler CTAS with boolean type columns fails.

2017-09-29 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17653:
-

 Summary: Druid storage handler CTAS with boolean type columns 
fails. 
 Key: HIVE-17653
 URL: https://issues.apache.org/jira/browse/HIVE-17653
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
 Fix For: 3.0.0


Druid storage handler CTAS fails with the exception below when a Boolean column 
is included.
A simple workaround would be to add a cast to string over the boolean column, 
this will lead to index the column as a druid dimension with value `true` or 
`false`.

{code}
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Reducer 3, 
vertexId=vertex_1506230948023_0005_9_02, diagnostics=[Task failed, 
taskId=task_1506230948023_0005_9_02_03, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_1506230948023_0005_9_02_03_0:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 2)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing vector batch (tag=0) (vectorizedVertexNum 2)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:406)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:248)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:189)
... 15 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing vector batch (tag=0) (vectorizedVertexNum 2)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:492)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:397)
... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
Dimension bo does not have STRING type: BOOLEAN
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:564)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:664)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:955)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:903)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:479)
... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: Dimension bo does not have STRING type: BOOLEAN
at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:272)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:609)
at 

[jira] [Created] (HIVE-17623) Fix Select query Fix Double column serde and some refactoring

2017-09-27 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17623:
-

 Summary: Fix Select query Fix Double column serde and some 
refactoring
 Key: HIVE-17623
 URL: https://issues.apache.org/jira/browse/HIVE-17623
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra


This PR has 2 fixes.
First, fixes the limit of results returned by Select query that used to be 
limited to 16K rows
Second fixes the type inference for the double type newly added to druid.
Use Jackson polymorphism to infer types and parse results from druid nodes.
Removes duplicate codes form RecordReaders.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17627) Use druid scan query instead of the select query.

2017-09-27 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17627:
-

 Summary: Use druid scan query instead of the select query.
 Key: HIVE-17627
 URL: https://issues.apache.org/jira/browse/HIVE-17627
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


The biggest difference between select query and scan query is that, scan query 
doesn't retain all rows in memory before rows can be returned to client.
It will cause memory pressure if too many rows required by select query.
Scan query doesn't have this issue.
Scan query can return all rows without issuing another pagination query, which 
is extremely useful when query against historical or realtime node directly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18156) Provide smooth migration path for CTAS when time column is not with timezone

2017-11-27 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18156:
-

 Summary: Provide smooth migration path for CTAS when time column 
is not with timezone 
 Key: HIVE-18156
 URL: https://issues.apache.org/jira/browse/HIVE-18156
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: Jesus Camacho Rodriguez


Currently, default recommend CTAS and most legacy documentation does not 
specify that __time column needs to be with timezone. Thus the CTAS will fail 
with 
{code} 
2017-11-27T17:13:10,241 ERROR [e5f708c8-df4e-41a4-b8a1-d18ac13123d2 main] 
ql.Driver: FAILED: SemanticException No column with timestamp with local 
time-zone type on query result; one column should be of timestamp with local 
time-zone type
org.apache.hadoop.hive.ql.parse.SemanticException: No column with timestamp 
with local time-zone type on query result; one column should be of timestamp 
with local time-zone type
at 
org.apache.hadoop.hive.ql.optimizer.SortedDynPartitionTimeGranularityOptimizer$SortedDynamicPartitionProc.getGranularitySelOp(SortedDynPartitionTimeGranularityOptimizer.java:242)
at 
org.apache.hadoop.hive.ql.optimizer.SortedDynPartitionTimeGranularityOptimizer$SortedDynamicPartitionProc.process(SortedDynPartitionTimeGranularityOptimizer.java:163)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.optimizer.SortedDynPartitionTimeGranularityOptimizer.transform(SortedDynPartitionTimeGranularityOptimizer.java:103)
at 
org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:250)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11683)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:298)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:592)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1589)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1356)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1346)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:342)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1300)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1274)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
at 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at 

[jira] [Created] (HIVE-18196) Druid Mini Cluster to run Qtests integrations tests.

2017-12-01 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18196:
-

 Summary: Druid Mini Cluster to run Qtests integrations tests.
 Key: HIVE-18196
 URL: https://issues.apache.org/jira/browse/HIVE-18196
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: Ashutosh Chauhan


The overall Goal of this is to add a new Module that can fork a druid cluster 
to run integration testing as part of the Mini Clusters Qtest suite.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18197) Fix issue with wrong segments identifier usage.

2017-12-01 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18197:
-

 Summary: Fix issue with wrong segments identifier usage.
 Key: HIVE-18197
 URL: https://issues.apache.org/jira/browse/HIVE-18197
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


We have 2 different issues, that can make checking of load status fail for 
druid segments.
issues are due to usage of wrong segment identifier at couple of locations.

# We are construction the segment identifier with UTC timezone, which can be 
wrong if the segments we built in a different timezone. The way to fix this is 
to use the segment identifier instead of re-making it at the client side.
# We are using outdate segments identifiers for the INSERT INTO case. The way 
to fix this is to use the segment metadata produced by the metadata commit 
phase.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18226) handle UDF to double/int over aggregate

2017-12-05 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18226:
-

 Summary: handle UDF to double/int over aggregate
 Key: HIVE-18226
 URL: https://issues.apache.org/jira/browse/HIVE-18226
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra


In cases like the following query Hive planner adds extra UDFtoDouble over 
integer columns.
This kind of udf can be pushed to Druid as DoubleSum instead of LongSum and 
vice versa.
{code}
PREHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
FROM druid_table GROUP BY floor_year(`__time`)
PREHOOK: type: QUERY
POSTHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
FROM druid_table GROUP BY floor_year(`__time`)
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: druid_table
properties:
  druid.query.json 
{"queryType":"timeseries","dataSource":"default.druid_table","descending":false,"granularity":"year","aggregations":[{"type":"longSum","name":"$f1","fieldName":"ctinyint"},{"type":"count","name":"$f2"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"],"context":{"skipEmptyBuckets":true}}
  druid.query.type timeseries
Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column 
stats: NONE
Select Operator
  expressions: __time (type: timestamp with local time zone), 
(UDFToDouble($f1) / UDFToDouble($f2)) (type: double)
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
table:
input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18254) Use proper AVG Calcite primitive instead of Other_FUNCTION

2017-12-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18254:
-

 Summary: Use proper AVG Calcite primitive instead of Other_FUNCTION
 Key: HIVE-18254
 URL: https://issues.apache.org/jira/browse/HIVE-18254
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


Currently Hive-Calcite operator tree treats AVG function as an unknown function 
that has a Calcite Sql Kind of Other_FUNCTION. This is an issue that can get 
into the way of rules like 
{{org.apache.calcite.rel.rules.AggregateReduceFunctionsRule}}.
This patch adds the avg function to the list of known aggregate function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17871) Add non nullability flag to druid time column

2017-10-20 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17871:
-

 Summary: Add non nullability flag to druid time column
 Key: HIVE-17871
 URL: https://issues.apache.org/jira/browse/HIVE-17871
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra


Druid time column is non null all the time.
Adding the non nullability flag will enable extra calcite goodness  like 
transforming 
{code} select count(`__time`) from table {code} to {code} select count(*) from 
table {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-19443) Issue with Druid timestamp with timezone handling

2018-05-07 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19443:
-

 Summary: Issue with Druid timestamp with timezone handling
 Key: HIVE-19443
 URL: https://issues.apache.org/jira/browse/HIVE-19443
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
 Attachments: test_resutls.out, test_timestamp.q

As you can see at the attached file [^test_resutls.out] when switching current 
timezone to UTC the insert of values from Hive table into Druid table does miss 
some rows.

You can use this to reproduce it.

[^test_timestamp.q]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.

2018-05-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19474:
-

 Summary: Decimal type should be casted as part of the CTAS or 
INSERT Clause.
 Key: HIVE-19474
 URL: https://issues.apache.org/jira/browse/HIVE-19474
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


HIVE-18569  introduced a runtime config variable to allow the indexing of 
Decimal as Double, this leads to kind of messy state, Hive metadata think the 
column is still decimal while it is stored as double. Since the Hive metadata 
of the column is Decimal the logical optimizer will not push down aggregates. i 
tried to fix this by adding some logic to the application but it makes the code 
very clumsy with lot of branches. Instead i propose to revert this patch and 
let the user introduce an explicit cast this will be better since the metada 
reflects actual storage type and push down aggregates will kick in and there is 
no config needed.

cc [~ashutoshc] and [~nishantbangarwa]

You can see the difference with the following DDL

{code}

create table test_base_table(`timecolumn` timestamp, `interval_marker` string, 
`num_l` DECIMAL(10,2));
insert into test_base_table values ('2015-03-08 00:00:00', 'i1-start', 4.5);
set hive.druid.approx.result=true;
CREATE TABLE druid_test_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
`interval_marker`, cast(`num_l` as double)
FROM test_base_table;
describe druid_test_table;
explain select sum(num_l), min(num_l) FROM druid_test_table;
CREATE TABLE druid_test_table_2
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
`interval_marker`, `num_l`
FROM test_base_table;
describe druid_test_table_2;
explain select sum(num_l), min(num_l) FROM druid_test_table_2;

{code}

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19490) Locking on Insert into for non native and managed tables.

2018-05-10 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19490:
-

 Summary: Locking on Insert into for non native and managed tables.
 Key: HIVE-19490
 URL: https://issues.apache.org/jira/browse/HIVE-19490
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra


Current state of the art: 

Managed non native table like Druid Tables, will need to get a Lock on Insert 
into or insert Over write. The nature of this lock is set to Exclusive by 
default for any non native table.

This implies that Inserts into Druid table will Lock any read query as well 
during the execution of the insert into. IMO this lock (on insert into) is  not 
needed since the insert statement is appending data and the state of loading it 
is managed partially by Hive Storage handler hook and part of it by Druid. 

What i am proposing is to relax the lock level to shared for all non native 
tables on insert into operations and keep it as Exclusive Write for insert 
Overwrite for now.

 

Any feedback is welcome.

cc [~ekoifman] / [~ashutoshc] / [~jdere] / [~hagleitn]

Also am not sure what is the best way to unit test this currently am using 
debugger to check of locks are what i except, please let me know if there is a 
better way to do this. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19441) Add support for float aggregator and use LLAP test Driver

2018-05-07 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19441:
-

 Summary: Add support for float aggregator and use LLAP test Driver
 Key: HIVE-19441
 URL: https://issues.apache.org/jira/browse/HIVE-19441
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Adding support to the float kind aggregator.

Use LLAP as test Driver to reduce execution time of tests from about 2 hours to 
15 min:

Although this patches unveiling an issue with timezone, maybe it is fixed by 
[~jcamachorodriguez] upcoming set of patches.

 

Before

{code}

[INFO] Executed tasks
[INFO]
[INFO] --- maven-compiler-plugin:3.6.1:testCompile (default-testCompile) @ 
hive-it-qfile ---
[INFO] Compiling 21 source files to 
/Users/sbouguerra/Hdev/hive/itests/qtest/target/test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hive-it-qfile ---
[INFO]
[INFO] ---
[INFO] T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hive.cli.TestMiniDruidCliDriver
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
6,654.117 s - in org.apache.hadoop.hive.cli.TestMiniDruidCliDriver
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 01:51 h
[INFO] Finished at: 2018-05-04T12:43:19-07:00
[INFO] 

{code}

After

{code}

INFO] Executed tasks
[INFO]
[INFO] --- maven-compiler-plugin:3.6.1:testCompile (default-testCompile) @ 
hive-it-qfile ---
[INFO] Compiling 22 source files to 
/Users/sbouguerra/Hdev/hive/itests/qtest/target/test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hive-it-qfile ---
[INFO]
[INFO] ---
[INFO] T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hive.cli.TestMiniDruidCliDriver
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 907.167 
s - in org.apache.hadoop.hive.cli.TestMiniDruidCliDriver
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 15:31 min
[INFO] Finished at: 2018-05-04T13:15:11-07:00
[INFO] 

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19462) Fix mapping for char_length function to enable pushdown to Druid.

2018-05-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19462:
-

 Summary: Fix mapping for char_length function to enable pushdown 
to Druid. 
 Key: HIVE-19462
 URL: https://issues.apache.org/jira/browse/HIVE-19462
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


currently char_length is not push down to Druid because of missing mapping 
form/to calcite

This patch will add this mapping.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-17 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19586:
-

 Summary: Optimize Count(distinct X) pushdown based on the storage 
capabilities 
 Key: HIVE-19586
 URL: https://issues.apache.org/jira/browse/HIVE-19586
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration, Logical Optimizer
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


h1. Goal
Provide a way to rewrite queries with combination of COUNT(Distinct) and 
Aggregates like SUM as a series of Group By.
This can be useful to push down to Druid queries like 
{code}
 select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) FROM 
druid_test_table GROUP  BY `__time`, `zone` ;
{code}
In general this can be useful to be used in cases where storage handlers can 
not perform count (distinct column)

h1. How to do it.
Use the Calcite rule {code} 
org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
breaks down Count distinct to a single Group by with Grouping sets or multiple 
series of Group by that might be linked with Joins if multiple counts are 
present.
FYI today Hive does have a similar rule {code} 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
 but it only provides a rewrite to Grouping sets based plan.
I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
caveats to be aware of?

h2. Concerns/questions
Need to have a way to switch between Grouping sets or Simple chained group by 
based on the plan cost. For instance for Druid based scan it makes always sense 
(at least today) to push down a series of Group by and stitch result sets in 
Hive later (as oppose to scan everything). 
But this might be not true for other storage handler that can handle Grouping 
sets it is better to push down the Grouping sets as one table scan.
Am still unsure how i can lean on the cost optimizer to select the best plan, 
[~ashutoshc]/[~jcamachorodriguez] any inputs?






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19684) Hive stats optimizer wrongly uses stats against non native tables

2018-05-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19684:
-

 Summary: Hive stats optimizer wrongly uses stats against non 
native tables
 Key: HIVE-19684
 URL: https://issues.apache.org/jira/browse/HIVE-19684
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Stats of non native tables are inaccurate, thus queries over non native tables 
can not optimized by stats optimizer.
Take example of query 
{code}
Explain select count(*) from (select `__time` from druid_test_table limit 1) as 
src ;
{code} 

the plan will be reduced to 
{code}
POSTHOOK: query: explain extended select count(*) from (select `__time` from 
druid_test_table limit 1) as src
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
  Stage-0 is a root stage
STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: 1
  Processor Tree:
ListSink
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19672) Column Names mismatch between native Druid Tables and Hive External map

2018-05-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19672:
-

 Summary: Column Names mismatch between native Druid Tables and 
Hive External map
 Key: HIVE-19672
 URL: https://issues.apache.org/jira/browse/HIVE-19672
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra
 Fix For: 4.0.0


Druid Columns names are case sensitive while Hive is case insensitive.
This implies that any Druid Datasource that has columns with some upper cases 
as part of column name it will not return the expected results.
One possible fix is to try to remap the column names before issuing Json Query 
to Druid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19674) Group by Decimal Constants push down to Druid tables.

2018-05-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19674:
-

 Summary: Group by Decimal Constants push down to Druid tables.
 Key: HIVE-19674
 URL: https://issues.apache.org/jira/browse/HIVE-19674
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Queries like following gets generated by Tableau.
{code}
SELECT SUM(`ssb_druid_100`.`lo_revenue`) AS `sum_lo_revenue_ok`
 FROM `druid_ssb`.`ssb_druid_100` `ssb_druid_100`
GROUP BY 1.1001;
{code}

The Group key is pushed down to Druid as a Constant Column, this leads to an 
Exception while parsing back the results since Druid Input format does not 
allow Decimals.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19675) Cast to timestamps on Druid time column leads to an exception

2018-05-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19675:
-

 Summary: Cast to timestamps on Druid time column leads to an 
exception
 Key: HIVE-19675
 URL: https://issues.apache.org/jira/browse/HIVE-19675
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra
Assignee: Jesus Camacho Rodriguez


The following query fail due to a formatting issue.
{code}
SELECT CAST(`ssb_druid_100`.`__time` AS TIMESTAMP) AS `x_time`,
. . . . . . . . . . . . . . . .>   SUM(`ssb_druid_100`.`lo_revenue`) AS 
`sum_lo_revenue_ok`
. . . . . . . . . . . . . . . .> FROM `druid_ssb`.`ssb_druid_100` 
`ssb_druid_100`
. . . . . . . . . . . . . . . .> GROUP BY CAST(`ssb_druid_100`.`__time` AS 
TIMESTAMP);
{code} 
Exception
{code} 
Error: java.io.IOException: java.lang.NumberFormatException: For input string: 
"1991-12-31 19:00:00" (state=,code=0)
{code}
[~jcamachorodriguez] maybe this is fixed by your upcoming patches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19695) Year Month Day extraction functions need to add an implicit cast for column that are String types

2018-05-24 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19695:
-

 Summary: Year Month Day extraction functions need to add an 
implicit cast for column that are String types
 Key: HIVE-19695
 URL: https://issues.apache.org/jira/browse/HIVE-19695
 Project: Hive
  Issue Type: Bug
  Components: Druid integration, Query Planning
Affects Versions: 3.0.0
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.1.0


To avoid surprising/wrong results, Hive Query plan shall add an explicit cast 
over non date/timestamp column type when user try to extract Year/Month/Hour 
etc..
This is an example of misleading results.
{code}
create table test_base_table(`timecolumn` timestamp, `date_c` string, 
`timestamp_c` string,  `metric_c` double);
insert into test_base_table values ('2015-03-08 00:00:00', '2015-03-10', 
'2015-03-08 00:00:00', 5.0);
CREATE TABLE druid_test_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS select
cast(`timecolumn` as timestamp with local time zone) as `__time`, `date_c`, 
`timestamp_c`, `metric_c` FROM test_base_table;
select
year(date_c), month(date_c),day(date_c), hour(date_c),
year(timestamp_c), month(timestamp_c),day(timestamp_c), hour(timestamp_c)
from druid_test_table;
{code} 

will return the following wrong results:
{code}
PREHOOK: query: select
year(date_c), month(date_c),day(date_c), hour(date_c),
year(timestamp_c), month(timestamp_c),day(timestamp_c), hour(timestamp_c)
from druid_test_table
PREHOOK: type: QUERY
PREHOOK: Input: default@druid_test_table
 A masked pattern was here 
POSTHOOK: query: select
year(date_c), month(date_c),day(date_c), hour(date_c),
year(timestamp_c), month(timestamp_c),day(timestamp_c), hour(timestamp_c)
from druid_test_table
POSTHOOK: type: QUERY
POSTHOOK: Input: default@druid_test_table
 A masked pattern was here 
196912  31  16  196912  31  16 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19607) Pushing Aggregates on Top of Aggregates

2018-05-18 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19607:
-

 Summary: Pushing Aggregates on Top of Aggregates
 Key: HIVE-19607
 URL: https://issues.apache.org/jira/browse/HIVE-19607
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
 Fix For: 3.1.0


This plan shows an instance where the count aggregates can be pushed to Druid 
which will eliminate the last stage reducer.

{code}
+PREHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM 
druid_table
+PREHOOK: type: QUERY
+POSTHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM 
druid_table
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+Tez
+ A masked pattern was here 
+  Edges:
+Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+ A masked pattern was here 
+  Vertices:
+Map 1
+Map Operator Tree:
+TableScan
+  alias: druid_table
+  properties:
+druid.fieldNames cstring2,$f1
+druid.fieldTypes string,double
+druid.query.json 
{"queryType":"groupBy","dataSource":"default.druid_table","granularity":"all","dimensions":[{"type":"default","dimension":"cstring2","outputName":"cstring2","outputType":"STRING"}],"limitSpec":{"type":"default"},"aggregations":[{"type":"doubleSum","name":"$f1","fieldName":"cdouble"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"]}
+druid.query.type groupBy
+  Statistics: Num rows: 9173 Data size: 1673472 Basic stats: 
COMPLETE Column stats: NONE
+  Select Operator
+expressions: cstring2 (type: string), $f1 (type: double)
+outputColumnNames: cstring2, $f1
+Statistics: Num rows: 9173 Data size: 1673472 Basic stats: 
COMPLETE Column stats: NONE
+Group By Operator
+  aggregations: count(cstring2), sum($f1)
+  mode: hash
+  outputColumnNames: _col0, _col1
+  Statistics: Num rows: 1 Data size: 208 Basic stats: 
COMPLETE Column stats: NONE
+  Reduce Output Operator
+sort order:
+Statistics: Num rows: 1 Data size: 208 Basic stats: 
COMPLETE Column stats: NONE
+value expressions: _col0 (type: bigint), _col1 (type: 
double)
+Reducer 2
+Reduce Operator Tree:
+  Group By Operator
+aggregations: count(VALUE._col0), sum(VALUE._col1)
+mode: mergepartial
+outputColumnNames: _col0, _col1
+Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE 
Column stats: NONE
+File Output Operator
+  compressed: false
+  Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE 
Column stats: NONE
+  table:
+  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
+  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19601) Unsupported Post join function

2018-05-17 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19601:
-

 Summary: Unsupported Post join function
 Key: HIVE-19601
 URL: https://issues.apache.org/jira/browse/HIVE-19601
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra


h1. As part of trying to use the Calcite rule {code} 
org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule#JOIN {code}
i got the following Calcite plan 
{code}
2018-05-17T09:26:02,781 DEBUG [80d6d405-ed78-4f60-bd93-b3e08e424f73 main] 
translator.PlanModifierForASTConv: Final plan after modifier
 HiveProject(_c0=[$1], _c1=[$2])
  HiveProject(zone=[$0], $f1=[$1], $f2=[$3])
HiveJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner], 
algorithm=[none], cost=[not available])
  HiveProject(zone=[$0], $f1=[$1])
HiveAggregate(group=[{0}], agg#0=[count($1)])
  HiveProject(zone=[$0], interval_marker=[$1])
HiveAggregate(group=[{0, 1}])
  HiveProject(zone=[$3], interval_marker=[$1])
HiveTableScan(table=[[druid_test_dst.test_base_table]], 
table:alias=[test_base_table])
  HiveProject(zone=[$0], $f1=[$1])
HiveAggregate(group=[{0}], agg#0=[count($1)])
  HiveProject(zone=[$0], dim=[$1])
HiveAggregate(group=[{0, 1}])
  HiveProject(zone=[$3], dim=[$4])
HiveTableScan(table=[[druid_test_dst.test_base_table]], 
table:alias=[test_base_table])
{code}
I run into this issue
{code} 
2018-05-17T09:26:02,876 ERROR [80d6d405-ed78-4f60-bd93-b3e08e424f73 main] 
parse.CalcitePlanner: CBO failed, skipping CBO.
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid function 
'IS NOT DISTINCT FROM'
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1069)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1464)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19600) Hive and Calcite have different semantics for Grouping sets

2018-05-17 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19600:
-

 Summary: Hive and Calcite have different semantics for Grouping 
sets
 Key: HIVE-19600
 URL: https://issues.apache.org/jira/browse/HIVE-19600
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
 Fix For: 3.1.0


h1. Issue:
Tried to use the calcite rule {code} 
org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule#AggregateExpandDistinctAggregatesRule(java.lang.Class, boolean, 
org.apache.calcite.tools.RelBuilderFactory) {code} to replace current rule used 
by Hive {code} 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule#HiveExpandDistinctAggregatesRule
{code}
But i got an exception when generating the Operator tree out of calcite plan.
This is the Calcite plan 
{code} 
HiveProject.HIVE.[](input=rel#50:HiveAggregate.HIVE.[](input=rel#48:HiveProject.HIVE.[](input=rel#44:HiveAggregate.HIVE.[](input=rel#38:HiveProject.HIVE.[](input=rel#0:HiveTableScan.HIVE.[]
(table=[druid_test_dst.test_base_table],table:alias=test_base_table)[false],$f0=$3,$f1=$1,$f2=$4),group={0,
 1, 2},groups=[{0, 1}, {0, 2}],$g=GROUPING($0, $1, 
$2)),$f0=$0,$f1=$1,$f2=$2,$g_1==($3, 1),$g_2==($3, 
2)),group={0},agg#0=count($1) FILTER $3,agg#1=count($2) FILTER 
$4),_o__c0=$1,_o__c1=$2)
{code}

This is the exception stack 
{code} 
2018-05-17T08:46:48,604 ERROR [649a61b0-d8c7-45d8-962d-b1d38397feb4 main] 
ql.Driver: FAILED: SemanticException Line 0:-1 Argument type mismatch 'zone': 
The first argument to grouping() must be an int/long. Got: STRING
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Argument type 
mismatch 'zone': The first argument to grouping() must be an int/long. Got: 
STRING
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1467)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:239)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:185)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12566)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12521)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4525)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4298)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10487)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10426)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11339)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11196)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11223)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11209)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:517)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12074)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:164)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:643)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1686)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1633)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1628)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at 

[jira] [Created] (HIVE-19615) Proper handling of is null and not is null predicate when pushed to Druid

2018-05-18 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19615:
-

 Summary: Proper handling of is null and not is null predicate when 
pushed to Druid
 Key: HIVE-19615
 URL: https://issues.apache.org/jira/browse/HIVE-19615
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


Recent development in Druid introduced new semantic of null handling 
[here|https://github.com/b-slim/druid/commit/219e77aeac9b07dc20dd9ab2dd537f3f17498346]

Based on those changes when need to honer push down of expressions with is 
null/ is not null predicates.
The prosed fix overrides the mapping of Calcite Function to Druid Expression to 
much the correct semantic.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19680) Push down limit is not applied for Druid storage handler.

2018-05-23 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19680:
-

 Summary: Push down limit is not applied for Druid storage handler.
 Key: HIVE-19680
 URL: https://issues.apache.org/jira/browse/HIVE-19680
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


Query like 
{code}
select `__time` from druid_test_table limit 1;
{code}
returns more than one row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19869) Remove double formatting bug followup of HIVE-19382

2018-06-12 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19869:
-

 Summary: Remove double formatting bug followup of HIVE-19382
 Key: HIVE-19869
 URL: https://issues.apache.org/jira/browse/HIVE-19869
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


HIVE-19382 has a minor bug that happens when users provide custom format as 
part of FROM_UNIXTIMESTAMP function.
Here is an example query
{code}
SELECT SUM(`ssb_druid_100`.`lo_revenue`) AS `sum_lo_revenue_ok`,
CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(`ssb_druid_100`.`__time` AS TIMESTAMP)), 
'-MM-dd HH:00:00') AS TIMESTAMP) AS `thr___time_ok`
 FROM `druid_ssb`.`ssb_druid_100` `ssb_druid_100`
GROUP BY CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(`ssb_druid_100`.`__time` AS 
TIMESTAMP)), '-MM-dd HH:00:00') AS TIMESTAMP);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19868) Extract support for float aggregator

2018-06-12 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19868:
-

 Summary: Extract support for float aggregator
 Key: HIVE-19868
 URL: https://issues.apache.org/jira/browse/HIVE-19868
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19923) Follow up of HIVE-19615, use UnaryFunction instead of prefix

2018-06-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19923:
-

 Summary: Follow up of HIVE-19615, use UnaryFunction instead of 
prefix
 Key: HIVE-19923
 URL: https://issues.apache.org/jira/browse/HIVE-19923
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra


Correct usage of Druid isnull function is {code} isnull(exp){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19879) Remove unused calcite sql operator.

2018-06-13 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19879:
-

 Summary: Remove unused calcite sql operator.
 Key: HIVE-19879
 URL: https://issues.apache.org/jira/browse/HIVE-19879
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


HIVE-19796 introduced by mistake an unused sql operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19721) Druid Storage handler throws exception when query has a Cast to Date

2018-05-26 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19721:
-

 Summary: Druid Storage handler throws exception when query has a 
Cast to Date 
 Key: HIVE-19721
 URL: https://issues.apache.org/jira/browse/HIVE-19721
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.1


{code}
SELECT CAST(`ssb_druid_100`.`__time` AS DATE) AS `x_time`,
SUM(`ssb_druid_100`.`metric_c`) AS `sum_lo_revenue_ok`
FROM `default`.`druid_test_table` `ssb_druid_100`
GROUP BY CAST(`ssb_druid_100`.`__time` AS DATE);
{code}

{code}
2018-05-26T06:54:56,570 DEBUG [HttpClient-Netty-Worker-5] 
client.NettyHttpClient: [POST http://localhost:8082/druid/v2/] Got chunk: 0B, 
last=true
2018-05-26T06:54:56,572 ERROR [1917f624-7b94-4990-9e3a-bbfff3656365 main] 
CliDriver: Failed with exception 
java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Unknown type: 
DATE
java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Unknown 
type: DATE
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:602)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:509)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2509)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1514)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1488)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:177)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
at 
org.apache.hadoop.hive.cli.TestMiniDruidLocalCliDriver.testCliDriver(TestMiniDruidLocalCliDriver.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runners.Suite.runChild(Suite.java:127)
at org.junit.runners.Suite.runChild(Suite.java:26)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:73)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 

[jira] [Created] (HIVE-19796) Push Down TRUNC Fn to Druid Storage Handler

2018-06-05 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19796:
-

 Summary: Push Down TRUNC Fn to Druid Storage Handler
 Key: HIVE-19796
 URL: https://issues.apache.org/jira/browse/HIVE-19796
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


Push down Queries with TRUNC date function such as 
{code}
SELECT SUM((`ssb_druid_100`.`discounted_price` * 
`ssb_druid_100`.`net_revenue`)) AS `sum_calculation_4998925219892510720_ok`,
  CAST(TRUNC(CAST(`ssb_druid_100`.`__time` AS TIMESTAMP),'MM') AS DATE) AS 
`tmn___time_ok`
FROM `druid_ssb`.`ssb_druid_100` `ssb_druid_100`
GROUP BY CAST(TRUNC(CAST(`ssb_druid_100`.`__time` AS TIMESTAMP),'MM') AS DATE)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18573) Use proper Calcite operator instead of UDFs

2018-01-29 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18573:
-

 Summary: Use proper Calcite operator instead of UDFs
 Key: HIVE-18573
 URL: https://issues.apache.org/jira/browse/HIVE-18573
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: slim bouguerra


Currently, Hive is mostly using user-defined black box sql operators during 
Query planning. It will be more beneficial to use proper calcite operators.

Also, Use a single name for Extract operator instead of a different name for 
every Unit,  

Same for Floor function. This will allow unifying the treatment per operator.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18595) UNIX_TIMESTAMP UDF fails when type is Timestamp with local timezone

2018-01-31 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18595:
-

 Summary: UNIX_TIMESTAMP  UDF fails when type is Timestamp with 
local timezone
 Key: HIVE-18595
 URL: https://issues.apache.org/jira/browse/HIVE-18595
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


{code}

2018-01-31T12:59:45,464 ERROR [10e97c86-7f90-406b-a8fa-38be5d3529cc main] 
ql.Driver: FAILED: SemanticException [Error 10014]: Line 3:456 Wrong arguments 
''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only 
string/date/timestamp types
org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:456 Wrong arguments 
''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only 
string/date/timestamp types
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1394)
 at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:235)
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:181)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11847)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11780)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genGBLogicalPlan(CalcitePlanner.java:3140)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4330)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1407)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1354)
 at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
 at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
 at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
 at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1159)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1175)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:422)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11393)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:304)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
 at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:163)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:639)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1504)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1632)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1382)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:240)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:343)
 at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1331)
 at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1305)
 at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173)
 at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
 at 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 

[jira] [Created] (HIVE-18594) DATEDIFF UDF fails when type is timestamp with Local timezone.

2018-01-31 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18594:
-

 Summary: DATEDIFF UDF fails when type is timestamp with Local 
timezone.
 Key: HIVE-18594
 URL: https://issues.apache.org/jira/browse/HIVE-18594
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: slim bouguerra


{code}

2018-01-31T12:45:08,488 ERROR [9b5c5020-b1f5-4703-8c2e-bac4aa01a578 main] 
ql.Driver: FAILED: SemanticException [Error 10014]: Line 3:88 Wrong arguments 
''2004-07-04'': DATEDIFF() o
nly takes STRING/TIMESTAMP/DATEWRITABLE types as 1-th argument, got 
TIMESTAMPLOCALTZ
org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:88 Wrong arguments 
''2004-07-04'': DATEDIFF() only takes STRING/TIMESTAMP/DATEWRITABLE types as 
1-th argument, got TIMESTA
MPLOCALTZ
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1394)
 at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:235)
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:181)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11847)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11802)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSelectLogicalPlan(CalcitePlanner.java:4005)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4336)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1407)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1354)
 at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
 at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
 at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
 at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1159)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1175)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:422)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11393)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:304)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
 at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:163)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:639)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1504)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1632)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1382)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:240)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:343)
 at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1331)
 at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1305)
 at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173)
 at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
 at 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at 

[jira] [Created] (HIVE-18730) Use LLAP as execution engine for Druid mini Cluster Tests

2018-02-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18730:
-

 Summary: Use LLAP as execution engine for Druid mini Cluster Tests
 Key: HIVE-18730
 URL: https://issues.apache.org/jira/browse/HIVE-18730
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


Currently, we are using local MR to run Mini Cluster tests. It will be better 
to use LLAP cluster or TEZ. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18731) Add Documentations about this feature.

2018-02-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18731:
-

 Summary: Add Documentations about this feature. 
 Key: HIVE-18731
 URL: https://issues.apache.org/jira/browse/HIVE-18731
 Project: Hive
  Issue Type: Sub-task
  Components: Druid integration
Reporter: slim bouguerra


need to add basic docs about new table properties and what it means in 
practice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18732) Push order/limit to Druid historical when approximate results are allowed

2018-02-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18732:
-

 Summary: Push order/limit to Druid historical when approximate 
results are allowed 
 Key: HIVE-18732
 URL: https://issues.apache.org/jira/browse/HIVE-18732
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra


Druid 0.11 allow force push down of Order by Limit to historicals using a 
context Query Flag \{code} forcePushDownLimit\{code}. 
[http://druid.io/docs/latest/querying/groupbyquery.html]

As per the docs 
[http://druid.io/docs/latest/querying/groupbyquery.html|http://druid.io/docs/latest/querying/groupbyquery.html.],
 this is a great optimization that can be used if the approximate results are 
allowed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18729) Druid Time column type

2018-02-16 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18729:
-

 Summary: Druid Time column type
 Key: HIVE-18729
 URL: https://issues.apache.org/jira/browse/HIVE-18729
 Project: Hive
  Issue Type: Task
  Components: Druid integration
Reporter: slim bouguerra
Assignee: Jesus Camacho Rodriguez


I have talked Offline with [~jcamachorodriguez] about this and agreed that the 
best way to go is to support both cases where Druid time column can be 
Timestamp or Timestamp with local time zone. 

In fact, for the Hive-Druid internal table, this makes perfect sense since we 
have Hive metadata about the time column during the CTAS statement then we can 
handle both cases as we do for another type of storage eg ORC.

For the Druid external tables, we can have a default type and allow the user to 
override that via table properties. 

CC [~ashutoshc] and [~nishantbangarwa]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18780) Improve schema discovery For Druid Storage Handler

2018-02-22 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18780:
-

 Summary: Improve schema discovery For Druid Storage Handler
 Key: HIVE-18780
 URL: https://issues.apache.org/jira/browse/HIVE-18780
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently, Druid Storage adapter issues a Segment metadata Query every time the 
query is of type Select or Scan. Not only that but then every input split (map) 
will do the same as well since it is using the same Serde, this is very 
expensive and put a lot of pressure on the Druid Cluster. The way to fix this 
is to add the schema out of the calcite plan instead of serializing the query 
itself as part of the Hive query context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18331) Renew the Kerberos ticket used by Druid Query runner

2017-12-22 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18331:
-

 Summary: Renew the Kerberos ticket used by Druid Query runner
 Key: HIVE-18331
 URL: https://issues.apache.org/jira/browse/HIVE-18331
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


Druid Http Client has to renew the current user Kerberos ticket when it is 
close to expire.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-20375) Json SerDe ignoring the timestamp.formats property

2018-08-13 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20375:
-

 Summary: Json SerDe ignoring the timestamp.formats property
 Key: HIVE-20375
 URL: https://issues.apache.org/jira/browse/HIVE-20375
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: slim bouguerra


JsonSerd is supposed to accept "timestamp.formats" SerDe property to allow 
different timestamp formats, after recent refactor I see that this is not 
working anymore.

Looking at the code I can see that The serde is not using the constructed 
parser with added format 
https://github.com/apache/hive/blob/1105ef3974d8a324637d3d35881a739af3aeb382/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonStructReader.java#L82

But instead it is using Converter
https://github.com/apache/hive/blob/1105ef3974d8a324637d3d35881a739af3aeb382/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonStructReader.java#L324

Then converter is using 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter.TimestampConverter

This converter does not have any knowledge about user formats or what so ever...
It is using this static converter 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils#getTimestampFromString



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20376) Timestamp Timezone parser dosen't handler ISO formats "2013-08-31T01:02:33Z"

2018-08-13 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20376:
-

 Summary: Timestamp Timezone parser dosen't handler ISO formats 
"2013-08-31T01:02:33Z"
 Key: HIVE-20376
 URL: https://issues.apache.org/jira/browse/HIVE-20376
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


It will be nice to add ISO formats to timezone utils parser to handler the 
following  "2013-08-31T01:02:33Z"
org.apache.hadoop.hive.common.type.TimestampTZUtil#parse(java.lang.String)
CC [~jcamachorodriguez]/ [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20377) Hive Kafka Storage Handler

2018-08-13 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20377:
-

 Summary: Hive Kafka Storage Handler
 Key: HIVE-20377
 URL: https://issues.apache.org/jira/browse/HIVE-20377
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: slim bouguerra
Assignee: slim bouguerra


h1. Goal
* Read streaming data form Kafka queue as an external table.
* Allow streaming navigation by pushing down filters on Kafka record partition 
id, offset and timestamp. 
* Insert streaming data form Kafka to an actual Hive internal table, using CTAS 
statement.
h1. Example
h2. Create the external table
{code} 
CREATE EXTERNAL TABLE kafka_table (`timestamp` timestamps, page string, `user` 
string, language string, added int, deleted int, flags string,comment string, 
namespace string)
STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
TBLPROPERTIES 
("kafka.topic" = "wikipedia", 
"kafka.bootstrap.servers"="brokeraddress:9092",
"kafka.serde.class"="org.apache.hadoop.hive.serde2.JsonSerDe");
{code}
h2. Kafka Metadata
In order to keep track of Kafka records the storage handler will add 
automatically the Kafka row metadata eg partition id, record offset and record 
timestamp. 
{code}
DESCRIBE EXTENDED kafka_table

timestamp   timestamp   from deserializer   
pagestring  from deserializer   
userstring  from deserializer   
languagestring  from deserializer   
country string  from deserializer   
continent   string  from deserializer   
namespace   string  from deserializer   
newpage boolean from deserializer   
unpatrolled boolean from deserializer   
anonymous   boolean from deserializer   
robot   boolean from deserializer   
added   int from deserializer   
deleted int from deserializer   
delta   bigint  from deserializer   
__partition int from deserializer   
__offsetbigint  from deserializer   
__timestamp bigint  from deserializer   

{code}

h2. Filter push down.
Newer Kafka consumers 0.11.0 and higher allow seeking on the stream based on a 
given offset. The proposed storage handler will be able to leverage such API by 
pushing down filters over metadata columns, namely __partition (int), 
__offset(long) and __timestamp(long)
For instance Query like
{code} 
select `__offset` from kafka_table where (`__offset` < 10 and `__offset`>3 and 
`__partition` = 0) or (`__partition` = 0 and `__offset` < 105 and `__offset` > 
99) or (`__offset` = 109);
{code}
Will result on a scan of partition 0 only then read only records between offset 
4 and 109. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20426) Upload Druid Test Runner logs from Build Slaves

2018-08-20 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20426:
-

 Summary: Upload Druid Test Runner logs from Build Slaves
 Key: HIVE-20426
 URL: https://issues.apache.org/jira/browse/HIVE-20426
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: Vineet Garg


Currently only hive log is uploaded from "hive/itests/qtest/tmp/log/"
It will be very valuable if we can add the following Druid logs
* coordinator.log
* broker.log
* historical.log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20427) Remove Druid Mock tests from CliDrive

2018-08-20 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20427:
-

 Summary: Remove Druid Mock tests from CliDrive 
 Key: HIVE-20427
 URL: https://issues.apache.org/jira/browse/HIVE-20427
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: slim bouguerra


as per comment 
https://issues.apache.org/jira/browse/HIVE-20425?focusedCommentId=16586272=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16586272
We do not need to run those Mock Druid tests anymore, since 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver cover most of this cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20425) Use a custom range of port for embedded Derby used by Druid.

2018-08-20 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20425:
-

 Summary: Use a custom range of port for embedded Derby used by 
Druid.
 Key: HIVE-20425
 URL: https://issues.apache.org/jira/browse/HIVE-20425
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


Seems like good amount of the flakiness of Druid Tests is due to port collision 
between Derby used by Hive and the one used by Druid. 
The goal of this Patch is to use a custom range 60_000 to 65535 and find the 
first available to be used by Druid Derby process.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20481) Add the Kafka Key record as part of the row.

2018-08-28 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20481:
-

 Summary: Add the Kafka Key record as part of the row.
 Key: HIVE-20481
 URL: https://issues.apache.org/jira/browse/HIVE-20481
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
Assignee: slim bouguerra


Kafka records are keyed, most of the case this key is null or used to route 
records to the same partition. This patch adds this column as a binary column 
{code} __record_key{code}.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20485) Test Storage Handler with Secured Kafka Cluster

2018-08-29 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20485:
-

 Summary: Test Storage Handler with Secured Kafka Cluster
 Key: HIVE-20485
 URL: https://issues.apache.org/jira/browse/HIVE-20485
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
Assignee: slim bouguerra


Need to test this with Secured Kafka Cluster.
* Kerberos
* SSL support



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20094) Update Druid to 0.12.1 version

2018-07-05 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20094:
-

 Summary: Update Druid to 0.12.1 version
 Key: HIVE-20094
 URL: https://issues.apache.org/jira/browse/HIVE-20094
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


As per Jira title.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18959) Avoid creating extra pool of threads within LLAP

2018-03-14 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18959:
-

 Summary: Avoid creating extra pool of threads within LLAP
 Key: HIVE-18959
 URL: https://issues.apache.org/jira/browse/HIVE-18959
 Project: Hive
  Issue Type: Task
  Components: Druid integration
 Environment: Kerberos Cluster
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


The current Druid-Kerberos-Http client is using an external single threaded 
pool to handle retry auth calls (eg when a cookie expire or other transient 
auth issues). 

First, this is not buying us anything since all the Druid Task is executed as 
one synchronous task.

Second, this can cause a major issue if an exception occurs that leads to 
shutting down the LLAP main thread.

 Thus to fix this we should avoid using an external thread pool and handle 
retrying in a synchronous way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19155) Day time saving cause Druid inserts to fail with org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping segments

2018-04-10 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19155:
-

 Summary: Day time saving cause Druid inserts to fail with 
org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping 
segments
 Key: HIVE-19155
 URL: https://issues.apache.org/jira/browse/HIVE-19155
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


If you try to insert data around the daylight saving time hour the query fails 
with following exception
{code}
2018-04-10T11:24:58,836 ERROR [065fdaa2-85f9-4e49-adaf-3dc14d51be90 main] 
exec.DDLTask: Failed
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hive.druid.io.druid.java.util.common.UOE: Cannot add overlapping 
segments [2015-03-08T05:00:00.000Z/2015-03-09T05:00:00.000Z and 
2015-03-09T04:00:00.000Z/2015-03-10T04:00:00.000Z] with the same version 
[2018-04-10T11:24:48.388-07:00]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:914) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:919) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4831) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:394) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2443) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2114) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1797) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1538) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1532) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:204) 
[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) 
[hive-cli-3.1.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) 
[hive-cli-3.1.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) 
[hive-cli-3.1.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) 
[hive-cli-3.1.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1455) 
[hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1429) 
[hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:177)
 [hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) 
[hive-it-util-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59)
 [test-classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_92]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_92]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_92]
{code}

You can reproduce this using the following DDL 
{code}
create database druid_test;
use druid_test;

create table test_table(`timecolumn` timestamp, `userid` string, `num_l` float);

insert into test_table values ('2015-03-08 00:00:00', 'i1-start', 4);
insert into test_table values ('2015-03-08 23:59:59', 'i1-end', 1);

insert into test_table values ('2015-03-09 00:00:00', 'i2-start', 4);
insert into test_table values ('2015-03-09 23:59:59', 'i2-end', 1);

insert into test_table values ('2015-03-10 00:00:00', 'i3-start', 2);
insert into test_table values ('2015-03-10 23:59:59', 'i3-end', 2);

CREATE TABLE druid_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
`userid`, `num_l` FROM test_table;
{code}

The fix is to always adjust the Druid segments identifiers to UTC.





[jira] [Created] (HIVE-19157) Assert that Insert into Druid Table it fails.

2018-04-10 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19157:
-

 Summary: Assert that Insert into Druid Table it fails. 
 Key: HIVE-19157
 URL: https://issues.apache.org/jira/browse/HIVE-19157
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


The usual work flow of loading Data into Druid relies on the fact that HS2 is 
able to load Segments metadata from HDFS that are produced by LLAP/TEZ works.
In some cases where HS2 is not able to perform `ls` on the HDFS path the insert 
into query will return success and will not insert any data.
This bug was introduced at function {code} 
org.apache.hadoop.hive.druid.DruidStorageHandlerUtils#getCreatedSegments{code} 
when we added feature to allow create empty tables.
{code}
 try {
  fss = fs.listStatus(taskDir);
} catch (FileNotFoundException e) {
  // This is a CREATE TABLE statement or query executed for CTAS/INSERT
  // did not produce any result. We do not need to do anything, this is
  // expected behavior.
  return publishedSegmentsBuilder.build();
}
{code}

Am still looking for the way to fix this, [~jcamachorodriguez]/[~ashutoshc] any 
idea what is the best way to detect that it is an empty create table statement? 

 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19187) Update Druid Storage Handler to Druid 0.12.0

2018-04-11 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19187:
-

 Summary: Update Druid Storage Handler to Druid 0.12.0
 Key: HIVE-19187
 URL: https://issues.apache.org/jira/browse/HIVE-19187
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.1.0


Current used Druid Version is 0.11.0
This Patch updates the Druid version to the most recent version 0.12.0




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19239) Check for possible null timestamp fields during SerDe from Druid events

2018-04-18 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19239:
-

 Summary: Check for possible null timestamp fields during SerDe 
from Druid events
 Key: HIVE-19239
 URL: https://issues.apache.org/jira/browse/HIVE-19239
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently we do not check for possible null timestamp events.

This might lead to NPE.

This Patch add addition check for such case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19298) Fix operator tree of CTAS for Druid Storage Handler

2018-04-25 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19298:
-

 Summary: Fix operator tree of CTAS for Druid Storage Handler
 Key: HIVE-19298
 URL: https://issues.apache.org/jira/browse/HIVE-19298
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.1.0


Current operator plan of CTAS for Druid storage handler is broken when used 
enables the property \{code} hive.exec.parallel\{code} as \{code} true\{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19044) Duplicate field names within Druid Query Generated by Calcite plan

2018-03-25 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19044:
-

 Summary: Duplicate field names within Druid Query Generated by 
Calcite plan
 Key: HIVE-19044
 URL: https://issues.apache.org/jira/browse/HIVE-19044
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


This is the Query plan as you can see "$f4" is duplicated.
{code}
PREHOOK: query: EXPLAIN SELECT Calcs.key AS none_key_nk,   SUM(Calcs.num0) AS 
temp_z_stdevp_num0___1723718801__0_,   COUNT(Calcs.num0) AS 
temp_z_stdevp_num0___2730138885__0_,   SUM((Calcs.num0 * Calcs.num0)) AS 
temp_z_stdevp_num0___4071133194__0_,   STDDEV_POP(Calcs.num0) AS stp_num0_ok 
FROM druid_tableau.calcs Calcs GROUP BY Calcs.key
PREHOOK: type: QUERY
POSTHOOK: query: EXPLAIN SELECT Calcs.key AS none_key_nk,   SUM(Calcs.num0) AS 
temp_z_stdevp_num0___1723718801__0_,   COUNT(Calcs.num0) AS 
temp_z_stdevp_num0___2730138885__0_,   SUM((Calcs.num0 * Calcs.num0)) AS 
temp_z_stdevp_num0___4071133194__0_,   STDDEV_POP(Calcs.num0) AS stp_num0_ok 
FROM druid_tableau.calcs Calcs GROUP BY Calcs.key
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: calcs
  properties:
druid.fieldNames key,$f1,$f2,$f3,$f4
druid.fieldTypes string,double,bigint,double,double
druid.query.json 
{"queryType":"groupBy","dataSource":"druid_tableau.calcs","granularity":"all","dimensions":[{"type":"default","dimension":"key","outputName":"key","outputType":"STRING"}],"limitSpec":{"type":"default"},"aggregations":[{"type":"doubleSum","name":"$f1","fieldName":"num0"},{"type":"filtered","filter":{"type":"not","field":{"type":"selector","dimension":"num0","value":null}},"aggregator":{"type":"count","name":"$f2","fieldName":"num0"}},{"type":"doubleSum","name":"$f3","expression":"(\"num0\"
 * \"num0\")"},{"type":"doubleSum","name":"$f4","expression":"(\"num0\" * 
\"num0\")"}],"postAggregations":[{"type":"expression","name":"$f4","expression":"pow(((\"$f4\"
 - ((\"$f1\" * \"$f1\") / \"$f2\")) / 
\"$f2\"),0.5)"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"]}
druid.query.type groupBy
  Select Operator
expressions: key (type: string), $f1 (type: double), $f2 (type: 
bigint), $f3 (type: double), $f4 (type: double)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
ListSink
{code}
Table DDL 
{code}
create database druid_tableau;
use druid_tableau;
drop table if exists calcs;
create table calcs
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES (
  "druid.segment.granularity" = "MONTH",
  "druid.query.granularity" = "DAY")
AS SELECT
  cast(datetime0 as timestamp with local time zone) `__time`,
  key,
  str0, str1, str2, str3,
  date0, date1, date2, date3,
  time0, time1,
  datetime1,
  zzz,
  cast(bool0 as string) bool0,
  cast(bool1 as string) bool1,
  cast(bool2 as string) bool2,
  cast(bool3 as string) bool3,
  int0, int1, int2, int3,
  num0, num1, num2, num3, num4
from default.calcs_orc;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19070) Add More Test To Druid Mini Cluster 200 Tableau kind queries.

2018-03-28 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19070:
-

 Summary: Add More Test To Druid Mini Cluster 200 Tableau kind 
queries.
 Key: HIVE-19070
 URL: https://issues.apache.org/jira/browse/HIVE-19070
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


In This patch am adding 200 new tableau query that runs over a new Data set 
called calcs.
The data set is very small.
I also have consolidated 3 different tests to run as one test this will help 
with keeping execution time low.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19023) Druid storage Handler still using old select query when the CBO fails

2018-03-22 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19023:
-

 Summary: Druid storage Handler still using old select query when 
the CBO fails
 Key: HIVE-19023
 URL: https://issues.apache.org/jira/browse/HIVE-19023
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: slim bouguerra
Assignee: slim bouguerra


See usage of function {code} 
org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat#createSelectStarQuery{code}
this can be replaced by scan query that is more efficent.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18993) Use Druid Expressions

2018-03-19 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18993:
-

 Summary: Use Druid Expressions
 Key: HIVE-18993
 URL: https://issues.apache.org/jira/browse/HIVE-18993
 Project: Hive
  Issue Type: Task
Reporter: slim bouguerra






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19011) Druid Storage Handler returns conflicting results for Qtest druidmini_dynamic_partition.q

2018-03-21 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19011:
-

 Summary: Druid Storage Handler returns conflicting results for 
Qtest druidmini_dynamic_partition.q
 Key: HIVE-19011
 URL: https://issues.apache.org/jira/browse/HIVE-19011
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


This git diff shows the conflicting results
{code}
diff --git 
a/ql/src/test/results/clientpositive/druid/druidmini_dynamic_partition.q.out 
b/ql/src/test/results/clientpositive/druid/druidmini_dynamic_partition.q.out
index 714778ebfc..cea9b7535c 100644
--- a/ql/src/test/results/clientpositive/druid/druidmini_dynamic_partition.q.out
+++ b/ql/src/test/results/clientpositive/druid/druidmini_dynamic_partition.q.out
@@ -243,7 +243,7 @@ POSTHOOK: query: SELECT  sum(cint), max(cbigint),  
sum(cbigint), max(cint) FROM
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@druid_partitioned_table
 POSTHOOK: Output: hdfs://### HDFS PATH ###
-1408069801800  4139540644  10992545287 165393120
+1408069801800  3272553822  10992545287 -648527473
 PREHOOK: query: SELECT  sum(cint), max(cbigint),  sum(cbigint), max(cint) FROM 
druid_partitioned_table_0
 PREHOOK: type: QUERY
 PREHOOK: Input: default@druid_partitioned_table_0
@@ -429,7 +429,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint),  
sum(cbigint), max(cint) FROM d
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@druid_partitioned_table
 POSTHOOK: Output: hdfs://### HDFS PATH ###
-2857395071862  4139540644  -1661313883124  885815256
+2857395071862  3728054572  -1661313883124  71894663
 PREHOOK: query: EXPLAIN INSERT OVERWRITE TABLE druid_partitioned_table
   SELECT cast (`ctimestamp1` as timestamp with local time zone) as `__time`,
 cstring1,
@@ -566,7 +566,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint),  
sum(cbigint), max(cint) FROM d
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@druid_partitioned_table
 POSTHOOK: Output: hdfs://### HDFS PATH ###
-1408069801800  7115092987  10992545287 1232243564
+1408069801800  4584782821  10992545287 -1808876374
 PREHOOK: query: SELECT  sum(cint), max(cbigint),  sum(cbigint), max(cint) FROM 
druid_partitioned_table_0
 PREHOOK: type: QUERY
 PREHOOK: Input: default@druid_partitioned_table_0
@@ -659,7 +659,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint),  
sum(cbigint), max(cint) FROM d
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@druid_partitioned_table
 POSTHOOK: Output: hdfs://### HDFS PATH ###
-1408069801800  7115092987  10992545287 1232243564
+1408069801800  4584782821  10992545287 -1808876374
 PREHOOK: query: EXPLAIN SELECT  sum(cint), max(cbigint),  sum(cbigint), 
max(cint)  FROM druid_max_size_partition
 PREHOOK: type: QUERY
 POSTHOOK: query: EXPLAIN SELECT  sum(cint), max(cbigint),  sum(cbigint), 
max(cint)  FROM druid_max_size_partition
@@ -758,7 +758,7 @@ POSTHOOK: query: SELECT sum(cint), max(cbigint),  
sum(cbigint), max(cint) FROM d
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@druid_partitioned_table
 POSTHOOK: Output: hdfs://### HDFS PATH ###
-1408069801800  7115092987  10992545287 1232243564
+1408069801800  4584782821  10992545287 -1808876374
 PREHOOK: query: DROP TABLE druid_partitioned_table_0
 PREHOOK: type: DROPTABLE
 PREHOOK: Input: default@druid_partitioned_table_0
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18996) SubString Druid convertor assuming that index is always constant literal value

2018-03-19 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-18996:
-

 Summary: SubString Druid convertor assuming that index is always 
constant literal value
 Key: HIVE-18996
 URL: https://issues.apache.org/jira/browse/HIVE-18996
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra


Query like the following 
{code}
SELECT substring(namespace, CAST(deleted AS INT), 4)
FROM druid_table_1;
{code}
will fail with 
{code}
java.lang.AssertionError: not a literal: $13
at org.apache.calcite.rex.RexLiteral.findValue(RexLiteral.java:963)
at org.apache.calcite.rex.RexLiteral.findValue(RexLiteral.java:955)
at org.apache.calcite.rex.RexLiteral.intValue(RexLiteral.java:938)
at 
org.apache.calcite.adapter.druid.SubstringOperatorConversion.toDruidExpression(SubstringOperatorConversion.java:46)
at 
org.apache.calcite.adapter.druid.DruidExpressions.toDruidExpression(DruidExpressions.java:120)
at 
org.apache.calcite.adapter.druid.DruidQuery.computeProjectAsScan(DruidQuery.java:746)
at 
org.apache.calcite.adapter.druid.DruidRules$DruidProjectRule.onMatch(DruidRules.java:308)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
{code}

because is assuming that index is always a constant literal. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20735) Address some of the review comments.

2018-10-12 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20735:
-

 Summary: Address some of the review comments.
 Key: HIVE-20735
 URL: https://issues.apache.org/jira/browse/HIVE-20735
 Project: Hive
  Issue Type: Sub-task
  Components: kafka integration
Reporter: slim bouguerra
Assignee: slim bouguerra


As part of the review comments we agreed to:
# remove start and end offsets columns
# remove the best effort mode
# make the 2pc as default protocol for EOS

Also this patch will include an additional enhancement to add kerberos support.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >