[jira] [Commented] (HIVE-15442) Driver.java has a redundancy code

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753737#comment-15753737
 ] 

Hive QA commented on HIVE-15442:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843543/HIVE-15442.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10818 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=216)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2605/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2605/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2605/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843543 - PreCommit-HIVE-Build

> Driver.java has a redundancy  code
> --
>
> Key: HIVE-15442
> URL: https://issues.apache.org/jira/browse/HIVE-15442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Gold huang
>Assignee: Gold huang
>Priority: Minor
> Attachments: HIVE-15442.patch
>
>
> Driver.java has a  redundancy  code,i think the third if statement could be 
> remved.
> if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> String explainOutput = getExplainOutput(sem, plan, tree.dump());
> if (explainOutput != null) {
>   if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> LOG.info("EXPLAIN output for queryid " + queryId + " : " + 
> explainOutput);
>   }
>   if (conf.isWebUiQueryInfoCacheEnabled()) {
> queryDisplay.setExplainPlan(explainOutput);
>   }
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15192) Use Calcite to de-correlate and plan subqueries

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753663#comment-15753663
 ] 

Hive QA commented on HIVE-15192:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843522/HIVE-15192.10.patch

{color:green}SUCCESS:{color} +1 due to 10 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10819 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] 
(batchId=92)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=92)
org.apache.hive.hcatalog.api.TestHCatClientNotification.createTable 
(batchId=220)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2604/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2604/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2604/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843522 - PreCommit-HIVE-Build

> Use Calcite to de-correlate and plan subqueries
> ---
>
> Key: HIVE-15192
> URL: https://issues.apache.org/jira/browse/HIVE-15192
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15192.10.patch, HIVE-15192.2.patch, 
> HIVE-15192.3.patch, HIVE-15192.4.patch, HIVE-15192.5.patch, 
> HIVE-15192.6.patch, HIVE-15192.7.patch, HIVE-15192.8.patch, 
> HIVE-15192.9.patch, HIVE-15192.patch
>
>
> HIVE currently tranform subqueries into SEMI-JOIN or LEFT OUTER JOIN. This 
> transformation occurs on query AST before generating logical plan. These 
> transformations are described at [Link to original spec | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf]. 
> Such transformations aren't able to handle a lot of subqueries, as a result 
> HIVE imposes various restrictions on the type of queries it could handle e.g. 
> Hive disallows nested subqueries. All current restrictions are detailed in 
> above linked document.
> This patch is 1st phase of getting rid of these transformations and leverage 
> Calcite's functionality to plan such queries. 
> Next phases will be lifting restrictions one by one. 
> Note that this patch already lifts one restriction *Restriction.6.m* (The LHS 
> in a SubQuery must have all its Column References be qualified)
> Known issues with this patch are:
>  * Return path tests fails for various reasons and are currently disabled. We 
> plan to fix and re-enable this later.
>   * Semi-join optimization (HIVE-15227) is disabled by default as it doesn't 
> work with this patch. We plan to fix this and re-enable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15441) Provide a config to timeout long compiling queries

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753561#comment-15753561
 ] 

Hive QA commented on HIVE-15441:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843512/HIVE-15441.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10773 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=112)

[vectorization_16.q,load_dyn_part5.q,join_casesensitive.q,transform_ppr2.q,join23.q,groupby7_map_skew.q,ppd_outer_join5.q,create_merge_compressed.q,louter_join_ppr.q,sample9.q,smb_mapjoin_16.q,vectorization_not.q,having.q,ppd_outer_join1.q,union_remove_12.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[schemeAuthority]
 (batchId=160)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2603/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2603/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2603/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843512 - PreCommit-HIVE-Build

> Provide a config to timeout long compiling queries
> --
>
> Key: HIVE-15441
> URL: https://issues.apache.org/jira/browse/HIVE-15441
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15441.1.patch
>
>
> Sometimes Hive users have long compiling queries which may need to scan 
> thousands or even more partitions (perhaps by accident). The compilation 
> process may take a very long time, especially in {{getInputSummary}} where it 
> need to make NN calls to get info about each input path.
> This is bad because it may block many other queries. Parallel compilation may 
> be useful but still {{getInputSummary}} has a global lock. In this case, it 
> makes sense to provide Hive admin with a config to put a timeout limit for 
> compilation, so that these "bad" queries can be blocked.
> Note https://issues.apache.org/jira/browse/HIVE-12431 also tries to address 
> similar issue. However it cancels those queries that are waiting for the 
> compile lock, which I think is not so useful for our case since the *query 
> under compile is the one to be blamed.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15339) Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator

2016-12-15 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753530#comment-15753530
 ] 

Rajesh Balamohan commented on HIVE-15339:
-

Thanks [~jcamachorodriguez]. I agree that it would be better to have this in 
HiveRelFieldTrimmer. Will revise the patch and share it.

> Batch metastore calls to get column stats for fields needed in 
> FilterSelectivityEstimator
> -
>
> Key: HIVE-15339
> URL: https://issues.apache.org/jira/browse/HIVE-15339
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15339.1.patch, HIVE-15339.3.patch
>
>
> Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics 
> from metastore in multiple calls. For instance, in the following query, it 
> ends up getting individual column statistics for for flights multiple number 
> of times.
> When the table has large number of partitions, getting statistics for columns 
> via multiple calls can be very expensive. This would adversely impact the 
> overall compilation time. The following query took 14 seconds to compile.
> {noformat}
> SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`,
> YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok`
> FROM `flights` as `flights`
> JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`)
> JOIN `airports` as `source_airport` ON (`flights`.`origin` = 
> `source_airport`.`iata`)
> JOIN `airports` as `dest_airport` ON (`flights`.`dest` = 
> `dest_airport`.`iata`)
> GROUP BY YEAR(`flights`.`dateofflight`);
> {noformat}
> It may be helpful to club all columns that need statistics and fetch these 
> details in single remote call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15339) Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator

2016-12-15 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753530#comment-15753530
 ] 

Rajesh Balamohan edited comment on HIVE-15339 at 12/16/16 5:38 AM:
---

Thanks [~jcamachorodriguez]. I agree that it would be better to have this in 
HiveRelFieldTrimmer. I will revise the patch and share it.


was (Author: rajesh.balamohan):
Thanks [~jcamachorodriguez]. I agree that it would be better to have this in 
HiveRelFieldTrimmer. Will revise the patch and share it.

> Batch metastore calls to get column stats for fields needed in 
> FilterSelectivityEstimator
> -
>
> Key: HIVE-15339
> URL: https://issues.apache.org/jira/browse/HIVE-15339
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15339.1.patch, HIVE-15339.3.patch
>
>
> Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics 
> from metastore in multiple calls. For instance, in the following query, it 
> ends up getting individual column statistics for for flights multiple number 
> of times.
> When the table has large number of partitions, getting statistics for columns 
> via multiple calls can be very expensive. This would adversely impact the 
> overall compilation time. The following query took 14 seconds to compile.
> {noformat}
> SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`,
> YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok`
> FROM `flights` as `flights`
> JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`)
> JOIN `airports` as `source_airport` ON (`flights`.`origin` = 
> `source_airport`.`iata`)
> JOIN `airports` as `dest_airport` ON (`flights`.`dest` = 
> `dest_airport`.`iata`)
> GROUP BY YEAR(`flights`.`dateofflight`);
> {noformat}
> It may be helpful to club all columns that need statistics and fetch these 
> details in single remote call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6425) Unable to create external table with 3000+ columns

2016-12-15 Thread Sangita (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753400#comment-15753400
 ] 

Sangita commented on HIVE-6425:
---

I was facing the same issue. It was resolved by doing following changes in hive 
meta store.

-- log into Hive Metastore DB

-- >alter table SERDE_PARAMS MODIFY PARAM_VALUE VARCHAR(4);

https://community.hortonworks.com/questions/33311/number-column-limitations-in-hive-over-hbase-table.html


> Unable to create external table with 3000+ columns
> --
>
> Key: HIVE-6425
> URL: https://issues.apache.org/jira/browse/HIVE-6425
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.10.0
> Environment: Linux, CDH 4.2.0
>Reporter: Anurag
>  Labels: patch
> Attachments: Hive_Script.txt
>
>
> While creating an external table in Hive to a table in HBase with 3000+ 
> columns, Hive shows up an error:
> FAILED: Error in metadata: 
> MetaException(message:javax.jdo.JDODataStoreException: Put request failed : 
> INSERT INTO "SERDE_PARAMS" ("PARAM_VALUE","SERDE_ID","PARAM_KEY") VALUES 
> (?,?,?)
> NestedThrowables:
> org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO 
> "SERDE_PARAMS" ("PARAM_VALUE","SERDE_ID","PARAM_KEY") VALUES (?,?,?) )
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15443) There is a table stored as orc format, it contains a column with the type of array.When each cell of this column contains tens of strings, the queries reported Ar

2016-12-15 Thread mortalee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mortalee updated HIVE-15443:

Attachment: orc_array.patch

> There is a table stored as orc format, it contains a column with the type of 
> array.When each cell of this column contains tens of strings, the 
> queries reported ArrayIndexOutOfBoundsException.
> ---
>
> Key: HIVE-15443
> URL: https://issues.apache.org/jira/browse/HIVE-15443
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
> Environment: centos+hive2.1.0+hadoop2.7.2
>Reporter: mortalee
>  Labels: patch
> Attachments: orc_array.patch
>
>
> java.lang.Exception: java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:230)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:106)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:42)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:228)
>   ... 12 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
>   at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
>   at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
>   at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
>   at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
>   at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
>   at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
>   at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
>   at 
> 

[jira] [Updated] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way

2016-12-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15147:

Attachment: HIVE-15147.04.WIP.noout.patch

This makes splitting work for anything using LineRecordReader, however a hack 
is involved. Cache needs to be adjusted to account for torn rows instead.


> LLAP: use LLAP cache for non-columnar formats in a somewhat general way
> ---
>
> Key: HIVE-15147
> URL: https://issues.apache.org/jira/browse/HIVE-15147
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15147.01.WIP.noout.patch, 
> HIVE-15147.02.WIP.noout.patch, HIVE-15147.04.WIP.noout.patch, 
> HIVE-15147.WIP.noout.patch
>
>
> The primary goal for the first pass is caching text files. Nothing would 
> prevent other formats from using the same path, in principle, although, as 
> was originally done with ORC, it may be better to have native caching support 
> optimized for each particular format.
> Given that caching pure text is not smart, and we already have ORC-encoded 
> cache that is columnar due to ORC file structure, we will transform data into 
> columnar ORC.
> The general idea is to treat all the data in the world as merely ORC that was 
> compressed with some poor compression codec, such as csv. Using the original 
> IF and serde, as well as an ORC writer (with some heavyweight optimizations 
> disabled, potentially), we can "uncompress" the csv/whatever data into its 
> "original" ORC representation, then cache it efficiently, by column, and also 
> reuse a lot of the existing code.
> Various other points:
> 1) Caching granularity will have to be somehow determined (i.e. how do we 
> slice the file horizontally, to avoid caching entire columns). As with ORC 
> uncompressed files, the specific offsets don't really matter as long as they 
> are consistent between reads. The problem is that the file offsets will 
> actually need to be propagated to the new reader from the original 
> inputformat. Row counts are easier to use but there's a problem of how to 
> actually map them to missing ranges to read from disk.
> 2) Obviously, for row-based formats, if any one column that is to be read has 
> been evicted or is otherwise missing, "all the columns" have to be read for 
> the corresponding slice to cache and read that one column. The vague plan is 
> to handle this implicitly, similarly to how ORC reader handles CB-RG overlaps 
> - it will just so happen that a missing column in disk range list to retrieve 
> will expand the disk-range-to-read into the whole horizontal slice of the 
> file.
> 3) Granularity/etc. won't work for gzipped text. If anything at all is 
> evicted, the entire file has to be re-read. Gzipped text is a ridiculous 
> feature, so this is by design.
> 4) In future, it would be possible to also build some form or 
> metadata/indexes for this cached data to do PPD, etc. This is out of the 
> scope for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14688) Hive drop call fails in presence of TDE

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753313#comment-15753313
 ] 

Hive QA commented on HIVE-14688:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843504/HIVE-14688.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10803 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=123)

[groupby_complex_types.q,multigroupby_singlemr.q,mapjoin_decimal.q,groupby7.q,join5.q,bucketmapjoin_negative2.q,vectorization_div0.q,union_script.q,add_part_multiple.q,limit_pushdown.q,union_remove_17.q,uniquejoin.q,metadata_only_queries_with_filters.q,union25.q,load_dyn_part13.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2602/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2602/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2602/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843504 - PreCommit-HIVE-Build

> Hive drop call fails in presence of TDE
> ---
>
> Key: HIVE-14688
> URL: https://issues.apache.org/jira/browse/HIVE-14688
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Deepesh Khandelwal
>Assignee: Wei Zheng
> Attachments: HIVE-14688.1.patch
>
>
> In Hadoop 2.8.0 TDE trash collection was fixed through HDFS-8831. This 
> enables us to make drop table calls for Hive managed tables where Hive 
> metastore warehouse directory is in encrypted zone. However even with the 
> feature in HDFS, Hive drop table currently fail:
> {noformat}
> $ hdfs crypto -listZones
> /apps/hive/warehouse  key2 
> $ hdfs dfs -ls /apps/hive/warehouse
> Found 1 items
> drwxrwxrwt   - hdfs hdfs  0 2016-09-01 02:54 
> /apps/hive/warehouse/.Trash
> hive> create table abc(a string, b int);
> OK
> Time taken: 5.538 seconds
> hive> dfs -ls /apps/hive/warehouse;
> Found 2 items
> drwxrwxrwt   - hdfs   hdfs  0 2016-09-01 02:54 
> /apps/hive/warehouse/.Trash
> drwxrwxrwx   - deepesh hdfs  0 2016-09-01 17:15 
> /apps/hive/warehouse/abc
> hive> drop table if exists abc;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
> default.abc because it is in an encryption zone and trash is enabled.  Use 
> PURGE option to skip trash.)
> {noformat}
> The problem lies here:
> {code:title=metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java}
> private void checkTrashPurgeCombination(Path pathToData, String objectName, 
> boolean ifPurge)
> ...
>   if (trashEnabled) {
> try {
>   HadoopShims.HdfsEncryptionShim shim =
> 
> ShimLoader.getHadoopShims().createHdfsEncryptionShim(FileSystem.get(hiveConf),
>  hiveConf);
>   if (shim.isPathEncrypted(pathToData)) {
> throw new MetaException("Unable to drop " + objectName + " 
> because it is in an encryption zone" +
>   " and trash is enabled.  Use PURGE option to skip trash.");
>   }
> } catch (IOException ex) {
>   MetaException e = new MetaException(ex.getMessage());
>   e.initCause(ex);
>   throw e;
> }
>   }
> {code}
> As we 

[jira] [Updated] (HIVE-15442) Driver.java has a redundancy code

2016-12-15 Thread Gold huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gold huang updated HIVE-15442:
--
Status: Patch Available  (was: Open)

> Driver.java has a redundancy  code
> --
>
> Key: HIVE-15442
> URL: https://issues.apache.org/jira/browse/HIVE-15442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Gold huang
>Assignee: Gold huang
>Priority: Minor
> Attachments: HIVE-15442.patch
>
>
> Driver.java has a  redundancy  code,i think the third if statement could be 
> remved.
> if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> String explainOutput = getExplainOutput(sem, plan, tree.dump());
> if (explainOutput != null) {
>   if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> LOG.info("EXPLAIN output for queryid " + queryId + " : " + 
> explainOutput);
>   }
>   if (conf.isWebUiQueryInfoCacheEnabled()) {
> queryDisplay.setExplainPlan(explainOutput);
>   }
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15442) Driver.java has a redundancy code

2016-12-15 Thread Gold huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gold huang updated HIVE-15442:
--
Status: Open  (was: Patch Available)

> Driver.java has a redundancy  code
> --
>
> Key: HIVE-15442
> URL: https://issues.apache.org/jira/browse/HIVE-15442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Gold huang
>Assignee: Gold huang
>Priority: Minor
> Attachments: HIVE-15442.patch
>
>
> Driver.java has a  redundancy  code,i think the third if statement could be 
> remved.
> if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> String explainOutput = getExplainOutput(sem, plan, tree.dump());
> if (explainOutput != null) {
>   if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> LOG.info("EXPLAIN output for queryid " + queryId + " : " + 
> explainOutput);
>   }
>   if (conf.isWebUiQueryInfoCacheEnabled()) {
> queryDisplay.setExplainPlan(explainOutput);
>   }
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15442) Driver.java has a redundancy code

2016-12-15 Thread Gold huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gold huang updated HIVE-15442:
--
Attachment: HIVE-15442.patch

> Driver.java has a redundancy  code
> --
>
> Key: HIVE-15442
> URL: https://issues.apache.org/jira/browse/HIVE-15442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Gold huang
>Assignee: Gold huang
>Priority: Minor
> Attachments: HIVE-15442.patch
>
>
> Driver.java has a  redundancy  code,i think the third if statement could be 
> remved.
> if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> String explainOutput = getExplainOutput(sem, plan, tree.dump());
> if (explainOutput != null) {
>   if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> LOG.info("EXPLAIN output for queryid " + queryId + " : " + 
> explainOutput);
>   }
>   if (conf.isWebUiQueryInfoCacheEnabled()) {
> queryDisplay.setExplainPlan(explainOutput);
>   }
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15442) Driver.java has a redundancy code

2016-12-15 Thread Gold huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gold huang updated HIVE-15442:
--
Attachment: (was: patch)

> Driver.java has a redundancy  code
> --
>
> Key: HIVE-15442
> URL: https://issues.apache.org/jira/browse/HIVE-15442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Gold huang
>Assignee: Gold huang
>Priority: Minor
>
> Driver.java has a  redundancy  code,i think the third if statement could be 
> remved.
> if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> String explainOutput = getExplainOutput(sem, plan, tree.dump());
> if (explainOutput != null) {
>   if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> LOG.info("EXPLAIN output for queryid " + queryId + " : " + 
> explainOutput);
>   }
>   if (conf.isWebUiQueryInfoCacheEnabled()) {
> queryDisplay.setExplainPlan(explainOutput);
>   }
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15442) Driver.java has a redundancy code

2016-12-15 Thread Gold huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gold huang updated HIVE-15442:
--
Status: Patch Available  (was: Open)

> Driver.java has a redundancy  code
> --
>
> Key: HIVE-15442
> URL: https://issues.apache.org/jira/browse/HIVE-15442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Gold huang
>Assignee: Gold huang
>Priority: Minor
> Attachments: patch
>
>
> Driver.java has a  redundancy  code,i think the third if statement could be 
> remved.
> if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> String explainOutput = getExplainOutput(sem, plan, tree.dump());
> if (explainOutput != null) {
>   if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> LOG.info("EXPLAIN output for queryid " + queryId + " : " + 
> explainOutput);
>   }
>   if (conf.isWebUiQueryInfoCacheEnabled()) {
> queryDisplay.setExplainPlan(explainOutput);
>   }
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15442) Driver.java has a redundancy code

2016-12-15 Thread Gold huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gold huang updated HIVE-15442:
--
Attachment: patch

> Driver.java has a redundancy  code
> --
>
> Key: HIVE-15442
> URL: https://issues.apache.org/jira/browse/HIVE-15442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Gold huang
>Assignee: Gold huang
>Priority: Minor
> Attachments: patch
>
>
> Driver.java has a  redundancy  code,i think the third if statement could be 
> remved.
> if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> String explainOutput = getExplainOutput(sem, plan, tree.dump());
> if (explainOutput != null) {
>   if (conf.getBoolVar(ConfVars.HIVE_LOG_EXPLAIN_OUTPUT)) {
> LOG.info("EXPLAIN output for queryid " + queryId + " : " + 
> explainOutput);
>   }
>   if (conf.isWebUiQueryInfoCacheEnabled()) {
> queryDisplay.setExplainPlan(explainOutput);
>   }
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15441) Provide a config to timeout long compiling queries

2016-12-15 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753254#comment-15753254
 ] 

Chao Sun commented on HIVE-15441:
-

Thanks [~sershe] for taking a look. Regarding 1), yes it doesn't have to - the 
only purpose is to quit the loop when {{shouldStop}} is true, but I guess I can 
just interrupt the thread in that case. Will change it.
2) it won't kill, but just interrupt. Yes in CLI it's the main thread while in 
HS2 it's the handler thread. Both cases it will recover. However, the error 
message is a little misleading since it's wrapped with potentially 
many other exceptions on top of it. I'll see if we can improve the message.

I do have one concern about this patch though: the interrupted exception could 
literally happen at ANY point during the compilation process, and I'm not sure 
if it can be handled gracefully in all the places. Any thought on that?

> Provide a config to timeout long compiling queries
> --
>
> Key: HIVE-15441
> URL: https://issues.apache.org/jira/browse/HIVE-15441
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15441.1.patch
>
>
> Sometimes Hive users have long compiling queries which may need to scan 
> thousands or even more partitions (perhaps by accident). The compilation 
> process may take a very long time, especially in {{getInputSummary}} where it 
> need to make NN calls to get info about each input path.
> This is bad because it may block many other queries. Parallel compilation may 
> be useful but still {{getInputSummary}} has a global lock. In this case, it 
> makes sense to provide Hive admin with a config to put a timeout limit for 
> compilation, so that these "bad" queries can be blocked.
> Note https://issues.apache.org/jira/browse/HIVE-12431 also tries to address 
> similar issue. However it cancels those queries that are waiting for the 
> compile lock, which I think is not so useful for our case since the *query 
> under compile is the one to be blamed.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15347) LLAP: Executor memory and Xmx should have some headroom for other services

2016-12-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15347:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

This is committed already. Forgot to close the jira.

> LLAP: Executor memory and Xmx should have some headroom for other services
> --
>
> Key: HIVE-15347
> URL: https://issues.apache.org/jira/browse/HIVE-15347
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-15347.1.patch
>
>
> If executor memory + cache memory is configured close or equal to Xmx, the 
> task attempts that is causing OOM can take down the LLAP daemon. Provide some 
> leeway for other services during memory crunch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15438) avrocountemptytbl.q should use SORT_QUERY_RESULTS

2016-12-15 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753231#comment-15753231
 ] 

Anthony Hsu commented on HIVE-15438:


Thanks, Ashutosh!

> avrocountemptytbl.q should use SORT_QUERY_RESULTS
> -
>
> Key: HIVE-15438
> URL: https://issues.apache.org/jira/browse/HIVE-15438
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Fix For: 2.2.0
>
> Attachments: HIVE-15438.1.patch
>
>
> In Hive 1.1.0, when building and testing using Java 1.8, I've noticed that 
> avrocountemptytbl.q due to ordering issues:
> {noformat}
> 57d56
> < 100
> 58a58
> > 100
> {noformat}
> This can be fixed by adding {{-- SORT_QUERY_RESULTS}} to the qtest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15192) Use Calcite to de-correlate and plan subqueries

2016-12-15 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753219#comment-15753219
 ] 

Ashutosh Chauhan commented on HIVE-15192:
-

+1 pending tests

> Use Calcite to de-correlate and plan subqueries
> ---
>
> Key: HIVE-15192
> URL: https://issues.apache.org/jira/browse/HIVE-15192
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15192.10.patch, HIVE-15192.2.patch, 
> HIVE-15192.3.patch, HIVE-15192.4.patch, HIVE-15192.5.patch, 
> HIVE-15192.6.patch, HIVE-15192.7.patch, HIVE-15192.8.patch, 
> HIVE-15192.9.patch, HIVE-15192.patch
>
>
> HIVE currently tranform subqueries into SEMI-JOIN or LEFT OUTER JOIN. This 
> transformation occurs on query AST before generating logical plan. These 
> transformations are described at [Link to original spec | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf]. 
> Such transformations aren't able to handle a lot of subqueries, as a result 
> HIVE imposes various restrictions on the type of queries it could handle e.g. 
> Hive disallows nested subqueries. All current restrictions are detailed in 
> above linked document.
> This patch is 1st phase of getting rid of these transformations and leverage 
> Calcite's functionality to plan such queries. 
> Next phases will be lifting restrictions one by one. 
> Note that this patch already lifts one restriction *Restriction.6.m* (The LHS 
> in a SubQuery must have all its Column References be qualified)
> Known issues with this patch are:
>  * Return path tests fails for various reasons and are currently disabled. We 
> plan to fix and re-enable this later.
>   * Semi-join optimization (HIVE-15227) is disabled by default as it doesn't 
> work with this patch. We plan to fix this and re-enable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark

2016-12-15 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li reassigned HIVE-15272:
-

Assignee: Rui Li

> "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
> --
>
> Key: HIVE-15272
> URL: https://issues.apache.org/jira/browse/HIVE-15272
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4
>Reporter: Vikash Pareek
>Assignee: Rui Li
>
> I ran following Hive query multiple times with execution engine as Hive on 
> Spark and Hive on MapReduce.
> {code}
> SELECT COUNT(DISTINCT t1.region, t1.amount)
> FROM my_db.my_table1 t1
> LEFT OUTER
> JOIN my_db.my_table2 t2 ON (t1.id = t2.id
> AND t1.name = t2.name)
> {code}
> With Hive on Spark: Result (count) were different of every execution.
> With Hive on MapReduce: Result (count) were same of every execution.
> Seems like Hive on Spark behaving differently in each execution and does not 
> populating correct result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark

2016-12-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753212#comment-15753212
 ] 

Rui Li commented on HIVE-15272:
---

OK I'll look into this.
[~VPareek], I think the two tables have same DDL right? Do they contain same 
data? Could you upload some sample data that can reproduce the issue? Thanks!

> "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
> --
>
> Key: HIVE-15272
> URL: https://issues.apache.org/jira/browse/HIVE-15272
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4
>Reporter: Vikash Pareek
>
> I ran following Hive query multiple times with execution engine as Hive on 
> Spark and Hive on MapReduce.
> {code}
> SELECT COUNT(DISTINCT t1.region, t1.amount)
> FROM my_db.my_table1 t1
> LEFT OUTER
> JOIN my_db.my_table2 t2 ON (t1.id = t2.id
> AND t1.name = t2.name)
> {code}
> With Hive on Spark: Result (count) were different of every execution.
> With Hive on MapReduce: Result (count) were same of every execution.
> Seems like Hive on Spark behaving differently in each execution and does not 
> populating correct result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15200) Support setOp in subQuery with parentheses

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753204#comment-15753204
 ] 

Hive QA commented on HIVE-15200:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843489/HIVE-15200.02.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 41 failed/errored test(s), 10819 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort] 
(batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_parse] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_cp] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[limit_pushdown_negative] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin_mapjoin6] 
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoinopt10] 
(batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_inline] (batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_lateralview] 
(batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_distinct_gby] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[windowing_navfn] 
(batchId=62)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_windowing_2]
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lateral_view]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf_streaming]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_ptf]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[windowing] 
(batchId=147)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=92)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[join_alt_syntax_comma_on]
 (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[ptf_negative_JoinWithAmbigousAlias]
 (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[right_side_join] 
(batchId=84)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ptf] (batchId=101)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ptf_streaming] 
(batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[skewjoinopt10] 
(batchId=103)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_lateralview] 
(batchId=105)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[windowing] 
(batchId=116)
org.apache.hadoop.hive.ql.parse.TestIUD.testSelectStarFromAnonymousVirtTable1Row
 (batchId=257)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2601/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2601/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2601/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 41 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843489 - PreCommit-HIVE-Build

> Support setOp in subQuery with parentheses
> 

[jira] [Commented] (HIVE-15428) HoS DPP doesn't remove cyclic dependency

2016-12-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753161#comment-15753161
 ] 

Rui Li commented on HIVE-15428:
---

Test failures are not related.

> HoS DPP doesn't remove cyclic dependency
> 
>
> Key: HIVE-15428
> URL: https://issues.apache.org/jira/browse/HIVE-15428
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-15428.1.patch
>
>
> More details in HIVE-15357



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13278) Avoid FileNotFoundException when map/reduce.xml is not available

2016-12-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753143#comment-15753143
 ] 

Rui Li edited comment on HIVE-13278 at 12/16/16 1:57 AM:
-

Hi [~csun], sorry maybe I was being misleading. What I have in mind is 
something like this:
{code}
  // In Utilities::setMapWork
  public static Path setMapWork(Configuration conf, MapWork w, Path 
hiveScratchDir, boolean useCache) {
conf.setBoolean(HAS_MAP_WORK, true);
return setBaseWork(conf, w, hiveScratchDir, MAP_PLAN_NAME, useCache);
  }

  // In Utilities::getMapWork
  public static MapWork getMapWork(Configuration conf) {
if (!conf.getBoolean(HAS_MAP_WORK, false)) {
  return null;
}

{code}
Similar for set/get ReduceWork. So if we haven't called set work, we'll just 
get null when getting the work. Do you think it makes sense?


was (Author: lirui):
Hi [~csun], sorry maybe I was being misleading. What I have in mind is 
something like this:
{code}
  // In Utilities::setMapWork
  public static Path setMapWork(Configuration conf, MapWork w, Path 
hiveScratchDir, boolean useCache) {
conf.setBoolean(HAS_REDUCE_WORK, true);
return setBaseWork(conf, w, hiveScratchDir, MAP_PLAN_NAME, useCache);
  }

  // In Utilities::getMapWork
  public static MapWork getMapWork(Configuration conf) {
if (!conf.getBoolean(HAS_MAP_WORK, false)) {
  return null;
}

{code}
Similar for set/get ReduceWork. So if we haven't called set work, we'll just 
get null when getting the work. Do you think it makes sense?

> Avoid FileNotFoundException when map/reduce.xml is not available
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch, HIVE-13278.4.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Avoid FileNotFoundException when map/reduce.xml is not available

2016-12-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753143#comment-15753143
 ] 

Rui Li commented on HIVE-13278:
---

Hi [~csun], sorry maybe I was being misleading. What I have in mind is 
something like this:
{code}
  // In Utilities::setMapWork
  public static Path setMapWork(Configuration conf, MapWork w, Path 
hiveScratchDir, boolean useCache) {
conf.setBoolean(HAS_REDUCE_WORK, true);
return setBaseWork(conf, w, hiveScratchDir, MAP_PLAN_NAME, useCache);
  }

  // In Utilities::getMapWork
  public static MapWork getMapWork(Configuration conf) {
if (!conf.getBoolean(HAS_MAP_WORK, false)) {
  return null;
}

{code}
Similar for set/get ReduceWork. So if we haven't called set work, we'll just 
get null when getting the work. Do you think it makes sense?

> Avoid FileNotFoundException when map/reduce.xml is not available
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch, HIVE-13278.4.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15441) Provide a config to timeout long compiling queries

2016-12-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753136#comment-15753136
 ] 

Sergey Shelukhin commented on HIVE-15441:
-

1) Is there need to wake up every second? It could just sleep for the duration 
and check when awoken to avoid spurious wake-ups. Also it's nice to honor 
interrupts.
2) What is the thread that it actually kills? E.g. in CLI would it kill the 
main thread, or in HS2 would the handler it kills be recovered, and would the 
user receive a useful error message?

> Provide a config to timeout long compiling queries
> --
>
> Key: HIVE-15441
> URL: https://issues.apache.org/jira/browse/HIVE-15441
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15441.1.patch
>
>
> Sometimes Hive users have long compiling queries which may need to scan 
> thousands or even more partitions (perhaps by accident). The compilation 
> process may take a very long time, especially in {{getInputSummary}} where it 
> need to make NN calls to get info about each input path.
> This is bad because it may block many other queries. Parallel compilation may 
> be useful but still {{getInputSummary}} has a global lock. In this case, it 
> makes sense to provide Hive admin with a config to put a timeout limit for 
> compilation, so that these "bad" queries can be blocked.
> Note https://issues.apache.org/jira/browse/HIVE-12431 also tries to address 
> similar issue. However it cancels those queries that are waiting for the 
> compile lock, which I think is not so useful for our case since the *query 
> under compile is the one to be blamed.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15192) Use Calcite to de-correlate and plan subqueries

2016-12-15 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15192:
---
Status: Open  (was: Patch Available)

> Use Calcite to de-correlate and plan subqueries
> ---
>
> Key: HIVE-15192
> URL: https://issues.apache.org/jira/browse/HIVE-15192
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15192.10.patch, HIVE-15192.2.patch, 
> HIVE-15192.3.patch, HIVE-15192.4.patch, HIVE-15192.5.patch, 
> HIVE-15192.6.patch, HIVE-15192.7.patch, HIVE-15192.8.patch, 
> HIVE-15192.9.patch, HIVE-15192.patch
>
>
> HIVE currently tranform subqueries into SEMI-JOIN or LEFT OUTER JOIN. This 
> transformation occurs on query AST before generating logical plan. These 
> transformations are described at [Link to original spec | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf]. 
> Such transformations aren't able to handle a lot of subqueries, as a result 
> HIVE imposes various restrictions on the type of queries it could handle e.g. 
> Hive disallows nested subqueries. All current restrictions are detailed in 
> above linked document.
> This patch is 1st phase of getting rid of these transformations and leverage 
> Calcite's functionality to plan such queries. 
> Next phases will be lifting restrictions one by one. 
> Note that this patch already lifts one restriction *Restriction.6.m* (The LHS 
> in a SubQuery must have all its Column References be qualified)
> Known issues with this patch are:
>  * Return path tests fails for various reasons and are currently disabled. We 
> plan to fix and re-enable this later.
>   * Semi-join optimization (HIVE-15227) is disabled by default as it doesn't 
> work with this patch. We plan to fix this and re-enable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15192) Use Calcite to de-correlate and plan subqueries

2016-12-15 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15192:
---
Status: Patch Available  (was: Open)

> Use Calcite to de-correlate and plan subqueries
> ---
>
> Key: HIVE-15192
> URL: https://issues.apache.org/jira/browse/HIVE-15192
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15192.10.patch, HIVE-15192.2.patch, 
> HIVE-15192.3.patch, HIVE-15192.4.patch, HIVE-15192.5.patch, 
> HIVE-15192.6.patch, HIVE-15192.7.patch, HIVE-15192.8.patch, 
> HIVE-15192.9.patch, HIVE-15192.patch
>
>
> HIVE currently tranform subqueries into SEMI-JOIN or LEFT OUTER JOIN. This 
> transformation occurs on query AST before generating logical plan. These 
> transformations are described at [Link to original spec | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf]. 
> Such transformations aren't able to handle a lot of subqueries, as a result 
> HIVE imposes various restrictions on the type of queries it could handle e.g. 
> Hive disallows nested subqueries. All current restrictions are detailed in 
> above linked document.
> This patch is 1st phase of getting rid of these transformations and leverage 
> Calcite's functionality to plan such queries. 
> Next phases will be lifting restrictions one by one. 
> Note that this patch already lifts one restriction *Restriction.6.m* (The LHS 
> in a SubQuery must have all its Column References be qualified)
> Known issues with this patch are:
>  * Return path tests fails for various reasons and are currently disabled. We 
> plan to fix and re-enable this later.
>   * Semi-join optimization (HIVE-15227) is disabled by default as it doesn't 
> work with this patch. We plan to fix this and re-enable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15192) Use Calcite to de-correlate and plan subqueries

2016-12-15 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15192:
---
Attachment: HIVE-15192.10.patch

> Use Calcite to de-correlate and plan subqueries
> ---
>
> Key: HIVE-15192
> URL: https://issues.apache.org/jira/browse/HIVE-15192
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15192.10.patch, HIVE-15192.2.patch, 
> HIVE-15192.3.patch, HIVE-15192.4.patch, HIVE-15192.5.patch, 
> HIVE-15192.6.patch, HIVE-15192.7.patch, HIVE-15192.8.patch, 
> HIVE-15192.9.patch, HIVE-15192.patch
>
>
> HIVE currently tranform subqueries into SEMI-JOIN or LEFT OUTER JOIN. This 
> transformation occurs on query AST before generating logical plan. These 
> transformations are described at [Link to original spec | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf]. 
> Such transformations aren't able to handle a lot of subqueries, as a result 
> HIVE imposes various restrictions on the type of queries it could handle e.g. 
> Hive disallows nested subqueries. All current restrictions are detailed in 
> above linked document.
> This patch is 1st phase of getting rid of these transformations and leverage 
> Calcite's functionality to plan such queries. 
> Next phases will be lifting restrictions one by one. 
> Note that this patch already lifts one restriction *Restriction.6.m* (The LHS 
> in a SubQuery must have all its Column References be qualified)
> Known issues with this patch are:
>  * Return path tests fails for various reasons and are currently disabled. We 
> plan to fix and re-enable this later.
>   * Semi-join optimization (HIVE-15227) is disabled by default as it doesn't 
> work with this patch. We plan to fix this and re-enable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753119#comment-15753119
 ] 

Hive QA commented on HIVE-15376:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843487/HIVE-15376.8.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10814 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=229)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2600/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2600/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2600/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843487 - PreCommit-HIVE-Build

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-15 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753102#comment-15753102
 ] 

Matt McCline commented on HIVE-15335:
-

I am very clear I do not want to support 3rd parties writing vectorized UDFs on 
our current data structures and without any design thought whatsoever being 
given to it.  I've always considered the vector classes to be internal 
non-shared data structures.  Certainly not public APIs.

And, clearly an alternative to examine is to say early versions of ORC using 
old HIveDecimal are compatible with an early range of Hive; and newer ORC 
versions are only compatible with newer Hive versions.

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753038#comment-15753038
 ] 

Lefty Leverenz edited comment on HIVE-15277 at 12/16/16 1:13 AM:
-

The new table property should be documented here as well as in the Druid 
Integration doc:

* [DDL -- Table Properties | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-listTableProperties]

Also document the new configuration parameters:

*  *hive.druid.indexer.segments.granularity*
*  *hive.druid.indexer.partition.size.max*
*  *hive.druid.indexer.memory.rownum.max*
*  *hive.druid.basePersistDirectory*
*  *hive.druid.storage.storageDirectory*
*  *hive.druid.metadata.base*
*  *hive.druid.metadata.db.type*
*  *hive.druid.metadata.username*
*  *hive.druid.metadata.password*
*  *hive.druid.metadata.uri*
*  *hive.druid.working.directory*

At this point there are enough Druid configuration parameters for a separate 
subsection in the Configuration Properties doc.  (Also see HIVE-14217 and 
HIVE-15273.)

* [Hive Configuration Properties | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveConfigurationProperties]

Added a TODOC2.2 label.


was (Author: le...@hortonworks.com):
Also document the new configuration parameters:

*  *hive.druid.indexer.segments.granularity*
*  *hive.druid.indexer.partition.size.max*
*  *hive.druid.indexer.memory.rownum.max*
*  *hive.druid.basePersistDirectory*
*  *hive.druid.storage.storageDirectory*
*  *hive.druid.metadata.base*
*  *hive.druid.metadata.db.type*
*  *hive.druid.metadata.username*
*  *hive.druid.metadata.password*
*  *hive.druid.metadata.uri*
*  *hive.druid.working.directory*

At this point there are enough Druid configuration parameters for a separate 
subsection in the Configuration Properties doc.  (Also see HIVE-14217 and 
HIVE-15273.)

* [Hive Configuration Properties | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveConfigurationProperties]

Added a TODOC2.2 label.

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15441) Provide a config to timeout long compiling queries

2016-12-15 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15441:

Status: Patch Available  (was: Open)

> Provide a config to timeout long compiling queries
> --
>
> Key: HIVE-15441
> URL: https://issues.apache.org/jira/browse/HIVE-15441
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15441.1.patch
>
>
> Sometimes Hive users have long compiling queries which may need to scan 
> thousands or even more partitions (perhaps by accident). The compilation 
> process may take a very long time, especially in {{getInputSummary}} where it 
> need to make NN calls to get info about each input path.
> This is bad because it may block many other queries. Parallel compilation may 
> be useful but still {{getInputSummary}} has a global lock. In this case, it 
> makes sense to provide Hive admin with a config to put a timeout limit for 
> compilation, so that these "bad" queries can be blocked.
> Note https://issues.apache.org/jira/browse/HIVE-12431 also tries to address 
> similar issue. However it cancels those queries that are waiting for the 
> compile lock, which I think is not so useful for our case since the *query 
> under compile is the one to be blamed.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15441) Provide a config to timeout long compiling queries

2016-12-15 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15441:

Attachment: HIVE-15441.1.patch

> Provide a config to timeout long compiling queries
> --
>
> Key: HIVE-15441
> URL: https://issues.apache.org/jira/browse/HIVE-15441
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15441.1.patch
>
>
> Sometimes Hive users have long compiling queries which may need to scan 
> thousands or even more partitions (perhaps by accident). The compilation 
> process may take a very long time, especially in {{getInputSummary}} where it 
> need to make NN calls to get info about each input path.
> This is bad because it may block many other queries. Parallel compilation may 
> be useful but still {{getInputSummary}} has a global lock. In this case, it 
> makes sense to provide Hive admin with a config to put a timeout limit for 
> compilation, so that these "bad" queries can be blocked.
> Note https://issues.apache.org/jira/browse/HIVE-12431 also tries to address 
> similar issue. However it cancels those queries that are waiting for the 
> compile lock, which I think is not so useful for our case since the *query 
> under compile is the one to be blamed.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15273) Druid http client not configured correctly

2016-12-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753055#comment-15753055
 ] 

Lefty Leverenz commented on HIVE-15273:
---

The Configuration Properties doc should have a separate section for Druid 
because HIVE-15277 adds 11 more.

> Druid http client not configured correctly
> --
>
> Key: HIVE-15273
> URL: https://issues.apache.org/jira/browse/HIVE-15273
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Minor
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: 0001-adding-confing-to-http-client.patch, 
> HIVE-15273.patch
>
>
> Current used http client by the druid-hive record reader is constructed with 
> default values. Default values of numConnection and ReadTimeout are very 
> small which can lead to following exception " ERROR 
> [2ee34a2b-c8a5-4748-ab91-db3621d2aa5c main] CliDriver: Failed with exception 
> java.io.IOException:java.io.IOException: java.io.IOException: org.apache.h
> ive.druid.org.jboss.netty.channel.ChannelException: Channel disconnected"
> Full stack can be found 
> here.https://gist.github.com/b-slim/384ca6a96698f5b51ad9b171cff556a2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753038#comment-15753038
 ] 

Lefty Leverenz commented on HIVE-15277:
---

Also document the new configuration parameters:

*  *hive.druid.indexer.segments.granularity*
*  *hive.druid.indexer.partition.size.max*
*  *hive.druid.indexer.memory.rownum.max*
*  *hive.druid.basePersistDirectory*
*  *hive.druid.storage.storageDirectory*
*  *hive.druid.metadata.base*
*  *hive.druid.metadata.db.type*
*  *hive.druid.metadata.username*
*  *hive.druid.metadata.password*
*  *hive.druid.metadata.uri*
*  *hive.druid.working.directory*

At this point there are enough Druid configuration parameters for a separate 
subsection in the Configuration Properties doc.  (Also see HIVE-14217 and 
HIVE-15273.)

* [Hive Configuration Properties | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveConfigurationProperties]

Added a TODOC2.2 label.

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Avoid FileNotFoundException when map/reduce.xml is not available

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753034#comment-15753034
 ] 

Hive QA commented on HIVE-13278:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843458/HIVE-13278.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10788 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2599/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2599/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2599/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843458 - PreCommit-HIVE-Build

> Avoid FileNotFoundException when map/reduce.xml is not available
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch, HIVE-13278.4.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> 

[jira] [Updated] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-15 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15277:
--
Labels: TODOC2.2  (was: )

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15438) avrocountemptytbl.q should use SORT_QUERY_RESULTS

2016-12-15 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15438:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Anthony!

> avrocountemptytbl.q should use SORT_QUERY_RESULTS
> -
>
> Key: HIVE-15438
> URL: https://issues.apache.org/jira/browse/HIVE-15438
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Fix For: 2.2.0
>
> Attachments: HIVE-15438.1.patch
>
>
> In Hive 1.1.0, when building and testing using Java 1.8, I've noticed that 
> avrocountemptytbl.q due to ordering issues:
> {noformat}
> 57d56
> < 100
> 58a58
> > 100
> {noformat}
> This can be fixed by adding {{-- SORT_QUERY_RESULTS}} to the qtest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14688) Hive drop call fails in presence of TDE

2016-12-15 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-14688:
-
Status: Patch Available  (was: Open)

> Hive drop call fails in presence of TDE
> ---
>
> Key: HIVE-14688
> URL: https://issues.apache.org/jira/browse/HIVE-14688
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 2.0.0, 1.2.1
>Reporter: Deepesh Khandelwal
>Assignee: Wei Zheng
> Attachments: HIVE-14688.1.patch
>
>
> In Hadoop 2.8.0 TDE trash collection was fixed through HDFS-8831. This 
> enables us to make drop table calls for Hive managed tables where Hive 
> metastore warehouse directory is in encrypted zone. However even with the 
> feature in HDFS, Hive drop table currently fail:
> {noformat}
> $ hdfs crypto -listZones
> /apps/hive/warehouse  key2 
> $ hdfs dfs -ls /apps/hive/warehouse
> Found 1 items
> drwxrwxrwt   - hdfs hdfs  0 2016-09-01 02:54 
> /apps/hive/warehouse/.Trash
> hive> create table abc(a string, b int);
> OK
> Time taken: 5.538 seconds
> hive> dfs -ls /apps/hive/warehouse;
> Found 2 items
> drwxrwxrwt   - hdfs   hdfs  0 2016-09-01 02:54 
> /apps/hive/warehouse/.Trash
> drwxrwxrwx   - deepesh hdfs  0 2016-09-01 17:15 
> /apps/hive/warehouse/abc
> hive> drop table if exists abc;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
> default.abc because it is in an encryption zone and trash is enabled.  Use 
> PURGE option to skip trash.)
> {noformat}
> The problem lies here:
> {code:title=metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java}
> private void checkTrashPurgeCombination(Path pathToData, String objectName, 
> boolean ifPurge)
> ...
>   if (trashEnabled) {
> try {
>   HadoopShims.HdfsEncryptionShim shim =
> 
> ShimLoader.getHadoopShims().createHdfsEncryptionShim(FileSystem.get(hiveConf),
>  hiveConf);
>   if (shim.isPathEncrypted(pathToData)) {
> throw new MetaException("Unable to drop " + objectName + " 
> because it is in an encryption zone" +
>   " and trash is enabled.  Use PURGE option to skip trash.");
>   }
> } catch (IOException ex) {
>   MetaException e = new MetaException(ex.getMessage());
>   e.initCause(ex);
>   throw e;
> }
>   }
> {code}
> As we can see that we are making an assumption that delete wouldn't be 
> successful in encrypted zone. We need to modify this logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14688) Hive drop call fails in presence of TDE

2016-12-15 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-14688:
-
Attachment: HIVE-14688.1.patch

[~ekoifman] Can you review please?

> Hive drop call fails in presence of TDE
> ---
>
> Key: HIVE-14688
> URL: https://issues.apache.org/jira/browse/HIVE-14688
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Deepesh Khandelwal
>Assignee: Wei Zheng
> Attachments: HIVE-14688.1.patch
>
>
> In Hadoop 2.8.0 TDE trash collection was fixed through HDFS-8831. This 
> enables us to make drop table calls for Hive managed tables where Hive 
> metastore warehouse directory is in encrypted zone. However even with the 
> feature in HDFS, Hive drop table currently fail:
> {noformat}
> $ hdfs crypto -listZones
> /apps/hive/warehouse  key2 
> $ hdfs dfs -ls /apps/hive/warehouse
> Found 1 items
> drwxrwxrwt   - hdfs hdfs  0 2016-09-01 02:54 
> /apps/hive/warehouse/.Trash
> hive> create table abc(a string, b int);
> OK
> Time taken: 5.538 seconds
> hive> dfs -ls /apps/hive/warehouse;
> Found 2 items
> drwxrwxrwt   - hdfs   hdfs  0 2016-09-01 02:54 
> /apps/hive/warehouse/.Trash
> drwxrwxrwx   - deepesh hdfs  0 2016-09-01 17:15 
> /apps/hive/warehouse/abc
> hive> drop table if exists abc;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
> default.abc because it is in an encryption zone and trash is enabled.  Use 
> PURGE option to skip trash.)
> {noformat}
> The problem lies here:
> {code:title=metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java}
> private void checkTrashPurgeCombination(Path pathToData, String objectName, 
> boolean ifPurge)
> ...
>   if (trashEnabled) {
> try {
>   HadoopShims.HdfsEncryptionShim shim =
> 
> ShimLoader.getHadoopShims().createHdfsEncryptionShim(FileSystem.get(hiveConf),
>  hiveConf);
>   if (shim.isPathEncrypted(pathToData)) {
> throw new MetaException("Unable to drop " + objectName + " 
> because it is in an encryption zone" +
>   " and trash is enabled.  Use PURGE option to skip trash.");
>   }
> } catch (IOException ex) {
>   MetaException e = new MetaException(ex.getMessage());
>   e.initCause(ex);
>   throw e;
> }
>   }
> {code}
> As we can see that we are making an assumption that delete wouldn't be 
> successful in encrypted zone. We need to modify this logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14496) Enable Calcite rewriting with materialized views

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752884#comment-15752884
 ] 

Hive QA commented on HIVE-14496:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843459/HIVE-14496.09.patch

{color:green}SUCCESS:{color} +1 due to 14 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view_partitioned] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cteViews] (batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_join_pushdown] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_unionDistinct_2]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_views]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[unionDistinct_2]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[unionDistinct_2] 
(batchId=92)
org.apache.hadoop.hive.ql.metadata.TestHive.testTable (batchId=259)
org.apache.hadoop.hive.ql.metadata.TestHive.testThriftTable (batchId=259)
org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testTable (batchId=260)
org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testThriftTable (batchId=260)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2598/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2598/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2598/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 25 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843459 - PreCommit-HIVE-Build

> Enable Calcite rewriting with materialized views
> 
>
> Key: HIVE-14496
> URL: https://issues.apache.org/jira/browse/HIVE-14496
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14496.01.patch, HIVE-14496.02.patch, 
> HIVE-14496.03.patch, HIVE-14496.04.patch, HIVE-14496.05.patch, 
> HIVE-14496.07.patch, HIVE-14496.08.patch, HIVE-14496.09.patch, 
> HIVE-14496.patch
>
>
> Calcite already supports query rewriting using materialized views. We will 
> use it to support this feature in Hive.
> In order to do that, we need to register the existing materialized views with 
> Calcite view service and enable the materialized views rewriting rules. 
> We should include a HiveConf flag to completely disable query rewriting using 
> materialized views if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14688) Hive drop call fails in presence of TDE

2016-12-15 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng reassigned HIVE-14688:


Assignee: Wei Zheng

> Hive drop call fails in presence of TDE
> ---
>
> Key: HIVE-14688
> URL: https://issues.apache.org/jira/browse/HIVE-14688
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Deepesh Khandelwal
>Assignee: Wei Zheng
>
> In Hadoop 2.8.0 TDE trash collection was fixed through HDFS-8831. This 
> enables us to make drop table calls for Hive managed tables where Hive 
> metastore warehouse directory is in encrypted zone. However even with the 
> feature in HDFS, Hive drop table currently fail:
> {noformat}
> $ hdfs crypto -listZones
> /apps/hive/warehouse  key2 
> $ hdfs dfs -ls /apps/hive/warehouse
> Found 1 items
> drwxrwxrwt   - hdfs hdfs  0 2016-09-01 02:54 
> /apps/hive/warehouse/.Trash
> hive> create table abc(a string, b int);
> OK
> Time taken: 5.538 seconds
> hive> dfs -ls /apps/hive/warehouse;
> Found 2 items
> drwxrwxrwt   - hdfs   hdfs  0 2016-09-01 02:54 
> /apps/hive/warehouse/.Trash
> drwxrwxrwx   - deepesh hdfs  0 2016-09-01 17:15 
> /apps/hive/warehouse/abc
> hive> drop table if exists abc;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
> default.abc because it is in an encryption zone and trash is enabled.  Use 
> PURGE option to skip trash.)
> {noformat}
> The problem lies here:
> {code:title=metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java}
> private void checkTrashPurgeCombination(Path pathToData, String objectName, 
> boolean ifPurge)
> ...
>   if (trashEnabled) {
> try {
>   HadoopShims.HdfsEncryptionShim shim =
> 
> ShimLoader.getHadoopShims().createHdfsEncryptionShim(FileSystem.get(hiveConf),
>  hiveConf);
>   if (shim.isPathEncrypted(pathToData)) {
> throw new MetaException("Unable to drop " + objectName + " 
> because it is in an encryption zone" +
>   " and trash is enabled.  Use PURGE option to skip trash.");
>   }
> } catch (IOException ex) {
>   MetaException e = new MetaException(ex.getMessage());
>   e.initCause(ex);
>   throw e;
> }
>   }
> {code}
> As we can see that we are making an assumption that delete wouldn't be 
> successful in encrypted zone. We need to modify this logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15437) avro tables join fails when - tbl join tbl_postfix

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752778#comment-15752778
 ] 

Hive QA commented on HIVE-15437:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843453/HIVE-15437.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10819 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_partitioned] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2597/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2597/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2597/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843453 - PreCommit-HIVE-Build

> avro tables join fails when - tbl join tbl_postfix
> --
>
> Key: HIVE-15437
> URL: https://issues.apache.org/jira/browse/HIVE-15437
> Project: Hive
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-15437.1.patch
>
>
> The following queries return good results:
> select * from table1 where col1=key1; 
> select * from table1_1 where col1=key1; 
> When join them together, it gets following error:
> {noformat}
> Caused by: java.io.IOException: org.apache.avro.AvroTypeException: Found 
> long, expecting union
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:43)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229)
>  ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:141)
>  ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}
> The two avro tables both is defined by using avro schema, and the first 
> table's name is the second table name's prefix. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15200) Support setOp in subQuery with parentheses

2016-12-15 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752759#comment-15752759
 ] 

Pengcheng Xiong commented on HIVE-15200:


[~ashutoshc], sounds like this patch works... let's wait for QA. Thanks.

> Support setOp in subQuery with parentheses
> --
>
> Key: HIVE-15200
> URL: https://issues.apache.org/jira/browse/HIVE-15200
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15200.01.patch, HIVE-15200.02.patch
>
>
> {code}
> explain select key from ((select key from src) union (select key from 
> src))subq;
> {code}
> will throw
> {code}
> FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' 
> 'select' in subquery source
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15200) Support setOp in subQuery with parentheses

2016-12-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15200:
---
Status: Patch Available  (was: Open)

> Support setOp in subQuery with parentheses
> --
>
> Key: HIVE-15200
> URL: https://issues.apache.org/jira/browse/HIVE-15200
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15200.01.patch, HIVE-15200.02.patch
>
>
> {code}
> explain select key from ((select key from src) union (select key from 
> src))subq;
> {code}
> will throw
> {code}
> FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' 
> 'select' in subquery source
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15200) Support setOp in subQuery with parentheses

2016-12-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15200:
---
Status: Open  (was: Patch Available)

> Support setOp in subQuery with parentheses
> --
>
> Key: HIVE-15200
> URL: https://issues.apache.org/jira/browse/HIVE-15200
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15200.01.patch, HIVE-15200.02.patch
>
>
> {code}
> explain select key from ((select key from src) union (select key from 
> src))subq;
> {code}
> will throw
> {code}
> FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' 
> 'select' in subquery source
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15200) Support setOp in subQuery with parentheses

2016-12-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15200:
---
Attachment: HIVE-15200.02.patch

> Support setOp in subQuery with parentheses
> --
>
> Key: HIVE-15200
> URL: https://issues.apache.org/jira/browse/HIVE-15200
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15200.01.patch, HIVE-15200.02.patch
>
>
> {code}
> explain select key from ((select key from src) union (select key from 
> src))subq;
> {code}
> will throw
> {code}
> FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' 
> 'select' in subquery source
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15331) Decimal multiplication with high precision/scale often returns NULL

2016-12-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752735#comment-15752735
 ] 

Sergey Shelukhin commented on HIVE-15331:
-

Decimal_precision:
{noformat}
-1234.56789012351524157.87532399036884525225
+1234.56789012351524157.87532399036884525
{noformat}

> Decimal multiplication with high precision/scale often returns NULL
> ---
>
> Key: HIVE-15331
> URL: https://issues.apache.org/jira/browse/HIVE-15331
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15331.1.patch, HIVE-15331.2.patch, 
> HIVE-15331.3.patch
>
>
> {noformat}
> create temporary table dec (a decimal(38,18));
> insert into dec values(100.0);
> hive> select a*a from dec;
> OK
> NULL
> Time taken: 0.165 seconds, Fetched: 1 row(s)
> {noformat}
> Looks like the reason is because the result of decimal(38,18) * 
> decimal(38,18) only has 2 digits of precision for integers:
> {noformat}
> hive> set hive.explain.user=false;
> hive> explain select a*a from dec;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: dec
>   Select Operator
> expressions: (a * a) (type: decimal(38,36))
> outputColumnNames: _col0
> ListSink
> Time taken: 0.039 seconds, Fetched: 15 row(s)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15411) ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES

2016-12-15 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752727#comment-15752727
 ] 

Anthony Hsu commented on HIVE-15411:


Test failures look unrelated. Same qtests have failed in other precommit builds 
around the same time.

> ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES
> ---
>
> Key: HIVE-15411
> URL: https://issues.apache.org/jira/browse/HIVE-15411
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15411.1.patch
>
>
> Currently, {{ALTER TABLE ... ADD PARTITION}} only lets you set the 
> partition's LOCATION but not its FILEFORMAT or SERDEPROPERTIES. In order to 
> change the FILEFORMAT or SERDEPROPERTIES, you have to issue two additional 
> calls to {{ALTER TABLE ... PARTITION ... SET FILEFORMAT}} and {{ALTER TABLE 
> ... PARTITION ... SET SERDEPROPERTIES}}. This is not atomic, and queries that 
> interleave the ALTER TABLE commands may fail.
> We should extend the grammar to support setting FILEFORMAT and 
> SERDEPROPERTIES atomically as part of the ADD PARTITION command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15331) Decimal multiplication with high precision/scale often returns NULL

2016-12-15 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752726#comment-15752726
 ] 

Jason Dere commented on HIVE-15331:
---

Which file in particular? These are all cases where the resulting 
precision/scale exceeds 38 - for example decimal_precision.q.out, the operation 
is decimal(20,10) * decimal(20,10), the precision/scale (assuming unlimited 
precision/scale) would be decimal(41,20), however we're already at the Hive 
limit of 38.
So this is where the updated logic would kick in terms of making the 
precision/scale fit within 38 digits.

> Decimal multiplication with high precision/scale often returns NULL
> ---
>
> Key: HIVE-15331
> URL: https://issues.apache.org/jira/browse/HIVE-15331
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15331.1.patch, HIVE-15331.2.patch, 
> HIVE-15331.3.patch
>
>
> {noformat}
> create temporary table dec (a decimal(38,18));
> insert into dec values(100.0);
> hive> select a*a from dec;
> OK
> NULL
> Time taken: 0.165 seconds, Fetched: 1 row(s)
> {noformat}
> Looks like the reason is because the result of decimal(38,18) * 
> decimal(38,18) only has 2 digits of precision for integers:
> {noformat}
> hive> set hive.explain.user=false;
> hive> explain select a*a from dec;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: dec
>   Select Operator
> expressions: (a * a) (type: decimal(38,36))
> outputColumnNames: _col0
> ListSink
> Time taken: 0.039 seconds, Fetched: 15 row(s)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-15 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Status: Patch Available  (was: Open)

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-15 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Attachment: HIVE-15376.8.patch

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-15 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Status: Open  (was: Patch Available)

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752662#comment-15752662
 ] 

Hive QA commented on HIVE-15376:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843450/HIVE-15376.7.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10818 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testLockTimeout (batchId=275)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2596/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2596/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2596/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843450 - PreCommit-HIVE-Build

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15331) Decimal multiplication with high precision/scale often returns NULL

2016-12-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752628#comment-15752628
 ] 

Sergey Shelukhin commented on HIVE-15331:
-

Looks good; it seems like it reduces precision even in cases where it's not 
needed though... is that intended? See out file changes, where last few 
(non-zero) digits of the result are rounded away

> Decimal multiplication with high precision/scale often returns NULL
> ---
>
> Key: HIVE-15331
> URL: https://issues.apache.org/jira/browse/HIVE-15331
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15331.1.patch, HIVE-15331.2.patch, 
> HIVE-15331.3.patch
>
>
> {noformat}
> create temporary table dec (a decimal(38,18));
> insert into dec values(100.0);
> hive> select a*a from dec;
> OK
> NULL
> Time taken: 0.165 seconds, Fetched: 1 row(s)
> {noformat}
> Looks like the reason is because the result of decimal(38,18) * 
> decimal(38,18) only has 2 digits of precision for integers:
> {noformat}
> hive> set hive.explain.user=false;
> hive> explain select a*a from dec;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: dec
>   Select Operator
> expressions: (a * a) (type: decimal(38,36))
> outputColumnNames: _col0
> ListSink
> Time taken: 0.039 seconds, Fetched: 15 row(s)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-15 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752616#comment-15752616
 ] 

Wei Zheng commented on HIVE-15376:
--

ctx is needed to set the heartbeater, just like what we did in 
acquireLocksWithHeartbeatDelay before

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15438) avrocountemptytbl.q should use SORT_QUERY_RESULTS

2016-12-15 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752574#comment-15752574
 ] 

Ashutosh Chauhan commented on HIVE-15438:
-

+1

> avrocountemptytbl.q should use SORT_QUERY_RESULTS
> -
>
> Key: HIVE-15438
> URL: https://issues.apache.org/jira/browse/HIVE-15438
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15438.1.patch
>
>
> In Hive 1.1.0, when building and testing using Java 1.8, I've noticed that 
> avrocountemptytbl.q due to ordering issues:
> {noformat}
> 57d56
> < 100
> 58a58
> > 100
> {noformat}
> This can be fixed by adding {{-- SORT_QUERY_RESULTS}} to the qtest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15331) Decimal multiplication with high precision/scale often returns NULL

2016-12-15 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752568#comment-15752568
 ] 

Jason Dere commented on HIVE-15331:
---

[~sershe] can you review?

> Decimal multiplication with high precision/scale often returns NULL
> ---
>
> Key: HIVE-15331
> URL: https://issues.apache.org/jira/browse/HIVE-15331
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15331.1.patch, HIVE-15331.2.patch, 
> HIVE-15331.3.patch
>
>
> {noformat}
> create temporary table dec (a decimal(38,18));
> insert into dec values(100.0);
> hive> select a*a from dec;
> OK
> NULL
> Time taken: 0.165 seconds, Fetched: 1 row(s)
> {noformat}
> Looks like the reason is because the result of decimal(38,18) * 
> decimal(38,18) only has 2 digits of precision for integers:
> {noformat}
> hive> set hive.explain.user=false;
> hive> explain select a*a from dec;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: dec
>   Select Operator
> expressions: (a * a) (type: decimal(38,36))
> outputColumnNames: _col0
> ListSink
> Time taken: 0.039 seconds, Fetched: 15 row(s)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15438) avrocountemptytbl.q should use SORT_QUERY_RESULTS

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752558#comment-15752558
 ] 

Hive QA commented on HIVE-15438:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843448/HIVE-15438.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10803 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=129)

[groupby6_map.q,groupby2_noskew_multi_distinct.q,load_dyn_part12.q,scriptfile1.q,join15.q,auto_join17.q,join_hive_626.q,tez_join_tests.q,auto_join21.q,join_view.q,join_cond_pushdown_4.q,vectorization_0.q,union_null.q,auto_join3.q,vectorization_decimal_date.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2595/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2595/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2595/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843448 - PreCommit-HIVE-Build

> avrocountemptytbl.q should use SORT_QUERY_RESULTS
> -
>
> Key: HIVE-15438
> URL: https://issues.apache.org/jira/browse/HIVE-15438
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15438.1.patch
>
>
> In Hive 1.1.0, when building and testing using Java 1.8, I've noticed that 
> avrocountemptytbl.q due to ordering issues:
> {noformat}
> 57d56
> < 100
> 58a58
> > 100
> {noformat}
> This can be fixed by adding {{-- SORT_QUERY_RESULTS}} to the qtest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15048) Update/Delete statement using wrong WriteEntity when subqueries are involved

2016-12-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15048:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

committed to master
thanks Alan for the review

> Update/Delete statement using wrong WriteEntity when subqueries are involved
> 
>
> Key: HIVE-15048
> URL: https://issues.apache.org/jira/browse/HIVE-15048
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-15048.01.patch, HIVE-15048.02.patch, 
> HIVE-15048.03.patch, HIVE-15048.04.patch
>
>
> See TestDbTxnManager2 for referenced methods
> {noformat}
> checkCmdOnDriver(driver.run("create table target (a int, b int) " +
>   "partitioned by (p int, q int) clustered by (a) into 2  buckets " +
>   "stored as orc TBLPROPERTIES ('transactional'='true')"));
> checkCmdOnDriver(driver.run("create table source (a1 int, b1 int, p1 int, 
> q1 int) clustered by (a1) into 2  buckets stored as orc TBLPROPERTIES 
> ('transactional'='true')"));
> checkCmdOnDriver(driver.run("insert into target partition(p,q) values 
> (1,2,1,2), (3,4,1,2), (5,6,1,3), (7,8,2,2)"));
> checkCmdOnDriver(driver.run(
>   "update source set b1 = 1 where p1 in (select t.q from target t where 
> t.p=2)"));
> {noformat}
> The last Update stmt creates the following Entity objects in the QueryPlan
> inputs: [default@source, default@target, default@target@p=2/q=2]
> outputs: [default@target@p=2/q=2]
> Which is clearly wrong for outputs - the target table is not even 
> partitioned(or called 'target').
> This happens in UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze()
> I suspect 
> update T ... where T.p IN (select d from T where ...) 
> type query would also get messed up (but not necessarily fail) if T is 
> partitioned and the subquery filters out some partitions but that does not 
> mean that the same partitions are filtered out in the parent query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-15 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752490#comment-15752490
 ] 

Jesus Camacho Rodriguez commented on HIVE-15277:


We should update the Druid integration wiki with information about new features 
introduced in this patch.

https://cwiki.apache.org/confluence/display/Hive/Druid+Integration

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Fix For: 2.2.0
>
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14474) Create datasource in Druid from Hive

2016-12-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14474:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

HIVE-15277 can create segments from Hive, thus it is more robust. Now that it 
has gone in, there is no need for this patch anymore. Closing as duplicate.

> Create datasource in Druid from Hive
> 
>
> Key: HIVE-14474
> URL: https://issues.apache.org/jira/browse/HIVE-14474
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, 
> HIVE-14474.03.patch, HIVE-14474.04.patch, HIVE-14474.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In the initial implementation proposed in this issue, we will write the 
> results of the query to HDFS (or the location specified in the CTAS 
> statement), and submit a HadoopIndexing task to the Druid overlord. The task 
> will contain the path where data was stored, it will read it and create the 
> segments in Druid. Once this is done, the results are removed from Hive.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "my_query_based_datasource")
> AS ;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'my_query_based_datasource'. One of the columns of the query 
> needs to be the time dimension, which is mandatory in Druid. In particular, 
> we use the same convention that it is used for Druid: there needs to be a the 
> column named '\_\_time' in the result of the executed query, which will act 
> as the time dimension column in Druid. Currently, the time column dimension 
> needs to be a 'timestamp' type column.
> This initial implementation interacts with Druid API as it is currently 
> exposed to the user. In a follow-up issue, we should propose an 
> implementation that integrates tighter with Druid. In particular, we would 
> like to store segments directly in Druid from Hive, thus avoiding the 
> overhead of writing Hive results to HDFS and then launching a MR job that 
> basically reads them again to create the segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-15303) Upgrade to Druid 0.9.2

2016-12-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-15303.

   Resolution: Fixed
Fix Version/s: 2.2.0

Pushed within HIVE-15277.

> Upgrade to Druid 0.9.2
> --
>
> Key: HIVE-15303
> URL: https://issues.apache.org/jira/browse/HIVE-15303
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: slim bouguerra
> Fix For: 2.2.0
>
>
> Upgrading to latest Druid release once it is done. HIVE-15277 has 
> dependencies on this new release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15277:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Test fails are unrelated.

Pushed to master, thanks [~bslim]!

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Fix For: 2.2.0
>
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-15 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752461#comment-15752461
 ] 

slim bouguerra commented on HIVE-15277:
---

Run some of the tests locally and they passed
https://gist.github.com/b-slim/dfa29b07ee901b5f0c8437975488436f


> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752412#comment-15752412
 ] 

Hive QA commented on HIVE-15277:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843449/HIVE-15277.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10817 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=92)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[5]
 (batchId=173)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[0]
 (batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2594/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2594/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2594/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843449 - PreCommit-HIVE-Build

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-15 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752397#comment-15752397
 ] 

Eugene Koifman commented on HIVE-15376:
---

DbTxnManager already has a "conf" reference.  Is there any other reason you 
need to pass Context in?

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15294) Capture additional metadata to replicate a simple insert at destination

2016-12-15 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752379#comment-15752379
 ] 

Vaibhav Gumashta commented on HIVE-15294:
-

[~thejas] Failures have been reported in the umbrella jira: HIVE-15058

> Capture additional metadata to replicate a simple insert at destination
> ---
>
> Key: HIVE-15294
> URL: https://issues.apache.org/jira/browse/HIVE-15294
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15294.1.patch, HIVE-15294.2.patch
>
>
> For replicating inserts like {{INSERT INTO ... SELECT ... FROM}}, we will 
> need to capture the newly added files in the notification message to be able 
> to replicate the event at destination. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14496) Enable Calcite rewriting with materialized views

2016-12-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14496:
---
Attachment: HIVE-14496.09.patch

> Enable Calcite rewriting with materialized views
> 
>
> Key: HIVE-14496
> URL: https://issues.apache.org/jira/browse/HIVE-14496
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14496.01.patch, HIVE-14496.02.patch, 
> HIVE-14496.03.patch, HIVE-14496.04.patch, HIVE-14496.05.patch, 
> HIVE-14496.07.patch, HIVE-14496.08.patch, HIVE-14496.09.patch, 
> HIVE-14496.patch
>
>
> Calcite already supports query rewriting using materialized views. We will 
> use it to support this feature in Hive.
> In order to do that, we need to register the existing materialized views with 
> Calcite view service and enable the materialized views rewriting rules. 
> We should include a HiveConf flag to completely disable query rewriting using 
> materialized views if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13278) Avoid FileNotFoundException when map/reduce.xml is not available

2016-12-15 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Attachment: HIVE-13278.4.patch

Thanks guys. Attaching patch v4 following [~lirui]'s idea. Please take a look.

> Avoid FileNotFoundException when map/reduce.xml is not available
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch, HIVE-13278.4.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15048) Update/Delete statement using wrong WriteEntity when subqueries are involved

2016-12-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752322#comment-15752322
 ] 

Alan Gates commented on HIVE-15048:
---

Sorry, I missed that the new method updateOutputs was actually a breaking up of 
analyzeMerge into multiple methods.  I was reading it as a whole new method.  
Ok, given that:

+1

> Update/Delete statement using wrong WriteEntity when subqueries are involved
> 
>
> Key: HIVE-15048
> URL: https://issues.apache.org/jira/browse/HIVE-15048
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-15048.01.patch, HIVE-15048.02.patch, 
> HIVE-15048.03.patch, HIVE-15048.04.patch
>
>
> See TestDbTxnManager2 for referenced methods
> {noformat}
> checkCmdOnDriver(driver.run("create table target (a int, b int) " +
>   "partitioned by (p int, q int) clustered by (a) into 2  buckets " +
>   "stored as orc TBLPROPERTIES ('transactional'='true')"));
> checkCmdOnDriver(driver.run("create table source (a1 int, b1 int, p1 int, 
> q1 int) clustered by (a1) into 2  buckets stored as orc TBLPROPERTIES 
> ('transactional'='true')"));
> checkCmdOnDriver(driver.run("insert into target partition(p,q) values 
> (1,2,1,2), (3,4,1,2), (5,6,1,3), (7,8,2,2)"));
> checkCmdOnDriver(driver.run(
>   "update source set b1 = 1 where p1 in (select t.q from target t where 
> t.p=2)"));
> {noformat}
> The last Update stmt creates the following Entity objects in the QueryPlan
> inputs: [default@source, default@target, default@target@p=2/q=2]
> outputs: [default@target@p=2/q=2]
> Which is clearly wrong for outputs - the target table is not even 
> partitioned(or called 'target').
> This happens in UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze()
> I suspect 
> update T ... where T.p IN (select d from T where ...) 
> type query would also get messed up (but not necessarily fail) if T is 
> partitioned and the subquery filters out some partitions but that does not 
> mean that the same partitions are filtered out in the parent query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15122) Hive: Upcasting types should not obscure stats (min/max/ndv)

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752287#comment-15752287
 ] 

Hive QA commented on HIVE-15122:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843446/HIVE-15122.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10818 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_casts]
 (batchId=152)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=216)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2593/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2593/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2593/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843446 - PreCommit-HIVE-Build

> Hive: Upcasting types should not obscure stats (min/max/ndv)
> 
>
> Key: HIVE-15122
> URL: https://issues.apache.org/jira/browse/HIVE-15122
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15122.patch
>
>
> A UDFToLong breaks PK/FK inferences and triggers mis-estimation of joins in 
> LLAP.
> Snippet from the bad plan.
> {code}
> | STAGE PLANS:
>   
>|
> |   Stage: Stage-1
>   
>|
> | Tez 
>   
>|
> |   DagId: hive_20161031222730_a700058f-78eb-40d6-a67d-43add60a50e2:6 
>   
>|
> |   Edges:
>   
>|
> | Map 2 <- Map 1 (BROADCAST_EDGE) 
>   
>|
> | Map 3 <- Map 2 (BROADCAST_EDGE) 
>   
>|
> | Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE), Map 7 
> (CUSTOM_SIMPLE_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)  
> |
> | Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
>   
>|
> | Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   
>|
> |   DagName:  
>  

[jira] [Updated] (HIVE-15437) avro tables join fails when - tbl join tbl_postfix

2016-12-15 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-15437:

Status: Patch Available  (was: Open)

Need code review.

> avro tables join fails when - tbl join tbl_postfix
> --
>
> Key: HIVE-15437
> URL: https://issues.apache.org/jira/browse/HIVE-15437
> Project: Hive
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-15437.1.patch
>
>
> The following queries return good results:
> select * from table1 where col1=key1; 
> select * from table1_1 where col1=key1; 
> When join them together, it gets following error:
> {noformat}
> Caused by: java.io.IOException: org.apache.avro.AvroTypeException: Found 
> long, expecting union
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:43)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229)
>  ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:141)
>  ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}
> The two avro tables both is defined by using avro schema, and the first 
> table's name is the second table name's prefix. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15437) avro tables join fails when - tbl join tbl_postfix

2016-12-15 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-15437:

Attachment: HIVE-15437.1.patch

The root cause is when getSchema for avro table, it compare splits path with 
table path by only matching same start string. This makes table1_1 got table1's 
schema, then cause the Exception later. 
Attach the patch to fix the issue.

> avro tables join fails when - tbl join tbl_postfix
> --
>
> Key: HIVE-15437
> URL: https://issues.apache.org/jira/browse/HIVE-15437
> Project: Hive
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-15437.1.patch
>
>
> The following queries return good results:
> select * from table1 where col1=key1; 
> select * from table1_1 where col1=key1; 
> When join them together, it gets following error:
> {noformat}
> Caused by: java.io.IOException: org.apache.avro.AvroTypeException: Found 
> long, expecting union
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>  ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>  ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:43)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>  ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229)
>  ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:141)
>  ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}
> The two avro tables both is defined by using avro schema, and the first 
> table's name is the second table name's prefix. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-15 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752189#comment-15752189
 ] 

Wei Zheng commented on HIVE-15376:
--

You're right. Hid the delay param to a test only visible method.

In startHeartbeat logic if the passed-in delay value is 0 we will always use 
half of HIVE_TXN_TIMEOUT. This is the same as before.

I've added isOpenTxn check in releaseLocks before quitting heartbeat.

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-15 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Status: Patch Available  (was: Open)

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-15 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Attachment: HIVE-15376.7.patch

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-15 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Status: Open  (was: Patch Available)

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-15 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15277:
--
Attachment: HIVE-15277.patch

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15438) avrocountemptytbl.q should use SORT_QUERY_RESULTS

2016-12-15 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15438:
---
Status: Patch Available  (was: Open)

Uploaded patch.

> avrocountemptytbl.q should use SORT_QUERY_RESULTS
> -
>
> Key: HIVE-15438
> URL: https://issues.apache.org/jira/browse/HIVE-15438
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15438.1.patch
>
>
> In Hive 1.1.0, when building and testing using Java 1.8, I've noticed that 
> avrocountemptytbl.q due to ordering issues:
> {noformat}
> 57d56
> < 100
> 58a58
> > 100
> {noformat}
> This can be fixed by adding {{-- SORT_QUERY_RESULTS}} to the qtest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15438) avrocountemptytbl.q should use SORT_QUERY_RESULTS

2016-12-15 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15438:
---
Attachment: HIVE-15438.1.patch

> avrocountemptytbl.q should use SORT_QUERY_RESULTS
> -
>
> Key: HIVE-15438
> URL: https://issues.apache.org/jira/browse/HIVE-15438
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15438.1.patch
>
>
> In Hive 1.1.0, when building and testing using Java 1.8, I've noticed that 
> avrocountemptytbl.q due to ordering issues:
> {noformat}
> 57d56
> < 100
> 58a58
> > 100
> {noformat}
> This can be fixed by adding {{-- SORT_QUERY_RESULTS}} to the qtest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15409) Add support for GROUPING function with grouping sets

2016-12-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15409:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for reviewing [~ashutoshc]!

> Add support for GROUPING function with grouping sets
> 
>
> Key: HIVE-15409
> URL: https://issues.apache.org/jira/browse/HIVE-15409
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 2.2.0
>
> Attachments: HIVE-15409.01.patch, HIVE-15409.02.patch, 
> HIVE-15409.patch
>
>
> The _grouping(col_expr)_ function indicates whether a given column is 
> aggregated in each row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15122) Hive: Upcasting types should not obscure stats (min/max/ndv)

2016-12-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15122:
---
Attachment: HIVE-15122.patch

> Hive: Upcasting types should not obscure stats (min/max/ndv)
> 
>
> Key: HIVE-15122
> URL: https://issues.apache.org/jira/browse/HIVE-15122
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15122.patch
>
>
> A UDFToLong breaks PK/FK inferences and triggers mis-estimation of joins in 
> LLAP.
> Snippet from the bad plan.
> {code}
> | STAGE PLANS:
>   
>|
> |   Stage: Stage-1
>   
>|
> | Tez 
>   
>|
> |   DagId: hive_20161031222730_a700058f-78eb-40d6-a67d-43add60a50e2:6 
>   
>|
> |   Edges:
>   
>|
> | Map 2 <- Map 1 (BROADCAST_EDGE) 
>   
>|
> | Map 3 <- Map 2 (BROADCAST_EDGE) 
>   
>|
> | Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE), Map 7 
> (CUSTOM_SIMPLE_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)  
> |
> | Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
>   
>|
> | Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   
>|
> |   DagName:  
>   
>|
> |   Vertices: 
>   
>|
> | Map 1   
>   
>|
> | Map Operator Tree:  
>   
>|
> | TableScan   
>   
>|
> |   alias: supplier   
>   
>|
> |   filterExpr: (s_suppkey is not null and s_nationkey is not 
> null) (type: boolean) 
>|
> |   Statistics: Num rows: 1000 Data size: 16000 Basic 
> stats: COMPLETE Column stats: COMPLETE
>|
> |   Filter Operator   
>   
>|
> | predicate: (s_suppkey is not null and s_nationkey is 
> not null) (type: boolean) 
>   |
> | Statistics: Num rows: 1000 Data size: 16000 
> Basic stats: COMPLETE Column stats: COMPLETE  
>|
> | Select Operator 
>   
>|
> |   expressions: s_suppkey (type: bigint), s_nationkey 
> (type: bigint)
>   |
> |   

[jira] [Assigned] (HIVE-15122) Hive: Upcasting types should not obscure stats (min/max/ndv)

2016-12-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-15122:
--

Assignee: Jesus Camacho Rodriguez

> Hive: Upcasting types should not obscure stats (min/max/ndv)
> 
>
> Key: HIVE-15122
> URL: https://issues.apache.org/jira/browse/HIVE-15122
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15122.patch
>
>
> A UDFToLong breaks PK/FK inferences and triggers mis-estimation of joins in 
> LLAP.
> Snippet from the bad plan.
> {code}
> | STAGE PLANS:
>   
>|
> |   Stage: Stage-1
>   
>|
> | Tez 
>   
>|
> |   DagId: hive_20161031222730_a700058f-78eb-40d6-a67d-43add60a50e2:6 
>   
>|
> |   Edges:
>   
>|
> | Map 2 <- Map 1 (BROADCAST_EDGE) 
>   
>|
> | Map 3 <- Map 2 (BROADCAST_EDGE) 
>   
>|
> | Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE), Map 7 
> (CUSTOM_SIMPLE_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)  
> |
> | Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
>   
>|
> | Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   
>|
> |   DagName:  
>   
>|
> |   Vertices: 
>   
>|
> | Map 1   
>   
>|
> | Map Operator Tree:  
>   
>|
> | TableScan   
>   
>|
> |   alias: supplier   
>   
>|
> |   filterExpr: (s_suppkey is not null and s_nationkey is not 
> null) (type: boolean) 
>|
> |   Statistics: Num rows: 1000 Data size: 16000 Basic 
> stats: COMPLETE Column stats: COMPLETE
>|
> |   Filter Operator   
>   
>|
> | predicate: (s_suppkey is not null and s_nationkey is 
> not null) (type: boolean) 
>   |
> | Statistics: Num rows: 1000 Data size: 16000 
> Basic stats: COMPLETE Column stats: COMPLETE  
>|
> | Select Operator 
>   
>|
> |   expressions: s_suppkey (type: bigint), s_nationkey 
> (type: bigint)
>   |
> |   

[jira] [Work started] (HIVE-15122) Hive: Upcasting types should not obscure stats (min/max/ndv)

2016-12-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15122 started by Jesus Camacho Rodriguez.
--
> Hive: Upcasting types should not obscure stats (min/max/ndv)
> 
>
> Key: HIVE-15122
> URL: https://issues.apache.org/jira/browse/HIVE-15122
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15122.patch
>
>
> A UDFToLong breaks PK/FK inferences and triggers mis-estimation of joins in 
> LLAP.
> Snippet from the bad plan.
> {code}
> | STAGE PLANS:
>   
>|
> |   Stage: Stage-1
>   
>|
> | Tez 
>   
>|
> |   DagId: hive_20161031222730_a700058f-78eb-40d6-a67d-43add60a50e2:6 
>   
>|
> |   Edges:
>   
>|
> | Map 2 <- Map 1 (BROADCAST_EDGE) 
>   
>|
> | Map 3 <- Map 2 (BROADCAST_EDGE) 
>   
>|
> | Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE), Map 7 
> (CUSTOM_SIMPLE_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)  
> |
> | Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
>   
>|
> | Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   
>|
> |   DagName:  
>   
>|
> |   Vertices: 
>   
>|
> | Map 1   
>   
>|
> | Map Operator Tree:  
>   
>|
> | TableScan   
>   
>|
> |   alias: supplier   
>   
>|
> |   filterExpr: (s_suppkey is not null and s_nationkey is not 
> null) (type: boolean) 
>|
> |   Statistics: Num rows: 1000 Data size: 16000 Basic 
> stats: COMPLETE Column stats: COMPLETE
>|
> |   Filter Operator   
>   
>|
> | predicate: (s_suppkey is not null and s_nationkey is 
> not null) (type: boolean) 
>   |
> | Statistics: Num rows: 1000 Data size: 16000 
> Basic stats: COMPLETE Column stats: COMPLETE  
>|
> | Select Operator 
>   
>|
> |   expressions: s_suppkey (type: bigint), s_nationkey 
> (type: bigint)
>   |
> |   outputColumnNames: _col0, 

[jira] [Updated] (HIVE-15122) Hive: Upcasting types should not obscure stats (min/max/ndv)

2016-12-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15122:
---
Status: Patch Available  (was: In Progress)

> Hive: Upcasting types should not obscure stats (min/max/ndv)
> 
>
> Key: HIVE-15122
> URL: https://issues.apache.org/jira/browse/HIVE-15122
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15122.patch
>
>
> A UDFToLong breaks PK/FK inferences and triggers mis-estimation of joins in 
> LLAP.
> Snippet from the bad plan.
> {code}
> | STAGE PLANS:
>   
>|
> |   Stage: Stage-1
>   
>|
> | Tez 
>   
>|
> |   DagId: hive_20161031222730_a700058f-78eb-40d6-a67d-43add60a50e2:6 
>   
>|
> |   Edges:
>   
>|
> | Map 2 <- Map 1 (BROADCAST_EDGE) 
>   
>|
> | Map 3 <- Map 2 (BROADCAST_EDGE) 
>   
>|
> | Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE), Map 7 
> (CUSTOM_SIMPLE_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)  
> |
> | Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
>   
>|
> | Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   
>|
> |   DagName:  
>   
>|
> |   Vertices: 
>   
>|
> | Map 1   
>   
>|
> | Map Operator Tree:  
>   
>|
> | TableScan   
>   
>|
> |   alias: supplier   
>   
>|
> |   filterExpr: (s_suppkey is not null and s_nationkey is not 
> null) (type: boolean) 
>|
> |   Statistics: Num rows: 1000 Data size: 16000 Basic 
> stats: COMPLETE Column stats: COMPLETE
>|
> |   Filter Operator   
>   
>|
> | predicate: (s_suppkey is not null and s_nationkey is 
> not null) (type: boolean) 
>   |
> | Statistics: Num rows: 1000 Data size: 16000 
> Basic stats: COMPLETE Column stats: COMPLETE  
>|
> | Select Operator 
>   
>|
> |   expressions: s_suppkey (type: bigint), s_nationkey 
> (type: bigint)
>   |
> |   

[jira] [Commented] (HIVE-15409) Add support for GROUPING function with grouping sets

2016-12-15 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751966#comment-15751966
 ] 

Ashutosh Chauhan commented on HIVE-15409:
-

+1

> Add support for GROUPING function with grouping sets
> 
>
> Key: HIVE-15409
> URL: https://issues.apache.org/jira/browse/HIVE-15409
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15409.01.patch, HIVE-15409.02.patch, 
> HIVE-15409.patch
>
>
> The _grouping(col_expr)_ function indicates whether a given column is 
> aggregated in each row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Avoid FileNotFoundException when map/reduce.xml is not available

2016-12-15 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751899#comment-15751899
 ] 

Xuefu Zhang commented on HIVE-13278:


Yeah, let's try [~lirui]'s idea to cover more cases if possible. It's important 
to note that the current patch is at least improving Hive (might incomplete) 
and does no harm.

> Avoid FileNotFoundException when map/reduce.xml is not available
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14496) Enable Calcite rewriting with materialized views

2016-12-15 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751891#comment-15751891
 ] 

Ashutosh Chauhan commented on HIVE-14496:
-

In metastore/if/hive_metastore.thrift can you declare rewrite_enabled as 
optional and have that declaration at last in the struct.
+1 with that change, pending QA run.

> Enable Calcite rewriting with materialized views
> 
>
> Key: HIVE-14496
> URL: https://issues.apache.org/jira/browse/HIVE-14496
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14496.01.patch, HIVE-14496.02.patch, 
> HIVE-14496.03.patch, HIVE-14496.04.patch, HIVE-14496.05.patch, 
> HIVE-14496.07.patch, HIVE-14496.08.patch, HIVE-14496.patch
>
>
> Calcite already supports query rewriting using materialized views. We will 
> use it to support this feature in Hive.
> In order to do that, we need to register the existing materialized views with 
> Calcite view service and enable the materialized views rewriting rules. 
> We should include a HiveConf flag to completely disable query rewriting using 
> materialized views if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15425) Eliminate conflicting output from schematool's table validator.

2016-12-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751887#comment-15751887
 ] 

Aihua Xu commented on HIVE-15425:
-

Looks good to me. +1

> Eliminate conflicting output from schematool's table validator.
> ---
>
> Key: HIVE-15425
> URL: https://issues.apache.org/jira/browse/HIVE-15425
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15425.patch, HIVE-15425.patch
>
>
> Running the schemaTool's validate command against Derby DB yields in the 
> following output.
> {code}
> Validating tables in the schema for version 2.2.0
> Expected (from schema definition) 57 tables, Found (from HMS metastore) 58 
> tables
> Schema table validation successful
> {code}
> The output above creates some confusion when there are extra tables (not part 
> of hive schema) in the database. The intention was to report the total tables 
> found and did not expect the schema namespace to contain additional tables. 
> Even as the validation is successful, the output is confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-15 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751777#comment-15751777
 ] 

Jesus Camacho Rodriguez commented on HIVE-15277:


[~bslim], some of the test failures (age=1) seem related to this patch. Could 
you take a look?

Change in _LineageLogger_ causes changes in lineage golden files. It is better 
to tackle that in a follow-up and remove it from this patch, as it is not part 
of this issue.

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, 
> file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751741#comment-15751741
 ] 

Hive QA commented on HIVE-15277:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843418/HIVE-15277.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10814 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_lineage2]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage2] 
(batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2591/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2591/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2591/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843418 - PreCommit-HIVE-Build

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, 
> file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15434) Add UDF to allow interrogation of uniontype values

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751619#comment-15751619
 ] 

Hive QA commented on HIVE-15434:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843408/HIVE-15434.01.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10837 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] 
(batchId=66)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2590/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2590/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2590/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843408 - PreCommit-HIVE-Build

> Add UDF to allow interrogation of uniontype values
> --
>
> Key: HIVE-15434
> URL: https://issues.apache.org/jira/browse/HIVE-15434
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: David Maughan
>Assignee: David Maughan
> Attachments: HIVE-15434.01.patch
>
>
> h2. Overview
> As stated in the documention:
> {quote}
> UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
> 0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
> Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
> BY clauses will fail, and Hive does not define syntax to extract the tag or 
> value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
> look-at-only.
> {quote}
> It is essential to have a usable uniontype. Until full support is added to 
> Hive users should at least have the ability to inspect and extract values for 
> further comparison or transformation.
> h2. Proposal
> I propose to add a GenericUDF that has 2 modes of operation. Consider the 
> following schema and data that contains a union:
> Schema:
> {code}
> struct>
> {code}
> Query:
> {code}
> hive> select field1 from thing;
> {0:0}
> {1:"one"}
> {code}
> h4. Explode to Struct
> This method will recursively convert all unions within the type to structs 
> with fields named {{tag_n}}, {{n}} being the tag number. Only the {{tag_*}} 
> field that matches the tag of the union will be populated with the value. In 
> the case above the schema of field1 will be converted to:
> {code}
> struct
> {code}
> {code}
> hive> select extract_union(field1) from thing;
> {"tag_0":0,"tag_1":null}
> {"tag_0":null,"tag_1":one}
> {code}
> {code}
> hive> select extract_union(field1).tag_0 from thing;
> 0
> null
> {code}
> h4. Extract the specified tag
> This method will simply extract the value of the specified tag. If the tag 
> number matches then the value is returned, if it does not, then null is 
> returned.
> {code}
> hive> select extract_union(field1, 0) from thing;
> 0
> null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14735) Build Infra: Spark artifacts download takes a long time

2016-12-15 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751612#comment-15751612
 ] 

Zoltan Haindrich commented on HIVE-14735:
-

Hello [~spena], thank you for taking a look! :)

* skipSparkAssemblyDeploy - there is a single leftover setting of this variable 
to true - sorry for it: it was part of the previous patch version; i'll remove 
it...because now its not neccessary as the thirdparty project does the 
unpacking - it will skip even downloading/unpacking if the tests are being 
skipped

* in its current form the publish doesnt work; because it tries to use my own 
private server - in its current form gradle can upload the artifacts using ssh 
access to any host - to make it work with another server; both of the rxd.hu 
references should be changed.

gradle / etc topic:

* the simplest would be to move this gradle project outside the project...into 
a custom repo; and place pointers in the readme file to it.
* if the spark project would be willing to publish the 'spark-without-hive' 
artifact  as a zip into the central maven repo - that would make this whole 
gradle/etc thing unneccessary ; but in this case they would need to publish 
this new artifact for spark-2.0.0 - because hive currently uses that version - 
this has other "+" sides to. as it doesnt need an extra repository declaration.
* I will look into alternatives...possibly using maven...or some shell scripts 
to achieve the same results as with gradle...

[~spena] which one of the above would you prefer ?



> Build Infra: Spark artifacts download takes a long time
> ---
>
> Key: HIVE-14735
> URL: https://issues.apache.org/jira/browse/HIVE-14735
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Vaibhav Gumashta
>Assignee: Zoltan Haindrich
> Attachments: HIVE-14735.1.patch, HIVE-14735.1.patch, 
> HIVE-14735.1.patch, HIVE-14735.1.patch, HIVE-14735.2.patch, HIVE-14735.3.patch
>
>
> In particular this command:
> {{curl -Sso ./../thirdparty/spark-1.6.0-bin-hadoop2-without-hive.tgz 
> http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.6.0-bin-hadoop2-without-hive.tgz}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14998) Fix and update test: TestPluggableHiveSessionImpl

2016-12-15 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751549#comment-15751549
 ] 

Zoltan Haindrich commented on HIVE-14998:
-

Thank you for the review this change [~ashutoshc]!

> Fix and update test: TestPluggableHiveSessionImpl
> -
>
> Key: HIVE-14998
> URL: https://issues.apache.org/jira/browse/HIVE-14998
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Fix For: 2.2.0
>
> Attachments: HIVE-14998.1.patch, HIVE-14998.2.patch
>
>
> this test either prints an exception to the stdout ... or not - in its 
> current form it doesn't really usefull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15383) Add additional info to 'desc function extended' output

2016-12-15 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15383:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Yongzhi for reviewing.

> Add additional info to 'desc function extended' output
> --
>
> Key: HIVE-15383
> URL: https://issues.apache.org/jira/browse/HIVE-15383
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-15383.1.patch, HIVE-15383.2.patch, 
> HIVE-15383.3.patch
>
>
> Add additional info to the output to 'desc function extended'. The resources 
> would be helpful for the user to check which jars are referred.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751519#comment-15751519
 ] 

Hive QA commented on HIVE-15335:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843373/HIVE-15335.093.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10901 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2589/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2589/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2589/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843373 - PreCommit-HIVE-Build

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14960) Improve the stability of TestNotificationListener

2016-12-15 Thread Marta Kuczora (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751515#comment-15751515
 ] 

Marta Kuczora commented on HIVE-14960:
--

Thanks a lot for committing the patch.

> Improve the stability of TestNotificationListener
> -
>
> Key: HIVE-14960
> URL: https://issues.apache.org/jira/browse/HIVE-14960
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 2.1.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14960.2.patch, HIVE-14960.patch
>
>
> The TestNotificationListener.testAMQListener test case fails occasionally 
> with the following error:
> {noformat}
> Error Message
> expected:<[CREATE_DATABASE, CREATE_TABLE, ADD_PARTITION, ALTER_PARTITION, 
> DROP_PARTITION, ALTER_TABLE, DROP_TABLE, DROP_DATABASE]> but 
> was:<[CREATE_DATABASE, CREATE_TABLE, ADD_PARTITION, ALTER_PARTITION, 
> DROP_PARTITION, ALTER_TABLE, DROP_TABLE]>
> Stacktrace
> java.lang.AssertionError: expected:<[CREATE_DATABASE, CREATE_TABLE, 
> ADD_PARTITION, ALTER_PARTITION, DROP_PARTITION, ALTER_TABLE, DROP_TABLE, 
> DROP_DATABASE]> but was:<[CREATE_DATABASE, CREATE_TABLE, ADD_PARTITION, 
> ALTER_PARTITION, DROP_PARTITION, ALTER_TABLE, DROP_TABLE]>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hive.hcatalog.listener.TestNotificationListener.tearDown(TestNotificationListener.java:114)
> {noformat}
> This error can happen if the testAMQListener method is completed before the 
> last DROP_TABLE message got processed and put to the actualMessages list by 
> the onMessage method. This can happen if there is a small delay in receiving 
> the message, since the message receiving is not synchronized with the 
> testAMQListener method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15383) Add additional info to 'desc function extended' output

2016-12-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751495#comment-15751495
 ] 

Aihua Xu commented on HIVE-15383:
-

Those test failures don't seem related.

> Add additional info to 'desc function extended' output
> --
>
> Key: HIVE-15383
> URL: https://issues.apache.org/jira/browse/HIVE-15383
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: HIVE-15383.1.patch, HIVE-15383.2.patch, 
> HIVE-15383.3.patch
>
>
> Add additional info to the output to 'desc function extended'. The resources 
> would be helpful for the user to check which jars are referred.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >