[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769383#comment-15769383
 ] 

Hive QA commented on HIVE-15335:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844322/HIVE-15335.0992.patch

{color:green}SUCCESS:{color} +1 due to 15 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10850 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_char_mapjoin1.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=122)

[groupby3_map.q,union11.q,union26.q,mapreduce1.q,mapjoin_addjar.q,bucket_map_join_spark1.q,udf_example_add.q,multi_insert_with_join.q,sample7.q,auto_join_nulls.q,ppd_outer_join4.q,load_dyn_part8.q,sample6.q,bucket_map_join_1.q,auto_sortmerge_join_9.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_expressions]
 (batchId=48)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_expressions]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2690/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2690/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2690/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844322 - PreCommit-HIVE-Build

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch, 
> HIVE-15335.099.patch, HIVE-15335.0991.patch, HIVE-15335.0992.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8373) OOM for a simple query with spark.master=local [Spark Branch]

2016-12-21 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769353#comment-15769353
 ] 

liyunzhang_intel commented on HIVE-8373:


[~lirui]: yes, this problem only happens on jdk7.
 after upgrading to jdk8,  do not need to increase the value of MaxPermSize. so 
my question is : is maxPermSize increased by default in jdk8?


> OOM for a simple query with spark.master=local [Spark Branch]
> -
>
> Key: HIVE-8373
> URL: https://issues.apache.org/jira/browse/HIVE-8373
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: liyunzhang_intel
>
> I have a straigh forward query to run in Spark local mode, but get an OOM 
> even though the data volumn is tiny:
> {code}
> Exception in thread "Spark Context Cleaner" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Spark Context Cleaner"
> Exception in thread "Executor task launch worker-1" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Executor task launch worker-1"
> Exception in thread "Keep-Alive-Timer" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Keep-Alive-Timer"
> Exception in thread "Driver Heartbeater" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Driver Heartbeater"
> {code}
> The query is:
> {code}
> select product_name, avg(item_price) as avg_price from product join item on 
> item.product_pk=product.product_pk group by product_name order by avg_price;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8373) OOM for a simple query with spark.master=local [Spark Branch]

2016-12-21 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769309#comment-15769309
 ] 

Rui Li commented on HIVE-8373:
--

Thanks [~kellyzly] for working on this. Have you verified the OOM is related to 
perm size? If so, yes let's document this in our wiki. And I suppose it's only 
needed for JDK7 right?

> OOM for a simple query with spark.master=local [Spark Branch]
> -
>
> Key: HIVE-8373
> URL: https://issues.apache.org/jira/browse/HIVE-8373
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: liyunzhang_intel
>
> I have a straigh forward query to run in Spark local mode, but get an OOM 
> even though the data volumn is tiny:
> {code}
> Exception in thread "Spark Context Cleaner" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Spark Context Cleaner"
> Exception in thread "Executor task launch worker-1" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Executor task launch worker-1"
> Exception in thread "Keep-Alive-Timer" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Keep-Alive-Timer"
> Exception in thread "Driver Heartbeater" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Driver Heartbeater"
> {code}
> The query is:
> {code}
> select product_name, avg(item_price) as avg_price from product join item on 
> item.product_pk=product.product_pk group by product_name order by avg_price;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8373) OOM for a simple query with spark.master=local [Spark Branch]

2016-12-21 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel reassigned HIVE-8373:
--

Assignee: liyunzhang_intel

> OOM for a simple query with spark.master=local [Spark Branch]
> -
>
> Key: HIVE-8373
> URL: https://issues.apache.org/jira/browse/HIVE-8373
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: liyunzhang_intel
>
> I have a straigh forward query to run in Spark local mode, but get an OOM 
> even though the data volumn is tiny:
> {code}
> Exception in thread "Spark Context Cleaner" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Spark Context Cleaner"
> Exception in thread "Executor task launch worker-1" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Executor task launch worker-1"
> Exception in thread "Keep-Alive-Timer" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Keep-Alive-Timer"
> Exception in thread "Driver Heartbeater" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Driver Heartbeater"
> {code}
> The query is:
> {code}
> select product_name, avg(item_price) as avg_price from product join item on 
> item.product_pk=product.product_pk group by product_name order by avg_price;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8373) OOM for a simple query with spark.master=local [Spark Branch]

2016-12-21 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769284#comment-15769284
 ] 

liyunzhang_intel commented on HIVE-8373:


[~xuefuz] and [~lirui]:  i have run spark.master=local  successfully to avoid 
"OOM" exception by appending following in conf/hive-env.sh
{code}
 export HADOOP_OPTS="$HADOOP_OPTS -XX:MaxPermSize=128m"
{code}
Refer the 
[answer|http://stackoverflow.com/questions/34476195/why-does-creating-hivecontext-fail-with-java-lang-outofmemoryerror-permgen-spa]
 on stackoverflow to solve the OOM exception.

Do we need to increase the value of "MaxPermSize" to make hos run successfully 
on local mode? if yes, we should add this on wiki.

> OOM for a simple query with spark.master=local [Spark Branch]
> -
>
> Key: HIVE-8373
> URL: https://issues.apache.org/jira/browse/HIVE-8373
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Xuefu Zhang
>
> I have a straigh forward query to run in Spark local mode, but get an OOM 
> even though the data volumn is tiny:
> {code}
> Exception in thread "Spark Context Cleaner" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Spark Context Cleaner"
> Exception in thread "Executor task launch worker-1" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Executor task launch worker-1"
> Exception in thread "Keep-Alive-Timer" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Keep-Alive-Timer"
> Exception in thread "Driver Heartbeater" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Driver Heartbeater"
> {code}
> The query is:
> {code}
> select product_name, avg(item_price) as avg_price from product join item on 
> item.product_pk=product.product_pk group by product_name order by avg_price;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15477) Provide options to adjust filter stats when column stats are not available

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769252#comment-15769252
 ] 

Hive QA commented on HIVE-15477:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844319/HIVE-15477.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 444 failed/errored test(s), 10807 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=218)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[allcolref_in_udf] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_filter] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_join_pkfk]
 (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join0] (batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join11] (batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join12] (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join13] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join14] (batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join16] (batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join20] (batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join21] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join23] (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join27] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join28] (batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join29] (batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join33] (batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join4] (batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join5] (batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join6] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join7] (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join8] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask]
 (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_10] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join0] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_outer_join_ppr] 
(batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer10] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer13] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer8] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer9] 
(batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas_colname] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[distinct_windowing_no_cbo]
 (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynamic_rdd_cache] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[except_all] (batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[filter_cond_pushdown] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[folder_predicate] 
(batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fouter_join_ppr] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[gby_star] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_multi_single_reducer2]
 (batchId=18)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_multi_single_reducer]
 (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_position] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[having2] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[having] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto] (batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables_compact]
 (batchId=33)

[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769145#comment-15769145
 ] 

Hive QA commented on HIVE-15376:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844320/HIVE-15376.14.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10807 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=134)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2688/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2688/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2688/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844320 - PreCommit-HIVE-Build

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.10.patch, 
> HIVE-15376.11.patch, HIVE-15376.12.patch, HIVE-15376.13.patch, 
> HIVE-15376.14.patch, HIVE-15376.2.patch, HIVE-15376.3.patch, 
> HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch, 
> HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769070#comment-15769070
 ] 

Hive QA commented on HIVE-15491:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844316/HIVE-15491.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10777 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_char_mapjoin1.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=216)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2687/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2687/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2687/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844316 - PreCommit-HIVE-Build

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15491.patch
>
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15471) LLAP UI: NPE when getting thread metrics

2016-12-21 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15471:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~gopalv]. Committed to master.

> LLAP UI: NPE when getting thread metrics
> 
>
> Key: HIVE-15471
> URL: https://issues.apache.org/jira/browse/HIVE-15471
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15471.1.patch, HIVE-15471.2.patch
>
>
> When tasks are interrupted/killed in the middle of the job in LLAP, 
> {{LlapDaemonExecutorMetrics.updateThreadMetrics}} could end up throwing NPE.
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.llap.metrics.LlapDaemonExecutorMetrics.updateThreadMetrics(LlapDaemonExecutorMetrics.java:307)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.metrics.LlapDaemonExecutorMetrics.getExecutorStats(LlapDaemonExecutorMetrics.java:269)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.metrics.LlapDaemonExecutorMetrics.getMetrics(LlapDaemonExecutorMetrics.java:187)
>  ~[hive-llap-server-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:194)
>  ~[hadoop-common-2.7.1.jar:?]
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
>  ~[hadoop-common-2.7.1.jar:?]
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
>  ~[hadoop-common-2.7.1.jar:?]
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(DefaultMBeanServerInterceptor.java:1378)
>  ~[?:1.8.0_77]
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.getMBeanInfo(JmxMBeanServer.java:920) 
> ~[?:1.8.0_77]
>   at 
> org.apache.hive.http.JMXJsonServlet.listBeans(JMXJsonServlet.java:228) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at org.apache.hive.http.JMXJsonServlet.doGet(JMXJsonServlet.java:194) 
> ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15489) Alternatively use table scan stats for HoS

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768928#comment-15768928
 ] 

Hive QA commented on HIVE-15489:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844312/HIVE-15489.wip.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 45 failed/errored test(s), 10807 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_inner_join]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join1]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join2]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join3]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join4]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join5]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join13] 
(batchId=128)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join22] 
(batchId=117)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join27] 
(batchId=132)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join2] 
(batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_filters] 
(batchId=119)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_stats2] 
(batchId=131)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_stats] 
(batchId=114)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_without_localtask]
 (batchId=94)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_10]
 (batchId=125)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_12]
 (batchId=108)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_6]
 (batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_9]
 (batchId=122)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_spark4]
 (batchId=94)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_tez2]
 (batchId=99)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[cross_product_check_2]
 (batchId=132)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[identity_project_remove_skip]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join28] 
(batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join32] 
(batchId=102)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join32_lessSize] 
(batchId=97)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join33] 
(batchId=101)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join34] 
(batchId=124)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join35] 
(batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join_star] 
(batchId=107)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_mapjoin] 
(batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_subquery2] 
(batchId=97)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_subquery] 
(batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[multi_join_union] 
(batchId=94)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[runtime_skewjoin_mapjoin_spark]
 (batchId=117)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_17] 
(batchId=96)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_25] 
(batchId=98)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multiinsert]
 (batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union22] 
(batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_left_outer_join]
 (batchId=104)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=128)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_nested_mapjoin]
 (batchId=101)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2686/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2686/console
Test logs: 

[jira] [Commented] (HIVE-15487) LLAP: Improvements to random selection while scheduling

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768859#comment-15768859
 ] 

Hive QA commented on HIVE-15487:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844309/HIVE-15487.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10808 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=214)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2685/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2685/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2685/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844309 - PreCommit-HIVE-Build

> LLAP: Improvements to random selection while scheduling
> ---
>
> Key: HIVE-15487
> URL: https://issues.apache.org/jira/browse/HIVE-15487
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15487.1.patch
>
>
> Currently llap scheduler, picks up random host when no locality information 
> is specified or when all requested hosts are busy serving other requests with 
> forced locality. In such cases, we can pick up the next available node in 
> consistent order to get better locality instead of random selection. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15112) Implement Parquet vectorization reader for Struct type

2016-12-21 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768803#comment-15768803
 ] 

Ferdinand Xu commented on HIVE-15112:
-

Hi [~csun],  can you take a look at the latest patch? Just rename 
TestVectorizedColumnReaderBase -> VectorizedColumnReaderTestBase since Java 
file with prefix "Test" will be treated as an UT.

> Implement Parquet vectorization reader for Struct type
> --
>
> Key: HIVE-15112
> URL: https://issues.apache.org/jira/browse/HIVE-15112
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-15112.1.patch, HIVE-15112.2.patch, 
> HIVE-15112.3.patch, HIVE-15112.patch
>
>
> Like HIVE-14815, we need support Parquet vectorized reader for struct type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15112) Implement Parquet vectorization reader for Struct type

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768779#comment-15768779
 ] 

Hive QA commented on HIVE-15112:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844234/HIVE-15112.3.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10800 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[1]
 (batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2684/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2684/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2684/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844234 - PreCommit-HIVE-Build

> Implement Parquet vectorization reader for Struct type
> --
>
> Key: HIVE-15112
> URL: https://issues.apache.org/jira/browse/HIVE-15112
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-15112.1.patch, HIVE-15112.2.patch, 
> HIVE-15112.3.patch, HIVE-15112.patch
>
>
> Like HIVE-14815, we need support Parquet vectorized reader for struct type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14707) ACID: Insert shuffle sort-merges on blank KEY

2016-12-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14707:
--
Status: Patch Available  (was: Open)

> ACID: Insert shuffle sort-merges on blank KEY
> -
>
> Key: HIVE-14707
> URL: https://issues.apache.org/jira/browse/HIVE-14707
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Eugene Koifman
> Attachments: HIVE-14707.01.patch
>
>
> The ACID insert codepath uses a sorted shuffle, while they key used for 
> shuffle is always 0 bytes long.
> {code}
> hive (sales_acid)> explain insert into sales values(1, 2, 
> '3400---009', 1, null);
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: gopal_20160906172626_80261c4c-79cc-4e02-87fe-3133be404e55:2
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> ...
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: values__tmp__table__2
>   Statistics: Num rows: 1 Data size: 28 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: tmp_values_col1 (type: string), 
> tmp_values_col2 (type: string), tmp_values_col3 (type: string), 
> tmp_values_col4 (type: string), tmp_values_col5 (type: string)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4
> Statistics: Num rows: 1 Data size: 28 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order: 
>   Map-reduce partition columns: UDFToLong(_col1) (type: 
> bigint)
>   Statistics: Num rows: 1 Data size: 28 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: string), _col1 (type: 
> string), _col2 (type: string), _col3 (type: string), _col4 (type: string)
> Execution mode: vectorized, llap
> LLAP IO: no inputs
> {code}
> Note the missing "+" / "-" in the Sort Order fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14707) ACID: Insert shuffle sort-merges on blank KEY

2016-12-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14707:
--
Attachment: HIVE-14707.01.patch

> ACID: Insert shuffle sort-merges on blank KEY
> -
>
> Key: HIVE-14707
> URL: https://issues.apache.org/jira/browse/HIVE-14707
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Eugene Koifman
> Attachments: HIVE-14707.01.patch
>
>
> The ACID insert codepath uses a sorted shuffle, while they key used for 
> shuffle is always 0 bytes long.
> {code}
> hive (sales_acid)> explain insert into sales values(1, 2, 
> '3400---009', 1, null);
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: gopal_20160906172626_80261c4c-79cc-4e02-87fe-3133be404e55:2
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> ...
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: values__tmp__table__2
>   Statistics: Num rows: 1 Data size: 28 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: tmp_values_col1 (type: string), 
> tmp_values_col2 (type: string), tmp_values_col3 (type: string), 
> tmp_values_col4 (type: string), tmp_values_col5 (type: string)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4
> Statistics: Num rows: 1 Data size: 28 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order: 
>   Map-reduce partition columns: UDFToLong(_col1) (type: 
> bigint)
>   Statistics: Num rows: 1 Data size: 28 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: string), _col1 (type: 
> string), _col2 (type: string), _col3 (type: string), _col4 (type: string)
> Execution mode: vectorized, llap
> LLAP IO: no inputs
> {code}
> Note the missing "+" / "-" in the Sort Order fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15360) Nested column pruning: add pruned column paths to explain output

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768685#comment-15768685
 ] 

Hive QA commented on HIVE-15360:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844302/HIVE-15360.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10806 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_coalesce] 
(batchId=75)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite 
(batchId=186)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2683/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2683/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2683/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844302 - PreCommit-HIVE-Build

> Nested column pruning: add pruned column paths to explain output
> 
>
> Key: HIVE-15360
> URL: https://issues.apache.org/jira/browse/HIVE-15360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-15360.1.patch, HIVE-15360.2.patch
>
>
> We should add the pruned nested column paths to the explain output for easier 
> tracing and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15399) Parser change for UniqueJoin

2016-12-21 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768637#comment-15768637
 ] 

Pengcheng Xiong edited comment on HIVE-15399 at 12/22/16 12:57 AM:
---

included in HIVE-15200


was (Author: pxiong):
included in HIVE-15220

> Parser change for UniqueJoin
> 
>
> Key: HIVE-15399
> URL: https://issues.apache.org/jira/browse/HIVE-15399
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15399.01.patch
>
>
> UniqueJoin was introduced in HIVE-591. Add Unique Join. (Emil Ibrishimov via 
> namit). It sounds like that there is only one q test for unique join, i.e., 
> uniquejoin.q. In the q test, unique join source can only come from a table. 
> However, in parser, its source can come from not only tableSource, but also
> {code}
> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource
> {code}
> I think it would be better to change the parser and limit it to meet the 
> user's real requirement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15399) Parser change for UniqueJoin

2016-12-21 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15399:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

included in HIVE-15220

> Parser change for UniqueJoin
> 
>
> Key: HIVE-15399
> URL: https://issues.apache.org/jira/browse/HIVE-15399
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15399.01.patch
>
>
> UniqueJoin was introduced in HIVE-591. Add Unique Join. (Emil Ibrishimov via 
> namit). It sounds like that there is only one q test for unique join, i.e., 
> uniquejoin.q. In the q test, unique join source can only come from a table. 
> However, in parser, its source can come from not only tableSource, but also
> {code}
> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource
> {code}
> I think it would be better to change the parser and limit it to meet the 
> user's real requirement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15445) Subquery failing with ClassCastException

2016-12-21 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15445:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks [~vgarg] and [~pxiong]!

> Subquery failing with ClassCastException
> 
>
> Key: HIVE-15445
> URL: https://issues.apache.org/jira/browse/HIVE-15445
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Jesus Camacho Rodriguez
> Fix For: 2.2.0
>
> Attachments: HIVE-15445.01.patch, HIVE-15445.patch
>
>
> To reproduce:
> {code:sql}
> CREATE TABLE table_7 (int_col INT);
> SELECT
> (t1.int_col) * (t1.int_col) AS int_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) t1
> WHERE
> (False) NOT IN (SELECT
> False AS boolean_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) tt1
> WHERE
> (t1.int_col) = (tt1.int_col));
> {code}
> The problem seems to be in the method that tries to resolve the subquery 
> column _MIN(NULL)_. It checks the column inspector and ends up returning a 
> constant expression instead of a column expression for _min(null)_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15494) Create perfLogger in method execute instead of class initialization for SparkTask

2016-12-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-15494:
-
Summary: Create perfLogger in method execute instead of class 
initialization for SparkTask  (was: Create perfLogger in method execute instead 
class initialization for SparkTask)

> Create perfLogger in method execute instead of class initialization for 
> SparkTask
> -
>
> Key: HIVE-15494
> URL: https://issues.apache.org/jira/browse/HIVE-15494
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-15494.000.patch
>
>
> Create perfLogger in method execute instead class initialization for 
> SparkTask,
> so perfLogger can be shared with SparkJobMonitor in the same thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15494) Create perfLogger in method execute instead class initialization for SparkTask

2016-12-21 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768610#comment-15768610
 ] 

Chao Sun commented on HIVE-15494:
-

+1

> Create perfLogger in method execute instead class initialization for SparkTask
> --
>
> Key: HIVE-15494
> URL: https://issues.apache.org/jira/browse/HIVE-15494
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-15494.000.patch
>
>
> Create perfLogger in method execute instead class initialization for 
> SparkTask,
> so perfLogger can be shared with SparkJobMonitor in the same thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15493) Wrong result for LEFT outer join in Tez using MapJoinOperator

2016-12-21 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15493:
---
Attachment: HIVE-15493.patch

> Wrong result for LEFT outer join in Tez using MapJoinOperator
> -
>
> Key: HIVE-15493
> URL: https://issues.apache.org/jira/browse/HIVE-15493
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-15493.patch
>
>
> To reproduce, we can run in Tez:
> {code:sql}
> set hive.auto.convert.join=true;
> DROP TABLE IF EXISTS test_1; 
> CREATE TABLE test_1 
> ( 
> member BIGINT 
> , age VARCHAR (100) 
> ) 
> STORED AS TEXTFILE 
> ; 
> DROP TABLE IF EXISTS test_2; 
> CREATE TABLE test_2 
> ( 
> member BIGINT 
> ) 
> STORED AS TEXTFILE 
> ; 
> INSERT INTO test_1 VALUES (1, '20'), (2, '30'), (3, '40'); 
> INSERT INTO test_2 VALUES (1), (2), (3); 
> SELECT 
> t2.member 
> , t1.age_1 
> , t1.age_2 
> FROM 
> test_2 t2 
> LEFT JOIN ( 
> SELECT 
> member 
> , age as age_1 
> , age as age_2 
> FROM 
> test_1 
> ) t1 
> ON t2.member = t1.member 
> ;
> {code}
> Result is:
> {noformat}
> 1 20  NULL
> 3 40  NULL
> 2 30  NULL
> {noformat}
> Correct result is:
> {noformat}
> 1 20  20
> 3 40  40
> 2 30  30
> {noformat}
> Bug was introduced by HIVE-10582. Though the fix in HIVE-10582 does not 
> contain tests, it does look legit. In fact, the problem seems to be in the 
> MapJoinOperator itself. It only happens for LEFT outer join (not with RIGHT 
> outer or FULL outer). Although I am still trying to understand part of the 
> MapJoinOperator code path, the bug could be in the initialization of the 
> operator. It only happens when we have duplicate values in the right part of 
> the output.
> Till we have more time to study the problem in detail and fix the 
> MapJoinOperator, I will submit a fix that removes the code in 
> SemanticAnalyzer that reuses duplicated value expressions from RS to create 
> multiple columns in the join output (this is equivalent to reverting 
> HIVE-10582). 
> Once this is pushed, I will create a follow-up issue to take this code back 
> and tackle the problem in the MapJoinOperator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15494) Create perfLogger in method execute instead class initialization for SparkTask

2016-12-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-15494:
-
Status: Patch Available  (was: Open)

> Create perfLogger in method execute instead class initialization for SparkTask
> --
>
> Key: HIVE-15494
> URL: https://issues.apache.org/jira/browse/HIVE-15494
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-15494.000.patch
>
>
> Create perfLogger in method execute instead class initialization for 
> SparkTask,
> so perfLogger can be shared with SparkJobMonitor in the same thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15494) Create perfLogger in method execute instead class initialization for SparkTask

2016-12-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-15494:
-
Attachment: HIVE-15494.000.patch

> Create perfLogger in method execute instead class initialization for SparkTask
> --
>
> Key: HIVE-15494
> URL: https://issues.apache.org/jira/browse/HIVE-15494
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-15494.000.patch
>
>
> Create perfLogger in method execute instead class initialization for 
> SparkTask,
> so perfLogger can be shared with SparkJobMonitor in the same thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15488) Native Vector MapJoin fails when trying to serialize BigTable rows that have (unreferenced) complex types

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768594#comment-15768594
 ] 

Hive QA commented on HIVE-15488:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844295/HIVE-15488.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10791 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=116)

[join39.q,bucketsortoptimize_insert_7.q,vector_distinct_2.q,join11.q,union13.q,dynamic_rdd_cache.q,auto_sortmerge_join_16.q,windowing.q,union_remove_3.q,skewjoinopt7.q,stats7.q,annotate_stats_join.q,multi_insert_lateral_view.q,ptf_streaming.q,join_1to1.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testPreemptionStateOnTaskMoveToNonFinishableState
 (batchId=282)
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testTaskStatus 
(batchId=212)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2682/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2682/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2682/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844295 - PreCommit-HIVE-Build

> Native Vector MapJoin fails when trying to serialize BigTable rows that have 
> (unreferenced) complex types
> -
>
> Key: HIVE-15488
> URL: https://issues.apache.org/jira/browse/HIVE-15488
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15488.01.patch
>
>
> When creating VectorSerializeRow we need to exclude any complex types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15493) Wrong result for LEFT outer join in Tez using MapJoinOperator

2016-12-21 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15493:
---
Status: Patch Available  (was: In Progress)

> Wrong result for LEFT outer join in Tez using MapJoinOperator
> -
>
> Key: HIVE-15493
> URL: https://issues.apache.org/jira/browse/HIVE-15493
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
>
> To reproduce, we can run in Tez:
> {code:sql}
> set hive.auto.convert.join=true;
> DROP TABLE IF EXISTS test_1; 
> CREATE TABLE test_1 
> ( 
> member BIGINT 
> , age VARCHAR (100) 
> ) 
> STORED AS TEXTFILE 
> ; 
> DROP TABLE IF EXISTS test_2; 
> CREATE TABLE test_2 
> ( 
> member BIGINT 
> ) 
> STORED AS TEXTFILE 
> ; 
> INSERT INTO test_1 VALUES (1, '20'), (2, '30'), (3, '40'); 
> INSERT INTO test_2 VALUES (1), (2), (3); 
> SELECT 
> t2.member 
> , t1.age_1 
> , t1.age_2 
> FROM 
> test_2 t2 
> LEFT JOIN ( 
> SELECT 
> member 
> , age as age_1 
> , age as age_2 
> FROM 
> test_1 
> ) t1 
> ON t2.member = t1.member 
> ;
> {code}
> Result is:
> {noformat}
> 1 20  NULL
> 3 40  NULL
> 2 30  NULL
> {noformat}
> Correct result is:
> {noformat}
> 1 20  20
> 3 40  40
> 2 30  30
> {noformat}
> Bug was introduced by HIVE-10582. Though the fix in HIVE-10582 does not 
> contain tests, it does look legit. In fact, the problem seems to be in the 
> MapJoinOperator itself. It only happens for LEFT outer join (not with RIGHT 
> outer or FULL outer). Although I am still trying to understand part of the 
> MapJoinOperator code path, the bug could be in the initialization of the 
> operator. It only happens when we have duplicate values in the right part of 
> the output.
> Till we have more time to study the problem in detail and fix the 
> MapJoinOperator, I will submit a fix that removes the code in 
> SemanticAnalyzer that reuses duplicated value expressions from RS to create 
> multiple columns in the join output (this is equivalent to reverting 
> HIVE-10582). 
> Once this is pushed, I will create a follow-up issue to take this code back 
> and tackle the problem in the MapJoinOperator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15470) Catch Throwable instead of Exception in driver.execute.

2016-12-21 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768589#comment-15768589
 ] 

zhihai xu commented on HIVE-15470:
--

Thanks [~jxiang] for reviewing and committing the patch!

> Catch Throwable instead of Exception in driver.execute.
> ---
>
> Key: HIVE-15470
> URL: https://issues.apache.org/jira/browse/HIVE-15470
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15470.000.patch
>
>
> Catch Throwable instead of Exception in driver.execute. So the failed query 
> with Throwable not Exception will also be logged and reported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-15493) Wrong result for LEFT outer join in Tez using MapJoinOperator

2016-12-21 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15493 started by Jesus Camacho Rodriguez.
--
> Wrong result for LEFT outer join in Tez using MapJoinOperator
> -
>
> Key: HIVE-15493
> URL: https://issues.apache.org/jira/browse/HIVE-15493
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
>
> To reproduce, we can run in Tez:
> {code:sql}
> set hive.auto.convert.join=true;
> DROP TABLE IF EXISTS test_1; 
> CREATE TABLE test_1 
> ( 
> member BIGINT 
> , age VARCHAR (100) 
> ) 
> STORED AS TEXTFILE 
> ; 
> DROP TABLE IF EXISTS test_2; 
> CREATE TABLE test_2 
> ( 
> member BIGINT 
> ) 
> STORED AS TEXTFILE 
> ; 
> INSERT INTO test_1 VALUES (1, '20'), (2, '30'), (3, '40'); 
> INSERT INTO test_2 VALUES (1), (2), (3); 
> SELECT 
> t2.member 
> , t1.age_1 
> , t1.age_2 
> FROM 
> test_2 t2 
> LEFT JOIN ( 
> SELECT 
> member 
> , age as age_1 
> , age as age_2 
> FROM 
> test_1 
> ) t1 
> ON t2.member = t1.member 
> ;
> {code}
> Result is:
> {noformat}
> 1 20  NULL
> 3 40  NULL
> 2 30  NULL
> {noformat}
> Correct result is:
> {noformat}
> 1 20  20
> 3 40  40
> 2 30  30
> {noformat}
> Bug was introduced by HIVE-10582. Though the fix in HIVE-10582 does not 
> contain tests, it does look legit. In fact, the problem seems to be in the 
> MapJoinOperator itself. It only happens for LEFT outer join (not with RIGHT 
> outer or FULL outer). Although I am still trying to understand part of the 
> MapJoinOperator code path, the bug could be in the initialization of the 
> operator. It only happens when we have duplicate values in the right part of 
> the output.
> Till we have more time to study the problem in detail and fix the 
> MapJoinOperator, I will submit a fix that removes the code in 
> SemanticAnalyzer that reuses duplicated value expressions from RS to create 
> multiple columns in the join output (this is equivalent to reverting 
> HIVE-10582). 
> Once this is pushed, I will create a follow-up issue to take this code back 
> and tackle the problem in the MapJoinOperator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14956) Parallelize TestHCatLoader

2016-12-21 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-14956:
---
Attachment: HIVE-14956.02.patch

Attaching the second iteration of the patch which fixes the test failures.

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14956.01.patch, HIVE-14956.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15493) Wrong result for LEFT outer join in Tez using MapJoinOperator

2016-12-21 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15493:
---
Description: 
To reproduce, we can run in Tez:
{code:sql}
set hive.auto.convert.join=true;

DROP TABLE IF EXISTS test_1; 
CREATE TABLE test_1 
( 
member BIGINT 
, age VARCHAR (100) 
) 
STORED AS TEXTFILE 
; 

DROP TABLE IF EXISTS test_2; 
CREATE TABLE test_2 
( 
member BIGINT 
) 
STORED AS TEXTFILE 
; 

INSERT INTO test_1 VALUES (1, '20'), (2, '30'), (3, '40'); 
INSERT INTO test_2 VALUES (1), (2), (3); 

SELECT 
t2.member 
, t1.age_1 
, t1.age_2 
FROM 
test_2 t2 
LEFT JOIN ( 
SELECT 
member 
, age as age_1 
, age as age_2 
FROM 
test_1 
) t1 
ON t2.member = t1.member 
;
{code}

Result is:
{noformat}
1   20  NULL
3   40  NULL
2   30  NULL
{noformat}

Correct result is:
{noformat}
1   20  20
3   40  40
2   30  30
{noformat}

Bug was introduced by HIVE-10582. Though the fix in HIVE-10582 does not contain 
tests, it does look legit. In fact, the problem seems to be in the 
MapJoinOperator itself. It only happens for LEFT outer join (not with RIGHT 
outer or FULL outer). Although I am still trying to understand part of the 
MapJoinOperator code path, the bug could be in the initialization of the 
operator. It only happens when we have duplicate values in the right part of 
the output.

Till we have more time to study the problem in detail and fix the 
MapJoinOperator, I will submit a fix that removes the code in SemanticAnalyzer 
that reuses duplicated value expressions from RS to create multiple columns in 
the join output (this is equivalent to reverting HIVE-10582). 

Once this is pushed, I will create a follow-up issue to take this code back and 
tackle the problem in the MapJoinOperator.

  was:
To reproduce, we can run in Tez:
{code:sql}
set hive.auto.convert.join=true;

DROP TABLE IF EXISTS test_1; 
CREATE TABLE test_1 
( 
member BIGINT 
, age VARCHAR (100) 
) 
STORED AS TEXTFILE 
; 

DROP TABLE IF EXISTS test_2; 
CREATE TABLE test_2 
( 
member BIGINT 
) 
STORED AS TEXTFILE 
; 

INSERT INTO test_1 VALUES (1, '20'), (2, '30'), (3, '40'); 
INSERT INTO test_2 VALUES (1), (2), (3); 

SELECT 
t2.member 
, t1.age_1 
, t1.age_2 
FROM 
test_2 t2 
LEFT JOIN ( 
SELECT 
member 
, age as age_1 
, age as age_2 
FROM 
test_1 
) t1 
ON t2.member = t1.member 
;
{code}

Result is:
{format}
1   20  NULL
3   40  NULL
2   30  NULL
{format}

Correct result is:
{format}
1   20  20
3   40  40
2   30  30
{format}

Bug was introduced by HIVE-10582. Though the fix in HIVE-10582 does not contain 
tests, it does look legit. In fact, the problem seems to be in the 
MapJoinOperator itself. It only happens for LEFT outer join (not with RIGHT 
outer or FULL outer). Although I am still trying to understand part of the 
MapJoinOperator code path, the bug could be in the initialization of the 
operator. It only happens when we have duplicate values in the right part of 
the output.

Till we have more time to study the problem in detail and fix the 
MapJoinOperator, I will submit a fix that removes the code in SemanticAnalyzer 
that reuses duplicated value expressions from RS to create multiple columns in 
the join output (this is equivalent to reverting HIVE-10582). 

Once this is pushed, I will create a follow-up issue to take this code back and 
tackle the problem in the MapJoinOperator.


> Wrong result for LEFT outer join in Tez using MapJoinOperator
> -
>
> Key: HIVE-15493
> URL: https://issues.apache.org/jira/browse/HIVE-15493
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
>
> To reproduce, we can run in Tez:
> {code:sql}
> set hive.auto.convert.join=true;
> DROP TABLE IF EXISTS test_1; 
> CREATE TABLE test_1 
> ( 
> member BIGINT 
> , age VARCHAR (100) 
> ) 
> STORED AS TEXTFILE 
> ; 
> DROP TABLE IF EXISTS test_2; 
> CREATE TABLE test_2 
> ( 
> member BIGINT 
> ) 
> STORED AS TEXTFILE 
> ; 
> INSERT INTO test_1 VALUES (1, '20'), (2, '30'), (3, '40'); 
> INSERT INTO test_2 VALUES (1), (2), (3); 
> SELECT 
> t2.member 
> , t1.age_1 
> , t1.age_2 
> FROM 
> test_2 t2 
> LEFT JOIN ( 
> SELECT 
> member 
> , age as age_1 
> , age as age_2 
> FROM 
> test_1 
> ) t1 
> ON t2.member = t1.member 
> ;
> {code}
> Result is:
> {noformat}
> 1 20  NULL
> 3 40  NULL
> 2 30  NULL
> {noformat}
> Correct result is:
> {noformat}
> 1 20  20
> 3 40  40
> 2 30  30
> {noformat}
> Bug was introduced by HIVE-10582. Though the fix in HIVE-10582 does not 
> contain tests, it does look legit. In fact, the problem seems to be in the 
> MapJoinOperator 

[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-21 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Status: Patch Available  (was: In Progress)

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch, 
> HIVE-15335.099.patch, HIVE-15335.0991.patch, HIVE-15335.0992.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-21 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Attachment: HIVE-15335.0992.patch

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch, 
> HIVE-15335.099.patch, HIVE-15335.0991.patch, HIVE-15335.0992.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-21 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Status: In Progress  (was: Patch Available)

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch, 
> HIVE-15335.099.patch, HIVE-15335.0991.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15488) Native Vector MapJoin fails when trying to serialize BigTable rows that have (unreferenced) complex types

2016-12-21 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768573#comment-15768573
 ] 

Wei Zheng commented on HIVE-15488:
--

+1 pending test

> Native Vector MapJoin fails when trying to serialize BigTable rows that have 
> (unreferenced) complex types
> -
>
> Key: HIVE-15488
> URL: https://issues.apache.org/jira/browse/HIVE-15488
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15488.01.patch
>
>
> When creating VectorSerializeRow we need to exclude any complex types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-21 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Status: Patch Available  (was: Open)

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.10.patch, 
> HIVE-15376.11.patch, HIVE-15376.12.patch, HIVE-15376.13.patch, 
> HIVE-15376.14.patch, HIVE-15376.2.patch, HIVE-15376.3.patch, 
> HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch, 
> HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-21 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Attachment: HIVE-15376.14.patch

patch 14 fixed a bug in TestDbTxnManager.testLockExpiration

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.10.patch, 
> HIVE-15376.11.patch, HIVE-15376.12.patch, HIVE-15376.13.patch, 
> HIVE-15376.14.patch, HIVE-15376.2.patch, HIVE-15376.3.patch, 
> HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch, 
> HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-21 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Status: Open  (was: Patch Available)

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.10.patch, 
> HIVE-15376.11.patch, HIVE-15376.12.patch, HIVE-15376.13.patch, 
> HIVE-15376.14.patch, HIVE-15376.2.patch, HIVE-15376.3.patch, 
> HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch, 
> HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15477) Provide options to adjust filter stats when column stats are not available

2016-12-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15477:

Attachment: HIVE-15477.2.patch

> Provide options to adjust filter stats when column stats are not available
> --
>
> Key: HIVE-15477
> URL: https://issues.apache.org/jira/browse/HIVE-15477
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15477.1.patch, HIVE-15477.2.patch
>
>
> Currently when column stats are not available, Hive will assume the "worst" 
> case by setting the # of output rows to be 1/2 of the # of input rows, for 
> each predicate expression. This could be inaccurate, especially in the 
> presence of multiple predicates chained by AND. We have found in some cases 
> this could cause map join to have wrong ordering and thus fail with memory 
> issue.
> One suggestion is to provide a config (such as {{hive.stats.filter.factor}}) 
> that can be used to control the percentage of rows emitted by a predicate 
> expression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2016-12-21 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15491:

Status: Patch Available  (was: Open)

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Mithun Radhakrishnan
> Attachments: HIVE-15491.patch
>
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2016-12-21 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-15491:
---

Assignee: Mithun Radhakrishnan

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15491.patch
>
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2016-12-21 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15491:

Attachment: HIVE-15491.patch

The patch.

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Mithun Radhakrishnan
> Attachments: HIVE-15491.patch
>
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2016-12-21 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15491:

Affects Version/s: 2.1.1

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Mithun Radhakrishnan
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2016-12-21 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15491:

Description: 
I draw your attention to the following piece of code in 
{{GenericUDTFJSONTuple::process()}}:

{code:java}
  @Override
  public void process(Object[] o) throws HiveException {
  ...
for (int i = 0; i < numCols; ++i) {
if (retCols[i] == null) {
  retCols[i] = cols[i]; // use the object pool rather than creating a 
new object
}
Object extractObject = ((Map)jsonObj).get(paths[i]);
if (extractObject instanceof Map || extractObject instanceof List) {
  retCols[i].set(MAPPER.writeValueAsString(extractObject));
} else if (extractObject != null) {
  retCols[i].set(extractObject.toString());
} else {
  retCols[i] = null;
}
  }
  forward(retCols);
  return;
} catch (Throwable e) {  <= Yikes.
  LOG.error("JSON parsing/evaluation exception" + e);
  forward(nullCols);
}
  }
{code}

The error-handling here seems suspect. Judging from the error message, the 
intention here seems to be to catch JSON-specific errors arising from 
{{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
{{Throwable}}, this code masks the errors that arise from the call to 
{{forward(retCols)}}.

I just ran into this in production. A user with a nearly exhausted HDFS quota 
attempted to use {{json_tuple}} to extract fields from json strings in his 
data. The data turned out to have large record counts and the query used over 
25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
exhausted quota. But the thrown exception was swallowed in the code above. 
{{process()}} ignored the failure for the record and proceeded to the next one. 
Eventually, this resulted in DDoS-ing the name-node.

I'll have a patch for this shortly.

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Reporter: Mithun Radhakrishnan
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768481#comment-15768481
 ] 

Hive QA commented on HIVE-14731:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844294/HIVE-14731.13.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10779 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_char_mapjoin1.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,vectorized_context.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=72)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[3]
 (batchId=173)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimalXY 
(batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2681/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2681/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2681/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844294 - PreCommit-HIVE-Build

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.11.patch, HIVE-14731.12.patch, HIVE-14731.13.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2016-12-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Status: Patch Available  (was: Open)

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2016-12-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: HIVE-15489.wip.patch

Attaching WIP patch for testing.

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15345) Spelling errors in logging and exceptions for query language code

2016-12-21 Thread Grant Sohn (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768435#comment-15768435
 ] 

Grant Sohn commented on HIVE-15345:
---

Thanks, Prasanth and Wei.

> Spelling errors in logging and exceptions for query language code
> -
>
> Key: HIVE-15345
> URL: https://issues.apache.org/jira/browse/HIVE-15345
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Grant Sohn
>Assignee: Grant Sohn
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-15345.1.patch
>
>
> Obvious typos and misspellings in the exceptions and messages.
> modifified -> modified
> commnad -> command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15487) LLAP: Improvements to random selection while scheduling

2016-12-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15487:
-
Status: Patch Available  (was: Open)

> LLAP: Improvements to random selection while scheduling
> ---
>
> Key: HIVE-15487
> URL: https://issues.apache.org/jira/browse/HIVE-15487
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15487.1.patch
>
>
> Currently llap scheduler, picks up random host when no locality information 
> is specified or when all requested hosts are busy serving other requests with 
> forced locality. In such cases, we can pick up the next available node in 
> consistent order to get better locality instead of random selection. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15487) LLAP: Improvements to random selection while scheduling

2016-12-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15487:
-
Attachment: HIVE-15487.1.patch

[~gopalv] Can you please review this patch?


> LLAP: Improvements to random selection while scheduling
> ---
>
> Key: HIVE-15487
> URL: https://issues.apache.org/jira/browse/HIVE-15487
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15487.1.patch
>
>
> Currently llap scheduler, picks up random host when no locality information 
> is specified or when all requested hosts are busy serving other requests with 
> forced locality. In such cases, we can pick up the next available node in 
> consistent order to get better locality instead of random selection. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15112) Implement Parquet vectorization reader for Struct type

2016-12-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15112:

Status: Patch Available  (was: Reopened)

> Implement Parquet vectorization reader for Struct type
> --
>
> Key: HIVE-15112
> URL: https://issues.apache.org/jira/browse/HIVE-15112
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-15112.1.patch, HIVE-15112.2.patch, 
> HIVE-15112.3.patch, HIVE-15112.patch
>
>
> Like HIVE-14815, we need support Parquet vectorized reader for struct type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14956) Parallelize TestHCatLoader

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768347#comment-15768347
 ] 

Hive QA commented on HIVE-14956:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844281/HIVE-14956.01.patch

{color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 10819 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=134)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hive.hcatalog.pig.TestHCatLoader.testColumnarStorePushdown2[4] 
(batchId=171)
org.apache.hive.hcatalog.pig.TestHCatLoader.testColumnarStorePushdown[4] 
(batchId=171)
org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[4] 
(batchId=171)
org.apache.hive.hcatalog.pig.TestHCatLoader.testDatePartitionPushUp[4] 
(batchId=171)
org.apache.hive.hcatalog.pig.TestHCatLoader.testGetInputBytes[4] (batchId=171)
org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic[4] 
(batchId=171)
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataBasic[4] (batchId=171)
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[4] 
(batchId=171)
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadMissingPartitionBasicNeg[4] 
(batchId=171)
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic[4] 
(batchId=171)
org.apache.hive.hcatalog.pig.TestHCatLoader.testSchemaLoadBasic[4] (batchId=171)
org.apache.hive.hcatalog.pig.TestHCatLoader.testSchemaLoadComplex[4] 
(batchId=171)
org.apache.hive.hcatalog.pig.TestHCatLoader.testSchemaLoadPrimitiveTypes[4] 
(batchId=171)
org.apache.hive.hcatalog.pig.TestParquetHCatLoader.testColumnarStorePushdown 
(batchId=173)
org.apache.hive.hcatalog.pig.TestParquetHCatLoader.testConvertBooleanToInt 
(batchId=173)
org.apache.hive.hcatalog.pig.TestParquetHCatLoader.testGetInputBytes 
(batchId=173)
org.apache.hive.hcatalog.pig.TestParquetHCatLoader.testReadDataPrimitiveTypes 
(batchId=173)
org.apache.hive.hcatalog.pig.TestParquetHCatLoader.testSchemaLoadBasic 
(batchId=173)
org.apache.hive.hcatalog.pig.TestParquetHCatLoader.testSchemaLoadComplex 
(batchId=173)
org.apache.hive.hcatalog.pig.TestParquetHCatLoader.testSchemaLoadPrimitiveTypes 
(batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2680/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2680/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2680/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844281 - PreCommit-HIVE-Build

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14956.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15360) Nested column pruning: add pruned column paths to explain output

2016-12-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15360:

Attachment: HIVE-15360.2.patch

> Nested column pruning: add pruned column paths to explain output
> 
>
> Key: HIVE-15360
> URL: https://issues.apache.org/jira/browse/HIVE-15360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-15360.1.patch, HIVE-15360.2.patch
>
>
> We should add the pruned nested column paths to the explain output for easier 
> tracing and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15477) Provide options to adjust filter stats when column stats are not available

2016-12-21 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768298#comment-15768298
 ] 

Chao Sun commented on HIVE-15477:
-

[~prasanth_j] Thanks. Yes let me fix that and submit a new patch.

> Provide options to adjust filter stats when column stats are not available
> --
>
> Key: HIVE-15477
> URL: https://issues.apache.org/jira/browse/HIVE-15477
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15477.1.patch
>
>
> Currently when column stats are not available, Hive will assume the "worst" 
> case by setting the # of output rows to be 1/2 of the # of input rows, for 
> each predicate expression. This could be inaccurate, especially in the 
> presence of multiple predicates chained by AND. We have found in some cases 
> this could cause map join to have wrong ordering and thus fail with memory 
> issue.
> One suggestion is to provide a config (such as {{hive.stats.filter.factor}}) 
> that can be used to control the percentage of rows emitted by a predicate 
> expression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15477) Provide options to adjust filter stats when column stats are not available

2016-12-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768290#comment-15768290
 ] 

Prasanth Jayachandran commented on HIVE-15477:
--

Do you want to fix the inequality case as well? that divides by 3 in worst 
case. 
Other than that the patch looks good to me, +1

> Provide options to adjust filter stats when column stats are not available
> --
>
> Key: HIVE-15477
> URL: https://issues.apache.org/jira/browse/HIVE-15477
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15477.1.patch
>
>
> Currently when column stats are not available, Hive will assume the "worst" 
> case by setting the # of output rows to be 1/2 of the # of input rows, for 
> each predicate expression. This could be inaccurate, especially in the 
> presence of multiple predicates chained by AND. We have found in some cases 
> this could cause map join to have wrong ordering and thus fail with memory 
> issue.
> One suggestion is to provide a config (such as {{hive.stats.filter.factor}}) 
> that can be used to control the percentage of rows emitted by a predicate 
> expression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15477) Provide options to adjust filter stats when column stats are not available

2016-12-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768281#comment-15768281
 ] 

Prasanth Jayachandran commented on HIVE-15477:
--

Looks like HIVE-0 already fixed this mis-estimation. Before that it used to 
consider the IS NOT NULL predicate as constant and reduce the rows by half. 
Sorry I lost track of this change that fixed it.

> Provide options to adjust filter stats when column stats are not available
> --
>
> Key: HIVE-15477
> URL: https://issues.apache.org/jira/browse/HIVE-15477
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15477.1.patch
>
>
> Currently when column stats are not available, Hive will assume the "worst" 
> case by setting the # of output rows to be 1/2 of the # of input rows, for 
> each predicate expression. This could be inaccurate, especially in the 
> presence of multiple predicates chained by AND. We have found in some cases 
> this could cause map join to have wrong ordering and thus fail with memory 
> issue.
> One suggestion is to provide a config (such as {{hive.stats.filter.factor}}) 
> that can be used to control the percentage of rows emitted by a predicate 
> expression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15489) Alternatively use table scan stats for HoS

2016-12-21 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768257#comment-15768257
 ] 

Chao Sun commented on HIVE-15489:
-

cc [~xuefuz]

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15453) Fix failing tests in master

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768223#comment-15768223
 ] 

Hive QA commented on HIVE-15453:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844282/HIVE-15453.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10776 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_char_mapjoin1.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite 
(batchId=186)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2679/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2679/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2679/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844282 - PreCommit-HIVE-Build

> Fix failing tests in master
> ---
>
> Key: HIVE-15453
> URL: https://issues.apache.org/jira/browse/HIVE-15453
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sushanth Sowmyan
>Assignee: Prasanth Jayachandran
> Fix For: 2.2.0
>
> Attachments: HIVE-15453.1.patch, HIVE-15453.2.patch, 
> HIVE-15453.3.patch, HIVE-15453.4.patch, HIVE-15453.5.patch
>
>
> This test has been failing in a couple of ptests off late. A recent example 
> is in 
> https://builds.apache.org/job/PreCommit-HIVE-Build/2612/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_stats_based_fetch_decision_/
> {noformat}
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_239_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_240
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_240_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_240
> Running: diff -a 
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/stats_based_fetch_decision.q.out
>  
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/stats_based_fetch_decision.q.out
> 153c153
> <   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> stats: COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> > stats: COMPLETE Column stats: COMPLETE
> 156c156
> < Statistics: Num rows: 1 Data size: 546 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 546 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 160c160
> <   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 163c163
> < Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15453) Fix failing tests in master

2016-12-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15453:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

The tests updates seemed to work in the last run. Committed patch to master.

> Fix failing tests in master
> ---
>
> Key: HIVE-15453
> URL: https://issues.apache.org/jira/browse/HIVE-15453
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sushanth Sowmyan
>Assignee: Prasanth Jayachandran
> Fix For: 2.2.0
>
> Attachments: HIVE-15453.1.patch, HIVE-15453.2.patch, 
> HIVE-15453.3.patch, HIVE-15453.4.patch, HIVE-15453.5.patch
>
>
> This test has been failing in a couple of ptests off late. A recent example 
> is in 
> https://builds.apache.org/job/PreCommit-HIVE-Build/2612/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_stats_based_fetch_decision_/
> {noformat}
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_239_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_240
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_240_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_240
> Running: diff -a 
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/stats_based_fetch_decision.q.out
>  
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/stats_based_fetch_decision.q.out
> 153c153
> <   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> stats: COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> > stats: COMPLETE Column stats: COMPLETE
> 156c156
> < Statistics: Num rows: 1 Data size: 546 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 546 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 160c160
> <   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 163c163
> < Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15488) Native Vector MapJoin fails when trying to serialize BigTable rows that have (unreferenced) complex types

2016-12-21 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15488:

Status: Patch Available  (was: Open)

> Native Vector MapJoin fails when trying to serialize BigTable rows that have 
> (unreferenced) complex types
> -
>
> Key: HIVE-15488
> URL: https://issues.apache.org/jira/browse/HIVE-15488
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15488.01.patch
>
>
> When creating VectorSerializeRow we need to exclude any complex types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15488) Native Vector MapJoin fails when trying to serialize BigTable rows that have (unreferenced) complex types

2016-12-21 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15488:

Attachment: HIVE-15488.01.patch

> Native Vector MapJoin fails when trying to serialize BigTable rows that have 
> (unreferenced) complex types
> -
>
> Key: HIVE-15488
> URL: https://issues.apache.org/jira/browse/HIVE-15488
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15488.01.patch
>
>
> When creating VectorSerializeRow we need to exclude any complex types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15477) Provide options to adjust filter stats when column stats are not available

2016-12-21 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768176#comment-15768176
 ] 

Chao Sun commented on HIVE-15477:
-

[~prasanth_j] can you elaborate on what mis-estimate can be done with 
"join_key_column IS NOT NULL" predicates? I'm also curious why it is added to 
Hive. I was looking at {{evaluateNotNullExpr}} but seems it just return the 
input # of rows when column stats are not present? (looking at here: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L586)

Yeah I totally agree that we are going to make wrong estimates even with 
configs. It's very difficult to get 100% accurate stats. But with some configs 
we can at least add some manual intervention. :)


> Provide options to adjust filter stats when column stats are not available
> --
>
> Key: HIVE-15477
> URL: https://issues.apache.org/jira/browse/HIVE-15477
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15477.1.patch
>
>
> Currently when column stats are not available, Hive will assume the "worst" 
> case by setting the # of output rows to be 1/2 of the # of input rows, for 
> each predicate expression. This could be inaccurate, especially in the 
> presence of multiple predicates chained by AND. We have found in some cases 
> this could cause map join to have wrong ordering and thus fail with memory 
> issue.
> One suggestion is to provide a config (such as {{hive.stats.filter.factor}}) 
> that can be used to control the percentage of rows emitted by a predicate 
> expression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-12-21 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768148#comment-15768148
 ] 

Zhiyuan Yang commented on HIVE-14731:
-

Thanks [~hagleitn] for review! Final patch has been uploaded.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.11.patch, HIVE-14731.12.patch, HIVE-14731.13.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-12-21 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.13.patch

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.11.patch, HIVE-14731.12.patch, HIVE-14731.13.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15357) Fix and re-enable the spark-only tests

2016-12-21 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768138#comment-15768138
 ] 

Chao Sun commented on HIVE-15357:
-

Thanks [~lirui] for working on this. Patch looks good to me. +1

> Fix and re-enable the spark-only tests
> --
>
> Key: HIVE-15357
> URL: https://issues.apache.org/jira/browse/HIVE-15357
> Project: Hive
>  Issue Type: Test
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-15357.1.patch
>
>
> Defined by {{spark.only.query.files}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15453) Fix failing tests in master

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768081#comment-15768081
 ] 

Hive QA commented on HIVE-15453:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844282/HIVE-15453.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10806 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=150)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2678/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2678/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2678/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844282 - PreCommit-HIVE-Build

> Fix failing tests in master
> ---
>
> Key: HIVE-15453
> URL: https://issues.apache.org/jira/browse/HIVE-15453
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sushanth Sowmyan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15453.1.patch, HIVE-15453.2.patch, 
> HIVE-15453.3.patch, HIVE-15453.4.patch, HIVE-15453.5.patch
>
>
> This test has been failing in a couple of ptests off late. A recent example 
> is in 
> https://builds.apache.org/job/PreCommit-HIVE-Build/2612/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_stats_based_fetch_decision_/
> {noformat}
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_239_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_240
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_240_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_240
> Running: diff -a 
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/stats_based_fetch_decision.q.out
>  
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/stats_based_fetch_decision.q.out
> 153c153
> <   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> stats: COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> > stats: COMPLETE Column stats: COMPLETE
> 156c156
> < Statistics: Num rows: 1 Data size: 546 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 546 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 160c160
> <   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 163c163
> < Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15335) Fast Decimal

2016-12-21 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768058#comment-15768058
 ] 

Matt McCline edited comment on HIVE-15335 at 12/21/16 8:22 PM:
---

Query benchmark on V1 showed very, very high cost in HiveDecimalWritable 
(serialization/deserialization, creation of HiveDecimal for getHiveDecimal), in 
ORC decimal deserialization (BigInteger).

The cost of V1 decimal add turns out not to be add but the cost of 
HiveDecimalWritable.getDecimal() and then serializing in back into BigInteger 
bytes for HiveDecimalWritable.set.  Everywhere code was doing a getHiveDecimal 
to pass it around between components.

Making HiveDecimalWritable a fast, first class citizen was major part of this 
change.  That included making HiveDecimalWritable the object of choice to pass 
around or operate on directly.  E.g. Vectorized SUM aggregation eliminated 
almost call calls HiveDecimalWritable.getHiveDecimal() for its summing.

One query benchmark on the new code showed 3X improvement and the add method 
cost was in the noise.  So storing decimals in 1 long instead of 3 (i.e. so 
called fast path) isn't the place to look.  Microbenchmarks on add cost miss 
the boat.  The fast path is using HiveDecimalWritable.mutableAdd and the fast 
V2 serialization/deserialization methods including the HiveDecimal.create 
family / HiveDecimalWritable.set family.  Another way of thinking about the 
fast path is not using BigInteger / BigDecimal.


was (Author: mmccline):
Query benchmark on V1 showed very, very high cost in HiveDecimalWritable 
(serialization/deserialization, creation of HiveDecimal for getHiveDecimal), in 
ORC decimal deserialization (BigInteger).

The cost of V1 decimal add turns out not to be add but the cost of 
HiveDecimalWritable.getDecimal() and then serializing in back into BigInteger 
bytes for HiveDecimalWritable.set.  Everywhere code was doing a getHiveDecimal 
to pass it around between components.

Making HiveDecimalWritable a fast, first class citizen was major part of this 
change.  That included making HiveDecimalWritable the object of choice to pass 
around or operate on directly.  E.g. Vectorized SUM aggregation eliminated 
almost call calls HiveDecimalWritable.getHiveDecimal() for its summing.

One query benchmark on the new code showed 3X improvement and the add method 
cost was in the noise.  So storing decimals in 1 long instead of 3 (i.e. so 
called fast path isn't the place to look.  Microbenchmarks on add cost miss the 
boat.  The fast path is using HiveDecimalWritable.mutableAdd and the fast V2 
serialization/deserialization methods including the HiveDecimal.create family / 
HiveDecimalWritable.set family.

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch, 
> HIVE-15335.099.patch, HIVE-15335.0991.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-21 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768058#comment-15768058
 ] 

Matt McCline commented on HIVE-15335:
-

Query benchmark on V1 showed very, very high cost in HiveDecimalWritable 
(serialization/deserialization, creation of HiveDecimal for getHiveDecimal), in 
ORC decimal deserialization (BigInteger).

The cost of V1 decimal add turns out not to be add but the cost of 
HiveDecimalWritable.getDecimal() and then serializing in back into BigInteger 
bytes for HiveDecimalWritable.set.  Everywhere code was doing a getHiveDecimal 
to pass it around between components.

Making HiveDecimalWritable a fast, first class citizen was major part of this 
change.  That included making HiveDecimalWritable the object of choice to pass 
around or operate on directly.  E.g. Vectorized SUM aggregation eliminated 
almost call calls HiveDecimalWritable.getHiveDecimal() for its summing.

One query benchmark on the new code showed 3X improvement and the add method 
cost was in the noise.  So storing decimals in 1 long instead of 3 (i.e. so 
called fast path isn't the place to look.  Microbenchmarks on add cost miss the 
boat.  The fast path is using HiveDecimalWritable.mutableAdd and the fast V2 
serialization/deserialization methods including the HiveDecimal.create family / 
HiveDecimalWritable.set family.

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch, 
> HIVE-15335.099.patch, HIVE-15335.0991.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15481) Support multiple and nested subqueries

2016-12-21 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15481:
---
Issue Type: Sub-task  (was: Task)
Parent: HIVE-15456

> Support multiple and nested subqueries
> --
>
> Key: HIVE-15481
> URL: https://issues.apache.org/jira/browse/HIVE-15481
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15481.1.patch
>
>
> This is continuation of the work done in HIVE-15192. As listed at  
> [Restrictions | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf ] 
> currently it is not possible to execute queries which either have more than 
> one subquery or have nested subquery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14963) Fix org.apache.hive.jdbc.* test failures

2016-12-21 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere resolved HIVE-14963.
---
Resolution: Cannot Reproduce

> Fix org.apache.hive.jdbc.* test failures
> 
>
> Key: HIVE-14963
> URL: https://issues.apache.org/jira/browse/HIVE-14963
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Jason Dere
>
> Looks like they have been failng for the last several test runs:
> {noformat}
>  org.apache.hive.jdbc.TestNoSaslAuth.org.apache.hive.jdbc.TestNoSaslAuth  
> 2.5 sec 8
>  
> org.apache.hive.jdbc.authorization.TestHS2AuthzContext.org.apache.hive.jdbc.authorization.TestHS2AuthzContext
> 2.5 sec 8
>  
> org.apache.hive.jdbc.authorization.TestHS2AuthzSessionContext.org.apache.hive.jdbc.authorization.TestHS2AuthzSessionContext
>   2.5 sec 8
>  
> org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth.org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth
> 2.6 sec 8
>  
> org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthUDFBlacklist.testBlackListedUdfUsage
>2.5 sec 8
>  
> org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization
>   2.7 sec 8
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14963) Fix org.apache.hive.jdbc.* test failures

2016-12-21 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767989#comment-15767989
 ] 

Jason Dere commented on HIVE-14963:
---

I don't have the particular failures for these tests.
If they are no longer failing in pre-commit, then I guess we can close out this 
Jira.

> Fix org.apache.hive.jdbc.* test failures
> 
>
> Key: HIVE-14963
> URL: https://issues.apache.org/jira/browse/HIVE-14963
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Jason Dere
>
> Looks like they have been failng for the last several test runs:
> {noformat}
>  org.apache.hive.jdbc.TestNoSaslAuth.org.apache.hive.jdbc.TestNoSaslAuth  
> 2.5 sec 8
>  
> org.apache.hive.jdbc.authorization.TestHS2AuthzContext.org.apache.hive.jdbc.authorization.TestHS2AuthzContext
> 2.5 sec 8
>  
> org.apache.hive.jdbc.authorization.TestHS2AuthzSessionContext.org.apache.hive.jdbc.authorization.TestHS2AuthzSessionContext
>   2.5 sec 8
>  
> org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth.org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth
> 2.6 sec 8
>  
> org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthUDFBlacklist.testBlackListedUdfUsage
>2.5 sec 8
>  
> org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization
>   2.7 sec 8
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767964#comment-15767964
 ] 

Hive QA commented on HIVE-15376:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844266/HIVE-15376.13.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10334 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=112)

[vectorization_16.q,load_dyn_part5.q,join_casesensitive.q,transform_ppr2.q,join23.q,groupby7_map_skew.q,ppd_outer_join5.q,create_merge_compressed.q,louter_join_ppr.q,sample9.q,smb_mapjoin_16.q,vectorization_not.q,having.q,ppd_outer_join1.q,union_remove_12.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[str_to_map] (batchId=58)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=92)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exchange_partition_neg_incomplete_partition]
 (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_00_unsupported_schema]
 (batchId=85)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2677/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2677/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2677/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: OutOfMemoryError: Java heap space
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844266 - PreCommit-HIVE-Build

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.10.patch, 
> HIVE-15376.11.patch, HIVE-15376.12.patch, HIVE-15376.13.patch, 
> HIVE-15376.2.patch, HIVE-15376.3.patch, HIVE-15376.4.patch, 
> HIVE-15376.5.patch, HIVE-15376.6.patch, HIVE-15376.7.patch, 
> HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15448) ChangeManager for replication

2016-12-21 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767970#comment-15767970
 ] 

Sushanth Sowmyan commented on HIVE-15448:
-

Agreed, cancelling out (a) - the only places the non-signature path would get 
used is in cases where the user should be calling recycle anyway.

> ChangeManager for replication
> -
>
> Key: HIVE-15448
> URL: https://issues.apache.org/jira/browse/HIVE-15448
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15448.1.patch
>
>
> The change manager implementation as described in 
> https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development#HiveReplicationv2Development-Changemanagement.
>  This issue tracks the infrastructure code. Hooking to actions will be 
> tracked in other ticket.
> ReplChangeManager includes:
> * method to generate checksum
> * method to convert file path to cm path
> * method to move table/partition/file into cm
> * thread to clear cm files if expires



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14963) Fix org.apache.hive.jdbc.* test failures

2016-12-21 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767953#comment-15767953
 ] 

Vihang Karajgaonkar commented on HIVE-14963:


Hi [~jdere] I see that these tests are working in the latest pre-commit builds. 
Do you remember what was the error shown for these tests in the logs?

> Fix org.apache.hive.jdbc.* test failures
> 
>
> Key: HIVE-14963
> URL: https://issues.apache.org/jira/browse/HIVE-14963
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Jason Dere
>
> Looks like they have been failng for the last several test runs:
> {noformat}
>  org.apache.hive.jdbc.TestNoSaslAuth.org.apache.hive.jdbc.TestNoSaslAuth  
> 2.5 sec 8
>  
> org.apache.hive.jdbc.authorization.TestHS2AuthzContext.org.apache.hive.jdbc.authorization.TestHS2AuthzContext
> 2.5 sec 8
>  
> org.apache.hive.jdbc.authorization.TestHS2AuthzSessionContext.org.apache.hive.jdbc.authorization.TestHS2AuthzSessionContext
>   2.5 sec 8
>  
> org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth.org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth
> 2.6 sec 8
>  
> org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthUDFBlacklist.testBlackListedUdfUsage
>2.5 sec 8
>  
> org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization
>   2.7 sec 8
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15453) Fix failing tests in master

2016-12-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15453:
-
Summary: Fix failing tests in master  (was: Failing test : 
TestMiniLlapLocalCliDriver.testCliDriver : stats_based_fetch_decision)

> Fix failing tests in master
> ---
>
> Key: HIVE-15453
> URL: https://issues.apache.org/jira/browse/HIVE-15453
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sushanth Sowmyan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15453.1.patch, HIVE-15453.2.patch, 
> HIVE-15453.3.patch, HIVE-15453.4.patch, HIVE-15453.5.patch
>
>
> This test has been failing in a couple of ptests off late. A recent example 
> is in 
> https://builds.apache.org/job/PreCommit-HIVE-Build/2612/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_stats_based_fetch_decision_/
> {noformat}
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_239_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_240
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_240_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_240
> Running: diff -a 
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/stats_based_fetch_decision.q.out
>  
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/stats_based_fetch_decision.q.out
> 153c153
> <   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> stats: COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> > stats: COMPLETE Column stats: COMPLETE
> 156c156
> < Statistics: Num rows: 1 Data size: 546 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 546 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 160c160
> <   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 163c163
> < Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15453) Failing test : TestMiniLlapLocalCliDriver.testCliDriver : stats_based_fetch_decision

2016-12-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767949#comment-15767949
 ] 

Prasanth Jayachandran commented on HIVE-15453:
--

tez_union_view.q file does not exists. Removed it from properties file as well. 

> Failing test : TestMiniLlapLocalCliDriver.testCliDriver : 
> stats_based_fetch_decision
> 
>
> Key: HIVE-15453
> URL: https://issues.apache.org/jira/browse/HIVE-15453
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sushanth Sowmyan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15453.1.patch, HIVE-15453.2.patch, 
> HIVE-15453.3.patch, HIVE-15453.4.patch, HIVE-15453.5.patch
>
>
> This test has been failing in a couple of ptests off late. A recent example 
> is in 
> https://builds.apache.org/job/PreCommit-HIVE-Build/2612/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_stats_based_fetch_decision_/
> {noformat}
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_239_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_240
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_240_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_240
> Running: diff -a 
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/stats_based_fetch_decision.q.out
>  
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/stats_based_fetch_decision.q.out
> 153c153
> <   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> stats: COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> > stats: COMPLETE Column stats: COMPLETE
> 156c156
> < Statistics: Num rows: 1 Data size: 546 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 546 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 160c160
> <   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 163c163
> < Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14956) Parallelize TestHCatLoader

2016-12-21 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-14956:
---
Attachment: HIVE-14956.01.patch

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14956.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15453) Failing test : TestMiniLlapLocalCliDriver.testCliDriver : stats_based_fetch_decision

2016-12-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15453:
-
Attachment: HIVE-15453.5.patch

Removing some tests from minillap. These test use script and transform 
functionality which is not supported in tez and hence cannot run in llap as 
well. Removing them from minillap test suite.

> Failing test : TestMiniLlapLocalCliDriver.testCliDriver : 
> stats_based_fetch_decision
> 
>
> Key: HIVE-15453
> URL: https://issues.apache.org/jira/browse/HIVE-15453
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sushanth Sowmyan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15453.1.patch, HIVE-15453.2.patch, 
> HIVE-15453.3.patch, HIVE-15453.4.patch, HIVE-15453.5.patch
>
>
> This test has been failing in a couple of ptests off late. A recent example 
> is in 
> https://builds.apache.org/job/PreCommit-HIVE-Build/2612/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_stats_based_fetch_decision_/
> {noformat}
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_239_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_240
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_240_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_240
> Running: diff -a 
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/stats_based_fetch_decision.q.out
>  
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/stats_based_fetch_decision.q.out
> 153c153
> <   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> stats: COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> > stats: COMPLETE Column stats: COMPLETE
> 156c156
> < Statistics: Num rows: 1 Data size: 546 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 546 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 160c160
> <   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 163c163
> < Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14956) Parallelize TestHCatLoader

2016-12-21 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-14956:
---
Status: Patch Available  (was: Open)

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14956.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-15416) CAST to string does not work for large decimal numbers

2016-12-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved HIVE-15416.
---
Resolution: Invalid

> CAST to string does not work for large decimal numbers
> --
>
> Key: HIVE-15416
> URL: https://issues.apache.org/jira/browse/HIVE-15416
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Pavel Benes
>
> The cast of large decimal values to string does not work and produces NULL 
> values. 
> Steps to reproduce:
> {code}
> hive> create table test_hive_bug30(decimal_col DECIMAL(30,0));
> OK
> {code}
> {code}
> hive> insert into test_hive_bug30 VALUES (123), 
> (9), 
> (99),(999);
> Query ID = benesp_20161212135717_5d16d7f4-7b84-409e-ad00-36085deaae54
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1480833176011_2469)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  1  100   0  
>  0
> 
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 7.69 s
> 
> Loading data to table default.test_hive_bug30
> Table default.test_hive_bug30 stats: [numFiles=1, numRows=4, totalSize=68, 
> rawDataSize=64]
> OK
> Time taken: 8.239 seconds
> {code}
> {code}
> hive> select CAST(decimal_col AS STRING) from test_hive_bug30;
> OK
> 123
> NULL
> NULL
> NULL
> Time taken: 0.043 seconds, Fetched: 4 row(s)
> {code}
> The numbers with 29 and 30 digits should be exported, but they are converted 
> to NULL instead. 
> The values are stored correctly as can be seen here:
> {code}
> hive> select * from test_hive_bug30;
> OK
> 123
> 9
> 99
> NULL
> Time taken: 0.447 seconds, Fetched: 4 row(s)
> {code}
> The same issue does not exists for smaller numbers (e.g. DECIMAL(10)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15416) CAST to string does not work for large decimal numbers

2016-12-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned HIVE-15416:
-

Assignee: Daniel Dai

> CAST to string does not work for large decimal numbers
> --
>
> Key: HIVE-15416
> URL: https://issues.apache.org/jira/browse/HIVE-15416
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Pavel Benes
>Assignee: Daniel Dai
>
> The cast of large decimal values to string does not work and produces NULL 
> values. 
> Steps to reproduce:
> {code}
> hive> create table test_hive_bug30(decimal_col DECIMAL(30,0));
> OK
> {code}
> {code}
> hive> insert into test_hive_bug30 VALUES (123), 
> (9), 
> (99),(999);
> Query ID = benesp_20161212135717_5d16d7f4-7b84-409e-ad00-36085deaae54
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1480833176011_2469)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  1  100   0  
>  0
> 
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 7.69 s
> 
> Loading data to table default.test_hive_bug30
> Table default.test_hive_bug30 stats: [numFiles=1, numRows=4, totalSize=68, 
> rawDataSize=64]
> OK
> Time taken: 8.239 seconds
> {code}
> {code}
> hive> select CAST(decimal_col AS STRING) from test_hive_bug30;
> OK
> 123
> NULL
> NULL
> NULL
> Time taken: 0.043 seconds, Fetched: 4 row(s)
> {code}
> The numbers with 29 and 30 digits should be exported, but they are converted 
> to NULL instead. 
> The values are stored correctly as can be seen here:
> {code}
> hive> select * from test_hive_bug30;
> OK
> 123
> 9
> 99
> NULL
> Time taken: 0.447 seconds, Fetched: 4 row(s)
> {code}
> The same issue does not exists for smaller numbers (e.g. DECIMAL(10)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15416) CAST to string does not work for large decimal numbers

2016-12-21 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767927#comment-15767927
 ] 

Daniel Dai commented on HIVE-15416:
---

You should use the following statement instead:
select CAST(decimal_col AS VARCHAR(38)) from test_hive_bug30;
The reason is UDFToString inherits UDF rather than GenericUDF. According to 
https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf,
 UDF does not support decimal with precision/scale and always assume maximum 
precision and scale. GenericUDFToVarchar address precision and scale gracefully.

> CAST to string does not work for large decimal numbers
> --
>
> Key: HIVE-15416
> URL: https://issues.apache.org/jira/browse/HIVE-15416
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Pavel Benes
>
> The cast of large decimal values to string does not work and produces NULL 
> values. 
> Steps to reproduce:
> {code}
> hive> create table test_hive_bug30(decimal_col DECIMAL(30,0));
> OK
> {code}
> {code}
> hive> insert into test_hive_bug30 VALUES (123), 
> (9), 
> (99),(999);
> Query ID = benesp_20161212135717_5d16d7f4-7b84-409e-ad00-36085deaae54
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1480833176011_2469)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  1  100   0  
>  0
> 
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 7.69 s
> 
> Loading data to table default.test_hive_bug30
> Table default.test_hive_bug30 stats: [numFiles=1, numRows=4, totalSize=68, 
> rawDataSize=64]
> OK
> Time taken: 8.239 seconds
> {code}
> {code}
> hive> select CAST(decimal_col AS STRING) from test_hive_bug30;
> OK
> 123
> NULL
> NULL
> NULL
> Time taken: 0.043 seconds, Fetched: 4 row(s)
> {code}
> The numbers with 29 and 30 digits should be exported, but they are converted 
> to NULL instead. 
> The values are stored correctly as can be seen here:
> {code}
> hive> select * from test_hive_bug30;
> OK
> 123
> 9
> 99
> NULL
> Time taken: 0.447 seconds, Fetched: 4 row(s)
> {code}
> The same issue does not exists for smaller numbers (e.g. DECIMAL(10)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15419) Separate out storage-api to be released independently

2016-12-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767869#comment-15767869
 ] 

Owen O'Malley commented on HIVE-15419:
--

Once we start this process, the how to release process page will be updated so 
that release managers will update the pom when the release branch is cut.

> Separate out storage-api to be released independently
> -
>
> Key: HIVE-15419
> URL: https://issues.apache.org/jira/browse/HIVE-15419
> Project: Hive
>  Issue Type: Task
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15419.patch
>
>
> Currently, the Hive project releases a single monolithic release, but this 
> makes file formats reading directly into Hive's vector row batches a circular 
> dependence. Storage-api is a small module with the vectorized row batches and 
> SearchArgument that are necessary for efficient vectorized read and write. By 
> releasing storage-api independently, we can make an interface that the file 
> formats can read and write from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15419) Separate out storage-api to be released independently

2016-12-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767865#comment-15767865
 ] 

Owen O'Malley commented on HIVE-15419:
--

Generally, we should release storage-api and ORC when they are ready, so they 
shouldn't be gates on getting the Hive release. Let's take the fast decimal 
patch, for example, we would release a new version of storage-api and ORC that 
incorporates that change.

If a last minute bug fix comes in that is blocking the Hive release, we could 
hold the three votes concurrently with the votes being contingent on each 
other. If the storage-api vote fails, you want the vote for ORC and Hive to 
fail also. 

> Separate out storage-api to be released independently
> -
>
> Key: HIVE-15419
> URL: https://issues.apache.org/jira/browse/HIVE-15419
> Project: Hive
>  Issue Type: Task
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15419.patch
>
>
> Currently, the Hive project releases a single monolithic release, but this 
> makes file formats reading directly into Hive's vector row batches a circular 
> dependence. Storage-api is a small module with the vectorized row batches and 
> SearchArgument that are necessary for efficient vectorized read and write. By 
> releasing storage-api independently, we can make an interface that the file 
> formats can read and write from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15482) LLAP: When pre-emption is disabled task scheduler gets into loop

2016-12-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15482:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master.

> LLAP: When pre-emption is disabled task scheduler gets into loop
> 
>
> Key: HIVE-15482
> URL: https://issues.apache.org/jira/browse/HIVE-15482
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-15482.1.patch, HIVE-15482.2.patch
>
>
> When pre-emption is disabled and when number of slots is 0, the scheduler can 
> get into a loop trying to schedule the tasks without actually waiting for 
> free slots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15448) ChangeManager for replication

2016-12-21 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767852#comment-15767852
 ] 

Thejas M Nair commented on HIVE-15448:
--

[~sushanth] Where would this "Path getCMPath(Path path, Configuration conf)" 
call be made from ? During 'repl load' , only 
getCMPath(Path path, Configuration conf, String signature) should be used.


> ChangeManager for replication
> -
>
> Key: HIVE-15448
> URL: https://issues.apache.org/jira/browse/HIVE-15448
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15448.1.patch
>
>
> The change manager implementation as described in 
> https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development#HiveReplicationv2Development-Changemanagement.
>  This issue tracks the infrastructure code. Hooking to actions will be 
> tracked in other ticket.
> ReplChangeManager includes:
> * method to generate checksum
> * method to convert file path to cm path
> * method to move table/partition/file into cm
> * thread to clear cm files if expires



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15482) LLAP: When pre-emption is disabled task scheduler gets into loop

2016-12-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767841#comment-15767841
 ] 

Prasanth Jayachandran commented on HIVE-15482:
--

The test failures are handled in HIVE-15453. Other failures are in umbrella 
jira HIVE-14547. No related failures.

> LLAP: When pre-emption is disabled task scheduler gets into loop
> 
>
> Key: HIVE-15482
> URL: https://issues.apache.org/jira/browse/HIVE-15482
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-15482.1.patch, HIVE-15482.2.patch
>
>
> When pre-emption is disabled and when number of slots is 0, the scheduler can 
> get into a loop trying to schedule the tasks without actually waiting for 
> free slots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15335) Fast Decimal

2016-12-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767775#comment-15767775
 ] 

Owen O'Malley edited comment on HIVE-15335 at 12/21/16 6:48 PM:


Review comments:
* Thank you for addressing my API compatibility concerns.
* Do you have benchmark numbers for the new and old implementation? It should 
be faster, but it is worth looking at the numbers before making such a huge 
change to Hive.
* The test cases in storage-api have a lot of commented out printlns, those 
should either be removed or converted into LOG.debug.
* The tests in VersionTestBase should provide a message rather than just call 
fail().
* There is a commented out fail. That should be removed or uncommented.
* It would be great to have a fast path for decimals where the precision is 
less than or equal to 18, which fit into a long. Based on my casual observation 
more than 90% of Hive schemas with decimals fit. What would be the right 
approach to fit that in as a later patch?


was (Author: owen.omalley):
Review comments:
* Thank you for addressing my API compatibility concerns.
* Do you have benchmark numbers for the new and old implementation? It should 
be faster, but it is worth looking at the numbers before making such a huge 
change to Hive.
* The test cases in storage-api have a lot of commented out printlns, those 
should either be removed or converted into LOG.debug.
* The tests in VersionTestBase should provide a message rather than just call 
fail().
* There is a commented out fail. That should be removed or uncommented.
* It would be great to have a fast path for decimals where the precision is 
less than or equal to 18, that fit into a long. Based on my casual observation 
more than 90% of Hive schemas with decimals fit. What would be the right 
approach to fit that in as a later patch?

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch, 
> HIVE-15335.099.patch, HIVE-15335.0991.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15453) Failing test : TestMiniLlapLocalCliDriver.testCliDriver : stats_based_fetch_decision

2016-12-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15453:
-
Attachment: HIVE-15453.4.patch

> Failing test : TestMiniLlapLocalCliDriver.testCliDriver : 
> stats_based_fetch_decision
> 
>
> Key: HIVE-15453
> URL: https://issues.apache.org/jira/browse/HIVE-15453
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sushanth Sowmyan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15453.1.patch, HIVE-15453.2.patch, 
> HIVE-15453.3.patch, HIVE-15453.4.patch
>
>
> This test has been failing in a couple of ptests off late. A recent example 
> is in 
> https://builds.apache.org/job/PreCommit-HIVE-Build/2612/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_stats_based_fetch_decision_/
> {noformat}
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_239_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_240
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_240_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_240
> Running: diff -a 
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/stats_based_fetch_decision.q.out
>  
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/stats_based_fetch_decision.q.out
> 153c153
> <   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> stats: COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> > stats: COMPLETE Column stats: COMPLETE
> 156c156
> < Statistics: Num rows: 1 Data size: 546 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 546 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 160c160
> <   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 163c163
> < Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15448) ChangeManager for replication

2016-12-21 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767778#comment-15767778
 ] 

Sushanth Sowmyan commented on HIVE-15448:
-

Hi Daniel, I have 2 suggestions:

a) In addition to "Path getCMPath(Path path, Configuration conf, String 
signature)" , can we also have a "Path getCMPath(Path path, Configuration 
conf)" also, which simply calls getCMPath(path,conf, getSignature(path,conf)) ? 
That makes it easier to call from outside.

b) hive.repl.cm.enabled should not default to true from HiveConf - CM is a 
costly system for users who don't care about replication to enable.

> ChangeManager for replication
> -
>
> Key: HIVE-15448
> URL: https://issues.apache.org/jira/browse/HIVE-15448
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15448.1.patch
>
>
> The change manager implementation as described in 
> https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development#HiveReplicationv2Development-Changemanagement.
>  This issue tracks the infrastructure code. Hooking to actions will be 
> tracked in other ticket.
> ReplChangeManager includes:
> * method to generate checksum
> * method to convert file path to cm path
> * method to move table/partition/file into cm
> * thread to clear cm files if expires



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767775#comment-15767775
 ] 

Owen O'Malley commented on HIVE-15335:
--

Review comments:
* Thank you for addressing my API compatibility concerns.
* Do you have benchmark numbers for the new and old implementation? It should 
be faster, but it is worth looking at the numbers before making such a huge 
change to Hive.
* The test cases in storage-api have a lot of commented out printlns, those 
should either be removed or converted into LOG.debug.
* The tests in VersionTestBase should provide a message rather than just call 
fail().
* There is a commented out fail. That should be removed or uncommented.
* It would be great to have a fast path for decimals where the precision is 
less than or equal to 18, that fit into a long. Based on my casual observation 
more than 90% of Hive schemas with decimals fit. What would be the right 
approach to fit that in as a later patch?

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch, 
> HIVE-15335.099.patch, HIVE-15335.0991.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15447) Log session ID in ATSHook

2016-12-21 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15447:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Log session ID in ATSHook
> -
>
> Key: HIVE-15447
> URL: https://issues.apache.org/jira/browse/HIVE-15447
> Project: Hive
>  Issue Type: Bug
>  Components: Hooks
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 2.2.0
>
> Attachments: HIVE-15447.1.patch
>
>
> Log the SessionID in addition the log trace ID (which can be different).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15445) Subquery failing with ClassCastException

2016-12-21 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767755#comment-15767755
 ] 

Pengcheng Xiong commented on HIVE-15445:


+1

> Subquery failing with ClassCastException
> 
>
> Key: HIVE-15445
> URL: https://issues.apache.org/jira/browse/HIVE-15445
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15445.01.patch, HIVE-15445.patch
>
>
> To reproduce:
> {code:sql}
> CREATE TABLE table_7 (int_col INT);
> SELECT
> (t1.int_col) * (t1.int_col) AS int_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) t1
> WHERE
> (False) NOT IN (SELECT
> False AS boolean_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) tt1
> WHERE
> (t1.int_col) = (tt1.int_col));
> {code}
> The problem seems to be in the method that tries to resolve the subquery 
> column _MIN(NULL)_. It checks the column inspector and ends up returning a 
> constant expression instead of a column expression for _min(null)_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15453) Failing test : TestMiniLlapLocalCliDriver.testCliDriver : stats_based_fetch_decision

2016-12-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767729#comment-15767729
 ] 

Prasanth Jayachandran commented on HIVE-15453:
--

Looks like it is already taken care in HIVE-15376

> Failing test : TestMiniLlapLocalCliDriver.testCliDriver : 
> stats_based_fetch_decision
> 
>
> Key: HIVE-15453
> URL: https://issues.apache.org/jira/browse/HIVE-15453
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sushanth Sowmyan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15453.1.patch, HIVE-15453.2.patch, 
> HIVE-15453.3.patch
>
>
> This test has been failing in a couple of ptests off late. A recent example 
> is in 
> https://builds.apache.org/job/PreCommit-HIVE-Build/2612/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_stats_based_fetch_decision_/
> {noformat}
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_239_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_240
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_240_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_240
> Running: diff -a 
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/stats_based_fetch_decision.q.out
>  
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/stats_based_fetch_decision.q.out
> 153c153
> <   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> stats: COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> > stats: COMPLETE Column stats: COMPLETE
> 156c156
> < Statistics: Num rows: 1 Data size: 546 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 546 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 160c160
> <   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 163c163
> < Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15453) Failing test : TestMiniLlapLocalCliDriver.testCliDriver : stats_based_fetch_decision

2016-12-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767721#comment-15767721
 ] 

Prasanth Jayachandran commented on HIVE-15453:
--

Sure. Will do.

> Failing test : TestMiniLlapLocalCliDriver.testCliDriver : 
> stats_based_fetch_decision
> 
>
> Key: HIVE-15453
> URL: https://issues.apache.org/jira/browse/HIVE-15453
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sushanth Sowmyan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15453.1.patch, HIVE-15453.2.patch, 
> HIVE-15453.3.patch
>
>
> This test has been failing in a couple of ptests off late. A recent example 
> is in 
> https://builds.apache.org/job/PreCommit-HIVE-Build/2612/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_stats_based_fetch_decision_/
> {noformat}
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_239_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_239
> 2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_240
> 2016-12-16 09:42:14 Completed running task attempt: 
> attempt_1481909974530_0001_240_00_00_0
> 2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_240
> Running: diff -a 
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/stats_based_fetch_decision.q.out
>  
> /home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/stats_based_fetch_decision.q.out
> 153c153
> <   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> stats: COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 2000 Data size: 1092000 Basic 
> > stats: COMPLETE Column stats: COMPLETE
> 156c156
> < Statistics: Num rows: 1 Data size: 546 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 546 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 160c160
> <   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> >   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> 163c163
> < Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: PARTIAL
> ---
> > Statistics: Num rows: 1 Data size: 543 Basic stats: 
> > COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-21 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Attachment: HIVE-15376.13.patch

patch 13 updated the typos in dbtxnmgr_showlocks.q.out

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.10.patch, 
> HIVE-15376.11.patch, HIVE-15376.12.patch, HIVE-15376.13.patch, 
> HIVE-15376.2.patch, HIVE-15376.3.patch, HIVE-15376.4.patch, 
> HIVE-15376.5.patch, HIVE-15376.6.patch, HIVE-15376.7.patch, 
> HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15345) Spelling errors in logging and exceptions for query language code

2016-12-21 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767708#comment-15767708
 ] 

Wei Zheng commented on HIVE-15345:
--

dbtxnmgr_showlocks is also a related test failure, which will be fixed in 
HIVE-15376

> Spelling errors in logging and exceptions for query language code
> -
>
> Key: HIVE-15345
> URL: https://issues.apache.org/jira/browse/HIVE-15345
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Grant Sohn
>Assignee: Grant Sohn
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-15345.1.patch
>
>
> Obvious typos and misspellings in the exceptions and messages.
> modifified -> modified
> commnad -> command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15331) Decimal multiplication with high precision/scale often returns NULL

2016-12-21 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15331:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
 Release Note: 
The resulting precision/scale of decimal arithmetic has been changed in the 
case where the precision/scale exceeds the maximum precision of 38.

When reducing the precision/scale down to the Hive limit of 38, the new 
behavior is to reduce the scale to preserve the integer portion of the 
precision, while leaving a minimum of 6 digits for the scale.

The previous behavior was to always shrink the integer portion of the precision 
first before the scale.
   Status: Resolved  (was: Patch Available)

Committed to master

> Decimal multiplication with high precision/scale often returns NULL
> ---
>
> Key: HIVE-15331
> URL: https://issues.apache.org/jira/browse/HIVE-15331
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 2.2.0
>
> Attachments: HIVE-15331.1.patch, HIVE-15331.2.patch, 
> HIVE-15331.3.patch
>
>
> {noformat}
> create temporary table dec (a decimal(38,18));
> insert into dec values(100.0);
> hive> select a*a from dec;
> OK
> NULL
> Time taken: 0.165 seconds, Fetched: 1 row(s)
> {noformat}
> Looks like the reason is because the result of decimal(38,18) * 
> decimal(38,18) only has 2 digits of precision for integers:
> {noformat}
> hive> set hive.explain.user=false;
> hive> explain select a*a from dec;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: dec
>   Select Operator
> expressions: (a * a) (type: decimal(38,36))
> outputColumnNames: _col0
> ListSink
> Time taken: 0.039 seconds, Fetched: 15 row(s)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14688) Hive drop call fails in presence of TDE

2016-12-21 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767670#comment-15767670
 ] 

Wei Zheng commented on HIVE-14688:
--

Don't think so. The last run for patch 4 was: 
https://builds.apache.org/job/PreCommit-HIVE-Build/2656/testReport/
dbtxnmgr_showlocks already has Age 2 for the failure.

Regarding that failure, looks HIVE-15345 fixed a typo in DDLTask.dumpLockInfo 
(Hearbeat -> Heartbeat). But it didn't update the q out file. I can include 
this q out update in HIVE-15376.

> Hive drop call fails in presence of TDE
> ---
>
> Key: HIVE-14688
> URL: https://issues.apache.org/jira/browse/HIVE-14688
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Deepesh Khandelwal
>Assignee: Wei Zheng
> Attachments: HIVE-14688.1.patch, HIVE-14688.2.patch, 
> HIVE-14688.3.patch, HIVE-14688.4.patch
>
>
> This should be committed to when Hive moves to Hadoop 2.8
> In Hadoop 2.8.0 TDE trash collection was fixed through HDFS-8831. This 
> enables us to make drop table calls for Hive managed tables where Hive 
> metastore warehouse directory is in encrypted zone. However even with the 
> feature in HDFS, Hive drop table currently fail:
> {noformat}
> $ hdfs crypto -listZones
> /apps/hive/warehouse  key2 
> $ hdfs dfs -ls /apps/hive/warehouse
> Found 1 items
> drwxrwxrwt   - hdfs hdfs  0 2016-09-01 02:54 
> /apps/hive/warehouse/.Trash
> hive> create table abc(a string, b int);
> OK
> Time taken: 5.538 seconds
> hive> dfs -ls /apps/hive/warehouse;
> Found 2 items
> drwxrwxrwt   - hdfs   hdfs  0 2016-09-01 02:54 
> /apps/hive/warehouse/.Trash
> drwxrwxrwx   - deepesh hdfs  0 2016-09-01 17:15 
> /apps/hive/warehouse/abc
> hive> drop table if exists abc;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
> default.abc because it is in an encryption zone and trash is enabled.  Use 
> PURGE option to skip trash.)
> {noformat}
> The problem lies here:
> {code:title=metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java}
> private void checkTrashPurgeCombination(Path pathToData, String objectName, 
> boolean ifPurge)
> ...
>   if (trashEnabled) {
> try {
>   HadoopShims.HdfsEncryptionShim shim =
> 
> ShimLoader.getHadoopShims().createHdfsEncryptionShim(FileSystem.get(hiveConf),
>  hiveConf);
>   if (shim.isPathEncrypted(pathToData)) {
> throw new MetaException("Unable to drop " + objectName + " 
> because it is in an encryption zone" +
>   " and trash is enabled.  Use PURGE option to skip trash.");
>   }
> } catch (IOException ex) {
>   MetaException e = new MetaException(ex.getMessage());
>   e.initCause(ex);
>   throw e;
> }
>   }
> {code}
> As we can see that we are making an assumption that delete wouldn't be 
> successful in encrypted zone. We need to modify this logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-21 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767657#comment-15767657
 ] 

Wei Zheng commented on HIVE-15376:
--

[~ekoifman] Can you take a look at patch 12? It retains 
acquireLocksWithHeartbeatDelay

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.10.patch, 
> HIVE-15376.11.patch, HIVE-15376.12.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767523#comment-15767523
 ] 

Owen O'Malley commented on HIVE-15335:
--

To create a pull request, push your change to a clone of Hive's repository in a 
branch. When you go to Hive's github repository it will offer you the option to 
create a pull request of the branch you just pushed.

And yes, please port the patch to ORC and create a pull request over there too.

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch, 
> HIVE-15335.099.patch, HIVE-15335.0991.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >