[jira] [Commented] (HIVE-12669) Need a way to analyze tables in the background

2015-12-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058444#comment-15058444
 ] 

Alan Gates commented on HIVE-12669:
---

Thanks Sergey for pointing this out.  HIVE-12075 doesn't have an 
implementation, just a subtask for it (HIVE-12052) which hasn't been 
implemented yet, correct?  Should I mark this a duplicate of HIVE-12052 and 
then move HIVE-12672 to be a subtask of that?

> Need a way to analyze tables in the background
> --
>
> Key: HIVE-12669
> URL: https://issues.apache.org/jira/browse/HIVE-12669
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> Currently analyze must be run by users manually.  It would be useful to have 
> an option for certain or all tables to be automatically analyzed on a regular 
> basis.  The system can do this in the background as a metastore thread 
> (similar to the compactor threads).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12666) PCRExprProcFactory.GenericFuncExprProcessor.process() aggressively removes dynamic partition pruner generated synthetic join predicates.

2015-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058273#comment-15058273
 ] 

Hive QA commented on HIVE-12666:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12777629/HIVE-12666.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 9885 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6357/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6357/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6357/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12777629 - PreCommit-HIVE-TRUNK-Build

> PCRExprProcFactory.GenericFuncExprProcessor.process() aggressively removes 
> dynamic partition pruner generated synthetic join predicates.
> 
>
> Key: HIVE-12666
> URL: https://issues.apache.org/jira/browse/HIVE-12666
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Blocker
> Attachments: HIVE-12666.1.patch
>
>
> Introduced by HIVE-11634. The original idea in HIVE-11634 was to remove the 
> IN partition conditions from the predicate list since the static dynamic 
> partitioning would kick in and push these predicates down to metastore. 
> However, the check is too aggressive and removes events such as below :
> {code}
> -Select Operator
> -  expressions: UDFToDouble(UDFToInteger((hr / 2))) 
> (type: double)
> -  outputColumnNames: _col0
> -  Statistics: Num rows: 1 Data size: 7 Basic stats: 
> COMPLETE Column stats: NONE
> -  Group By Operator
> -keys: _col0 (type: double)
> -mode: hash
> -outputColumnNames: _col0
> -Statistics: Num rows: 1 Data size: 7 Basic stats: 
> COMPLETE Column stats: NONE
> -Dynamic Partitioning Event Operator
> -  Target Input: srcpart
> -  Partition key expr: UDFToDouble(hr)
> -  Statistics: Num rows: 1 Data size: 7 Basic stats: 
> COMPLETE Column stats: NONE
> -  Target column: hr
> -  Target Vertex: Map 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7672) Potential resource leak in EximUtil#createExportDump()

2015-12-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058358#comment-15058358
 ] 

Ted Yu commented on HIVE-7672:
--

lgtm

nit: indentation is off - please use two spaces

> Potential resource leak in EximUtil#createExportDump()
> --
>
> Key: HIVE-7672
> URL: https://issues.apache.org/jira/browse/HIVE-7672
> Project: Hive
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: SUYEON LEE
>Priority: Minor
> Attachments: HIVE-7672.patch
>
>
> Here is related code:
> {code}
>   OutputStream out = fs.create(metadataPath);
>   out.write(jsonContainer.toString().getBytes("UTF-8"));
>   out.close();
> {code}
> If out.write() throws exception, out would be left unclosed.
> out.close() should be enclosed in finally block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11927) Implement/Enable constant related optimization rules in Calcite: enable HiveReduceExpressionsRule to fold constants

2015-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058442#comment-15058442
 ] 

Hive QA commented on HIVE-11927:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12777625/HIVE-11927.12.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 65 failed/errored test(s), 9902 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_const
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_gby_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_lineage2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_outer_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_not_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_udaf_percentile_approx_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantfolding
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppd
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_multilevels
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_basic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_unquote_and
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_unquote_not
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_unquote_or
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_folder_constants
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_sort_1_23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_sort_skew_1_23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_view
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion

[jira] [Updated] (HIVE-12666) PCRExprProcFactory.GenericFuncExprProcessor.process() aggressively removes dynamic partition pruner generated synthetic join predicates.

2015-12-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-12666:
-
Attachment: HIVE-12666.2.patch

Updating the golden files (expected changes).

> PCRExprProcFactory.GenericFuncExprProcessor.process() aggressively removes 
> dynamic partition pruner generated synthetic join predicates.
> 
>
> Key: HIVE-12666
> URL: https://issues.apache.org/jira/browse/HIVE-12666
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Blocker
> Attachments: HIVE-12666.1.patch, HIVE-12666.2.patch
>
>
> Introduced by HIVE-11634. The original idea in HIVE-11634 was to remove the 
> IN partition conditions from the predicate list since the static dynamic 
> partitioning would kick in and push these predicates down to metastore. 
> However, the check is too aggressive and removes events such as below :
> {code}
> -Select Operator
> -  expressions: UDFToDouble(UDFToInteger((hr / 2))) 
> (type: double)
> -  outputColumnNames: _col0
> -  Statistics: Num rows: 1 Data size: 7 Basic stats: 
> COMPLETE Column stats: NONE
> -  Group By Operator
> -keys: _col0 (type: double)
> -mode: hash
> -outputColumnNames: _col0
> -Statistics: Num rows: 1 Data size: 7 Basic stats: 
> COMPLETE Column stats: NONE
> -Dynamic Partitioning Event Operator
> -  Target Input: srcpart
> -  Partition key expr: UDFToDouble(hr)
> -  Statistics: Num rows: 1 Data size: 7 Basic stats: 
> COMPLETE Column stats: NONE
> -  Target column: hr
> -  Target Vertex: Map 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12663) Support quoted table names/columns when ACID is on

2015-12-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12663:
---
Fix Version/s: (was: 2.0.0)
   2.1.0

> Support quoted table names/columns when ACID is on
> --
>
> Key: HIVE-12663
> URL: https://issues.apache.org/jira/browse/HIVE-12663
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.2.1
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-12663.01.patch, HIVE-12663.02.patch, 
> HIVE-12663.03.patch
>
>
> Right now the rewrite part in UpdateDeleteSemanticAnalyzer does not support 
> quoted names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12663) Support quoted table names/columns when ACID is on

2015-12-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058632#comment-15058632
 ] 

Sergey Shelukhin commented on HIVE-12663:
-

2.0.0 currently doesn't require approval.

> Support quoted table names/columns when ACID is on
> --
>
> Key: HIVE-12663
> URL: https://issues.apache.org/jira/browse/HIVE-12663
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.2.1
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-12663.01.patch, HIVE-12663.02.patch, 
> HIVE-12663.03.patch
>
>
> Right now the rewrite part in UpdateDeleteSemanticAnalyzer does not support 
> quoted names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7305) Return value from in.read() is ignored in SerializationUtils#readLongLE()

2015-12-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058348#comment-15058348
 ] 

Ted Yu commented on HIVE-7305:
--

lgtm

nit:
indentation is off
please add space between if and '('

> Return value from in.read() is ignored in SerializationUtils#readLongLE()
> -
>
> Key: HIVE-7305
> URL: https://issues.apache.org/jira/browse/HIVE-7305
> Project: Hive
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: skrho
>Priority: Minor
> Attachments: HIVE-7305_001.patch
>
>
> {code}
>   long readLongLE(InputStream in) throws IOException {
> in.read(readBuffer, 0, 8);
> return (((readBuffer[0] & 0xff) << 0)
> + ((readBuffer[1] & 0xff) << 8)
> {code}
> Return value from read() may indicate fewer than 8 bytes read.
> The return value should be checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7862) close of InputStream in Utils#copyToZipStream() should be placed in finally block

2015-12-15 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-7862:
-
Attachment: HIVE-7862.1.patch

> close of InputStream in Utils#copyToZipStream() should be placed in finally 
> block
> -
>
> Key: HIVE-7862
> URL: https://issues.apache.org/jira/browse/HIVE-7862
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Ted Yu
>Assignee: skrho
>Priority: Minor
>  Labels: patch
> Attachments: HIVE-7862.1.patch, HIVE-7862_001.txt
>
>
> In accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/Utils.java , 
> line 278 :
> {code}
>   private static void copyToZipStream(InputStream is, ZipEntry entry, 
> ZipOutputStream zos)
>   throws IOException {
> zos.putNextEntry(entry);
> byte[] arr = new byte[4096];
> int read = is.read(arr);
> while (read > -1) {
>   zos.write(arr, 0, read);
>   read = is.read(arr);
> }
> is.close();
> {code}
> If read() throws IOException, is would be left unclosed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12366) Refactor Heartbeater logic for transaction

2015-12-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12366:
--
Component/s: (was: Hive)
 Transactions

> Refactor Heartbeater logic for transaction
> --
>
> Key: HIVE-12366
> URL: https://issues.apache.org/jira/browse/HIVE-12366
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12366.1.patch, HIVE-12366.2.patch, 
> HIVE-12366.3.patch, HIVE-12366.4.patch, HIVE-12366.5.patch, 
> HIVE-12366.6.patch, HIVE-12366.7.patch, HIVE-12366.8.patch, HIVE-12366.9.patch
>
>
> Currently there is a gap between the time locks acquisition and the first 
> heartbeat being sent out. Normally the gap is negligible, but when it's big 
> it will cause query fail since the locks are timed out by the time the 
> heartbeat is sent.
> Need to remove this gap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9907) insert into table values() when UTF-8 character is not correct

2015-12-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-9907:
-
Assignee: niklaus xiao

> insert into table values()   when UTF-8 character is not correct
> 
>
> Key: HIVE-9907
> URL: https://issues.apache.org/jira/browse/HIVE-9907
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Clients, JDBC
>Affects Versions: 0.14.0, 0.13.1, 1.0.0
> Environment: centos 6   LANG=zh_CN.UTF-8
> hadoop 2.6
> hive 1.1.0
>Reporter: Fanhong Li
>Assignee: niklaus xiao
>Priority: Critical
> Attachments: HIVE-9907.1.patch
>
>
> insert into table test_acid partition(pt='pt_2')
> values( 2, '中文_2' , 'city_2' )
> ;
> hive> select *
> > from test_acid 
> > ;
> OK
> 2 -�_2city_2  pt_2
> Time taken: 0.237 seconds, Fetched: 1 row(s)
> hive> 
> CREATE TABLE test_acid(id INT, 
> name STRING, 
> city STRING) 
> PARTITIONED BY (pt STRING)
> clustered by (id) into 1 buckets
> stored as ORCFILE
> TBLPROPERTIES('transactional'='true')
> ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12663) Support quoted table names/columns when ACID is on

2015-12-15 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058600#comment-15058600
 ] 

Eugene Koifman commented on HIVE-12663:
---

[~pxiong] did you push to master or 2.0.0?  Master is now 2.1.0 but you set 
FixVersion 2.0.0
BTW, assuming the non-acid fix for this is in 2.0.0 this should be there as 
well if [~sershe] approves

> Support quoted table names/columns when ACID is on
> --
>
> Key: HIVE-12663
> URL: https://issues.apache.org/jira/browse/HIVE-12663
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.2.1
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0
>
> Attachments: HIVE-12663.01.patch, HIVE-12663.02.patch, 
> HIVE-12663.03.patch
>
>
> Right now the rewrite part in UpdateDeleteSemanticAnalyzer does not support 
> quoted names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12366) Refactor Heartbeater logic for transaction

2015-12-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12366:
--
Target Version/s: 1.3.0, 2.1.0  (was: 2.1.0)

> Refactor Heartbeater logic for transaction
> --
>
> Key: HIVE-12366
> URL: https://issues.apache.org/jira/browse/HIVE-12366
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12366.1.patch, HIVE-12366.2.patch, 
> HIVE-12366.3.patch, HIVE-12366.4.patch, HIVE-12366.5.patch, 
> HIVE-12366.6.patch, HIVE-12366.7.patch, HIVE-12366.8.patch, HIVE-12366.9.patch
>
>
> Currently there is a gap between the time locks acquisition and the first 
> heartbeat being sent out. Normally the gap is negligible, but when it's big 
> it will cause query fail since the locks are timed out by the time the 
> heartbeat is sent.
> Need to remove this gap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12663) Support quoted table names/columns when ACID is on

2015-12-15 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058612#comment-15058612
 ] 

Pengcheng Xiong commented on HIVE-12663:


I changed it to 2.1.0. [~sershe] do you want it to be in 2.0.0? Thanks.

> Support quoted table names/columns when ACID is on
> --
>
> Key: HIVE-12663
> URL: https://issues.apache.org/jira/browse/HIVE-12663
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.2.1
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-12663.01.patch, HIVE-12663.02.patch, 
> HIVE-12663.03.patch
>
>
> Right now the rewrite part in UpdateDeleteSemanticAnalyzer does not support 
> quoted names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12658) Task rejection by an llap daemon spams the log with RejectedExecutionExceptions

2015-12-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12658:
-
Attachment: HIVE-12658.2.patch

Addressed review comments.

> Task rejection by an llap daemon spams the log with 
> RejectedExecutionExceptions
> ---
>
> Key: HIVE-12658
> URL: https://issues.apache.org/jira/browse/HIVE-12658
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12658.1.patch, HIVE-12658.2.patch
>
>
> The execution queue throws a RejectedExecutionException - which is logged by 
> the hadoop IPC layer.
> Instead of relying on an Exception in the protocol - move to sending back an 
> explicit response to indicate a rejected fragment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12677) StackOverflowError with kryo

2015-12-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058564#comment-15058564
 ] 

Prasanth Jayachandran commented on HIVE-12677:
--

Most likely due to Vector operator classes not registering with. 

> StackOverflowError with kryo
> 
>
> Key: HIVE-12677
> URL: https://issues.apache.org/jira/browse/HIVE-12677
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>
> Env: Master branch
> {noformat}
> explain formatted insert overwrite table default.test  select entry_date,
> regexp_replace(
> regexp_replace(
> regexp_replace(
> regexp_replace(
> regexp_replace(random_string
> ,"\\b(A\;S|A|Tours)\\b","Destination Services")
> ,"\\b(PPV/3PP)\\b","Third Party Package")
> ,"\\b(Flight)\\b","Air")
> ,"\\b(Rail)\\b","Train")
> ,"\\b(Hotel)\\b","Lodging") as rn from transactions where 
> effective_date between '2015-12-01' AND '2015-12-31' limit 10;
> {"STAGE DEPENDENCIES":{"Stage-1":{"ROOT STAGE":"TRUE"},"Stage-2":{"DEPENDENT 
> STAGES":"Stage-1"},"Stage-0":{"DEPENDENT 
> STAGES":"Stage-2"},"Stage-3":{"DEPENDENT STAGES":"Stage-0"}},"STAGE 
> PLANS":{"Stage-1":{"Tez":{"Edges:":{"Reducer 2":{"parent":"Map 
> 1","type":"SIMPLE_EDGE"}},"DagName:":"rajesh_20151215120344_69fa6465-22ed-4fe2-83b5-20782e45d3f7:2","Vertices:":{"Map
>  1":{"Map Operator 
> Tree:":[{"TableScan":{"alias:":"transactions","filterExpr:":"effective_date 
> BETWEEN '2015-12-01' AND '2015-12-31' (type: boolean)","Statistics:":"Num 
> rows: 197642628 Data size: 59095145772 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Select Operator":{"expressions:":"entry_date (type: 
> date), 
> regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(random_string,
>  '\\b(AS|A|Tours)\\b', 'Destination Services'), '\\b(PPV/3PP)\\b', 
> 'Third Party Package'), '\\b(Flight)\\b', 'Air'), '\\b(Rail)\\b', 'Train'), 
> '\\b(Hotel)\\b', 'Lodging') (type: 
> string)","outputColumnNames:":["_col0","_col1"],"Statistics:":"Num rows: 
> 197642628 Data size: 47434230720 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Limit":{"Number of rows:":"10","Statistics:":"Num 
> rows: 10 Data size: 2400 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Reduce Output Operator":{"sort 
> order:":"","Statistics:":"Num rows: 10 Data size: 2400 Basic stats: COMPLETE 
> Column stats: COMPLETE","TopN Hash Memory Usage:":"0.04","value 
> expressions:":"_col0 (type: date), _col1 (type: string)"]},"Reducer 
> 2":{"Execution mode:":"vectorized","Reduce Operator Tree:":{"Select 
> Operator":{"expressions:":"VALUE._col0 (type: date), VALUE._col1 (type: 
> string)","outputColumnNames:":["_col0","_col1"],"Statistics:":"Num rows: 10 
> Data size: 2400 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Limit":{"Number of rows:":"10","Statistics:":"Num 
> rows: 10 Data size: 2400 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Select Operator":{"expressions:":"UDFToString(_col0) 
> (type: string), _col1 (type: 
> string)","outputColumnNames:":["_col0","_col1"],"Statistics:":"Num rows: 10 
> Data size: 3680 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"File Output 
> Operator":{"compressed:":"false","Statistics:":"Num rows: 10 Data size: 3680 
> Basic stats: COMPLETE Column stats: COMPLETE","table:":{"input 
> format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output 
> format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde","name:":"default.test"},"Stage-2":{"Dependency
>  Collection":{}},"Stage-0":{"Move 
> Operator":{"tables:":{"replace:":"true","table:":{"input 
> format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output 
> format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde","name:":"default.test","Stage-3":{"Stats-Aggr
>  Operator":{
> {noformat}
> {noformat}
> childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> reducer (org.apache.hadoop.hive.ql.plan.ReduceWork)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:450)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:305)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor$1.call(ReduceRecordProcessor.java:106)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:75)
> ... 

[jira] [Commented] (HIVE-12663) Support quoted table names/columns when ACID is on

2015-12-15 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058550#comment-15058550
 ] 

Pengcheng Xiong commented on HIVE-12663:


Test cases failures are not related. Pushed to master. As special character in 
table names is not supported in branch-1, we do not push. Thanks [~ekoifman] 
for the review!

> Support quoted table names/columns when ACID is on
> --
>
> Key: HIVE-12663
> URL: https://issues.apache.org/jira/browse/HIVE-12663
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.2.1
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0
>
> Attachments: HIVE-12663.01.patch, HIVE-12663.02.patch, 
> HIVE-12663.03.patch
>
>
> Right now the rewrite part in UpdateDeleteSemanticAnalyzer does not support 
> quoted names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12667) Proper fix for HIVE-12473

2015-12-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058559#comment-15058559
 ] 

Sergey Shelukhin commented on HIVE-12667:
-

Target column in all cases is reported as string. Is that what it's cast to? 
String cast should be a no-op.


> Proper fix for HIVE-12473
> -
>
> Key: HIVE-12667
> URL: https://issues.apache.org/jira/browse/HIVE-12667
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-12667.1.patch
>
>
> HIVE-12473 has added an incorrect comment and also lacks a test case.
> Benefits of this fix:
>* Does not say: "Probably doesn't work"
>* Does not use grammar like "subquery columns and such"
>* Adds test cases, that let you verify the fix
>* Doesn't rely on certain structure of key expr, just takes the type at 
> compile time
>* Doesn't require an additional walk of each key expression
>* Shows the type used in explain



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12625) Backport to branch-1 HIVE-11981 ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2015-12-15 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12625:

Attachment: HIVE-12625.4-branch1.patch

> Backport to branch-1 HIVE-11981 ORC Schema Evolution Issues (Vectorized, 
> ACID, and Non-Vectorized)
> --
>
> Key: HIVE-12625
> URL: https://issues.apache.org/jira/browse/HIVE-12625
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12625.1-branch1.patch, HIVE-12625.2-branch1.patch, 
> HIVE-12625.3-branch1.patch, HIVE-12625.4-branch1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12676) [hive+impala] Alter table Rename to + Set location in a single step

2015-12-15 Thread Egmont Koblinger (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057709#comment-15057709
 ] 

Egmont Koblinger commented on HIVE-12676:
-

(Please let me know if I should file a separate jira for the same feature 
request in Impala.)

> [hive+impala] Alter table Rename to + Set location in a single step
> ---
>
> Key: HIVE-12676
> URL: https://issues.apache.org/jira/browse/HIVE-12676
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Egmont Koblinger
>Assignee: Dmitry Tolpeko
>Priority: Minor
>
> Assume a nonstandard table location, let's say /foo/bar/table1. You might 
> want to rename from table1 to table2 and move the underlying data accordingly 
> to /foo/bar/table2.
> The "alter table ... rename to ..." clause alters the table name, but in the 
> same step moves the data into the standard location 
> /user/hive/warehouse/table2. Then a subsequent "alter table ... set location 
> ..." can move it back to the desired location /foo/bar/table2.
> This is problematic if there's any permission problem in the game, e.g. not 
> being able to write to /user/hive/warehouse. So it should be possible to move 
> the underlying data to its desired final place without intermittent places in 
> between.
> A probably hard to discover workaround is to set the table to external, then 
> rename it, then set back to internal and then change its location.
> It would be great to be able to do an "alter table ... rename to ... set 
> location ..." operation in a single step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11528) incrementally read query results when there's no ORDER BY

2015-12-15 Thread Keisuke Ogiwara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057766#comment-15057766
 ] 

Keisuke Ogiwara commented on HIVE-11528:


Sorry, I added a comment on the way. Please ignore this comment because I 
rewrite later.

> incrementally read query results when there's no ORDER BY
> -
>
> Key: HIVE-11528
> URL: https://issues.apache.org/jira/browse/HIVE-11528
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Keisuke Ogiwara
>
> May require HIVE-11527. When there's no ORDER BY and there's more than one 
> reducer on the last stage of the query, it should be possible to return data 
> to the user as it is produced, instead of waiting for all reducers to finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11528) incrementally read query results when there's no ORDER BY

2015-12-15 Thread Keisuke Ogiwara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057735#comment-15057735
 ] 

Keisuke Ogiwara commented on HIVE-11528:


Hi Sergey,
I have two questions for how to progress with this ticket.
1. Have each reducers finished to output the results to file and we output the 
results for console order of arrival?
2. I think FileSinkOperator output the results to file, am I right?
3. If #2 is correct, 

> incrementally read query results when there's no ORDER BY
> -
>
> Key: HIVE-11528
> URL: https://issues.apache.org/jira/browse/HIVE-11528
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Keisuke Ogiwara
>
> May require HIVE-11527. When there's no ORDER BY and there's more than one 
> reducer on the last stage of the query, it should be possible to return data 
> to the user as it is produced, instead of waiting for all reducers to finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-15 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Fix Version/s: 1.2.1

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12677) StackOverflowError with kryo

2015-12-15 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057952#comment-15057952
 ] 

Rajesh Balamohan commented on HIVE-12677:
-

Works fine when "set hive.vectorized.execution.enabled=false;"

> StackOverflowError with kryo
> 
>
> Key: HIVE-12677
> URL: https://issues.apache.org/jira/browse/HIVE-12677
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>
> Env: Master branch
> {noformat}
> explain formatted insert overwrite table default.test  select entry_date,
> regexp_replace(
> regexp_replace(
> regexp_replace(
> regexp_replace(
> regexp_replace(random_string
> ,"\\b(A\;S|A|Tours)\\b","Destination Services")
> ,"\\b(PPV/3PP)\\b","Third Party Package")
> ,"\\b(Flight)\\b","Air")
> ,"\\b(Rail)\\b","Train")
> ,"\\b(Hotel)\\b","Lodging") as rn from transactions where 
> effective_date between '2015-12-01' AND '2015-12-31' limit 10;
> {"STAGE DEPENDENCIES":{"Stage-1":{"ROOT STAGE":"TRUE"},"Stage-2":{"DEPENDENT 
> STAGES":"Stage-1"},"Stage-0":{"DEPENDENT 
> STAGES":"Stage-2"},"Stage-3":{"DEPENDENT STAGES":"Stage-0"}},"STAGE 
> PLANS":{"Stage-1":{"Tez":{"Edges:":{"Reducer 2":{"parent":"Map 
> 1","type":"SIMPLE_EDGE"}},"DagName:":"rajesh_20151215120344_69fa6465-22ed-4fe2-83b5-20782e45d3f7:2","Vertices:":{"Map
>  1":{"Map Operator 
> Tree:":[{"TableScan":{"alias:":"transactions","filterExpr:":"effective_date 
> BETWEEN '2015-12-01' AND '2015-12-31' (type: boolean)","Statistics:":"Num 
> rows: 197642628 Data size: 59095145772 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Select Operator":{"expressions:":"entry_date (type: 
> date), 
> regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(random_string,
>  '\\b(AS|A|Tours)\\b', 'Destination Services'), '\\b(PPV/3PP)\\b', 
> 'Third Party Package'), '\\b(Flight)\\b', 'Air'), '\\b(Rail)\\b', 'Train'), 
> '\\b(Hotel)\\b', 'Lodging') (type: 
> string)","outputColumnNames:":["_col0","_col1"],"Statistics:":"Num rows: 
> 197642628 Data size: 47434230720 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Limit":{"Number of rows:":"10","Statistics:":"Num 
> rows: 10 Data size: 2400 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Reduce Output Operator":{"sort 
> order:":"","Statistics:":"Num rows: 10 Data size: 2400 Basic stats: COMPLETE 
> Column stats: COMPLETE","TopN Hash Memory Usage:":"0.04","value 
> expressions:":"_col0 (type: date), _col1 (type: string)"]},"Reducer 
> 2":{"Execution mode:":"vectorized","Reduce Operator Tree:":{"Select 
> Operator":{"expressions:":"VALUE._col0 (type: date), VALUE._col1 (type: 
> string)","outputColumnNames:":["_col0","_col1"],"Statistics:":"Num rows: 10 
> Data size: 2400 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Limit":{"Number of rows:":"10","Statistics:":"Num 
> rows: 10 Data size: 2400 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Select Operator":{"expressions:":"UDFToString(_col0) 
> (type: string), _col1 (type: 
> string)","outputColumnNames:":["_col0","_col1"],"Statistics:":"Num rows: 10 
> Data size: 3680 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"File Output 
> Operator":{"compressed:":"false","Statistics:":"Num rows: 10 Data size: 3680 
> Basic stats: COMPLETE Column stats: COMPLETE","table:":{"input 
> format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output 
> format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde","name:":"default.test"},"Stage-2":{"Dependency
>  Collection":{}},"Stage-0":{"Move 
> Operator":{"tables:":{"replace:":"true","table:":{"input 
> format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output 
> format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde","name:":"default.test","Stage-3":{"Stats-Aggr
>  Operator":{
> {noformat}
> {noformat}
> childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> reducer (org.apache.hadoop.hive.ql.plan.ReduceWork)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:450)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:305)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor$1.call(ReduceRecordProcessor.java:106)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:75)
> ... 16 more
> 

[jira] [Updated] (HIVE-12677) StackOverflowError with kryo

2015-12-15 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-12677:

Component/s: Vectorization

> StackOverflowError with kryo
> 
>
> Key: HIVE-12677
> URL: https://issues.apache.org/jira/browse/HIVE-12677
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>
> Env: Master branch
> {noformat}
> explain formatted insert overwrite table default.test  select entry_date,
> regexp_replace(
> regexp_replace(
> regexp_replace(
> regexp_replace(
> regexp_replace(random_string
> ,"\\b(A\;S|A|Tours)\\b","Destination Services")
> ,"\\b(PPV/3PP)\\b","Third Party Package")
> ,"\\b(Flight)\\b","Air")
> ,"\\b(Rail)\\b","Train")
> ,"\\b(Hotel)\\b","Lodging") as rn from transactions where 
> effective_date between '2015-12-01' AND '2015-12-31' limit 10;
> {"STAGE DEPENDENCIES":{"Stage-1":{"ROOT STAGE":"TRUE"},"Stage-2":{"DEPENDENT 
> STAGES":"Stage-1"},"Stage-0":{"DEPENDENT 
> STAGES":"Stage-2"},"Stage-3":{"DEPENDENT STAGES":"Stage-0"}},"STAGE 
> PLANS":{"Stage-1":{"Tez":{"Edges:":{"Reducer 2":{"parent":"Map 
> 1","type":"SIMPLE_EDGE"}},"DagName:":"rajesh_20151215120344_69fa6465-22ed-4fe2-83b5-20782e45d3f7:2","Vertices:":{"Map
>  1":{"Map Operator 
> Tree:":[{"TableScan":{"alias:":"transactions","filterExpr:":"effective_date 
> BETWEEN '2015-12-01' AND '2015-12-31' (type: boolean)","Statistics:":"Num 
> rows: 197642628 Data size: 59095145772 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Select Operator":{"expressions:":"entry_date (type: 
> date), 
> regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(random_string,
>  '\\b(AS|A|Tours)\\b', 'Destination Services'), '\\b(PPV/3PP)\\b', 
> 'Third Party Package'), '\\b(Flight)\\b', 'Air'), '\\b(Rail)\\b', 'Train'), 
> '\\b(Hotel)\\b', 'Lodging') (type: 
> string)","outputColumnNames:":["_col0","_col1"],"Statistics:":"Num rows: 
> 197642628 Data size: 47434230720 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Limit":{"Number of rows:":"10","Statistics:":"Num 
> rows: 10 Data size: 2400 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Reduce Output Operator":{"sort 
> order:":"","Statistics:":"Num rows: 10 Data size: 2400 Basic stats: COMPLETE 
> Column stats: COMPLETE","TopN Hash Memory Usage:":"0.04","value 
> expressions:":"_col0 (type: date), _col1 (type: string)"]},"Reducer 
> 2":{"Execution mode:":"vectorized","Reduce Operator Tree:":{"Select 
> Operator":{"expressions:":"VALUE._col0 (type: date), VALUE._col1 (type: 
> string)","outputColumnNames:":["_col0","_col1"],"Statistics:":"Num rows: 10 
> Data size: 2400 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Limit":{"Number of rows:":"10","Statistics:":"Num 
> rows: 10 Data size: 2400 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"Select Operator":{"expressions:":"UDFToString(_col0) 
> (type: string), _col1 (type: 
> string)","outputColumnNames:":["_col0","_col1"],"Statistics:":"Num rows: 10 
> Data size: 3680 Basic stats: COMPLETE Column stats: 
> COMPLETE","children":{"File Output 
> Operator":{"compressed:":"false","Statistics:":"Num rows: 10 Data size: 3680 
> Basic stats: COMPLETE Column stats: COMPLETE","table:":{"input 
> format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output 
> format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde","name:":"default.test"},"Stage-2":{"Dependency
>  Collection":{}},"Stage-0":{"Move 
> Operator":{"tables:":{"replace:":"true","table:":{"input 
> format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output 
> format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde","name:":"default.test","Stage-3":{"Stats-Aggr
>  Operator":{
> {noformat}
> {noformat}
> childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> reducer (org.apache.hadoop.hive.ql.plan.ReduceWork)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:450)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:305)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor$1.call(ReduceRecordProcessor.java:106)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:75)
> ... 16 more
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> 

[jira] [Commented] (HIVE-12375) ensure hive.compactor.check.interval cannot be set too low

2015-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057794#comment-15057794
 ] 

Hive QA commented on HIVE-12375:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12777604/HIVE-12375.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 9871 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-tez_smb_empty.q-transform_ppr2.q-vector_outer_join5.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_decimal_round.q-cbo_windowing.q-tez_schema_evolution.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_drop_with_concurrency
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_insert_into1
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_insert_into2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_insert_into3
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_insert_into4
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testNewConnectionConfiguration
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6356/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6356/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6356/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12777604 - PreCommit-HIVE-TRUNK-Build

> ensure hive.compactor.check.interval cannot be set too low
> --
>
> Key: HIVE-12375
> URL: https://issues.apache.org/jira/browse/HIVE-12375
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-12375.2.patch, HIVE-12375.3.patch, HIVE-12375.patch
>
>
> hive.compactor.check.interval can currently be set to as low as 0, which 
> makes Initiator spin needlessly feeling up logs, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12677) StackOverflowError with kryo

2015-12-15 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-12677:

Description: 
Env: Master branch

{noformat}

explain formatted insert overwrite table default.test  select entry_date,
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(random_string
,"\\b(A\;S|A|Tours)\\b","Destination Services")
,"\\b(PPV/3PP)\\b","Third Party Package")
,"\\b(Flight)\\b","Air")
,"\\b(Rail)\\b","Train")
,"\\b(Hotel)\\b","Lodging") as rn from transactions where 
effective_date between '2015-12-01' AND '2015-12-31' limit 10;

{"STAGE DEPENDENCIES":{"Stage-1":{"ROOT STAGE":"TRUE"},"Stage-2":{"DEPENDENT 
STAGES":"Stage-1"},"Stage-0":{"DEPENDENT 
STAGES":"Stage-2"},"Stage-3":{"DEPENDENT STAGES":"Stage-0"}},"STAGE 
PLANS":{"Stage-1":{"Tez":{"Edges:":{"Reducer 2":{"parent":"Map 
1","type":"SIMPLE_EDGE"}},"DagName:":"rajesh_20151215120344_69fa6465-22ed-4fe2-83b5-20782e45d3f7:2","Vertices:":{"Map
 1":{"Map Operator 
Tree:":[{"TableScan":{"alias:":"transactions","filterExpr:":"effective_date 
BETWEEN '2015-12-01' AND '2015-12-31' (type: boolean)","Statistics:":"Num rows: 
197642628 Data size: 59095145772 Basic stats: COMPLETE Column stats: 
COMPLETE","children":{"Select Operator":{"expressions:":"entry_date (type: 
date), 
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(random_string,
 '\\b(AS|A|Tours)\\b', 'Destination Services'), '\\b(PPV/3PP)\\b', 
'Third Party Package'), '\\b(Flight)\\b', 'Air'), '\\b(Rail)\\b', 'Train'), 
'\\b(Hotel)\\b', 'Lodging') (type: 
string)","outputColumnNames:":["_col0","_col1"],"Statistics:":"Num rows: 
197642628 Data size: 47434230720 Basic stats: COMPLETE Column stats: 
COMPLETE","children":{"Limit":{"Number of rows:":"10","Statistics:":"Num rows: 
10 Data size: 2400 Basic stats: COMPLETE Column stats: 
COMPLETE","children":{"Reduce Output Operator":{"sort 
order:":"","Statistics:":"Num rows: 10 Data size: 2400 Basic stats: COMPLETE 
Column stats: COMPLETE","TopN Hash Memory Usage:":"0.04","value 
expressions:":"_col0 (type: date), _col1 (type: string)"]},"Reducer 
2":{"Execution mode:":"vectorized","Reduce Operator Tree:":{"Select 
Operator":{"expressions:":"VALUE._col0 (type: date), VALUE._col1 (type: 
string)","outputColumnNames:":["_col0","_col1"],"Statistics:":"Num rows: 10 
Data size: 2400 Basic stats: COMPLETE Column stats: 
COMPLETE","children":{"Limit":{"Number of rows:":"10","Statistics:":"Num rows: 
10 Data size: 2400 Basic stats: COMPLETE Column stats: 
COMPLETE","children":{"Select Operator":{"expressions:":"UDFToString(_col0) 
(type: string), _col1 (type: 
string)","outputColumnNames:":["_col0","_col1"],"Statistics:":"Num rows: 10 
Data size: 3680 Basic stats: COMPLETE Column stats: COMPLETE","children":{"File 
Output Operator":{"compressed:":"false","Statistics:":"Num rows: 10 Data size: 
3680 Basic stats: COMPLETE Column stats: COMPLETE","table:":{"input 
format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output 
format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde","name:":"default.test"},"Stage-2":{"Dependency
 Collection":{}},"Stage-0":{"Move 
Operator":{"tables:":{"replace:":"true","table:":{"input 
format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output 
format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde","name:":"default.test","Stage-3":{"Stats-Aggr
 Operator":{

{noformat}

{noformat}
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
reducer (org.apache.hadoop.hive.ql.plan.ReduceWork)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:450)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:305)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor$1.call(ReduceRecordProcessor.java:106)
at 
org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:75)
... 16 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator
Serialization trace:
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
reducer 

[jira] [Commented] (HIVE-10790) orc write on viewFS throws exception

2015-12-15 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057972#comment-15057972
 ] 

Xiaowei Wang commented on HIVE-10790:
-

[~wisgood]

> orc write on viewFS throws exception
> 
>
> Key: HIVE-10790
> URL: https://issues.apache.org/jira/browse/HIVE-10790
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.13.0, 0.14.0
> Environment: Hadoop 2.5.0-cdh5.3.2 
> hive 0.14
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>  Labels: patch
> Fix For: 2.0.0
>
> Attachments: HIVE-10790.0.patch.txt
>
>
> from a text table insert into a orc table like as 
> {code:sql}
> insert overwrite table custom.rank_less_orc_none 
> partition(logdate='2015051500') 
> select ur,rf,it,dt from custom.rank_text where logdate='2015051500';
> {code}
> will throws a error ,
> {noformat}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
> getDefaultReplication on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
> ... 8 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058967#comment-15058967
 ] 

Hitesh Shah commented on HIVE-12683:


Additional info such as the query text as well as the explain would be useful. 

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059023#comment-15059023
 ] 

Gopal V commented on HIVE-12683:


Pretty sure all of that math is for 10k RPM disks - SSDs don't exactly follow 
the same rules.

For r3.8xl, my recommendation from measurement was 24 containers x 8 Gb 
containers for optimum performance.

EMR's default configs for Hive might not be the best for Tez, you might want to 
reconfigure hive-site.xml based on the Ambari default install instead.

Something like your Test 2 might be OOM'ing due to lack of S3 file closures - 
instead of increasing the memory Xmx so high, you might want to turn on the 
scalable partitioned insert from HIVE-6455.

Most of what you describe here isn't necessarily a bug.

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12442) Refactor/repackage HiveServer2's Thrift code so that it can be used in the tasks

2015-12-15 Thread Rohit Dholakia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Dholakia updated HIVE-12442:
--
Description: 
For implementing HIVE-12427, the tasks will need to have knowledge of thrift 
types from HS2's thrift API. This jira will look at the least invasive way to 
do that.

https://reviews.apache.org/r/41379


  was:For implementing HIVE-12427, the tasks will need to have knowledge of 
thrift types from HS2's thrift API. This jira will look at the least invasive 
way to do that.


> Refactor/repackage HiveServer2's Thrift code so that it can be used in the 
> tasks
> 
>
> Key: HIVE-12442
> URL: https://issues.apache.org/jira/browse/HIVE-12442
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Rohit Dholakia
>
> For implementing HIVE-12427, the tasks will need to have knowledge of thrift 
> types from HS2's thrift API. This jira will look at the least invasive way to 
> do that.
> https://reviews.apache.org/r/41379



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9907) insert into table values() when UTF-8 character is not correct

2015-12-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058772#comment-15058772
 ] 

Alan Gates commented on HIVE-9907:
--

I'm confused on what is being reported here.  The initial descriptions seems to 
say that this doesn't work with insert values, but there's a test 
(insert_values_nonascii.q) that explicitly tests this case and passes.  Does 
this just not work on some OS or JVM configurations?  A later comment implies 
this doesn't work with update.

In general the patch looks good, but I'd like to know exactly what we're fixing 
first.  The patch should address insert values, but won't help update.

Niklaus, I've assigned the JIRA to you.  The next step is for you to mark it 
patch available so it can be run through the tests.

> insert into table values()   when UTF-8 character is not correct
> 
>
> Key: HIVE-9907
> URL: https://issues.apache.org/jira/browse/HIVE-9907
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Clients, JDBC
>Affects Versions: 0.14.0, 0.13.1, 1.0.0
> Environment: centos 6   LANG=zh_CN.UTF-8
> hadoop 2.6
> hive 1.1.0
>Reporter: Fanhong Li
>Assignee: niklaus xiao
>Priority: Critical
> Attachments: HIVE-9907.1.patch
>
>
> insert into table test_acid partition(pt='pt_2')
> values( 2, '中文_2' , 'city_2' )
> ;
> hive> select *
> > from test_acid 
> > ;
> OK
> 2 -�_2city_2  pt_2
> Time taken: 0.237 seconds, Fetched: 1 row(s)
> hive> 
> CREATE TABLE test_acid(id INT, 
> name STRING, 
> city STRING) 
> PARTITIONED BY (pt STRING)
> clustered by (id) into 1 buckets
> stored as ORCFILE
> TBLPROPERTIES('transactional'='true')
> ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11687) TaskExecutorService can reject work even if capacity is available

2015-12-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058841#comment-15058841
 ] 

Prasanth Jayachandran commented on HIVE-11687:
--

[~sseth] Are you still seeing this issue recently? WaitQueueThread will get 
notified as soon a new element is added to the wait queue. Executor threads get 
filled up as new works arrives in wait queue. My guess is notification is sent 
out late (probably because of kill of evicted task, state update etc.) during 
wait queue gets filled up and starts rejected. I think we should pull the 
notification up in the schedule() method before killing the evicted task inside 
the same synchronization block. Thoughts?

> TaskExecutorService can reject work even if capacity is available
> -
>
> Key: HIVE-11687
> URL: https://issues.apache.org/jira/browse/HIVE-11687
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Affects Versions: llap
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Fix For: llap
>
>
> The waitQueue has a fixed capacity - which is the wait queue size. Addition 
> of new work doe snot factor in the capacity available to execute work. This 
> ends up being left to the race between work getting scheduled for execution 
> and added to the waitQueue.
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058980#comment-15058980
 ] 

Gopal V commented on HIVE-12683:


>From your blog - are you using 59Gb Tez containers?

set hive.tez.container.size=59205;

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread rohit garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058975#comment-15058975
 ] 

rohit garg commented on HIVE-12683:
---

ohh yes the query is there

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12431) Support timeout for compile lock

2015-12-15 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058850#comment-15058850
 ] 

Mohit Sabharwal commented on HIVE-12431:


[~sershe] / [~szehon], could either of you please help commit this patch. 
thanks!

> Support timeout for compile lock
> 
>
> Key: HIVE-12431
> URL: https://issues.apache.org/jira/browse/HIVE-12431
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Query Processor
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Mohit Sabharwal
> Attachments: HIVE-12431.1.patch, HIVE-12431.2.patch, 
> HIVE-12431.3.patch, HIVE-12431.3.patch, HIVE-12431.patch
>
>
> To help with HiveServer2 scalability, it would be useful to allow users to 
> configure a timeout value for queries waiting to be compiled. If the timeout 
> value is reached then the query would abort. One option to achieve this would 
> be to update the compile lock to use a try-lock with the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah moved TEZ-3002 to HIVE-12683:
-

Key: HIVE-12683  (was: TEZ-3002)
Project: Hive  (was: Apache Tez)

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058963#comment-15058963
 ] 

Hitesh Shah commented on HIVE-12683:


\cc [~hagleitn] [~gopalv]

[~rohitgarg1989] Can you attach the yarn application logs for the query that 
was slow. 

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12682) Reducers in dynamic partitioning job spend a lot of time running hadoop.conf.Configuration.getOverlay

2015-12-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058985#comment-15058985
 ] 

Prasanth Jayachandran commented on HIVE-12682:
--

I don't think we need the task id for sorted dynamic partition optimization. 
Since sorted dynamic partition already has the bucket number in the key.. We 
can just pass "00_0" string to the replace function along with bucket 
number. 

> Reducers in dynamic partitioning job spend a lot of time running 
> hadoop.conf.Configuration.getOverlay
> -
>
> Key: HIVE-12682
> URL: https://issues.apache.org/jira/browse/HIVE-12682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Carter Shanklin
>Assignee: Gopal V
> Attachments: reducer.png
>
>
> I tested this on Hive 1.2.1 but looks like it's still applicable to 2.0.
> I ran this query:
> {code}
> create table flights (
> …
> )
> PARTITIONED BY (Year int)
> CLUSTERED BY (Month)
> SORTED BY (DayofMonth) into 12 buckets
> STORED AS ORC
> TBLPROPERTIES("orc.bloom.filter.columns"="*")
> ;
> {code}
> (Taken from here: 
> https://github.com/t3rmin4t0r/all-airlines-data/blob/master/ddl/orc.sql)
> I profiled just the reduce phase and noticed something odd, the attached 
> graph shows where time was spent during the reducer phase.
> !reducer.png!
> Problem seems to relate to 
> https://github.com/apache/hive/blob/branch-2.0/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L903
> /cc [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12172) LLAP: Cache metrics are incorrect

2015-12-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058735#comment-15058735
 ] 

Sergey Shelukhin commented on HIVE-12172:
-

HIVE-12591?

> LLAP: Cache metrics are incorrect
> -
>
> Key: HIVE-12172
> URL: https://issues.apache.org/jira/browse/HIVE-12172
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
>
> CacheCapacityUsed goes negative, CacheCapacityRemaining is a lot higher than 
> the original cache size.
> This was while using the LRFU cache policy.
> Reproduces when running a query which reads far too much data to store in 
> cache.
> {code}
> name: "Hadoop:service=LlapDaemon,name=LlapDaemonCacheMetrics-",
> tag.ProcessName: "LlapDaemon",
> tag.SessionId: "37c590b9-cccd-4b81-8b5f-36b3659c3454",
> CacheCapacityRemaining: 15958381947264,
> CacheCapacityTotal: 10737418240,
> CacheCapacityUsed: -15947644529024,
> CacheReadRequests: 5837,
> CacheRequestedBytes: 28996627387,
> CacheHitBytes: 8266607002,
> CacheAllocatedArena: 79,
> CacheNumLockedBuffers: 315,
> CacheHitRatio: 0.28508857
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12528) don't start HS2 Tez sessions in a single thread

2015-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058783#comment-15058783
 ] 

Hive QA commented on HIVE-12528:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12777631/HIVE-12528.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 9885 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-script_pipe.q-insert_values_non_partitioned.q-subquery_in.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testReturn
org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testSessionPoolGetInOrder
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6359/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6359/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6359/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12777631 - PreCommit-HIVE-TRUNK-Build

> don't start HS2 Tez sessions in a single thread
> ---
>
> Key: HIVE-12528
> URL: https://issues.apache.org/jira/browse/HIVE-12528
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12528.01.patch, HIVE-12528.patch
>
>
> Starting sessions in parallel would improve the startup time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12682) Reducers in dynamic partitioning job spend a lot of time running hadoop.conf.Configuration.getOverlay

2015-12-15 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-12682:
--

Assignee: Gopal V

> Reducers in dynamic partitioning job spend a lot of time running 
> hadoop.conf.Configuration.getOverlay
> -
>
> Key: HIVE-12682
> URL: https://issues.apache.org/jira/browse/HIVE-12682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Carter Shanklin
>Assignee: Gopal V
> Attachments: reducer.png
>
>
> I tested this on Hive 1.2.1 but looks like it's still applicable to 2.0.
> I ran this query:
> {code}
> create table flights (
> …
> )
> PARTITIONED BY (Year int)
> CLUSTERED BY (Month)
> SORTED BY (DayofMonth) into 12 buckets
> STORED AS ORC
> TBLPROPERTIES("orc.bloom.filter.columns"="*")
> ;
> {code}
> (Taken from here: 
> https://github.com/t3rmin4t0r/all-airlines-data/blob/master/ddl/orc.sql)
> I profiled just the reduce phase and noticed something odd, the attached 
> graph shows where time was spent during the reducer phase.
> Problem seems to relate to 
> https://github.com/apache/hive/blob/branch-2.0/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L903
> /cc [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12682) Reducers in dynamic partitioning job spend a lot of time running hadoop.conf.Configuration.getOverlay

2015-12-15 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12682:
---
Description: 
I tested this on Hive 1.2.1 but looks like it's still applicable to 2.0.

I ran this query:
{code}
create table flights (
…
)
PARTITIONED BY (Year int)
CLUSTERED BY (Month)
SORTED BY (DayofMonth) into 12 buckets
STORED AS ORC
TBLPROPERTIES("orc.bloom.filter.columns"="*")
;
{code}

(Taken from here: 
https://github.com/t3rmin4t0r/all-airlines-data/blob/master/ddl/orc.sql)

I profiled just the reduce phase and noticed something odd, the attached graph 
shows where time was spent during the reducer phase.

!reducer.png!

Problem seems to relate to 
https://github.com/apache/hive/blob/branch-2.0/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L903

/cc [~gopalv]

  was:
I tested this on Hive 1.2.1 but looks like it's still applicable to 2.0.

I ran this query:
{code}
create table flights (
…
)
PARTITIONED BY (Year int)
CLUSTERED BY (Month)
SORTED BY (DayofMonth) into 12 buckets
STORED AS ORC
TBLPROPERTIES("orc.bloom.filter.columns"="*")
;
{code}

!reducer.png!

(Taken from here: 
https://github.com/t3rmin4t0r/all-airlines-data/blob/master/ddl/orc.sql)

I profiled just the reduce phase and noticed something odd, the attached graph 
shows where time was spent during the reducer phase.

Problem seems to relate to 
https://github.com/apache/hive/blob/branch-2.0/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L903

/cc [~gopalv]


> Reducers in dynamic partitioning job spend a lot of time running 
> hadoop.conf.Configuration.getOverlay
> -
>
> Key: HIVE-12682
> URL: https://issues.apache.org/jira/browse/HIVE-12682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Carter Shanklin
>Assignee: Gopal V
> Attachments: reducer.png
>
>
> I tested this on Hive 1.2.1 but looks like it's still applicable to 2.0.
> I ran this query:
> {code}
> create table flights (
> …
> )
> PARTITIONED BY (Year int)
> CLUSTERED BY (Month)
> SORTED BY (DayofMonth) into 12 buckets
> STORED AS ORC
> TBLPROPERTIES("orc.bloom.filter.columns"="*")
> ;
> {code}
> (Taken from here: 
> https://github.com/t3rmin4t0r/all-airlines-data/blob/master/ddl/orc.sql)
> I profiled just the reduce phase and noticed something odd, the attached 
> graph shows where time was spent during the reducer phase.
> !reducer.png!
> Problem seems to relate to 
> https://github.com/apache/hive/blob/branch-2.0/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L903
> /cc [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread rohit garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058972#comment-15058972
 ] 

rohit garg commented on HIVE-12683:
---

Thanks for the reply Hitesh. I didn't save the logs but I will re-launch EMR 
cluster and re-run the application and provide you with logs ASAP.

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058968#comment-15058968
 ] 

Hitesh Shah commented on HIVE-12683:


Nevermind - just noticed that the query text is on the blog. 

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5025) Column aliases for input argument of GenericUDFs

2015-12-15 Thread Nick Karpov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059003#comment-15059003
 ] 

Nick Karpov commented on HIVE-5025:
---

Was there any discussion about getting this into an official release?

> Column aliases for input argument of GenericUDFs 
> -
>
> Key: HIVE-5025
> URL: https://issues.apache.org/jira/browse/HIVE-5025
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: D12093.2.patch, D12093.3.patch, HIVE-5025.4.patch.txt, 
> HIVE-5025.D12093.1.patch
>
>
> In some cases, column aliases for input argument are very useful to know. But 
> I cannot sure of this in the sense that UDFs should not be dependent to 
> contextual information like column alias.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12675) PerfLogger should log performance metrics at debug level

2015-12-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058825#comment-15058825
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-12675:
--

[~sershe] It would be possible to do the following from the user side if he/she 
needs the performance metrics :
{code}
log4j.logger.org.apache.hadoop.hive.ql.log.PerfLogger=DEBUG
{code}

This way you can make sure that we get DEBUG level logs only for PerfLogger and 
INFO level logs for others (assuming that hive.root.logger=INFO). I will update 
the documentation in 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties and 
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-HiveLogging
 so that the user is aware of this. 

Thanks
Hari

> PerfLogger should log performance metrics at debug level
> 
>
> Key: HIVE-12675
> URL: https://issues.apache.org/jira/browse/HIVE-12675
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12675.1.patch
>
>
> As more and more subcomponents of Hive (Tez, Optimizer) etc are using 
> PerfLogger to track the performance metrics, it will be more meaningful to 
> set the PerfLogger logging level to DEBUG. Otherwise, we will print the 
> performance metrics unnecessarily for each and every query if the underlying 
> subcomponent does not control the PerfLogging via a parameter on its own.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12667) Proper fix for HIVE-12473

2015-12-15 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058824#comment-15058824
 ] 

Gunther Hagleitner commented on HIVE-12667:
---

In dynamic_partition_pruning_2.q it's actually an int. All other tests have 
defined the part col as string. String to string doesn't need a special code 
path, is that what you're suggesting?

> Proper fix for HIVE-12473
> -
>
> Key: HIVE-12667
> URL: https://issues.apache.org/jira/browse/HIVE-12667
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-12667.1.patch
>
>
> HIVE-12473 has added an incorrect comment and also lacks a test case.
> Benefits of this fix:
>* Does not say: "Probably doesn't work"
>* Does not use grammar like "subquery columns and such"
>* Adds test cases, that let you verify the fix
>* Doesn't rely on certain structure of key expr, just takes the type at 
> compile time
>* Doesn't require an additional walk of each key expression
>* Shows the type used in explain



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11107) Support for Performance regression test suite with TPCDS

2015-12-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11107:
-
Attachment: HIVE-11107.8.patch

rebased the original patch with master before commit

> Support for Performance regression test suite with TPCDS
> 
>
> Key: HIVE-11107
> URL: https://issues.apache.org/jira/browse/HIVE-11107
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11107.1.patch, HIVE-11107.2.patch, 
> HIVE-11107.3.patch, HIVE-11107.4.patch, HIVE-11107.5.patch, 
> HIVE-11107.6.patch, HIVE-11107.7.patch, HIVE-11107.8.patch
>
>
> Support to add TPCDS queries to the performance regression test suite with 
> Hive CBO turned on.
> This benchmark is intended to make sure that subsequent changes to the 
> optimizer or any hive code do not yield any unexpected plan changes. i.e.  
> the intention is to not run the entire TPCDS query set, but just "explain 
> plan" for the TPCDS queries.
> As part of this jira, we will manually verify that expected hive 
> optimizations kick in for the queries (for given stats/dataset). If there is 
> a difference in plan within this test suite due to a future commit, it needs 
> to be analyzed and we need to make sure that it is not a regression.
> The test suite can be run in master branch from itests by 
> {code}
> mvn test -Dtest=TestPerfCliDriver 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12172) LLAP: Cache metrics are incorrect

2015-12-15 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058723#comment-15058723
 ] 

Siddharth Seth commented on HIVE-12172:
---

[~prasanth_j] - is this fixed now via a different jira ? Vaguely remember a 
similar jira being closed recently.

> LLAP: Cache metrics are incorrect
> -
>
> Key: HIVE-12172
> URL: https://issues.apache.org/jira/browse/HIVE-12172
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
>
> CacheCapacityUsed goes negative, CacheCapacityRemaining is a lot higher than 
> the original cache size.
> This was while using the LRFU cache policy.
> Reproduces when running a query which reads far too much data to store in 
> cache.
> {code}
> name: "Hadoop:service=LlapDaemon,name=LlapDaemonCacheMetrics-",
> tag.ProcessName: "LlapDaemon",
> tag.SessionId: "37c590b9-cccd-4b81-8b5f-36b3659c3454",
> CacheCapacityRemaining: 15958381947264,
> CacheCapacityTotal: 10737418240,
> CacheCapacityUsed: -15947644529024,
> CacheReadRequests: 5837,
> CacheRequestedBytes: 28996627387,
> CacheHitBytes: 8266607002,
> CacheAllocatedArena: 79,
> CacheNumLockedBuffers: 315,
> CacheHitRatio: 0.28508857
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11107) Support for Performance regression test suite with TPCDS

2015-12-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058933#comment-15058933
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11107:
--

HIVE-12681 is the follow-up jira to address the above comments from Ashutosh.

> Support for Performance regression test suite with TPCDS
> 
>
> Key: HIVE-11107
> URL: https://issues.apache.org/jira/browse/HIVE-11107
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11107.1.patch, HIVE-11107.2.patch, 
> HIVE-11107.3.patch, HIVE-11107.4.patch, HIVE-11107.5.patch, 
> HIVE-11107.6.patch, HIVE-11107.7.patch, HIVE-11107.8.patch
>
>
> Support to add TPCDS queries to the performance regression test suite with 
> Hive CBO turned on.
> This benchmark is intended to make sure that subsequent changes to the 
> optimizer or any hive code do not yield any unexpected plan changes. i.e.  
> the intention is to not run the entire TPCDS query set, but just "explain 
> plan" for the TPCDS queries.
> As part of this jira, we will manually verify that expected hive 
> optimizations kick in for the queries (for given stats/dataset). If there is 
> a difference in plan within this test suite due to a future commit, it needs 
> to be analyzed and we need to make sure that it is not a regression.
> The test suite can be run in master branch from itests by 
> {code}
> mvn test -Dtest=TestPerfCliDriver 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread rohit garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058999#comment-15058999
 ] 

rohit garg commented on HIVE-12683:
---

I tried lot of different settings. This was one of them and which worked. It 
came from the below formulas. Not sure if thats the right way to estimate it.

We used the following formulae to guide us in determining YARN and MapReduce 
memory configurations:

Number of containers =  min (2 * cores, 1.8 * disks, (Total available RAM) / 
min_container_size)
Reserved Memory = Memory for stack memory
Total available RAM = Total RAM of the cluster – Reserved Memory
Disks = Number of data disks per machine
min_container_size = Minimum container size (in RAM). Its value is dependent on 
RAM available
RAM-per-container = max(min_container_size, (Total Available RAM) / containers)

For example, for our cluster, we had 32 CPU cores, 244 GB RAM, and 2 disks per 
node.

Reserved Memory = 38 GB
Container Size = 2 GB
Available RAM = (244-38) GB = 206 GB
Number of containers = min (2*32, 1.8* 2, 206/2) = min (64,3.6, 103) = ~4
RAM-per-container = max (2, 206/4) = max (2, 51.5) = ~52 GB



> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12534) Date functions with vectorization is returning wrong results

2015-12-15 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059090#comment-15059090
 ] 

Matt McCline commented on HIVE-12534:
-

There are a bunch of tests that include vectorization of the year and month 
functions in vectorized_date_funcs.q and that test passes with 
hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled=false.

And with hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled=true.

(Tests run on master)

I need a repro.

> Date functions with vectorization is returning wrong results
> 
>
> Key: HIVE-12534
> URL: https://issues.apache.org/jira/browse/HIVE-12534
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
>Priority: Critical
> Attachments: p26_explain.txt, plan.txt
>
>
> {noformat}
> select c.effective_date, year(c.effective_date), month(c.effective_date) from 
> customers c where c.customer_id = 146028;
> hive> set hive.vectorized.execution.enabled=true;
> hive> select c.effective_date, year(c.effective_date), 
> month(c.effective_date) from customers c where c.customer_id = 146028;
> 2015-11-19  0   0
> hive> set hive.vectorized.execution.enabled=false;
> hive> select c.effective_date, year(c.effective_date), 
> month(c.effective_date) from customers c where c.customer_id = 146028;
> 2015-11-19  201511
> {noformat}
> \cc [~gopalv], [~sseth], [~sershe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12658) Task rejection by an llap daemon spams the log with RejectedExecutionExceptions

2015-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059143#comment-15059143
 ] 

Hive QA commented on HIVE-12658:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12777816/HIVE-12658.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 9886 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.llap.daemon.impl.TestLlapDaemonProtocolServerImpl.test
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6360/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6360/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6360/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12777816 - PreCommit-HIVE-TRUNK-Build

> Task rejection by an llap daemon spams the log with 
> RejectedExecutionExceptions
> ---
>
> Key: HIVE-12658
> URL: https://issues.apache.org/jira/browse/HIVE-12658
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12658.1.patch, HIVE-12658.2.patch
>
>
> The execution queue throws a RejectedExecutionException - which is logged by 
> the hadoop IPC layer.
> Instead of relying on an Exception in the protocol - move to sending back an 
> explicit response to indicate a rejected fragment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12577) NPE in LlapTaskCommunicator when unregistering containers

2015-12-15 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-12577:
--
Attachment: HIVE-12577.2.txt

> NPE in LlapTaskCommunicator when unregistering containers
> -
>
> Key: HIVE-12577
> URL: https://issues.apache.org/jira/browse/HIVE-12577
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: HIVE-12577.1.review.txt, HIVE-12577.1.txt, 
> HIVE-12577.1.wip.txt, HIVE-12577.2.review.txt, HIVE-12577.2.txt
>
>
> {code}
> 2015-12-02 13:29:00,160 [ERROR] [Dispatcher thread {Central}] 
> |common.AsyncDispatcher|: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$EntityTracker.unregisterContainer(LlapTaskCommunicator.java:586)
> at 
> org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.registerContainerEnd(LlapTaskCommunicator.java:188)
> at 
> org.apache.tez.dag.app.TaskCommunicatorManager.unregisterRunningContainer(TaskCommunicatorManager.java:389)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.unregisterFromTAListener(AMContainerImpl.java:1121)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtLaunchingTransition.transition(AMContainerImpl.java:699)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtIdleTransition.transition(AMContainerImpl.java:805)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtRunningTransition.transition(AMContainerImpl.java:892)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtRunningTransition.transition(AMContainerImpl.java:887)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:415)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:72)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:60)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:36)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> 2015-12-02 13:29:00,167 [ERROR] [Dispatcher thread {Central}] 
> |common.AsyncDispatcher|: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.tez.dag.app.TaskCommunicatorManager.unregisterRunningContainer(TaskCommunicatorManager.java:386)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.unregisterFromTAListener(AMContainerImpl.java:1121)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtLaunchingTransition.transition(AMContainerImpl.java:699)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtIdleTransition.transition(AMContainerImpl.java:805)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtRunningTransition.transition(AMContainerImpl.java:892)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtRunningTransition.transition(AMContainerImpl.java:887)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:415)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:72)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:60)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:36)
> at 
> 

[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059037#comment-15059037
 ] 

Hitesh Shah commented on HIVE-12683:


For SSDs, you should be able to run a few more containers per node. Maybe try 
with say 8 containers sized to 20 GB each ( Xmx 16G ) as a start. 

Also, you may want to try the large group-by query with "hive.map.aggr" set to 
false to help with the OOMs. 


 

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12516) HS2 log4j printing query about to be compiled

2015-12-15 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-12516:

Assignee: (was: Vaibhav Gumashta)

> HS2 log4j printing query about to be compiled 
> --
>
> Key: HIVE-12516
> URL: https://issues.apache.org/jira/browse/HIVE-12516
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Richard Walshe
>
> Requesting HiveServer2 would use log4j to print out the query about to be 
> compiled to help identify badly written queries which take a lot of time to 
> compile and cause good queries to be queued.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12516) HS2 log4j printing query about to be compiled

2015-12-15 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059095#comment-15059095
 ] 

Vaibhav Gumashta commented on HIVE-12516:
-

[~ctang.ma] Not actively working on it right now. Let me un-assign so that if 
someone else wants to take up, they can do it.

> HS2 log4j printing query about to be compiled 
> --
>
> Key: HIVE-12516
> URL: https://issues.apache.org/jira/browse/HIVE-12516
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Richard Walshe
>Assignee: Vaibhav Gumashta
>
> Requesting HiveServer2 would use log4j to print out the query about to be 
> compiled to help identify badly written queries which take a lot of time to 
> compile and cause good queries to be queued.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12577) NPE in LlapTaskCommunicator when unregistering containers

2015-12-15 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-12577:
--
Attachment: HIVE-12577.2.review.txt

Patch for review.

Addresses the comments, except getContext() being used twice - that's the way 
it is used throughout the file and is cheap.

Also adds some additional tracking for future debugging of some timeouts which 
have been seen at times - which are related to this tracking and nodes sending 
in heartbeats.

> NPE in LlapTaskCommunicator when unregistering containers
> -
>
> Key: HIVE-12577
> URL: https://issues.apache.org/jira/browse/HIVE-12577
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: HIVE-12577.1.review.txt, HIVE-12577.1.txt, 
> HIVE-12577.1.wip.txt, HIVE-12577.2.review.txt
>
>
> {code}
> 2015-12-02 13:29:00,160 [ERROR] [Dispatcher thread {Central}] 
> |common.AsyncDispatcher|: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$EntityTracker.unregisterContainer(LlapTaskCommunicator.java:586)
> at 
> org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.registerContainerEnd(LlapTaskCommunicator.java:188)
> at 
> org.apache.tez.dag.app.TaskCommunicatorManager.unregisterRunningContainer(TaskCommunicatorManager.java:389)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.unregisterFromTAListener(AMContainerImpl.java:1121)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtLaunchingTransition.transition(AMContainerImpl.java:699)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtIdleTransition.transition(AMContainerImpl.java:805)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtRunningTransition.transition(AMContainerImpl.java:892)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtRunningTransition.transition(AMContainerImpl.java:887)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:415)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:72)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:60)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:36)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> 2015-12-02 13:29:00,167 [ERROR] [Dispatcher thread {Central}] 
> |common.AsyncDispatcher|: Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.tez.dag.app.TaskCommunicatorManager.unregisterRunningContainer(TaskCommunicatorManager.java:386)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.unregisterFromTAListener(AMContainerImpl.java:1121)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtLaunchingTransition.transition(AMContainerImpl.java:699)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtIdleTransition.transition(AMContainerImpl.java:805)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtRunningTransition.transition(AMContainerImpl.java:892)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$StopRequestAtRunningTransition.transition(AMContainerImpl.java:887)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:415)
> at 
> 

[jira] [Commented] (HIVE-12516) HS2 log4j printing query about to be compiled

2015-12-15 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059085#comment-15059085
 ] 

Chaoyu Tang commented on HIVE-12516:


[~vgumashta] Would you like to provide a patch to this? Thanks

> HS2 log4j printing query about to be compiled 
> --
>
> Key: HIVE-12516
> URL: https://issues.apache.org/jira/browse/HIVE-12516
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Richard Walshe
>Assignee: Vaibhav Gumashta
>
> Requesting HiveServer2 would use log4j to print out the query about to be 
> compiled to help identify badly written queries which take a lot of time to 
> compile and cause good queries to be queued.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12684) NPE in stats annotation when all values in decimal column are NULLs

2015-12-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12684:
-
Attachment: HIVE-12684.1.patch

> NPE in stats annotation when all values in decimal column are NULLs
> ---
>
> Key: HIVE-12684
> URL: https://issues.apache.org/jira/browse/HIVE-12684
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12684.1.patch
>
>
> When all column values are null for a decimal column and when column stats 
> exists. AnnotateWithStatistics optimization can throw NPE. Following is the 
> exception trace
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatistics(StatsUtils.java:712)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:764)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:750)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:197)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:143)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:131)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:114)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
> at 
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
> at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:228)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10156)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:225)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12684) NPE in stats annotation when all values in decimal column are NULLs

2015-12-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059161#comment-15059161
 ] 

Prasanth Jayachandran commented on HIVE-12684:
--

[~ashutoshc]/[~pxiong] Can someone please review this patch?

> NPE in stats annotation when all values in decimal column are NULLs
> ---
>
> Key: HIVE-12684
> URL: https://issues.apache.org/jira/browse/HIVE-12684
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12684.1.patch
>
>
> When all column values are null for a decimal column and when column stats 
> exists. AnnotateWithStatistics optimization can throw NPE. Following is the 
> exception trace
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatistics(StatsUtils.java:712)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:764)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:750)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:197)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:143)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:131)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:114)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
> at 
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
> at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:228)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10156)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:225)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12685) Remove invalid property in common/src/test/resources/hive-site.xml

2015-12-15 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12685:
-
Attachment: HIVE-12685.1.patch

[~daijy] Can you take a look?

> Remove invalid property in common/src/test/resources/hive-site.xml
> --
>
> Key: HIVE-12685
> URL: https://issues.apache.org/jira/browse/HIVE-12685
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12685.1.patch
>
>
> Currently there's such a property as below, which is obviously wrong
> {code}
> 
>   javax.jdo.option.ConnectionDriverName
>   hive-site.xml
>   Override ConfVar defined in HiveConf
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12534) Date functions with vectorization is returning wrong results

2015-12-15 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059092#comment-15059092
 ] 

Matt McCline commented on HIVE-12534:
-

Assigning to [~rajesh.balamohan]

> Date functions with vectorization is returning wrong results
> 
>
> Key: HIVE-12534
> URL: https://issues.apache.org/jira/browse/HIVE-12534
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
>Priority: Critical
> Attachments: p26_explain.txt, plan.txt
>
>
> {noformat}
> select c.effective_date, year(c.effective_date), month(c.effective_date) from 
> customers c where c.customer_id = 146028;
> hive> set hive.vectorized.execution.enabled=true;
> hive> select c.effective_date, year(c.effective_date), 
> month(c.effective_date) from customers c where c.customer_id = 146028;
> 2015-11-19  0   0
> hive> set hive.vectorized.execution.enabled=false;
> hive> select c.effective_date, year(c.effective_date), 
> month(c.effective_date) from customers c where c.customer_id = 146028;
> 2015-11-19  201511
> {noformat}
> \cc [~gopalv], [~sseth], [~sershe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12534) Date functions with vectorization is returning wrong results

2015-12-15 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12534:

Assignee: Rajesh Balamohan  (was: Matt McCline)

> Date functions with vectorization is returning wrong results
> 
>
> Key: HIVE-12534
> URL: https://issues.apache.org/jira/browse/HIVE-12534
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Critical
> Attachments: p26_explain.txt, plan.txt
>
>
> {noformat}
> select c.effective_date, year(c.effective_date), month(c.effective_date) from 
> customers c where c.customer_id = 146028;
> hive> set hive.vectorized.execution.enabled=true;
> hive> select c.effective_date, year(c.effective_date), 
> month(c.effective_date) from customers c where c.customer_id = 146028;
> 2015-11-19  0   0
> hive> set hive.vectorized.execution.enabled=false;
> hive> select c.effective_date, year(c.effective_date), 
> month(c.effective_date) from customers c where c.customer_id = 146028;
> 2015-11-19  201511
> {noformat}
> \cc [~gopalv], [~sseth], [~sershe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12685) Remove invalid property in common/src/test/resources/hive-site.xml

2015-12-15 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059226#comment-15059226
 ] 

Wei Zheng commented on HIVE-12685:
--

This issue is exposed by HIVE-12628, which included hive-common as ql 
dependency.

> Remove invalid property in common/src/test/resources/hive-site.xml
> --
>
> Key: HIVE-12685
> URL: https://issues.apache.org/jira/browse/HIVE-12685
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>
> Currently there's such a property as below, which is obviously wrong
> {code}
> 
>   javax.jdo.option.ConnectionDriverName
>   hive-site.xml
>   Override ConfVar defined in HiveConf
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12075) add analyze command to explictly cache file metadata in HBase metastore

2015-12-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12075:

Attachment: (was: HIVE-12075.04.patch)

> add analyze command to explictly cache file metadata in HBase metastore
> ---
>
> Key: HIVE-12075
> URL: https://issues.apache.org/jira/browse/HIVE-12075
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12075.01.nogen.patch, HIVE-12075.01.patch, 
> HIVE-12075.02.patch, HIVE-12075.03.patch, HIVE-12075.04.patch, 
> HIVE-12075.nogen.patch, HIVE-12075.patch
>
>
> ANALYZE TABLE (spec as usual) CACHE METADATA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12669) Need a way to analyze tables in the background

2015-12-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059464#comment-15059464
 ] 

Sergey Shelukhin commented on HIVE-12669:
-

Sure, that makes sense. I guess both analyzers can use that. 12075 adds the 
threadpool IIRC, there's no code to populate in background.

> Need a way to analyze tables in the background
> --
>
> Key: HIVE-12669
> URL: https://issues.apache.org/jira/browse/HIVE-12669
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> Currently analyze must be run by users manually.  It would be useful to have 
> an option for certain or all tables to be automatically analyzed on a regular 
> basis.  The system can do this in the background as a metastore thread 
> (similar to the compactor threads).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12688) HIVE-11826 makes hive unusable in properly secured cluster

2015-12-15 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-12688:
-
Priority: Blocker  (was: Major)

> HIVE-11826 makes hive unusable in properly secured cluster
> --
>
> Key: HIVE-12688
> URL: https://issues.apache.org/jira/browse/HIVE-12688
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
>
> HIVE-11826 makes a change to restrict connections to metastore to users who 
> belong to groups under 'hadoop.proxyuser.hive.groups'.
> That property was only a meant to be a hadoop property, which controls what 
> users the hive user can impersonate. What this change is doing is to enable 
> use of that to also restrict who can connect to metastore server. This is new 
> functionality, not a bug fix. There is value to this functionality.
> However, this change makes hive unusable in a properly secured cluster. If 
> 'hadoop.proxyuser.hive.hosts' is set to the proper set of hosts that run 
> Metastore and Hiveserver2 (instead of a very open "*"), then users will be 
> able to connect to metastore only from those hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11775) Implement limit push down through union all in CBO

2015-12-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11775:
---
Attachment: (was: HIVE-11775.09.patch)

> Implement limit push down through union all in CBO
> --
>
> Key: HIVE-11775
> URL: https://issues.apache.org/jira/browse/HIVE-11775
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11775.01.patch, HIVE-11775.02.patch, 
> HIVE-11775.03.patch, HIVE-11775.04.patch, HIVE-11775.05.patch, 
> HIVE-11775.06.patch, HIVE-11775.07.patch, HIVE-11775.08.patch, 
> HIVE-11775.09.patch
>
>
> Enlightened by HIVE-11684 (Kudos to [~jcamachorodriguez]), we can actually 
> push limit down through union all, which reduces the intermediate number of 
> rows in union branches. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11775) Implement limit push down through union all in CBO

2015-12-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11775:
---
Attachment: HIVE-11775.09.patch

> Implement limit push down through union all in CBO
> --
>
> Key: HIVE-11775
> URL: https://issues.apache.org/jira/browse/HIVE-11775
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11775.01.patch, HIVE-11775.02.patch, 
> HIVE-11775.03.patch, HIVE-11775.04.patch, HIVE-11775.05.patch, 
> HIVE-11775.06.patch, HIVE-11775.07.patch, HIVE-11775.08.patch, 
> HIVE-11775.09.patch
>
>
> Enlightened by HIVE-11684 (Kudos to [~jcamachorodriguez]), we can actually 
> push limit down through union all, which reduces the intermediate number of 
> rows in union branches. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11687) TaskExecutorService can reject work even if capacity is available

2015-12-15 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059274#comment-15059274
 ] 

Siddharth Seth commented on HIVE-11687:
---

[~prasanth_j] - I don't think I ever saw this. However, this scenario is 
possible - typically in case of high contention on available resources. 
Notifying the executor does nit mean it will start running (and acquire the 
lock) immediately. Instead another submission could come in - take the lock - 
find the wait queue to be full and be rejected.

> TaskExecutorService can reject work even if capacity is available
> -
>
> Key: HIVE-11687
> URL: https://issues.apache.org/jira/browse/HIVE-11687
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Affects Versions: llap
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Fix For: llap
>
>
> The waitQueue has a fixed capacity - which is the wait queue size. Addition 
> of new work doe snot factor in the capacity available to execute work. This 
> ends up being left to the race between work getting scheduled for execution 
> and added to the waitQueue.
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12367) Lock/unlock database should add current database to inputs and outputs of authz hook

2015-12-15 Thread Dapeng Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-12367:
--
Attachment: HIVE-12367.004.patch

> Lock/unlock database should add current database to inputs and outputs of 
> authz hook
> 
>
> Key: HIVE-12367
> URL: https://issues.apache.org/jira/browse/HIVE-12367
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-12367.001.patch, HIVE-12367.002.patch, 
> HIVE-12367.003.patch, HIVE-12367.004.patch, HIVE-12367.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12075) add analyze command to explictly cache file metadata in HBase metastore

2015-12-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12075:

Attachment: HIVE-12075.04.patch

> add analyze command to explictly cache file metadata in HBase metastore
> ---
>
> Key: HIVE-12075
> URL: https://issues.apache.org/jira/browse/HIVE-12075
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12075.01.nogen.patch, HIVE-12075.01.patch, 
> HIVE-12075.02.patch, HIVE-12075.03.patch, HIVE-12075.04.patch, 
> HIVE-12075.nogen.patch, HIVE-12075.patch
>
>
> ANALYZE TABLE (spec as usual) CACHE METADATA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-15 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Attachment: HIVE-12541.3.patch

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch, 
> HIVE-12541.3.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11826) 'hadoop.proxyuser.hive.groups' configuration doesn't prevent unauthorized user to access metastore

2015-12-15 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11826:
-
Description: 
With 'hadoop.proxyuser.hive.groups' configured in core-site.xml to certain 
groups, currently if you run the job with a user not belonging to those groups, 
it won't fail to access metastore. -With old version hive 0.13, actually it 
fails properly.-

Seems HadoopThriftAuthBridge20S.java correctly call ProxyUsers.authorize() 
while HadoopThriftAuthBridge23 doesn't. 

  was:
With 'hadoop.proxyuser.hive.groups' configured in core-site.xml to certain 
groups, currently if you run the job with a user not belonging to those groups, 
it won't fail to access metastore. With old version hive 0.13, actually it 
fails properly. 

Seems HadoopThriftAuthBridge20S.java correctly call ProxyUsers.authorize() 
while HadoopThriftAuthBridge23 doesn't. 


> 'hadoop.proxyuser.hive.groups' configuration doesn't prevent unauthorized 
> user to access metastore
> --
>
> Key: HIVE-11826
> URL: https://issues.apache.org/jira/browse/HIVE-11826
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11826.2.patch, HIVE-11826.patch
>
>
> With 'hadoop.proxyuser.hive.groups' configured in core-site.xml to certain 
> groups, currently if you run the job with a user not belonging to those 
> groups, it won't fail to access metastore. -With old version hive 0.13, 
> actually it fails properly.-
> Seems HadoopThriftAuthBridge20S.java correctly call ProxyUsers.authorize() 
> while HadoopThriftAuthBridge23 doesn't. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12688) HIVE-11826 makes hive unusable in properly secured cluster

2015-12-15 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059527#comment-15059527
 ] 

Thejas M Nair commented on HIVE-12688:
--

I think this is a blocker for 2.0.0 release . 

I am attaching a patch to roll back that change to unblock the 2.0.0 release. 
An fixed version of  HIVE-11826 can be added in a follow up jira.

cc [~sershe] [~aihuaxu] [~csun] [~ashutoshc]




> HIVE-11826 makes hive unusable in properly secured cluster
> --
>
> Key: HIVE-12688
> URL: https://issues.apache.org/jira/browse/HIVE-12688
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
>
> HIVE-11826 makes a change to restrict connections to metastore to users who 
> belong to groups under 'hadoop.proxyuser.hive.groups'.
> That property was only a meant to be a hadoop property, which controls what 
> users the hive user can impersonate. What this change is doing is to enable 
> use of that to also restrict who can connect to metastore server. This is new 
> functionality, not a bug fix. There is value to this functionality.
> However, this change makes hive unusable in a properly secured cluster. If 
> 'hadoop.proxyuser.hive.hosts' is set to the proper set of hosts that run 
> Metastore and Hiveserver2 (instead of a very open "*"), then users will be 
> able to connect to metastore only from those hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11927) Implement/Enable constant related optimization rules in Calcite: enable HiveReduceExpressionsRule to fold constants

2015-12-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11927:
---
Attachment: HIVE-11927.13.patch

> Implement/Enable constant related optimization rules in Calcite: enable 
> HiveReduceExpressionsRule to fold constants
> ---
>
> Key: HIVE-11927
> URL: https://issues.apache.org/jira/browse/HIVE-11927
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11927.01.patch, HIVE-11927.02.patch, 
> HIVE-11927.03.patch, HIVE-11927.04.patch, HIVE-11927.05.patch, 
> HIVE-11927.06.patch, HIVE-11927.07.patch, HIVE-11927.08.patch, 
> HIVE-11927.09.patch, HIVE-11927.10.patch, HIVE-11927.11.patch, 
> HIVE-11927.12.patch, HIVE-11927.13.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12367) Lock/unlock database should add current database to inputs and outputs of authz hook

2015-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059549#comment-15059549
 ] 

Hive QA commented on HIVE-12367:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12777915/HIVE-12367.004.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 9948 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver-vector_decimal_round.q-metadata_export_drop.q-stats13.q-and-12-more
 - did not produce a TEST-*.xml file
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6364/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6364/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6364/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12777915 - PreCommit-HIVE-TRUNK-Build

> Lock/unlock database should add current database to inputs and outputs of 
> authz hook
> 
>
> Key: HIVE-12367
> URL: https://issues.apache.org/jira/browse/HIVE-12367
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-12367.001.patch, HIVE-12367.002.patch, 
> HIVE-12367.003.patch, HIVE-12367.004.patch, HIVE-12367.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12688) HIVE-11826 makes hive unusable in properly secured cluster

2015-12-15 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-12688:
-
Attachment: HIVE-12688.1.patch

> HIVE-11826 makes hive unusable in properly secured cluster
> --
>
> Key: HIVE-12688
> URL: https://issues.apache.org/jira/browse/HIVE-12688
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Blocker
> Attachments: HIVE-12688.1.patch
>
>
> HIVE-11826 makes a change to restrict connections to metastore to users who 
> belong to groups under 'hadoop.proxyuser.hive.groups'.
> That property was only a meant to be a hadoop property, which controls what 
> users the hive user can impersonate. What this change is doing is to enable 
> use of that to also restrict who can connect to metastore server. This is new 
> functionality, not a bug fix. There is value to this functionality.
> However, this change makes hive unusable in a properly secured cluster. If 
> 'hadoop.proxyuser.hive.hosts' is set to the proper set of hosts that run 
> Metastore and Hiveserver2 (instead of a very open "*"), then users will be 
> able to connect to metastore only from those hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12685) Remove invalid property in common/src/test/resources/hive-site.xml

2015-12-15 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059554#comment-15059554
 ] 

Mohit Sabharwal commented on HIVE-12685:


[~wzheng], FYI I think this invalid property was also causing following tests 
to fail:
TestSessionHooks
TestPlainSaslHelper 
TestSessionGlobalInitFile

I filed HIVE-12670 for this. But after removing the property, these tests still 
fail with other errors. I haven't investigated yet, but I'd be curious to see 
if these fail after your patch.  

> Remove invalid property in common/src/test/resources/hive-site.xml
> --
>
> Key: HIVE-12685
> URL: https://issues.apache.org/jira/browse/HIVE-12685
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12685.1.patch
>
>
> Currently there's such a property as below, which is obviously wrong
> {code}
> 
>   javax.jdo.option.ConnectionDriverName
>   hive-site.xml
>   Override ConfVar defined in HiveConf
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6425) Unable to create external table with 3000+ columns

2015-12-15 Thread Alina GHERMAN (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058043#comment-15058043
 ] 

Alina GHERMAN commented on HIVE-6425:
-

I got the same error for a an external table with 185 columns. 

> Unable to create external table with 3000+ columns
> --
>
> Key: HIVE-6425
> URL: https://issues.apache.org/jira/browse/HIVE-6425
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.10.0
> Environment: Linux, CDH 4.2.0
>Reporter: Anurag
>  Labels: patch
> Attachments: Hive_Script.txt
>
>
> While creating an external table in Hive to a table in HBase with 3000+ 
> columns, Hive shows up an error:
> FAILED: Error in metadata: 
> MetaException(message:javax.jdo.JDODataStoreException: Put request failed : 
> INSERT INTO "SERDE_PARAMS" ("PARAM_VALUE","SERDE_ID","PARAM_KEY") VALUES 
> (?,?,?)
> NestedThrowables:
> org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO 
> "SERDE_PARAMS" ("PARAM_VALUE","SERDE_ID","PARAM_KEY") VALUES (?,?,?) )
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12625) Backport to branch-1 HIVE-11981 ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2015-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058049#comment-15058049
 ] 

Hive QA commented on HIVE-12625:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/1204/HIVE-12625.4-branch1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 9233 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-groupby10.q-timestamp_comparison.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_cond_pushdown_unqual4.q-vectorization_16.q-union_remove_1.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-table_access_keys_stats.q-groupby_complex_types.q-vectorization_10.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-vector_distinct_2.q-load_dyn_part2.q-join35.q-and-12-more - 
did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_drop_database_removes_partition_dirs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_drop_table_removes_partition_dirs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_temp_table_gb1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_fast_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join_filters
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_drop_partition
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_with_trash
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_inner_join
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_smb_empty
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_count_distinct
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-BRANCH_1-Build/10/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-BRANCH_1-Build/10/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-BRANCH_1-Build-10/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 30 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 1204 - PreCommit-HIVE-BRANCH_1-Build

> Backport to branch-1 HIVE-11981 ORC Schema Evolution Issues (Vectorized, 
> ACID, and Non-Vectorized)
> --
>
> Key: HIVE-12625
> URL: https://issues.apache.org/jira/browse/HIVE-12625
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12625.1-branch1.patch, HIVE-12625.2-branch1.patch, 
> HIVE-12625.3-branch1.patch, HIVE-12625.4-branch1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-12172) LLAP: Cache metrics are incorrect

2015-12-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-12172.
-
Resolution: Duplicate

> LLAP: Cache metrics are incorrect
> -
>
> Key: HIVE-12172
> URL: https://issues.apache.org/jira/browse/HIVE-12172
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
>
> CacheCapacityUsed goes negative, CacheCapacityRemaining is a lot higher than 
> the original cache size.
> This was while using the LRFU cache policy.
> Reproduces when running a query which reads far too much data to store in 
> cache.
> {code}
> name: "Hadoop:service=LlapDaemon,name=LlapDaemonCacheMetrics-",
> tag.ProcessName: "LlapDaemon",
> tag.SessionId: "37c590b9-cccd-4b81-8b5f-36b3659c3454",
> CacheCapacityRemaining: 15958381947264,
> CacheCapacityTotal: 10737418240,
> CacheCapacityUsed: -15947644529024,
> CacheReadRequests: 5837,
> CacheRequestedBytes: 28996627387,
> CacheHitBytes: 8266607002,
> CacheAllocatedArena: 79,
> CacheNumLockedBuffers: 315,
> CacheHitRatio: 0.28508857
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12682) Reducers in dynamic partitioning job spend a lot of time running hadoop.conf.Configuration.getOverlay

2015-12-15 Thread Carter Shanklin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter Shanklin updated HIVE-12682:
---
Attachment: reducer.png

> Reducers in dynamic partitioning job spend a lot of time running 
> hadoop.conf.Configuration.getOverlay
> -
>
> Key: HIVE-12682
> URL: https://issues.apache.org/jira/browse/HIVE-12682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Carter Shanklin
> Attachments: reducer.png
>
>
> I tested this on Hive 1.2.1 but looks like it's still applicable to 2.0.
> I ran this query:
> {code}
> create table flights (
> …
> )
> PARTITIONED BY (Year int)
> CLUSTERED BY (Month)
> SORTED BY (DayofMonth) into 12 buckets
> STORED AS ORC
> TBLPROPERTIES("orc.bloom.filter.columns"="*")
> ;
> {code}
> (Taken from here: 
> https://github.com/t3rmin4t0r/all-airlines-data/blob/master/ddl/orc.sql)
> I profiled just the reduce phase and noticed something odd, the attached 
> graph shows where time was spent during the reducer phase.
> Problem seems to relate to 
> https://github.com/apache/hive/blob/branch-2.0/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L903
> /cc [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12421) Streaming API add TransactionBatch.beginNextTransaction(long timeout)

2015-12-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12421:
--
Priority: Critical  (was: Major)

> Streaming API add TransactionBatch.beginNextTransaction(long timeout)
> -
>
> Key: HIVE-12421
> URL: https://issues.apache.org/jira/browse/HIVE-12421
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
>
> TransactionBatchImpl.beginNextTransactionImpl() has
> {noformat}
> LockResponse res = msClient.lock(lockRequest);
> if (res.getState() != LockState.ACQUIRED) {
>   throw new TransactionError("Unable to acquire lock on " + endPt);
> }
> {noformat}
> This means that if there are any competing locks already take, this will 
> throw an Exception to client.  This doesn't seem like the right behavior.  It 
> should block.
> We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to  
> give the client more control.
> cc [~alangates]  [~sriharsha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12658) Task rejection by an llap daemon spams the log with RejectedExecutionExceptions

2015-12-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12658:
-
Attachment: HIVE-12658.3.patch

> Task rejection by an llap daemon spams the log with 
> RejectedExecutionExceptions
> ---
>
> Key: HIVE-12658
> URL: https://issues.apache.org/jira/browse/HIVE-12658
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12658.1.patch, HIVE-12658.2.patch, 
> HIVE-12658.3.patch
>
>
> The execution queue throws a RejectedExecutionException - which is logged by 
> the hadoop IPC layer.
> Instead of relying on an Exception in the protocol - move to sending back an 
> explicit response to indicate a rejected fragment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12674) HS2 Tez sessions should have maximum age

2015-12-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12674:

Attachment: HIVE-12674.patch

The first patch. Still need to test in cluster. I created the subclass for the 
session that's part of the pool and most of the expiration logic, as well as 
some pool-related logic is in it. This could be further improved, so that the 
code elsewhere could avoid e.g. calling pool.close on all the sessions because 
it's not clear if the session is coming from the pool.

[~vikram.dixit] can you check if this makes sense?

cc [~sseth]


RB https://reviews.apache.org/r/41431/

> HS2 Tez sessions should have maximum age
> 
>
> Key: HIVE-12674
> URL: https://issues.apache.org/jira/browse/HIVE-12674
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12674.patch
>
>
> Certain tokens passed to AM by clients (e.g. an HDFS token) have maximum 
> lifetime beyond which they cannot be renewed. We should cycle long-lived 
> session AMs after a configurable period to avoid problems with these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12658) Task rejection by an llap daemon spams the log with RejectedExecutionExceptions

2015-12-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12658:
-
Attachment: (was: HIVE-12658.3.patch)

> Task rejection by an llap daemon spams the log with 
> RejectedExecutionExceptions
> ---
>
> Key: HIVE-12658
> URL: https://issues.apache.org/jira/browse/HIVE-12658
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12658.1.patch, HIVE-12658.2.patch, 
> HIVE-12658.3.patch
>
>
> The execution queue throws a RejectedExecutionException - which is logged by 
> the hadoop IPC layer.
> Instead of relying on an Exception in the protocol - move to sending back an 
> explicit response to indicate a rejected fragment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-15 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059381#comment-15059381
 ] 

Xiaowei Wang commented on HIVE-12541:
-

Ok,I will check 

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12075) add analyze command to explictly cache file metadata in HBase metastore

2015-12-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12075:

Attachment: HIVE-12075.04.patch

Moved the methods into file-format-specific proxy.

> add analyze command to explictly cache file metadata in HBase metastore
> ---
>
> Key: HIVE-12075
> URL: https://issues.apache.org/jira/browse/HIVE-12075
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12075.01.nogen.patch, HIVE-12075.01.patch, 
> HIVE-12075.02.patch, HIVE-12075.03.patch, HIVE-12075.04.patch, 
> HIVE-12075.nogen.patch, HIVE-12075.patch
>
>
> ANALYZE TABLE (spec as usual) CACHE METADATA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12421) Streaming API add TransactionBatch.beginNextTransaction(long timeout)

2015-12-15 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059269#comment-15059269
 ] 

Eugene Koifman commented on HIVE-12421:
---

Note to self: lock() is not a blocking call.  It can return state = WAITING 
indicating that lock could not be acquired.  This is why there is a separate 
checkLock() API.

DbLockManager.lock() has a "template"

> Streaming API add TransactionBatch.beginNextTransaction(long timeout)
> -
>
> Key: HIVE-12421
> URL: https://issues.apache.org/jira/browse/HIVE-12421
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> TransactionBatchImpl.beginNextTransactionImpl() has
> {noformat}
> LockResponse res = msClient.lock(lockRequest);
> if (res.getState() != LockState.ACQUIRED) {
>   throw new TransactionError("Unable to acquire lock on " + endPt);
> }
> {noformat}
> This means that if there are any competing locks already take, this will 
> throw an Exception to client.  This doesn't seem like the right behavior.  It 
> should block.
> We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to  
> give the client more control.
> cc [~alangates]  [~sriharsha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12664) Bug in reduce deduplication optimization causing ArrayOutOfBoundException

2015-12-15 Thread Johan Gustavsson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059401#comment-15059401
 ] 

Johan Gustavsson commented on HIVE-12664:
-

[~gopalv] could you please re-run Jenkins on the new patch? No tests should be 
needed for this patch.

> Bug in reduce deduplication optimization causing ArrayOutOfBoundException
> -
>
> Key: HIVE-12664
> URL: https://issues.apache.org/jira/browse/HIVE-12664
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1, 1.2.1
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
> Attachments: HIVE-12664-1.patch, HIVE-12664.patch
>
>
> The optimisation check for reduce deduplication only checks the first child 
> node for join -and the check itself also contains a major bug- causing 
> ArrayOutOfBoundException no matter what.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12675) PerfLogger should log performance metrics at debug level

2015-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059400#comment-15059400
 ] 

Hive QA commented on HIVE-12675:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12777650/HIVE-12675.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 9948 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6361/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6361/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6361/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12777650 - PreCommit-HIVE-TRUNK-Build

> PerfLogger should log performance metrics at debug level
> 
>
> Key: HIVE-12675
> URL: https://issues.apache.org/jira/browse/HIVE-12675
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12675.1.patch
>
>
> As more and more subcomponents of Hive (Tez, Optimizer) etc are using 
> PerfLogger to track the performance metrics, it will be more meaningful to 
> set the PerfLogger logging level to DEBUG. Otherwise, we will print the 
> performance metrics unnecessarily for each and every query if the underlying 
> subcomponent does not control the PerfLogging via a parameter on its own.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work

2015-12-15 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058109#comment-15058109
 ] 

Yongzhi Chen commented on HIVE-11609:
-

[~swarnim], could you check the precommit tests failures, for example 
TestHBaseCliDriver.testCliDriver_hbase_custom_key3
may be related. Thanks

> Capability to add a filter to hbase scan via composite key doesn't work
> ---
>
> Key: HIVE-11609
> URL: https://issues.apache.org/jira/browse/HIVE-11609
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, 
> HIVE-11609.3.patch.txt
>
>
> It seems like the capability to add filter to an hbase scan which was added 
> as part of HIVE-6411 doesn't work. This is primarily because in the 
> HiveHBaseInputFormat, the filter is added in the getsplits instead of 
> getrecordreader. This works fine for start and stop keys but not for filter 
> because a filter is respected only when an actual scan is performed. This is 
> also related to the initial refactoring that was done as part of HIVE-3420.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6425) Unable to create external table with 3000+ columns

2015-12-15 Thread Alina GHERMAN (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058173#comment-15058173
 ] 

Alina GHERMAN commented on HIVE-6425:
-

In fact the limit was not on the number of columns but on the number of 
characters in 
SERDEPROPERTIES("hbase.columns.mapping"= "a field that has maximum 4000 
characters").

So this bug is due to the error: ERROR: value too long for type character 
varying(4000)
workaround: 
https://support.pivotal.io/hc/en-us/articles/203422043-ERROR-value-too-long-for-type-character-varying-4000-






> Unable to create external table with 3000+ columns
> --
>
> Key: HIVE-6425
> URL: https://issues.apache.org/jira/browse/HIVE-6425
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.10.0
> Environment: Linux, CDH 4.2.0
>Reporter: Anurag
>  Labels: patch
> Attachments: Hive_Script.txt
>
>
> While creating an external table in Hive to a table in HBase with 3000+ 
> columns, Hive shows up an error:
> FAILED: Error in metadata: 
> MetaException(message:javax.jdo.JDODataStoreException: Put request failed : 
> INSERT INTO "SERDE_PARAMS" ("PARAM_VALUE","SERDE_ID","PARAM_KEY") VALUES 
> (?,?,?)
> NestedThrowables:
> org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO 
> "SERDE_PARAMS" ("PARAM_VALUE","SERDE_ID","PARAM_KEY") VALUES (?,?,?) )
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059300#comment-15059300
 ] 

Aihua Xu commented on HIVE-12541:
-

[~wisgood] Can you check the failed unit tests above? Seems at least 
symlink_text_input_format test case needs to update baseline?

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12658) Task rejection by an llap daemon spams the log with RejectedExecutionExceptions

2015-12-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12658:
-
Attachment: HIVE-12658.3.patch

Addressed review comments.

> Task rejection by an llap daemon spams the log with 
> RejectedExecutionExceptions
> ---
>
> Key: HIVE-12658
> URL: https://issues.apache.org/jira/browse/HIVE-12658
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12658.1.patch, HIVE-12658.2.patch, 
> HIVE-12658.3.patch
>
>
> The execution queue throws a RejectedExecutionException - which is logged by 
> the hadoop IPC layer.
> Instead of relying on an Exception in the protocol - move to sending back an 
> explicit response to indicate a rejected fragment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12684) NPE in stats annotation when all values in decimal column are NULLs

2015-12-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12684:
-
Attachment: HIVE-12684.2.patch

Without the import reordering. 

> NPE in stats annotation when all values in decimal column are NULLs
> ---
>
> Key: HIVE-12684
> URL: https://issues.apache.org/jira/browse/HIVE-12684
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12684.1.patch, HIVE-12684.2.patch
>
>
> When all column values are null for a decimal column and when column stats 
> exists. AnnotateWithStatistics optimization can throw NPE. Following is the 
> exception trace
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatistics(StatsUtils.java:712)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:764)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:750)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:197)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:143)
> at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:131)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:114)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
> at 
> org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
> at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
> at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:228)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10156)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:225)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12628) Eliminate flakiness in TestMetrics

2015-12-15 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-12628:
--
Labels: TODOC2.1  (was: )

> Eliminate flakiness in TestMetrics
> --
>
> Key: HIVE-12628
> URL: https://issues.apache.org/jira/browse/HIVE-12628
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 2.1.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-12628.patch
>
>
> TestMetrics relies on timing of json file dumps.  Rewrite these tests to 
> eliminate flakiness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >