date:20150519


[ 
https://issues.apache.org/jira/browse/HIVE-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549863#comment-14549863
 ] 

Prasanth Jayachandran commented on HIVE-10744:
--

[~sseth] Can you take a look at the patch? The task scheduler is much more 
simplified now. Removed all book-keeping data structures.

 LLAP: dags get stuck in yet another way
 ---

 Key: HIVE-10744
 URL: https://issues.apache.org/jira/browse/HIVE-10744
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10744.patch


 DAG gets stuck when number of tasks that is multiple of number of containers 
 on machine (6, 12, ... in my case) fails to finish at the end of the stage (I 
 am running a job with 500-1000 maps). Status just hangs forever (beyond 5 min 
 timeout) with some tasks shown as running. Happened twice on 3rd DAG with 
 1000-map job (TPCH Q1), then when I reduced to 500 happened on 7th DAG so 
 far. [~sseth] has the details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10744) LLAP: dags get stuck in yet another way


 [ 
https://issues.apache.org/jira/browse/HIVE-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10744:
-
Attachment: (was: HIVE-10744.patch)

 LLAP: dags get stuck in yet another way
 ---

 Key: HIVE-10744
 URL: https://issues.apache.org/jira/browse/HIVE-10744
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10744.patch


 DAG gets stuck when number of tasks that is multiple of number of containers 
 on machine (6, 12, ... in my case) fails to finish at the end of the stage (I 
 am running a job with 500-1000 maps). Status just hangs forever (beyond 5 min 
 timeout) with some tasks shown as running. Happened twice on 3rd DAG with 
 1000-map job (TPCH Q1), then when I reduced to 500 happened on 7th DAG so 
 far. [~sseth] has the details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10744) LLAP: dags get stuck in yet another way


 [ 
https://issues.apache.org/jira/browse/HIVE-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10744:
-
Attachment: HIVE-10744.patch

 LLAP: dags get stuck in yet another way
 ---

 Key: HIVE-10744
 URL: https://issues.apache.org/jira/browse/HIVE-10744
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10744.patch


 DAG gets stuck when number of tasks that is multiple of number of containers 
 on machine (6, 12, ... in my case) fails to finish at the end of the stage (I 
 am running a job with 500-1000 maps). Status just hangs forever (beyond 5 min 
 timeout) with some tasks shown as running. Happened twice on 3rd DAG with 
 1000-map job (TPCH Q1), then when I reduced to 500 happened on 7th DAG so 
 far. [~sseth] has the details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10732) Hive JDBC driver does not close operation for metadata queries


[ 
https://issues.apache.org/jira/browse/HIVE-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549958#comment-14549958
 ] 

Hive QA commented on HIVE-10732:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12733702/HIVE-10732.patch

{color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 8945 tests 
executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name
org.apache.hive.beeline.TestBeeLineWithArgs.testLastLineCmdInScriptFile
org.apache.hive.jdbc.TestJdbcDriver2.testBuiltInUDFCol
org.apache.hive.jdbc.TestJdbcDriver2.testCloseResultSet
org.apache.hive.jdbc.TestJdbcDriver2.testDuplicateColumnNameOrder
org.apache.hive.jdbc.TestJdbcDriver2.testExprCol
org.apache.hive.jdbc.TestJdbcDriver2.testParentReferences
org.apache.hive.jdbc.TestJdbcDriver2.testPostClose
org.apache.hive.jdbc.TestJdbcDriver2.testPrepareStatement
org.apache.hive.jdbc.TestJdbcDriver2.testSetCommand
org.apache.hive.jdbc.TestJdbcDriver2.testShowGrant
org.apache.hive.jdbc.TestJdbcDriver2.testShowRoleGrant
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testNonSparkQuery
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testSparkQuery
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConnection
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConnectionSchemaAPIs
org.apache.hive.jdbc.TestJdbcWithMiniMr.testMrQuery
org.apache.hive.jdbc.TestJdbcWithMiniMr.testNonMrQuery
org.apache.hive.jdbc.TestJdbcWithMiniMr.testTempTable
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testNonSparkQuery
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
org.apache.hive.jdbc.miniHS2.TestMiniHS2.testConfInSession
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3940/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3940/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3940/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 25 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12733702 - PreCommit-HIVE-TRUNK-Build

 Hive JDBC driver does not close operation for metadata queries
 --

 Key: HIVE-10732
 URL: https://issues.apache.org/jira/browse/HIVE-10732
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Mala Chikka Kempanna
Assignee: Chaoyu Tang
 Attachments: HIVE-10732.patch


 In following file
 http://github.mtv.cloudera.com/CDH/hive/blob/cdh5-0.14.1/jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java
 Line 315 implemented the ResultSet.close() method. Because DatabaseMetadata 
 operation doesn't have a statement, it doesn't close the operation. 
 However, regardless whether it has a statement or not, it should close the 
 operation through the stmtHandle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10256) Filter row groups based on the block statistics in Parquet

2015-05-19 Thread Dong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-10256:
-
Attachment: HIVE-10256-parquet.2.patch

Patch rebased

 Filter row groups based on the block statistics in Parquet
 --

 Key: HIVE-10256
 URL: https://issues.apache.org/jira/browse/HIVE-10256
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-10256-parquet.1.patch, HIVE-10256-parquet.2.patch, 
 HIVE-10256-parquet.patch


 In Parquet PPD, the not matched row groups should be eliminated. See 
 {{TestOrcSplitElimination}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10741) count distinct rewrite is not firing


[ 
https://issues.apache.org/jira/browse/HIVE-10741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550057#comment-14550057
 ] 

Hive QA commented on HIVE-10741:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12733706/HIVE-10741.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8946 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_context_ngrams
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet_types
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3941/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3941/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3941/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12733706 - PreCommit-HIVE-TRUNK-Build

 count distinct rewrite is not firing
 

 Key: HIVE-10741
 URL: https://issues.apache.org/jira/browse/HIVE-10741
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10741.1.patch, HIVE-10741.patch


 Rewrite introduced in HIVE-10568 is not effective outside of test environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10257) Ensure Parquet Hive has null optimization

2015-05-19 Thread Dong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-10257:
-
Attachment: HIVE-10257-parquet.2.patch

Sure, Thanks!

Patch updated. 
Since this renaming has been fixed in HIVE-10256, I updated this patch based on 
the code there. So this patch has to be merged after that one.

Sorroy for the inconvenience.

 Ensure Parquet Hive has null optimization
 -

 Key: HIVE-10257
 URL: https://issues.apache.org/jira/browse/HIVE-10257
 Project: Hive
  Issue Type: Sub-task
  Components: Tests
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-10257-parquet.1.patch, HIVE-10257-parquet.2.patch, 
 HIVE-10257-parquet.patch


 In Parquet statistics, a boolean value {{hasNonNullValue}} is used for each 
 column chunk. Hive could use this value to skip a column, avoid null-checking 
 logic, and speed up vectorization like HIVE-4478 (in the future, Parquet 
 vectorization is not completed yet).
 In this Jira we could check whether this null optimization works, and make 
 changes if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters


[ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550189#comment-14550189
 ] 

Hive QA commented on HIVE-7193:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12733705/HIVE-7193.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8946 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3942/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3942/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3942/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12733705 - PreCommit-HIVE-TRUNK-Build

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx


 Currently hive has only following authenticator parameters for LDAP
  authentication for hiveserver2. 
 property 
 namehive.server2.authentication/name 
 valueLDAP/value 
 /property 
 property 
 namehive.server2.authentication.ldap.url/name 
 valueldap://our_ldap_address/value 
 /property 
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10550) Dynamic RDD caching optimization for HoS.[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550208#comment-14550208
 ] 

Hive QA commented on HIVE-10550:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12733759/HIVE-10550.3-spark.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8721 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did 
not produce a TEST-*.xml file
TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not 
produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more
 - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/861/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/861/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-861/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12733759 - PreCommit-HIVE-SPARK-Build

 Dynamic RDD caching optimization for HoS.[Spark Branch]
 ---

 Key: HIVE-10550
 URL: https://issues.apache.org/jira/browse/HIVE-10550
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-10550.1-spark.patch, HIVE-10550.1.patch, 
 HIVE-10550.2-spark.patch, HIVE-10550.3-spark.patch


 A Hive query may try to scan the same table multi times, like self-join, 
 self-union, or even share the same subquery, [TPC-DS 
 Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql]
  is an example. As you may know that, Spark support cache RDD data, which 
 mean Spark would put the calculated RDD data in memory and get the data from 
 memory directly for next time, this avoid the calculation cost of this 
 RDD(and all the cost of its dependencies) at the cost of more memory usage. 
 Through analyze the query context, we should be able to understand which part 
 of query could be shared, so that we can reuse the cached RDD in the 
 generated Spark job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10458) Enable parallel order by for spark [Spark Branch]

2015-05-19 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550283#comment-14550283
 ] 

Rui Li commented on HIVE-10458:
---

Hi [~xuefuz], we won't do double sample for approach a1. Because the 
{{TotalOrderPartitioner}} is MR-specific.
One interesting thing I found is the qtest {{parallel_orderby.q}}. As I 
mentioned above, when sorted data is stored in multiple files, we have to read 
these files in a proper order to maintain the global sort. Seems when 
retrieving the results in FetchOperator, we rely on InputFormat::getSplits 
which is related to the underlying FileSystem and doesn't guarantee an order. 
So if I run {{parallel_orderby}} with local-cluster mode (TestSparkCliDriver), 
the FS used is LocalFileSystem and it doesn't produce a correct result (in fact 
we do produce the correct results but we don't read it in a proper way). 
However if I run {{parallel_orderby}} with yarn mode 
(TestMiniSparkOnYarnCliDriver), the FS used is DistributedFileSystem and the 
result is correct. I also tried sorting the splits in FetchOperator and then 
both modes work fine.
Maybe we should verify and fix this in a separate JIRA. What do you think?

 Enable parallel order by for spark [Spark Branch]
 -

 Key: HIVE-10458
 URL: https://issues.apache.org/jira/browse/HIVE-10458
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-10458.1-spark.patch, HIVE-10458.2-spark.patch, 
 HIVE-10458.3-spark.patch


 We don't have to force reducer# to 1 as spark supports parallel sorting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases


[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551864#comment-14551864
 ] 

Pengcheng Xiong commented on HIVE-6867:
---

[~jpullokkaran], could you please take a look? The failed test is not related.

 Bucketized Table feature fails in some cases
 

 Key: HIVE-6867
 URL: https://issues.apache.org/jira/browse/HIVE-6867
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Pengcheng Xiong
 Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch


 Bucketized Table feature fails in some cases. if src  destination is 
 bucketed on same key, and if actual data in the src is not bucketed (because 
 data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
 bucketed while writing to destination.
 Example
 --
 CREATE TABLE P1(key STRING, val STRING)
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
 P1;
 – perform an insert to make sure there are 2 files
 INSERT OVERWRITE TABLE P1 select key, val from P1;
 --
 This is not a regression. This has never worked.
 This got only discovered due to Hadoop2 changes.
 In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
 what is requested by app. Hadoop2 now honors the number of reducer setting in 
 local mode (by spawning threads).
 Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10767) LLAP: Improve the way task finishable information is processed


 [ 
https://issues.apache.org/jira/browse/HIVE-10767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved HIVE-10767.
---
   Resolution: Fixed
Fix Version/s: llap

 LLAP: Improve the way task finishable information is processed
 --

 Key: HIVE-10767
 URL: https://issues.apache.org/jira/browse/HIVE-10767
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10767.1.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HIVE-10764) LLAP: Wait queue scheduler goes into tight loop


 [ 
https://issues.apache.org/jira/browse/HIVE-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reopened HIVE-10764:
---

 LLAP: Wait queue scheduler goes into tight loop
 ---

 Key: HIVE-10764
 URL: https://issues.apache.org/jira/browse/HIVE-10764
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: llap

 Attachments: HIVE-10764.patch


 {code}
 if (!task.canFinish() || numSlotsAvailable.get() == 0) {
 {code}
 this condition makes it to run into tight loop if no slots available and if 
 the task is finishable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-10404) hive.exec.parallel=true causes out of sequence response and SocketTimeoutException: Read timed out


 [ 
https://issues.apache.org/jira/browse/HIVE-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10404:

Comment: was deleted

(was: After discussing with [~ashutoshc], we would like to estimate the efforts 
needed if we would like to set hive.exec.parallel=true as default.)

 hive.exec.parallel=true causes out of sequence response and 
 SocketTimeoutException: Read timed out
 

 Key: HIVE-10404
 URL: https://issues.apache.org/jira/browse/HIVE-10404
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Eugene Koifman

 With hive.exec.parallel=true, Driver.lauchTask() calls Task.initialize() from 
 1 thread on several Tasks.  It then starts new threads to run those tasks.
 Taks.initiazlie() gets an instance of Hive and holds on to it.  Hive.java 
 internally uses ThreadLocal to hand out instances, but since 
 Task.initialize() is called by a single thread from the Driver multiple tasks 
 share an instance of Hive.
 Each Hive instances has a single instance of MetaStoreClient; the later is 
 not thread safe.
 With hive.exec.parallel=true, different threads actually execute the tasks, 
 different threads end up sharing the same MetaStoreClient.
 If you make 2 concurrent calls, for example Hive.getTable(String), the Thrift 
 responses may return to the wrong caller.
 Thus the first caller gets out of sequence response, drops this message and 
 reconnects.  If the timing is right, it will consume the other's response, 
 but the the other caller will block for hive.metastore.client.socket.timeout 
 since its response message has now been lost.
 This is just one concrete example.
 One possible fix is to make Task.db use ThreadLocal.
 This could be related to HIVE-6893



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10764) LLAP: Wait queue scheduler goes into tight loop


 [ 
https://issues.apache.org/jira/browse/HIVE-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved HIVE-10764.
---
Resolution: Implemented

Done as part of HIVE-10767. The patch here was reverted.

 LLAP: Wait queue scheduler goes into tight loop
 ---

 Key: HIVE-10764
 URL: https://issues.apache.org/jira/browse/HIVE-10764
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: llap

 Attachments: HIVE-10764.patch


 {code}
 if (!task.canFinish() || numSlotsAvailable.get() == 0) {
 {code}
 this condition makes it to run into tight loop if no slots available and if 
 the task is finishable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10767) LLAP: Improve the way task finishable information is processed


 [ 
https://issues.apache.org/jira/browse/HIVE-10767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-10767:
--
Attachment: HIVE-10767.1.txt

 LLAP: Improve the way task finishable information is processed
 --

 Key: HIVE-10767
 URL: https://issues.apache.org/jira/browse/HIVE-10767
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: HIVE-10767.1.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8529) HiveSessionImpl#fetchResults should not try to fetch operation log when hive.server2.logging.operation.enabled is false.


[ 
https://issues.apache.org/jira/browse/HIVE-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551497#comment-14551497
 ] 

Hive QA commented on HIVE-8529:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12733936/HIVE-8529.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8945 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3949/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3949/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3949/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12733936 - PreCommit-HIVE-TRUNK-Build

 HiveSessionImpl#fetchResults should not try to fetch operation log when 
 hive.server2.logging.operation.enabled is false.
 

 Key: HIVE-8529
 URL: https://issues.apache.org/jira/browse/HIVE-8529
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0
Reporter: Vaibhav Gumashta
Assignee: Yongzhi Chen
 Attachments: HIVE-8529.1.patch, HIVE-8529.2.patch


 Throws this even when it is disabled:
 {code}
 14/10/20 15:53:14 [HiveServer2-Handler-Pool: Thread-53]: DEBUG 
 security.UserGroupInformation: PrivilegedActionException as:vgumashta 
 (auth:SIMPLE) cause:org.apache.hive.service.cli.HiveSQLException: Couldn't 
 find log associated with operation handle: OperationHandle 
 [opType=EXECUTE_STATEMENT, 
 getHandleIdentifier()=b3d05ca6-e3e8-4bef-b869-0ea0732c3ac5]
 14/10/20 15:53:14 [HiveServer2-Handler-Pool: Thread-53]: WARN 
 thrift.ThriftCLIService: Error fetching results: 
 org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated 
 with operation handle: OperationHandle [opType=EXECUTE_STATEMENT, 
 getHandleIdentifier()=b3d05ca6-e3e8-4bef-b869-0ea0732c3ac5]
   at 
 org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:240)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:665)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
   at 
 org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
   at 
 org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:394)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:508)
   at 
 org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
   at com.sun.proxy.$Proxy20.fetchResults(Unknown Source)
   at 
 org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:427)
   at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:582)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at 
 org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
   at

[jira] [Commented] (HIVE-10732) Hive JDBC driver does not close operation for metadata queries

2015-05-19 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551499#comment-14551499
 ] 

Xuefu Zhang commented on HIVE-10732:


[~ctang.ma], could you explain (as I don't quite understand) why you made the 
change from patch #0 to #1? I understand it's related to test failures.

 Hive JDBC driver does not close operation for metadata queries
 --

 Key: HIVE-10732
 URL: https://issues.apache.org/jira/browse/HIVE-10732
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Mala Chikka Kempanna
Assignee: Chaoyu Tang
 Attachments: HIVE-10732.1.patch, HIVE-10732.patch


 In following file
 http://github.mtv.cloudera.com/CDH/hive/blob/cdh5-0.14.1/jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java
 Line 315 implemented the ResultSet.close() method. Because DatabaseMetadata 
 operation doesn't have a statement, it doesn't close the operation. 
 However, regardless whether it has a statement or not, it should close the 
 operation through the stmtHandle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10761) Create codahale-based metrics system for Hive

2015-05-19 Thread Szehon Ho (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Szehon Ho updated HIVE-10761:
-
Attachment: HIVE-10761.patch

Review board:
[https://reviews.apache.org/r/34447/|https://reviews.apache.org/r/34447/]

This adds a codahale-based metrics system to HiveServer2 and HiveMetastore.
Metrics implementation is now internally pluggable, and the existing Metrics
system can be re-enabled by configuration if desired for backward-compatibility.

Following metrics are supported by Metrics system:
1. JVMPauseMonitor (used to call Hadoop's internal implementation, now forked
off to integrate with Metrics system)
2. HMS API calls
3. Standard JVM metrics (only for new implementation, as its free with
codahale).

The following metrics reporting are supported by new system (configuration
exposed)
1. JMX
2. CONSOLE
3. JSON_FILE (periodic file of metrics that gets overwritten).

An eventual goal is to add a web-server that exposes the JSON metrics, but this
will defer to a later JIRA.

Create codahale-based metrics system for Hive
-

Key: HIVE-10761
URL: https://issues.apache.org/jira/browse/HIVE-10761
Project: Hive
Issue Type: New Feature
Components: Diagnosability
Reporter: Szehon Ho
Assignee: Szehon Ho
Attachments: HIVE-10761.patch, hms-metrics.json

There is a current Hive metrics system that hooks up to a JMX reporting, but
all its measurements, models are custom.
This is to make another metrics system that will be based on Codahale (ie
yammer, dropwizard), which has the following advantage:
* Well-defined metric model for frequently-needed metrics (ie JVM metrics)
* Well-defined measurements for all metrics (ie max, mean, stddev, mean_rate,
etc),
* Built-in reporting frameworks like JMX, Console, Log, JSON webserver
It is used for many projects, including several Apache projects like Oozie.
Overall, monitoring tools should find it easier to understand these common
metric, measurement, reporting models.
The existing metric subsystem will be kept and can be enabled if backward
compatibility is desired.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10761) Create codahale-based metrics system for Hive

2015-05-19 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551541#comment-14551541
 ] 

Thejas M Nair commented on HIVE-10761:
--

This looks very useful! Thanks for working on it. Better monitoring 
capabilities will really help to improve the server uptimes!

 Create codahale-based metrics system for Hive
 -

 Key: HIVE-10761
 URL: https://issues.apache.org/jira/browse/HIVE-10761
 Project: Hive
  Issue Type: New Feature
  Components: Diagnosability
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-10761.patch, hms-metrics.json


 There is a current Hive metrics system that hooks up to a JMX reporting, but 
 all its measurements, models are custom.
 This is to make another metrics system that will be based on Codahale (ie 
 yammer, dropwizard), which has the following advantage:
 * Well-defined metric model for frequently-needed metrics (ie JVM metrics)
 * Well-defined measurements for all metrics (ie max, mean, stddev, mean_rate, 
 etc), 
 * Built-in reporting frameworks like JMX, Console, Log, JSON webserver
 It is used for many projects, including several Apache projects like Oozie.  
 Overall, monitoring tools should find it easier to understand these common 
 metric, measurement, reporting models.
 The existing metric subsystem will be kept and can be enabled if backward 
 compatibility is desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled

2015-05-19 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-10244:
--
Assignee: Matt McCline  (was: Jesus Camacho Rodriguez)

 Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when 
 hive.vectorized.execution.reduce.enabled is enabled
 ---

 Key: HIVE-10244
 URL: https://issues.apache.org/jira/browse/HIVE-10244
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: explain_q80_vectorized_reduce_on.txt


 Query 
 {code}
 set hive.vectorized.execution.reduce.enabled=true;
 with ssr as
  (select  s_store_id as store_id,
   sum(ss_ext_sales_price) as sales,
   sum(coalesce(sr_return_amt, 0)) as returns,
   sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit
   from store_sales left outer join store_returns on
  (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number),
  date_dim,
  store,
  item,
  promotion
  where ss_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date) 
   and (cast('1998-09-04' as date))
and ss_store_sk = s_store_sk
and ss_item_sk = i_item_sk
and i_current_price  50
and ss_promo_sk = p_promo_sk
and p_channel_tv = 'N'
  group by s_store_id)
  ,
  csr as
  (select  cp_catalog_page_id as catalog_page_id,
   sum(cs_ext_sales_price) as sales,
   sum(coalesce(cr_return_amount, 0)) as returns,
   sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit
   from catalog_sales left outer join catalog_returns on
  (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number),
  date_dim,
  catalog_page,
  item,
  promotion
  where cs_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date)
   and (cast('1998-09-04' as date))
 and cs_catalog_page_sk = cp_catalog_page_sk
and cs_item_sk = i_item_sk
and i_current_price  50
and cs_promo_sk = p_promo_sk
and p_channel_tv = 'N'
 group by cp_catalog_page_id)
  ,
  wsr as
  (select  web_site_id,
   sum(ws_ext_sales_price) as sales,
   sum(coalesce(wr_return_amt, 0)) as returns,
   sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit
   from web_sales left outer join web_returns on
  (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number),
  date_dim,
  web_site,
  item,
  promotion
  where ws_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date)
   and (cast('1998-09-04' as date))
 and ws_web_site_sk = web_site_sk
and ws_item_sk = i_item_sk
and i_current_price  50
and ws_promo_sk = p_promo_sk
and p_channel_tv = 'N'
 group by web_site_id)
   select  channel
 , id
 , sum(sales) as sales
 , sum(returns) as returns
 , sum(profit) as profit
  from 
  (select 'store channel' as channel
 , concat('store', store_id) as id
 , sales
 , returns
 , profit
  from   ssr
  union all
  select 'catalog channel' as channel
 , concat('catalog_page', catalog_page_id) as id
 , sales
 , returns
 , profit
  from  csr
  union all
  select 'web channel' as channel
 , concat('web_site', web_site_id) as id
 , sales
 , returns
 , profit
  from   wsr
  ) x
  group by channel, id with rollup
  order by channel
  ,id
  limit 100
 {code}
 Exception 
 {code}
 Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, 
 diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing vector batch (tag=0) 
 \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8
 \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7
 \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
   at

[jira] [Updated] (HIVE-10764) LLAP: Wait queue scheduler goes into tight loop

2015-05-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10764:
-
Attachment: HIVE-10764.patch

 LLAP: Wait queue scheduler goes into tight loop
 ---

 Key: HIVE-10764
 URL: https://issues.apache.org/jira/browse/HIVE-10764
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10764.patch


 {code}
 if (!task.canFinish() || numSlotsAvailable.get() == 0) {
 {code}
 this condition makes it to run into tight loop if no slots available and if 
 the task is finishable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10753) hs2 jdbc url - wrong connection string cause error on beeline/jdbc/odbc client, misleading message


[ 
https://issues.apache.org/jira/browse/HIVE-10753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551425#comment-14551425
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-10753:
--

[~thejas] Thanks for the review, I noticed that OOM does not happen with master 
branch and it happens only with 0.14.0, most likely the OOM error was resolved 
with HIVE-6468.  However, I still get a connection error message like this :

{code}
localhost:bin hsubramaniyan$ ./beeline --verbose=true
Beeline version 1.3.0-SNAPSHOT by Apache Hive
beeline !connect 
jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http
Connecting to jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http
Enter username for 
jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http: scott
Enter password for 
jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http: *
Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http: Invalid 
status 72 (state=08S01,code=0)
java.sql.SQLException: Could not open client transport with JDBC Uri: 
jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http: Invalid 
status 72
at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:228)
at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:175)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at 
org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:142)
at 
org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:207)
at org.apache.hive.beeline.Commands.connect(Commands.java:1139)
at org.apache.hive.beeline.Commands.connect(Commands.java:1060)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:976)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:815)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:772)
at 
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:485)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:468)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.thrift.transport.TTransportException: Invalid status 72
at 
org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:307)
at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:203)
... 24 more
{code}

I will upload a new patch which improves the error message i.e. the point 1 you 
mentioned above.

Thanks
Hari

 hs2 jdbc url - wrong connection string cause  error on beeline/jdbc/odbc 
 client, misleading message
 ---

 Key: HIVE-10753
 URL: https://issues.apache.org/jira/browse/HIVE-10753
 Project: Hive
  Issue Type: Bug
  Components: Beeline, JDBC
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10753.1.patch


 {noformat}
 beeline -u 
 'jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http' -n 
 hdiuser
 scan complete in 15ms
 Connecting to 
 jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http
 Java heap space
 Beeline version 0.14.0.2.2.4.1-1 by Apache Hive
 0: jdbc:hive2://localhost:10001/default (closed) ^Chdiuser@headnode0:~$ 
 But it works if I use the deprecated param - 
 hdiuser@headnode0:~$ beeline -u

[jira] [Commented] (HIVE-10725) Better resource management in HiveServer2

2015-05-19 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551539#comment-14551539
 ] 

Thejas M Nair commented on HIVE-10725:
--

The work in HIVE-10761 will also help to improve HS2 uptime, by making it 
easier to monitor


 Better resource management in HiveServer2
 -

 Key: HIVE-10725
 URL: https://issues.apache.org/jira/browse/HIVE-10725
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, JDBC
Affects Versions: 1.3.0
Reporter: Vaibhav Gumashta

 We have various ways to control the number of queries that can be run on one 
 HS2 instance (max threads, thread pool queuing etc). We also have ways to run 
 multiple HS2 instances using dynamic service discovery. We should do a better 
 job at:
 1. Monitoring resource utilization (sessions, ophandles, memory, threads etc).
 2. Being upfront to the client when we cannot accept new queries.
 3. Throttle among different server instances in case dynamic service 
 discovery is used.
 4. Consolidate existing ways to control #queries into a simpler model.
 5. See if we can recommend reasonable values for OS resources or provide 
 alerts if we run out of those.
 6. Health reports, server status API (to get number of queries, sessions etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10709) Update Avro version to 1.7.7


[ 
https://issues.apache.org/jira/browse/HIVE-10709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550423#comment-14550423
 ] 

Hive QA commented on HIVE-10709:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12733725/HIVE-10790.3.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8946 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3944/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3944/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3944/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12733725 - PreCommit-HIVE-TRUNK-Build

 Update Avro version to 1.7.7
 

 Key: HIVE-10709
 URL: https://issues.apache.org/jira/browse/HIVE-10709
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-10709.1.patch, HIVE-10709.2.patch, 
 HIVE-10709.2.patch, HIVE-10790.3.patch


 We should update the avro version to 1.7.7 to consumer some of the nicer 
 compatibility features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10665) Continue to make udaf_percentile_approx_23.q test more stable

2015-05-19 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550487#comment-14550487
 ] 

Swarnim Kulkarni commented on HIVE-10665:
-

+1. Just ran into this failure on HIVE-10709

 Continue to make udaf_percentile_approx_23.q test more stable
 -

 Key: HIVE-10665
 URL: https://issues.apache.org/jira/browse/HIVE-10665
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-10665.1.patch


 HIVE-10059 fixed line 628 in q.out
 Similar issue exists on line 567 and should be fixed as well.
 {code}
 Running: diff -a 
 /home/hiveptest/54.159.254.207-hiveptest-2/apache-github-source-source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/udaf_percentile_approx_23.q.out
  
 /home/hiveptest/54.159.254.207-hiveptest-2/apache-github-source-source/itests/qtest/../../ql/src/test/results/clientpositive/udaf_percentile_approx_23.q.out
 567c567
  342.0
 ---
  341.5
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9880) Support configurable username attribute for HiveServer2 LDAP authentication

2015-05-19 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-9880:
---

Assignee: Naveen Gangam

 Support configurable username attribute for HiveServer2 LDAP authentication
 ---

 Key: HIVE-9880
 URL: https://issues.apache.org/jira/browse/HIVE-9880
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Jaime Murillo
Assignee: Naveen Gangam
 Attachments: HIVE-9880-1.patch


 OpenLDAP requires that when bind authenticating, the DN being supplied must 
 be the creation DN of the account.  Since, OpenLDAP allows for any attribute 
 to be used when creating a DN for an account, organizations that don’t use 
 hardcoded *uid* attribute won’t be able to utilize HiveServer2 LDAP 
 authentication.
 HiveServer2 should support a configurable username attribute when 
 constructing the bindDN



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10752) Revert HIVE-5193

2015-05-19 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10752:

Attachment: HIVE-10752.patch

Revert HIVE-5193. Please note, we have an additional problem even after 
reverting. I will address later. I didn't include in this patch to keep the 
work separate. 

 Revert HIVE-5193
 

 Key: HIVE-10752
 URL: https://issues.apache.org/jira/browse/HIVE-10752
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10752.patch


 Revert HIVE-5193 since it causes pig+hcatalog not working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10735) LLAP: Cached plan race condition - VectorMapJoinCommonOperator has no closeOp()

2015-05-19 Thread Mostafa Mokhtar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551135#comment-14551135
 ] 

Mostafa Mokhtar commented on HIVE-10735:


[~gopalv] [~hagleitn]

Can you add the query and the plan?

 LLAP: Cached plan race condition - VectorMapJoinCommonOperator has no 
 closeOp()
 ---

 Key: HIVE-10735
 URL: https://issues.apache.org/jira/browse/HIVE-10735
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Reporter: Gopal V
Assignee: Matt McCline
Priority: Critical

 Looks like some state is mutated during execution across threads in LLAP. 
 Either we can't share the operator objects across threads, because they are 
 tied to the data objects per invocation or this is missing a closeOp() which 
 resets the common-setup between reuses.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyLongOperator.process(VectorMapJoinInnerBigOnlyLongOperator.java:380)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:850)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:114)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:850)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:164)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
   ... 18 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:379)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:850)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:599)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.generateHashMultiSetResultRepeatedAll(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:304)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.finishInnerBigOnlyRepeated(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:328)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyLongOperator.process(VectorMapJoinInnerBigOnlyLongOperator.java:201)
   ... 24 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException
   at 
 org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:152)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow$StringReaderByValue.apply(VectorDeserializeRow.java:349)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserializeByValue(VectorDeserializeRow.java:688)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.generateHashMapResultSingleValue(VectorMapJoinGenerateResultOperator.java:177)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerGenerateResultOperator.finishInner(VectorMapJoinInnerGenerateResultOperator.java:201)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:359)
   ... 29 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8529) HiveSessionImpl#fetchResults should not try to fetch operation log when hive.server2.logging.operation.enabled is false.

2015-05-19 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-8529:
---
Attachment: HIVE-8529.2.patch

 HiveSessionImpl#fetchResults should not try to fetch operation log when 
 hive.server2.logging.operation.enabled is false.
 

 Key: HIVE-8529
 URL: https://issues.apache.org/jira/browse/HIVE-8529
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0
Reporter: Vaibhav Gumashta
Assignee: Yongzhi Chen
 Attachments: HIVE-8529.1.patch, HIVE-8529.2.patch


 Throws this even when it is disabled:
 {code}
 14/10/20 15:53:14 [HiveServer2-Handler-Pool: Thread-53]: DEBUG 
 security.UserGroupInformation: PrivilegedActionException as:vgumashta 
 (auth:SIMPLE) cause:org.apache.hive.service.cli.HiveSQLException: Couldn't 
 find log associated with operation handle: OperationHandle 
 [opType=EXECUTE_STATEMENT, 
 getHandleIdentifier()=b3d05ca6-e3e8-4bef-b869-0ea0732c3ac5]
 14/10/20 15:53:14 [HiveServer2-Handler-Pool: Thread-53]: WARN 
 thrift.ThriftCLIService: Error fetching results: 
 org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated 
 with operation handle: OperationHandle [opType=EXECUTE_STATEMENT, 
 getHandleIdentifier()=b3d05ca6-e3e8-4bef-b869-0ea0732c3ac5]
   at 
 org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:240)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:665)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
   at 
 org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
   at 
 org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:394)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:508)
   at 
 org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
   at com.sun.proxy.$Proxy20.fetchResults(Unknown Source)
   at 
 org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:427)
   at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:582)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at 
 org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
   at java.lang.Thread.run(Thread.java:695)
 14/10/20 15:53:14 [HiveServer2-Handler-Pool: Thread-53]: DEBUG 
 transport.TSaslTransport: writing data length: 2525
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8190) LDAP user match for authentication on hiveserver2

2015-05-19 Thread Naveen Gangam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551272#comment-14551272
 ] 

Naveen Gangam commented on HIVE-8190:
-

I just uploaded a patch for HIVE-7193. There is also a design doc attached. The 
new enhancements made should make it more flexible for the users to configure 
LDAP for authentication. Filter support (user and group) has been added. Let me 
know if you have questions or any feedback. Thanks

 LDAP user match for authentication on hiveserver2
 -

 Key: HIVE-8190
 URL: https://issues.apache.org/jira/browse/HIVE-8190
 Project: Hive
  Issue Type: Improvement
  Components: Authorization, Clients
Affects Versions: 0.13.1
 Environment: Centos 6.5
Reporter: LINTE
Assignee: Naveen Gangam

 Some LDAP has the user composant as CN and not UID.
 SO when you try to authenticate the LDAP authentication module of hive try to 
 authenticate with the following string :  
 uid=$login,basedn
 Some AD have user objects that are not uid but cn, so it is be important to 
 personalize the kind of objects that the authentication moduel look for in 
 ldap.
 We can see an exemple in knox LDAP module configuration the parameter 
 main.ldapRealm.userDnTemplate can be configured to look for :
 uid : 'uid={0}, basedn'
 or cn : 'cn={0}, basedn'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10753) hs2 jdbc url - wrong connection string cause OOM error on beeline/jdbc/odbc client, misleading message


[ 
https://issues.apache.org/jira/browse/HIVE-10753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551191#comment-14551191
 ] 

Hive QA commented on HIVE-10753:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12733904/HIVE-10753.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8946 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hive.jdbc.TestJdbcDriver2.testSetOnConnection
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3947/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3947/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3947/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12733904 - PreCommit-HIVE-TRUNK-Build

 hs2 jdbc url - wrong connection string cause OOM error on beeline/jdbc/odbc 
 client, misleading message
 --

 Key: HIVE-10753
 URL: https://issues.apache.org/jira/browse/HIVE-10753
 Project: Hive
  Issue Type: Bug
  Components: Beeline, JDBC
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10753.1.patch


 {noformat}
 beeline -u 
 'jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http' -n 
 hdiuser
 scan complete in 15ms
 Connecting to 
 jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http
 Java heap space
 Beeline version 0.14.0.2.2.4.1-1 by Apache Hive
 0: jdbc:hive2://localhost:10001/default (closed) ^Chdiuser@headnode0:~$ 
 But it works if I use the deprecated param - 
 hdiuser@headnode0:~$ beeline -u 
 'jdbc:hive2://localhost:10001/default?hive.server2.transport.mode=http;httpPath=/'
  -n hdiuser
 scan complete in 12ms
 Connecting to 
 jdbc:hive2://localhost:10001/default?hive.server2.transport.mode=http;httpPath=/
 15/04/28 23:16:46 [main]: WARN jdbc.Utils: * JDBC param deprecation *
 15/04/28 23:16:46 [main]: WARN jdbc.Utils: The use of 
 hive.server2.transport.mode is deprecated.
 15/04/28 23:16:46 [main]: WARN jdbc.Utils: Please use transportMode like so: 
 jdbc:hive2://host:port/dbName;transportMode=transport_mode_value
 Connected to: Apache Hive (version 0.14.0.2.2.4.1-1)
 Driver: Hive JDBC (version 0.14.0.2.2.4.1-1)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 Beeline version 0.14.0.2.2.4.1-1 by Apache Hive
 0: jdbc:hive2://localhost:10001/default show tables;
 +--+--+
 | tab_name |
 +--+--+
 | hivesampletable  |
 +--+--+
 1 row selected (18.181 seconds)
 0: jdbc:hive2://localhost:10001/default ^Chdiuser@headnode0:~$ ^C
 {noformat}
 The reason for the above message is :
 The url is wrong. Correct one:
 {code}
 beeline -u 
 'jdbc:hive2://localhost:10001/default;httpPath=/;transportMode=http' -n 
 hdiuser
 {code}
 Note the ; instead of ?. The deprecation msg prints the format as well: 
 {code}
 Please use transportMode like so: 
 jdbc:hive2://host:port/dbName;transportMode=transport_mode_value
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases


 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-6867:
--
Attachment: HIVE-6867.02.patch

with q files updated

 Bucketized Table feature fails in some cases
 

 Key: HIVE-6867
 URL: https://issues.apache.org/jira/browse/HIVE-6867
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Pengcheng Xiong
 Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch


 Bucketized Table feature fails in some cases. if src  destination is 
 bucketed on same key, and if actual data in the src is not bucketed (because 
 data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
 bucketed while writing to destination.
 Example
 --
 CREATE TABLE P1(key STRING, val STRING)
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
 P1;
 – perform an insert to make sure there are 2 files
 INSERT OVERWRITE TABLE P1 select key, val from P1;
 --
 This is not a regression. This has never worked.
 This got only discovered due to Hadoop2 changes.
 In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
 what is requested by app. Hadoop2 now honors the number of reducer setting in 
 local mode (by spawning threads).
 Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10756) LLAP: Misc changes to daemon scheduling


[ 
https://issues.apache.org/jira/browse/HIVE-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551224#comment-14551224
 ] 

Prasanth Jayachandran commented on HIVE-10756:
--

LGTM, +1

 LLAP: Misc changes to daemon scheduling
 ---

 Key: HIVE-10756
 URL: https://issues.apache.org/jira/browse/HIVE-10756
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10756.1.txt


 Running the completion callback in a separate thread to avoid potentially 
 unnecessary preemptions.
 Sending out a kill to the AM only if the task was actually killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats


 [ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-10677:
--

Assignee: Pengcheng Xiong

 hive.exec.parallel=true has problem when it is used for analyze table column 
 stats
 --

 Key: HIVE-10677
 URL: https://issues.apache.org/jira/browse/HIVE-10677
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong

 To reproduce it, in q tests.
 {code}
 hive set hive.exec.parallel;
 hive.exec.parallel=true
 hive analyze table src compute statistics for columns;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask
 java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
 java.lang.InterruptedException
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
 Caused by: java.io.IOException: java.lang.InterruptedException
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
   at org.apache.hadoop.util.Shell.run(Shell.java:455)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
   at 
 org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
   ... 7 more
 hive Job Submission failed with exception 'java.lang.RuntimeException(Error 
 caching map.xml: java.io.IOException: java.lang.InterruptedException)'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10760) Templeton: HCatalog Get Column for Non-existent column returns Server Error (500) rather than Not Found(404)

2015-05-19 Thread Lekha Thota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lekha Thota updated HIVE-10760:
---
Attachment: 0001-Change-HCatalog-Get-Column-error-for-Non-existent-co.patch

 Templeton: HCatalog Get Column for Non-existent column returns Server Error 
 (500) rather than Not Found(404)
 

 Key: HIVE-10760
 URL: https://issues.apache.org/jira/browse/HIVE-10760
 Project: Hive
  Issue Type: Bug
  Components: HCatalog, Hive, WebHCat
Reporter: Lekha Thota
Assignee: Lekha Thota
Priority: Minor
 Attachments: 
 0001-Change-HCatalog-Get-Column-error-for-Non-existent-co.patch


 Apache Jira for https://hwxmonarch.atlassian.net/browse/HIVE-578



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9069) Simplify filter predicates for CBO

2015-05-19 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-9069:
--
Attachment: HIVE-9069.08.patch

 Simplify filter predicates for CBO
 --

 Key: HIVE-9069
 URL: https://issues.apache.org/jira/browse/HIVE-9069
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.14.1

 Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, 
 HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, 
 HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, HIVE-9069.patch


 Simplify predicates for disjunctive predicates so that can get pushed down to 
 the scan.
 Looks like this is still an issue, some of the filters can be pushed down to 
 the scan.
 {code}
 set hive.cbo.enable=true
 set hive.stats.fetch.column.stats=true
 set hive.exec.dynamic.partition.mode=nonstrict
 set hive.tez.auto.reducer.parallelism=true
 set hive.auto.convert.join.noconditionaltask.size=32000
 set hive.exec.reducers.bytes.per.reducer=1
 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
 set hive.support.concurrency=false
 set hive.tez.exec.print.summary=true
 explain  
 select  substr(r_reason_desc,1,20) as r
,avg(ws_quantity) wq
,avg(wr_refunded_cash) ref
,avg(wr_fee) fee
  from web_sales, web_returns, web_page, customer_demographics cd1,
   customer_demographics cd2, customer_address, date_dim, reason 
  where web_sales.ws_web_page_sk = web_page.wp_web_page_sk
and web_sales.ws_item_sk = web_returns.wr_item_sk
and web_sales.ws_order_number = web_returns.wr_order_number
and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998
and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
and reason.r_reason_sk = web_returns.wr_reason_sk
and
(
 (
  cd1.cd_marital_status = 'M'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = '4 yr Degree'
  and 
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 100.00 and 150.00
 )
or
 (
  cd1.cd_marital_status = 'D'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = 'Primary' 
  and
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 50.00 and 100.00
 )
or
 (
  cd1.cd_marital_status = 'U'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = 'Advanced Degree'
  and
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 150.00 and 200.00
 )
)
and
(
 (
  ca_country = 'United States'
  and
  ca_state in ('KY', 'GA', 'NM')
  and ws_net_profit between 100 and 200  
 )
 or
 (
  ca_country = 'United States'
  and
  ca_state in ('MT', 'OR', 'IN')
  and ws_net_profit between 150 and 300  
 )
 or
 (
  ca_country = 'United States'
  and
  ca_state in ('WI', 'MO', 'WV')
  and ws_net_profit between 50 and 250  
 )
)
 group by r_reason_desc
 order by r, wq, ref, fee
 limit 100
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 9 - Map 1 (BROADCAST_EDGE)
 Reducer 3 - Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE)
 Reducer 4 - Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
 Reducer 5 - Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
 Reducer 6 - Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
 (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE)
 Reducer 7 - Reducer 6 (SIMPLE_EDGE)
 Reducer 8 - Reducer 7 (SIMPLE_EDGE)
   DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: web_page
   filterExpr: wp_web_page_sk is not null (type: boolean)
   Statistics: Num rows: 4602 Data size: 2696178 Basic stats: 
 COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: wp_web_page_sk is not null (type: boolean)
 Statistics: Num rows: 4602 Data size: 18408 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: wp_web_page_sk (type: int)

[jira] [Updated] (HIVE-10404) hive.exec.parallel=true causes out of sequence response and SocketTimeoutException: Read timed out


 [ 
https://issues.apache.org/jira/browse/HIVE-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10404:
---
Attachment: HIVE-10404.01.patch

After discussing with [~ashutoshc], we would like to estimate the efforts 
needed if we would like to set hive.exec.parallel=true as default.

 hive.exec.parallel=true causes out of sequence response and 
 SocketTimeoutException: Read timed out
 

 Key: HIVE-10404
 URL: https://issues.apache.org/jira/browse/HIVE-10404
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Eugene Koifman
 Attachments: HIVE-10404.01.patch


 With hive.exec.parallel=true, Driver.lauchTask() calls Task.initialize() from 
 1 thread on several Tasks.  It then starts new threads to run those tasks.
 Taks.initiazlie() gets an instance of Hive and holds on to it.  Hive.java 
 internally uses ThreadLocal to hand out instances, but since 
 Task.initialize() is called by a single thread from the Driver multiple tasks 
 share an instance of Hive.
 Each Hive instances has a single instance of MetaStoreClient; the later is 
 not thread safe.
 With hive.exec.parallel=true, different threads actually execute the tasks, 
 different threads end up sharing the same MetaStoreClient.
 If you make 2 concurrent calls, for example Hive.getTable(String), the Thrift 
 responses may return to the wrong caller.
 Thus the first caller gets out of sequence response, drops this message and 
 reconnects.  If the timing is right, it will consume the other's response, 
 but the the other caller will block for hive.metastore.client.socket.timeout 
 since its response message has now been lost.
 This is just one concrete example.
 One possible fix is to make Task.db use ThreadLocal.
 This could be related to HIVE-6893



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats


[ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551398#comment-14551398
 ] 

Ashutosh Chauhan commented on HIVE-10677:
-

+1

 hive.exec.parallel=true has problem when it is used for analyze table column 
 stats
 --

 Key: HIVE-10677
 URL: https://issues.apache.org/jira/browse/HIVE-10677
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-10677.01.patch


 To reproduce it, in q tests.
 {code}
 hive set hive.exec.parallel;
 hive.exec.parallel=true
 hive analyze table src compute statistics for columns;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask
 java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
 java.lang.InterruptedException
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
 Caused by: java.io.IOException: java.lang.InterruptedException
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
   at org.apache.hadoop.util.Shell.run(Shell.java:455)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
   at 
 org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
   ... 7 more
 hive Job Submission failed with exception 'java.lang.RuntimeException(Error 
 caching map.xml: java.io.IOException: java.lang.InterruptedException)'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10735) LLAP: Cached plan race condition - VectorMapJoinCommonOperator has no closeOp()

2015-05-19 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551330#comment-14551330
 ] 

Sergey Shelukhin commented on HIVE-10735:
-

Yeah column vectors from VRBs are pooled and reused. We can remove that if 
needed... see swapColumnVector in LlapRecordReader.

 LLAP: Cached plan race condition - VectorMapJoinCommonOperator has no 
 closeOp()
 ---

 Key: HIVE-10735
 URL: https://issues.apache.org/jira/browse/HIVE-10735
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Reporter: Gopal V
Assignee: Matt McCline
Priority: Critical

 Looks like some state is mutated during execution across threads in LLAP. 
 Either we can't share the operator objects across threads, because they are 
 tied to the data objects per invocation or this is missing a closeOp() which 
 resets the common-setup between reuses.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyLongOperator.process(VectorMapJoinInnerBigOnlyLongOperator.java:380)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:850)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:114)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:850)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:164)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
   ... 18 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:379)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:850)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:599)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.generateHashMultiSetResultRepeatedAll(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:304)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.finishInnerBigOnlyRepeated(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:328)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyLongOperator.process(VectorMapJoinInnerBigOnlyLongOperator.java:201)
   ... 24 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException
   at 
 org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:152)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow$StringReaderByValue.apply(VectorDeserializeRow.java:349)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserializeByValue(VectorDeserializeRow.java:688)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.generateHashMapResultSingleValue(VectorMapJoinGenerateResultOperator.java:177)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerGenerateResultOperator.finishInner(VectorMapJoinInnerGenerateResultOperator.java:201)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:359)
   ... 29 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10756) LLAP: Misc changes to daemon scheduling


 [ 
https://issues.apache.org/jira/browse/HIVE-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-10756:
--
Attachment: HIVE-10756.1.txt

[~prasanth_j] - could you take a quick look please.

 LLAP: Misc changes to daemon scheduling
 ---

 Key: HIVE-10756
 URL: https://issues.apache.org/jira/browse/HIVE-10756
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10756.1.txt


 Running the completion callback in a separate thread to avoid potentially 
 unnecessary preemptions.
 Sending out a kill to the AM only if the task was actually killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10735) LLAP: Cached plan race condition - VectorMapJoinCommonOperator has no closeOp()

2015-05-19 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551263#comment-14551263
 ] 

Matt McCline commented on HIVE-10735:
-

The closeOp is in the VectorMapJoinGenerateResultOperator class, which 
overrides MapJoinOperator's closeOp.

I knew we cached and shared hash tables, but was not aware we shared operators 
across threads?

 LLAP: Cached plan race condition - VectorMapJoinCommonOperator has no 
 closeOp()
 ---

 Key: HIVE-10735
 URL: https://issues.apache.org/jira/browse/HIVE-10735
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Reporter: Gopal V
Assignee: Matt McCline
Priority: Critical

 Looks like some state is mutated during execution across threads in LLAP. 
 Either we can't share the operator objects across threads, because they are 
 tied to the data objects per invocation or this is missing a closeOp() which 
 resets the common-setup between reuses.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyLongOperator.process(VectorMapJoinInnerBigOnlyLongOperator.java:380)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:850)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:114)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:850)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:164)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
   ... 18 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:379)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:850)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:599)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.generateHashMultiSetResultRepeatedAll(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:304)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.finishInnerBigOnlyRepeated(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:328)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyLongOperator.process(VectorMapJoinInnerBigOnlyLongOperator.java:201)
   ... 24 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException
   at 
 org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:152)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow$StringReaderByValue.apply(VectorDeserializeRow.java:349)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserializeByValue(VectorDeserializeRow.java:688)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.generateHashMapResultSingleValue(VectorMapJoinGenerateResultOperator.java:177)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerGenerateResultOperator.finishInner(VectorMapJoinInnerGenerateResultOperator.java:201)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:359)
   ... 29 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10722) external table creation with msck in Hive can create unusable partition

2015-05-19 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10722:

Attachment: HIVE-10722.patch

This fixes the standard formatter to provide proper output, and adds validation 
to msck to make sure metastore will actually create matching partitions 

 external table creation with msck in Hive can create unusable partition
 ---

 Key: HIVE-10722
 URL: https://issues.apache.org/jira/browse/HIVE-10722
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.1, 1.0.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-10722.patch


 There can be directories in HDFS containing unprintable characters; when 
 doing hadoop fs -ls, these characters are not even visible, and can only be 
 seen for example if output is piped thru od.
 When these are loaded via msck, they are stored in e.g. mysql as ? (literal 
 question mark, findable via LIKE '%?%' in db) and show accordingly in Hive.
 However, datanucleus appears to encode it as %3F; this causes the partition 
 to be unusable - it cannot be dropped, and other operations like drop table 
 get stuck (didn't investigate in detail why; drop table got unstuck as soon 
 as the partition was removed from metastore).
 We should probably have a 2-way option for such cases - error out on load 
 (default), or convert to '?'/drop such characters (and have partition that 
 actually works, too).
 We should also check if partitions with '?' inserted explicitly work at all 
 with datanucleus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8222) CBO Trunk Merge: Fix Check Style issues

2015-05-19 Thread Lars Francke (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke resolved HIVE-8222.

Resolution: Won't Fix

 CBO Trunk Merge: Fix Check Style issues
 ---

 Key: HIVE-8222
 URL: https://issues.apache.org/jira/browse/HIVE-8222
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Lars Francke
 Attachments: HIVE-8222.1.patch, HIVE-8222.2.patch, HIVE-8222.3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10747) Enable the cleanup of side effect for the Encryption related qfile test

2015-05-19 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551188#comment-14551188
 ] 

Eugene Koifman commented on HIVE-10747:
---

I tried the same change earlier.  it fixes the leak problem

 Enable the cleanup of side effect for the Encryption related qfile test
 ---

 Key: HIVE-10747
 URL: https://issues.apache.org/jira/browse/HIVE-10747
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-10747.patch


 The hive conf is not reset in the clearTestSideEffects method which is 
 involved from HIVE-8900. This will have pollute other qfile's settings 
 running by TestEncryptedHDFSCliDriver



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10756) LLAP: Misc changes to daemon scheduling


 [ 
https://issues.apache.org/jira/browse/HIVE-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved HIVE-10756.
---
Resolution: Fixed

Thanks. Committed.

 LLAP: Misc changes to daemon scheduling
 ---

 Key: HIVE-10756
 URL: https://issues.apache.org/jira/browse/HIVE-10756
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10756.1.txt


 Running the completion callback in a separate thread to avoid potentially 
 unnecessary preemptions.
 Sending out a kill to the AM only if the task was actually killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10735) LLAP: Cached plan race condition - VectorMapJoinCommonOperator has no closeOp()

2015-05-19 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551349#comment-14551349
 ] 

Gopal V commented on HIVE-10735:


No, the pooling is not the bug - the vector row-batch has a definite lifetime 
till the end of the return of processOp().

the setVal() up there is the right solution for the issue (is hard to verify 
though) - the issue is that the unit-tests we run do not trigger the switch 
between files and the re-creation of new vector row-batches between invocations.

 LLAP: Cached plan race condition - VectorMapJoinCommonOperator has no 
 closeOp()
 ---

 Key: HIVE-10735
 URL: https://issues.apache.org/jira/browse/HIVE-10735
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Reporter: Gopal V
Assignee: Matt McCline
Priority: Critical

 Looks like some state is mutated during execution across threads in LLAP. 
 Either we can't share the operator objects across threads, because they are 
 tied to the data objects per invocation or this is missing a closeOp() which 
 resets the common-setup between reuses.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyLongOperator.process(VectorMapJoinInnerBigOnlyLongOperator.java:380)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:850)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:114)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:850)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:164)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
   ... 18 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:379)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:850)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:599)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.generateHashMultiSetResultRepeatedAll(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:304)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyGenerateResultOperator.finishInnerBigOnlyRepeated(VectorMapJoinInnerBigOnlyGenerateResultOperator.java:328)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyLongOperator.process(VectorMapJoinInnerBigOnlyLongOperator.java:201)
   ... 24 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException
   at 
 org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:152)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow$StringReaderByValue.apply(VectorDeserializeRow.java:349)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserializeByValue(VectorDeserializeRow.java:688)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.generateHashMapResultSingleValue(VectorMapJoinGenerateResultOperator.java:177)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerGenerateResultOperator.finishInner(VectorMapJoinInnerGenerateResultOperator.java:201)
   at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:359)
   ... 29 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10745) Better null handling by Vectorizer


[ 
https://issues.apache.org/jira/browse/HIVE-10745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551350#comment-14551350
 ] 

Hive QA commented on HIVE-10745:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12733919/HIVE-10745.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8946 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3948/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3948/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3948/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12733919 - PreCommit-HIVE-TRUNK-Build

 Better null handling by Vectorizer
 --

 Key: HIVE-10745
 URL: https://issues.apache.org/jira/browse/HIVE-10745
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Jagruti Varia
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10745.1.patch, HIVE-10745.2.patch, HIVE-10745.patch


 Minor refactoring around null handling in Vectorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10722) external table creation with msck in Hive can create unusable partition

2015-05-19 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551170#comment-14551170
]

Sergey Shelukhin commented on HIVE-10722:
-

cannot post RB, fails to validate diff... probably because of unprintable
characters

external table creation with msck in Hive can create unusable partition
---

Key: HIVE-10722
URL: https://issues.apache.org/jira/browse/HIVE-10722
Project: Hive
Issue Type: Bug
Affects Versions: 0.14.1, 1.0.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
Attachments: HIVE-10722.patch

There can be directories in HDFS containing unprintable characters; when
doing hadoop fs -ls, these characters are not even visible, and can only be
seen for example if output is piped thru od.
When these are loaded via msck, they are stored in e.g. mysql as ? (literal
question mark, findable via LIKE '%?%' in db) and show accordingly in Hive.
However, datanucleus appears to encode it as %3F; this causes the partition
to be unusable - it cannot be dropped, and other operations like drop table
get stuck (didn't investigate in detail why; drop table got unstuck as soon
as the partition was removed from metastore).
We should probably have a 2-way option for such cases - error out on load
(default), or convert to '?'/drop such characters (and have partition that
actually works, too).
We should also check if partitions with '?' inserted explicitly work at all
with datanucleus.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8007) Use proper Thrift comments

2015-05-19 Thread Lars Francke (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke updated HIVE-8007:
---
Attachment: HIVE-8007.2.patch

 Use proper Thrift comments
 --

 Key: HIVE-8007
 URL: https://issues.apache.org/jira/browse/HIVE-8007
 Project: Hive
  Issue Type: Improvement
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor
 Attachments: HIVE-8007.1.patch, HIVE-8007.2.patch


 Currently the thrift file uses {{//}} to denote comments. Thrift understands 
 the {{/** ... */}} syntax and converts that into documentation in the 
 generated code. This patch changes the syntax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10327) Remove ExprNodeNullDesc


 [ 
https://issues.apache.org/jira/browse/HIVE-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10327:

Fix Version/s: (was: 1.2.0)
   1.2.1

 Remove ExprNodeNullDesc
 ---

 Key: HIVE-10327
 URL: https://issues.apache.org/jira/browse/HIVE-10327
 Project: Hive
  Issue Type: Task
  Components: Query Planning
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 1.2.1

 Attachments: HIVE-10327.1.patch, HIVE-10327.2.patch, HIVE-10327.patch


 Its purpose can be served by ExprNodeConstantDesc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10760) Templeton: HCatalog Get Column for Non-existent column returns Server Error (500) rather than Not Found(404)

2015-05-19 Thread Lekha Thota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lekha Thota updated HIVE-10760:
---
Description: Apache Jira for 
https://hwxmonarch.atlassian.net/browse/HIVE-578  (was: Apache Jira for 
HIVE-578)

 Templeton: HCatalog Get Column for Non-existent column returns Server Error 
 (500) rather than Not Found(404)
 

 Key: HIVE-10760
 URL: https://issues.apache.org/jira/browse/HIVE-10760
 Project: Hive
  Issue Type: Bug
  Components: HCatalog, Hive, WebHCat
Reporter: Lekha Thota
Assignee: Lekha Thota
Priority: Minor

 Apache Jira for https://hwxmonarch.atlassian.net/browse/HIVE-578



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10745) Better null handling by Vectorizer

2015-05-19 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551087#comment-14551087
 ] 

Swarnim Kulkarni commented on HIVE-10745:
-

[~ashutoshc] Want to update the RB real quick? I can help. review this then.

 Better null handling by Vectorizer
 --

 Key: HIVE-10745
 URL: https://issues.apache.org/jira/browse/HIVE-10745
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Jagruti Varia
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10745.1.patch, HIVE-10745.2.patch, HIVE-10745.patch


 Minor refactoring around null handling in Vectorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases


 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-6867:
--
Attachment: (was: HIVE-6867.02.patch)

 Bucketized Table feature fails in some cases
 

 Key: HIVE-6867
 URL: https://issues.apache.org/jira/browse/HIVE-6867
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Pengcheng Xiong
 Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch


 Bucketized Table feature fails in some cases. if src  destination is 
 bucketed on same key, and if actual data in the src is not bucketed (because 
 data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
 bucketed while writing to destination.
 Example
 --
 CREATE TABLE P1(key STRING, val STRING)
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
 P1;
 – perform an insert to make sure there are 2 files
 INSERT OVERWRITE TABLE P1 select key, val from P1;
 --
 This is not a regression. This has never worked.
 This got only discovered due to Hadoop2 changes.
 In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
 what is requested by app. Hadoop2 now honors the number of reducer setting in 
 local mode (by spawning threads).
 Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10636) CASE comparison operator rotation optimization


 [ 
https://issues.apache.org/jira/browse/HIVE-10636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10636:

Fix Version/s: (was: 1.2.0)
   1.2.1

 CASE comparison operator rotation optimization
 --

 Key: HIVE-10636
 URL: https://issues.apache.org/jira/browse/HIVE-10636
 Project: Hive
  Issue Type: New Feature
  Components: Logical Optimizer
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 1.2.1

 Attachments: HIVE-10636.1.patch, HIVE-10636.2.patch, 
 HIVE-10636.3.patch, HIVE-10636.patch


 Step 1 as outlined in description of HIVE-9644



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats


 [ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10677:
---
Attachment: HIVE-10677.01.patch

 hive.exec.parallel=true has problem when it is used for analyze table column 
 stats
 --

 Key: HIVE-10677
 URL: https://issues.apache.org/jira/browse/HIVE-10677
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-10677.01.patch


 To reproduce it, in q tests.
 {code}
 hive set hive.exec.parallel;
 hive.exec.parallel=true
 hive analyze table src compute statistics for columns;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask
 java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
 java.lang.InterruptedException
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
 Caused by: java.io.IOException: java.lang.InterruptedException
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
   at org.apache.hadoop.util.Shell.run(Shell.java:455)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
   at 
 org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
   ... 7 more
 hive Job Submission failed with exception 'java.lang.RuntimeException(Error 
 caching map.xml: java.io.IOException: java.lang.InterruptedException)'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats