[jira] [Commented] (HIVE-10522) CBO (Calcite Return Path): fix the wrong needed column names when TS is created

2015-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519120#comment-14519120
 ] 

Hive QA commented on HIVE-10522:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12728961/HIVE-10522.02.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8825 tests 
executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithURL
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3641/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3641/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3641/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12728961 - PreCommit-HIVE-TRUNK-Build

 CBO (Calcite Return Path): fix the wrong needed column names when TS is 
 created
 ---

 Key: HIVE-10522
 URL: https://issues.apache.org/jira/browse/HIVE-10522
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Critical
 Fix For: 1.2.0

 Attachments: HIVE-10522.01.patch, HIVE-10522.02.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10529) Remove references to tez task context before storing operator plan in object cache

2015-04-29 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-10529:

Attachment: HIVE-10529.2.patch

 Remove references to tez task context before storing operator plan in object 
 cache
 --

 Key: HIVE-10529
 URL: https://issues.apache.org/jira/browse/HIVE-10529
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan
 Attachments: HIVE-10529.1.patch, HIVE-10529.2.patch, 
 hive_hashtable_loader.png






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10519) Move TestGenericUDF classes to udf.generic package

2015-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519206#comment-14519206
 ] 

Hive QA commented on HIVE-10519:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12728877/HIVE-10519.2.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8824 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3642/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3642/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3642/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12728877 - PreCommit-HIVE-TRUNK-Build

 Move TestGenericUDF classes to udf.generic package
 --

 Key: HIVE-10519
 URL: https://issues.apache.org/jira/browse/HIVE-10519
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Trivial
 Attachments: HIVE-10519.1.patch, HIVE-10519.2.patch


 The following TestGenericUDF classes are located in udf package instead of 
 udf.generic.
 {code}
 TestGenericUDFDate.java
 TestGenericUDFDateAdd.java
 TestGenericUDFDateDiff.java
 TestGenericUDFDateSub.java
 TestGenericUDFUtils.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10511) Replacing the implementation of Hive CLI using Beeline

2015-04-29 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519333#comment-14519333
 ] 

Ferdinand Xu commented on HIVE-10511:
-

Yes, that's one option. I am wondering whether it will break the compatibility. 

 Replacing the implementation of Hive CLI using Beeline
 --

 Key: HIVE-10511
 URL: https://issues.apache.org/jira/browse/HIVE-10511
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
Assignee: Ferdinand Xu

 Hive CLI is a legacy tool which had two main use cases: 
 1. a thick client for SQL on hadoop
 2. a command line tool for HiveServer1.
 HiveServer1 is already deprecated and removed from Hive code base, so  use 
 case #2 is out of the question. For #1, Beeline provides or is supposed to 
 provides equal functionality, yet is implemented differently from Hive CLI.
 As it has been a while that Hive community has been recommending Beeline + 
 HS2 configuration, ideally we should deprecating Hive CLI. Because of wide 
 use of Hive CLI, we instead propose replacing Hive CLI's implementation with 
 Beeline plus embedded HS2 so that Hive community only needs to maintain a 
 single code path. In this way, Hive CLI is just an alias to Beeline at either 
 shell script level or at high code level. The goal is that  no changes or 
 minimum changes are expected from existing user scrip using Hive CLI.
 This is an Umbrella JIRA covering all tasks related to this initiative. Over 
 the last year or two, Beeline has been improved significantly to match what 
 Hive CLI offers. Still, there may still be some gaps or deficiency to be 
 discovered and fixed. In the meantime, we also want to make sure the enough 
 tests are included and performance impact is identified and addressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9674) *DropPartitionEvent should handle partition-sets.

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518827#comment-14518827
 ] 

Sushanth Sowmyan commented on HIVE-9674:


All the failures listed here have nothing to do with the patch itself. I 
wondered about the 
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection 
failure, since it's in HCat's package. However, that is not related to this 
patch, and on retesting locally, it seems to pass. Also, that's likely to be 
related to ACID rather than this, since its error log looks like the following:

{noformat}
java.sql.SQLException: Table/View 'TXNS' already exists in Schema 'APP'.
at org.apache.derby.iapi.error.StandardException.newException(Unknown 
Source)
at org.apache.derby.iapi.error.StandardException.newException(Unknown 
Source)
at 
org.apache.derby.impl.sql.catalog.DataDictionaryImpl.duplicateDescriptorException(Unknown
 Source)
at 
org.apache.derby.impl.sql.catalog.DataDictionaryImpl.addDescriptor(Unknown 
Source)
at 
org.apache.derby.impl.sql.execute.CreateTableConstantAction.executeConstantAction(Unknown
 Source)
at org.apache.derby.impl.sql.execute.MiscResultSet.open(Unknown Source)
at 
org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
at 
org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:72)
at 
org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:131)
at 
org.apache.hive.hcatalog.streaming.TestStreaming.init(TestStreaming.java:157)
{noformat}

 *DropPartitionEvent should handle partition-sets.
 -

 Key: HIVE-9674
 URL: https://issues.apache.org/jira/browse/HIVE-9674
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9674.2.patch, HIVE-9674.3.patch, HIVE-9674.4.patch, 
 HIVE-9674.5.patch


 Dropping a set of N partitions from a table currently results in N 
 DropPartitionEvents (and N PreDropPartitionEvents) being fired serially. This 
 is wasteful, especially so for large N. It also makes it impossible to even 
 try to run authorization-checks on all partitions in a batch.
 Taking the cue from HIVE-9609, we should compose an {{IterablePartition}} 
 in the event, and expose them via an {{Iterator}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10520) LLAP: Must reset small table result columns for Native Vectorization of Map Join

2015-04-29 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518837#comment-14518837
 ] 

Jason Dere commented on HIVE-10520:
---

+1 if tests look good

 LLAP: Must reset small table result columns for Native Vectorization of Map 
 Join
 

 Key: HIVE-10520
 URL: https://issues.apache.org/jira/browse/HIVE-10520
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Blocker
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10520.01.patch, HIVE-10520.02.patch


 Scratch columns not getting reset by input source, so native vector map join 
 operators must manually reset small table result columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10071) CBO (Calcite Return Path): Join to MultiJoin rule

2015-04-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10071:
---
Attachment: HIVE-10071.patch

[~ashutoshc], [~jpullokkaran], could you take a look? This patch will merge 
Join operators on the same key into MultiJoin operators; currently it only 
supports inner joins. The patch is rather large already, so I have split the 
work and created HIVE-10533 to follow up on the development of the outer joins 
support. Thanks

 CBO (Calcite Return Path): Join to MultiJoin rule
 -

 Key: HIVE-10071
 URL: https://issues.apache.org/jira/browse/HIVE-10071
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-10071.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-04-29 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518834#comment-14518834
 ] 

Mithun Radhakrishnan commented on HIVE-9736:


Hello, Chris. 

bq. ... we can combine the multiple actions by using FsAction#or, and then call 
accessMethod.invoke just once...

Yikes! I might've missed incorporating that suggestion by accident. Thank you 
for following up. I'll update the patch shortly.

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10437) NullPointerException on queries where map/reduce is not involved on tables with partitions

2015-04-29 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518859#comment-14518859
 ] 

Ashutosh Chauhan commented on HIVE-10437:
-

Failures are unrelated. [~hagleitn] can you take a look?

 NullPointerException on queries where map/reduce is not involved on tables 
 with partitions
 --

 Key: HIVE-10437
 URL: https://issues.apache.org/jira/browse/HIVE-10437
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.1.0
Reporter: Demeter Sztanko
Assignee: Ashutosh Chauhan
Priority: Critical
 Attachments: HIVE-10437.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 On a table with partitions, whenever I try to do a simple query which tells 
 hive not to execute mapreduce but just read data straight from hdfs, it 
 raises an exception:
 {code}
 create external table jsonbug(
 a int,
 b string
 )
 PARTITIONED BY (
 `c` string)
 ROW FORMAT SERDE
   'org.openx.data.jsonserde.JsonSerDe'
 WITH SERDEPROPERTIES (
   'ignore.malformed.json'='true')
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
 LOCATION
   '/tmp/jsonbug';
 ALTER TABLE jsonbug ADD PARTITION(c='1');
 {code}
 Runnin simple 
 {code}
 select * from jsonbug;
 {code}
 Raises the following exception:
 {code}
 FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
 Failed with exception nulljava.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:578)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.init(FetchOperator.java:140)
 at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:455)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 {code}
 It works fine if I execute a query involving map/reduce job though.
 This problem occurs only when using SerDe's created for hive versions pre 
 1.1.0, those which do not have @SerDeSpec annotation specified. Most of the 
 third party SerDE's, including hcat's JsonSerde have this problem as well. 
 It seems like changes made in HIVE-7977 introduce this bug. See 
 org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607)
 {code}
 Class? tableSerDe = tableDesc.getDeserializerClass();
 String[] schemaProps = AnnotationUtils.getAnnotation(tableSerDe, 
 SerDeSpec.class).schemaProps();
 {code}
 And it also seems like a relatively easy fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10485) Create md5 UDF

2015-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518945#comment-14518945
 ] 

Hive QA commented on HIVE-10485:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12728862/HIVE-10485.3.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8829 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3639/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3639/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3639/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12728862 - PreCommit-HIVE-TRUNK-Build

 Create md5 UDF
 --

 Key: HIVE-10485
 URL: https://issues.apache.org/jira/browse/HIVE-10485
 Project: Hive
  Issue Type: Task
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10485.1.patch, HIVE-10485.2.patch, 
 HIVE-10485.3.patch


 MD5(str)
 Calculates an MD5 128-bit checksum for the string. The value is returned as a 
 string of 32 hex digits, or NULL if the argument was NULL. The return value 
 can, for example, be used as a hash key.
 Example:
 {code}
 SELECT MD5('udf_md5');
 'ce62ef0d2d27dc37b6d488b92f4b24fd'
 {code}
 online md5 generator: http://www.md5.cz/
 MySQL has md5 function: 
 https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5
 PostgreSQL also has md5 function: 
 http://www.postgresql.org/docs/9.1/static/functions-string.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10495) Hive index creation code throws NPE if index table is null

2015-04-29 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518904#comment-14518904
 ] 

Bing Li commented on HIVE-10495:


The failure should not related to this patch.

 Hive index creation code throws NPE if index table is null
 --

 Key: HIVE-10495
 URL: https://issues.apache.org/jira/browse/HIVE-10495
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Bing Li
Assignee: Bing Li
 Attachments: HIVE-10495.1.patch


 The stack trace would be:
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
 at java.lang.reflect.Method.invoke(Method.java:611)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
 at $Proxy9.add_index(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518943#comment-14518943
 ] 

Sushanth Sowmyan commented on HIVE-9736:


Hi, just so this gets into the precommit queue, could you upload a 
HIVE-9736.5.patch which is really the combination of HIVE-9681 and 
HIVE-9736.4.patch and set this jira to patch-available?

When committing it, I'll be sure to use the .4.patch, even uploading a new 
.6.patch which is its equivalent to make it clear for future java visitors, but 
this would make the precommit queue pick it up.

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10530) Aggregate stats cache: bug fixes for RDBMS path

2015-04-29 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-10530:

Fix Version/s: 1.2.0

 Aggregate stats cache: bug fixes for RDBMS path
 ---

 Key: HIVE-10530
 URL: https://issues.apache.org/jira/browse/HIVE-10530
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.2.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 1.2.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10531) Implement isClosed() to HiveQueryResultSet

2015-04-29 Thread Yun-young LEE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun-young LEE updated HIVE-10531:
-
Attachment: HiveQueryResultSet_isClosed.patch

implement isClosed() on HiveQueryResultSet

 Implement isClosed() to HiveQueryResultSet
 --

 Key: HIVE-10531
 URL: https://issues.apache.org/jira/browse/HIVE-10531
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 1.1.0
Reporter: Yun-young LEE
Priority: Minor
 Attachments: HiveQueryResultSet_isClosed.patch


 HiveQueryResultSet can implement isClosed() method by isClosed field, but 
 remains as SQLException(Method not supported) from HiveBaseResultSet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-04-29 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518930#comment-14518930
 ] 

Bing Li commented on HIVE-4577:
---

The failure should not related to this patch.

 hive CLI can't handle hadoop dfs command  with space and quotes.
 

 Key: HIVE-4577
 URL: https://issues.apache.org/jira/browse/HIVE-4577
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.1.0
Reporter: Bing Li
Assignee: Bing Li
 Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
 HIVE-4577.3.patch.txt, HIVE-4577.4.patch


 As design, hive could support hadoop dfs command in hive shell, like 
 hive dfs -mkdir /user/biadmin/mydir;
 but has different behavior with hadoop if the path contains space and quotes
 hive dfs -mkdir hello; 
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
 /user/biadmin/hello
 hive dfs -mkdir 'world';
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
 /user/biadmin/'world'
 hive dfs -mkdir bei jing;
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
 /user/biadmin/bei
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
 /user/biadmin/jing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10382) Aggregate stats cache for RDBMS based metastore codepath

2015-04-29 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-10382:
--
Labels: TODOC1.2  (was: )

 Aggregate stats cache for RDBMS based metastore codepath
 

 Key: HIVE-10382
 URL: https://issues.apache.org/jira/browse/HIVE-10382
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 1.2.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-10382.1.patch, HIVE-10382.1.patch, 
 HIVE-10382.2.patch, HIVE-10382.2.patch, HIVE-10382.3.patch


 Similar to the work done on the HBase branch (HIVE-9693), the stats cache can 
 potentially have performance gains.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10511) Replacing the implementation of Hive CLI using Beeline

2015-04-29 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518851#comment-14518851
 ] 

Ferdinand Xu commented on HIVE-10511:
-

I have an investigation for the hive cli. It can help us define the scope of 
this jira.
*Options*
HiveCli is supporting the following options(Detailed information is available 
in the class OptionsProcessor):
* database:
* execute quoted query string
* execute query file
* specify initial query file
* set hiveconf
* define variable
* set hivevar
* silent mode
* verbose
* help

By replacing the HIVE Cli, we need to implement all of the options one by one 
using beeline functionality.

*Iterative commands*
Belows are the commands used by HIVE Cli.
* quite exit
* source
* +commands begin with !(execute shell command directly)+
* processLocalCommand (executed by CommandProcessor: dirver, 
addResourceProcessor, etc)

For commands beginning with ‘!’, it is treated as sql command other than shell 
command in beeline. Addressing this issue, we need add one configuration 
allowing user to choose command style in beeline or that in hivecli.

Others points including RCFileCat and completes, it should be supported by 
using beeline funcionality.
Any thoughts about it, [~xuefuz]?

 Replacing the implementation of Hive CLI using Beeline
 --

 Key: HIVE-10511
 URL: https://issues.apache.org/jira/browse/HIVE-10511
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
Assignee: Ferdinand Xu

 Hive CLI is a legacy tool which had two main use cases: 
 1. a thick client for SQL on hadoop
 2. a command line tool for HiveServer1.
 HiveServer1 is already deprecated and removed from Hive code base, so  use 
 case #2 is out of the question. For #1, Beeline provides or is supposed to 
 provides equal functionality, yet is implemented differently from Hive CLI.
 As it has been a while that Hive community has been recommending Beeline + 
 HS2 configuration, ideally we should deprecating Hive CLI. Because of wide 
 use of Hive CLI, we instead propose replacing Hive CLI's implementation with 
 Beeline plus embedded HS2 so that Hive community only needs to maintain a 
 single code path. In this way, Hive CLI is just an alias to Beeline at either 
 shell script level or at high code level. The goal is that  no changes or 
 minimum changes are expected from existing user scrip using Hive CLI.
 This is an Umbrella JIRA covering all tasks related to this initiative. Over 
 the last year or two, Beeline has been improved significantly to match what 
 Hive CLI offers. Still, there may still be some gaps or deficiency to be 
 discovered and fixed. In the meantime, we also want to make sure the enough 
 tests are included and performance impact is identified and addressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10513) [CBO] return path : Fix create_func1.q for return path

2015-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518759#comment-14518759
 ] 

Hive QA commented on HIVE-10513:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12728846/HIVE-10513.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8824 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3637/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3637/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3637/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12728846 - PreCommit-HIVE-TRUNK-Build

 [CBO] return path : Fix create_func1.q for return path
 --

 Key: HIVE-10513
 URL: https://issues.apache.org/jira/browse/HIVE-10513
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10513.patch


 throws class cast exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10500) Repeated deadlocks in underlying RDBMS cause transaction or lock failure

2015-04-29 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518978#comment-14518978
 ] 

Lefty Leverenz commented on HIVE-10500:
---

If the trunk commit is 8981f365bf0cf921bc0ac2ff8914df44ca2f7de7 then it has a 
typo in the JIRA number:  HIVE-10050 Added backoff for deadlock retry. Also 
make sure to reset the deadlock counter at appropriate points.

 Repeated deadlocks in underlying RDBMS cause transaction or lock failure
 

 Key: HIVE-10500
 URL: https://issues.apache.org/jira/browse/HIVE-10500
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0, 1.0.0, 1.1.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 1.2.0

 Attachments: HIVE-10050.patch


 In some cases in a busy system, deadlocks in the metastore RDBMS can cause 
 failures in Hive locks and transactions when using DbTxnManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10532) SpecificMutableRow doesn't handle Date Type correctly

2015-04-29 Thread Cheng Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Hao resolved HIVE-10532.
--
Resolution: Invalid

Oops.

 SpecificMutableRow doesn't handle Date Type correctly
 -

 Key: HIVE-10532
 URL: https://issues.apache.org/jira/browse/HIVE-10532
 Project: Hive
  Issue Type: Bug
Reporter: Cheng Hao

 {code}
   test(test DATE types in cache) {
 val rows = TestSQLContext.jdbc(urlWithUserAndPass, 
 TEST.TIMETYPES).collect()
 TestSQLContext.jdbc(urlWithUserAndPass, 
 TEST.TIMETYPES).cache().registerTempTable(mycached_date)
 val cachedRows = sql(select * from mycached_date).collect()
 assert(rows(0).getAs[java.sql.Date](1) === 
 java.sql.Date.valueOf(1996-01-01))
 assert(cachedRows(0).getAs[java.sql.Date](1) === 
 java.sql.Date.valueOf(1996-01-01))
   }
 {code}
 {panel}
 java.lang.ClassCastException: 
 org.apache.spark.sql.catalyst.expressions.MutableAny cannot be cast to 
 org.apache.spark.sql.catalyst.expressions.MutableInt
   at 
 org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.getInt(SpecificMutableRow.scala:252)
   at 
 org.apache.spark.sql.columnar.IntColumnStats.gatherStats(ColumnStats.scala:208)
   at 
 org.apache.spark.sql.columnar.NullableColumnBuilder$class.appendFrom(NullableColumnBuilder.scala:56)
   at 
 org.apache.spark.sql.columnar.NativeColumnBuilder.org$apache$spark$sql$columnar$compression$CompressibleColumnBuilder$$super$appendFrom(ColumnBuilder.scala:87)
   at 
 org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.appendFrom(CompressibleColumnBuilder.scala:78)
   at 
 org.apache.spark.sql.columnar.NativeColumnBuilder.appendFrom(ColumnBuilder.scala:87)
   at 
 org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:148)
   at 
 org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:124)
   at 
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:277)
   at 
 org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
   at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10530) Aggregate stats cache: bug fixes for RDBMS path

2015-04-29 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518879#comment-14518879
 ] 

Vaibhav Gumashta commented on HIVE-10530:
-

cc [~mmokhtar] [~thejas]

 Aggregate stats cache: bug fixes for RDBMS path
 ---

 Key: HIVE-10530
 URL: https://issues.apache.org/jira/browse/HIVE-10530
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.2.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 1.2.0

 Attachments: HIVE-10530.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9681) Extend HiveAuthorizationProvider to support partition-sets.

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518861#comment-14518861
 ] 

Sushanth Sowmyan commented on HIVE-9681:


+1, marking as patch-available for tests to run, now that HIVE-9674 has been 
committed.

 Extend HiveAuthorizationProvider to support partition-sets.
 ---

 Key: HIVE-9681
 URL: https://issues.apache.org/jira/browse/HIVE-9681
 Project: Hive
  Issue Type: Bug
  Components: Security
Affects Versions: 0.14.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9681.1.patch, HIVE-9681.2.patch


 {{HiveAuthorizationProvider}} allows only for the authorization of a single 
 partition at a time. For instance, when the {{StorageBasedAuthProvider}} must 
 authorize an operation on a set of partitions (say from a 
 PreDropPartitionEvent), each partition's data-directory needs to be checked 
 individually. For N partitions, this results in N namenode calls.
 I'd like to add {{authorize()}} overloads that accept multiple partitions. 
 This will allow StorageBasedAuthProvider to make batched namenode calls. 
 P.S. There's 2 further optimizations that are possible:
 1. In the ideal case, we'd have a single call in 
 {{org.apache.hadoop.fs.FileSystem}} to check access for an array of Paths, 
 something like:
 {code:title=FileSystem.java|borderStyle=solid}
 @InterfaceAudience.LimitedPrivate({HDFS, Hive})
   public void access(Path [] paths, FsAction mode) throws 
 AccessControlException, FileNotFoundException, IOException 
 {...}
 {code}
 2. We can go one better if we could retrieve partition-locations in DirectSQL 
 and use those for authorization. The EventListener-abstraction behind which 
 the AuthProviders operate make this difficult. I can attempt to solve this 
 using a PartitionSpec and a call-back into the ObjectStore from 
 StorageBasedAuthProvider. I'll save this rigmarole for later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518943#comment-14518943
 ] 

Sushanth Sowmyan edited comment on HIVE-9736 at 4/29/15 8:13 AM:
-

Hi, just so this gets into the precommit queue, could you upload a 
HIVE-9736.5.patch which is really the combination of HIVE-9681 and 
HIVE-9736.4.patch and set this jira to patch-available?

When committing it, I'll be sure to use the .4.patch, even uploading a new 
.6.patch which is its equivalent to make it clear for future jira visitors, but 
this would make the precommit queue pick it up.


was (Author: sushanth):
Hi, just so this gets into the precommit queue, could you upload a 
HIVE-9736.5.patch which is really the combination of HIVE-9681 and 
HIVE-9736.4.patch and set this jira to patch-available?

When committing it, I'll be sure to use the .4.patch, even uploading a new 
.6.patch which is its equivalent to make it clear for future java visitors, but 
this would make the precommit queue pick it up.

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10382) Aggregate stats cache for RDBMS based metastore codepath

2015-04-29 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518994#comment-14518994
 ] 

Lefty Leverenz commented on HIVE-10382:
---

Doc note:  This adds 10 configuration parameters to HiveConf.java, so they need 
to be documented in the Metastore section of Configuration Properties -- should 
they also be discussed (or just listed) in the Metastore Admin doc?

#  hive.metastore.aggregate.stats.cache.enabled
#  hive.metastore.aggregate.stats.cache.size
#  hive.metastore.aggregate.stats.cache.max.partitions
#  hive.metastore.aggregate.stats.cache.fpp
#  hive.metastore.aggregate.stats.cache.max.variance
#  hive.metastore.aggregate.stats.cache.ttl
#  hive.metastore.aggregate.stats.cache.max.writer.wait
#  hive.metastore.aggregate.stats.cache.max.reader.wait
#  hive.metastore.aggregate.stats.cache.max.full
#  hive.metastore.aggregate.stats.cache.clean.until

* [Configuration Properties -- Metastore | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-MetaStore]
* [Metastore Administration | 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin]
** [Metastore Admin -- Additional Configuration Parameters | 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-AdditionalConfigurationParameters]

 Aggregate stats cache for RDBMS based metastore codepath
 

 Key: HIVE-10382
 URL: https://issues.apache.org/jira/browse/HIVE-10382
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 1.2.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-10382.1.patch, HIVE-10382.1.patch, 
 HIVE-10382.2.patch, HIVE-10382.2.patch, HIVE-10382.3.patch


 Similar to the work done on the HBase branch (HIVE-9693), the stats cache can 
 potentially have performance gains.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3404) Create quarter UDF

2015-04-29 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518847#comment-14518847
 ] 

Jason Dere commented on HIVE-3404:
--

+1

 Create quarter UDF
 --

 Key: HIVE-3404
 URL: https://issues.apache.org/jira/browse/HIVE-3404
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Sanam Naz
Assignee: Alexander Pivovarov
 Attachments: HIVE-3404.1.patch.txt, HIVE-3404.2.patch, 
 HIVE-3404.2.patch, HIVE-3404.3.patch


 The function QUARTER(date) would return the quarter  from a string / date / 
 timestamp. This will be useful for different domains like retail ,finance etc.
 MySQL has QUARTER function
 https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_quarter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata

2015-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518850#comment-14518850
 ] 

Hive QA commented on HIVE-9451:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12728854/HIVE-9451.patch

{color:red}ERROR:{color} -1 due to 38 failed/errored test(s), 8827 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_multi_insert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3638/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3638/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3638/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 38 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12728854 - PreCommit-HIVE-TRUNK-Build

 Add max size of 

[jira] [Updated] (HIVE-10071) CBO (Calcite Return Path): Join to MultiJoin rule

2015-04-29 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10071:
---
Description: CBO return path: auto_join3.q can be used to reproduce the 
problem.

 CBO (Calcite Return Path): Join to MultiJoin rule
 -

 Key: HIVE-10071
 URL: https://issues.apache.org/jira/browse/HIVE-10071
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-10071.patch


 CBO return path: auto_join3.q can be used to reproduce the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9674) *DropPartitionEvent should handle partition-sets.

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518829#comment-14518829
 ] 

Sushanth Sowmyan commented on HIVE-9674:


Will go ahead and commit this to master and to 1.2.

 *DropPartitionEvent should handle partition-sets.
 -

 Key: HIVE-9674
 URL: https://issues.apache.org/jira/browse/HIVE-9674
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9674.2.patch, HIVE-9674.3.patch, HIVE-9674.4.patch, 
 HIVE-9674.5.patch


 Dropping a set of N partitions from a table currently results in N 
 DropPartitionEvents (and N PreDropPartitionEvents) being fired serially. This 
 is wasteful, especially so for large N. It also makes it impossible to even 
 try to run authorization-checks on all partitions in a batch.
 Taking the cue from HIVE-9609, we should compose an {{IterablePartition}} 
 in the event, and expose them via an {{Iterator}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10485) Create md5 UDF

2015-04-29 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518838#comment-14518838
 ] 

Jason Dere commented on HIVE-10485:
---

+1 if tests look good

 Create md5 UDF
 --

 Key: HIVE-10485
 URL: https://issues.apache.org/jira/browse/HIVE-10485
 Project: Hive
  Issue Type: Task
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10485.1.patch, HIVE-10485.2.patch, 
 HIVE-10485.3.patch


 MD5(str)
 Calculates an MD5 128-bit checksum for the string. The value is returned as a 
 string of 32 hex digits, or NULL if the argument was NULL. The return value 
 can, for example, be used as a hash key.
 Example:
 {code}
 SELECT MD5('udf_md5');
 'ce62ef0d2d27dc37b6d488b92f4b24fd'
 {code}
 online md5 generator: http://www.md5.cz/
 MySQL has md5 function: 
 https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5
 PostgreSQL also has md5 function: 
 http://www.postgresql.org/docs/9.1/static/functions-string.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-04-29 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-9736:
---
Attachment: HIVE-9736.5.patch

As per [~sushanth]'s suggestion, I've squashed the patches for HIVE-9681 and 
HIVE-9736 into a single one. This should allow the patch to apply to trunk, for 
tests.

(Good idea, Sush.)

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10530) Aggregate stats cache: bug fixes for RDBMS path

2015-04-29 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-10530:

Attachment: HIVE-10530.1.patch

 Aggregate stats cache: bug fixes for RDBMS path
 ---

 Key: HIVE-10530
 URL: https://issues.apache.org/jira/browse/HIVE-10530
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.2.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 1.2.0

 Attachments: HIVE-10530.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9451) Add max size of column dictionaries to ORC metadata

2015-04-29 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-9451:
---
Labels: ORC  (was: )

 Add max size of column dictionaries to ORC metadata
 ---

 Key: HIVE-9451
 URL: https://issues.apache.org/jira/browse/HIVE-9451
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley
  Labels: ORC
 Fix For: 1.2.0

 Attachments: HIVE-9451.patch, HIVE-9451.patch


 To predict the amount of memory required to read an ORC file we need to know 
 the size of the dictionaries for the columns that we are reading. I propose 
 adding the number of bytes for each column's dictionary to the stripe's 
 column statistics. The file's column statistics would have the maximum 
 dictionary size for each column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-04-29 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-9736:
---
Attachment: HIVE-9736.4.patch

The reason this patch didn't apply to trunk/ is that it depends on HIVE-9681. :/

Here's the patch that incorporates [~cnauroth]'s suggestion to combine 
{{FsActions}} into a single instance, to reduce RPCs. I'm afraid that still 
doesn't obviate the overload we added to {{Hadoop*Shims*}} since we needed a 
new overload anyway to pluralize the {{FileStatus}} argument.

The compromise in the patch is to reduce the RPC calls, but keep the overload.

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8890) HiveServer2 dynamic service discovery: use persistent ephemeral nodes curator recipe

2015-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519024#comment-14519024
 ] 

Hive QA commented on HIVE-8890:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12728868/HIVE-8890.4.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8826 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3640/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3640/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3640/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12728868 - PreCommit-HIVE-TRUNK-Build

 HiveServer2 dynamic service discovery: use persistent ephemeral nodes curator 
 recipe
 

 Key: HIVE-8890
 URL: https://issues.apache.org/jira/browse/HIVE-8890
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0, 1.0.0, 1.1.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
Priority: Critical
 Fix For: 1.2.0

 Attachments: HIVE-8890.1.patch, HIVE-8890.2.patch, HIVE-8890.3.patch, 
 HIVE-8890.4.patch


 Using this recipe gives better reliability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10529) Remove references to tez task context before storing operator plan in object cache

2015-04-29 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10529:
---
Summary: Remove references to tez task context before storing operator plan 
in object cache  (was: Cleanup tezcontext reference in 
org.apache.hadoop.hive.ql.exec.tez.HashTableLoader)

 Remove references to tez task context before storing operator plan in object 
 cache
 --

 Key: HIVE-10529
 URL: https://issues.apache.org/jira/browse/HIVE-10529
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan
 Attachments: HIVE-10529.1.patch, hive_hashtable_loader.png






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10286) SARGs: Type Safety via PredicateLeaf.type

2015-04-29 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10286:
-
Attachment: HIVE-10286.4.patch

Added more tests to cover all branches in predicate evaluation. 

 SARGs: Type Safety via PredicateLeaf.type
 -

 Key: HIVE-10286
 URL: https://issues.apache.org/jira/browse/HIVE-10286
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Serializers/Deserializers
Reporter: Gopal V
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10286.1.patch, HIVE-10286.2.patch, 
 HIVE-10286.4.patch


 The Sargs impl today converts the statsObj to the type of the predicate 
 object before doing any comparisons.
 To satisfy the PPD requirements, the conversion has to be coerced to the type 
 specified in PredicateLeaf.type.
 The type conversions in Hive are standard and have a fixed promotion order.
 Therefore the PredicateLeaf has to do type changes which match the exact 
 order of type coercions offered by the FilterOperator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10513) [CBO] return path : Fix create_func1.q for return path

2015-04-29 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519584#comment-14519584
 ] 

Ashutosh Chauhan commented on HIVE-10513:
-

[~jpullokkaran] can you take a look?

 [CBO] return path : Fix create_func1.q for return path
 --

 Key: HIVE-10513
 URL: https://issues.apache.org/jira/browse/HIVE-10513
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10513.patch


 throws class cast exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10480) LLAP: Tez task is interrupted for unknown reason after an IPC exception and then fails to report completion

2015-04-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned HIVE-10480:
-

Assignee: Siddharth Seth

 LLAP: Tez task is interrupted for unknown reason after an IPC exception and 
 then fails to report completion
 ---

 Key: HIVE-10480
 URL: https://issues.apache.org/jira/browse/HIVE-10480
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth
 Attachments: HIVE-10480.1.txt


 No idea if this is LLAP bug, Tez bug, Hadoop IPC bug (due to patch on the 
 cluster), or all 3.
 So for now I will just dump all I have here.
 TPCH Q1 started running for a long time for me on large number of runs today 
 (didn't happen yesterday). It would always be one Map task timing out.
  Example attempt (logs from am):
 {noformat}
 2015-04-24 11:11:01,073 INFO [TaskCommunicator # 0] 
 tezplugins.LlapTaskCommunicator: Successfully launched task: 
 attempt_1429683757595_0321_9_00_000928_0
 2015-04-24 11:16:25,498 INFO [Dispatcher thread: Central] 
 history.HistoryEventHandler: 
 [HISTORY][DAG:dag_1429683757595_0321_9][Event:TASK_ATTEMPT_FINISHED]: 
 vertexName=Map 1, taskAttemptId=attempt_1429683757595_0321_9_00_000928_0, 
 startTime=1429899061071, finishTime=1429899385498, timeTaken=324427, 
 status=FAILED, errorEnum=TASK_HEARTBEAT_ERROR, 
 diagnostics=AttemptID:attempt_1429683757595_0321_9_00_000928_0 Timed out 
 after 300 secs, counters=Counters: 1, 
 org.apache.tez.common.counters.DAGCounter, RACK_LOCAL_TASKS=1
 {noformat}
 No other lines for this attempt in between.
 However there's this:
 {noformat}
 2015-04-24 11:11:01,074 WARN [Socket Reader #1 for port 59446] ipc.Server: 
 Unable to read call parameters for client 172.19.128.56on connection protocol 
 org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol for rpcKind 
 RPC_WRITABLE
 java.lang.ArrayIndexOutOfBoundsException
 2015-04-24 11:11:01,075 INFO [Socket Reader #1 for port 59446] ipc.Server: 
 Socket Reader #1 for port 59446: readAndProcess from client 172.19.128.56 
 threw exception [org.apache.hadoop.ipc.RpcServerException: IPC server unable 
 to read call parameters: null]
 {noformat}
 On LLAP, the following is logged 
 {noformat}
 2015-04-24 11:11:01,142 [TaskHeartbeatThread()] ERROR 
 org.apache.tez.runtime.task.TezTaskRunner: TaskReporter reported error
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException):
  IPC server unable to read call parameters: null
 at org.apache.hadoop.ipc.Client.call(Client.java:1492)
 at org.apache.hadoop.ipc.Client.call(Client.java:1423)
 at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:242)
 at com.sun.proxy.$Proxy19.heartbeat(Unknown Source)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:258)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:186)
 at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:128)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 The attempt starts but is then interrupted (not clear by whom)
 {noformat}
 2015-04-24 11:11:01,144 [Initializer 
 0(container_1_0321_01_008943_sershe_20150424110948_86ce1f6f-7cd2-4a40-b9a6-4a6854f010f6:9_Map
  1_928_0)] INFO org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: 
 Initialized Input with src edge: lineitem
 2015-04-24 11:11:01,145 
 [TezTaskRunner_attempt_1429683757595_0321_9_00_000928_0(container_1_0321_01_008943_sershe_20150424110948_86ce1f6f-7cd2-4a40-b9a6-4a6854f010f6:9_Map
  1_928_0)] INFO org.apache.tez.runtime.task.TezTaskRunner: Encounted an error 
 while executing task: attempt_1429683757595_0321_9_00_000928_0
 java.lang.InterruptedException
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
 at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:439)
 at 
 java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218)
 at 
 

[jira] [Commented] (HIVE-9365) The Metastore should take port configuration from hive-site.xml

2015-04-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519625#comment-14519625
 ] 

Sergio Peña commented on HIVE-9365:
---

[~sircodesalot] could you upload the patch to the review board?

 The Metastore should take port configuration from hive-site.xml
 ---

 Key: HIVE-9365
 URL: https://issues.apache.org/jira/browse/HIVE-9365
 Project: Hive
  Issue Type: Improvement
Reporter: Nicolas Thiébaud
Assignee: Reuben Kuhnert
Priority: Minor
  Labels: metastore
 Attachments: HIVE-9365.01.patch, HIVE-9365.02.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 As opposed to the cli. Having this configuration in the launcher script 
 create fragmentation and does is not consistent with the way the hive stack 
 is configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10524) Add utility method ExprNodeDescUtils.forwardTrack()

2015-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519754#comment-14519754
 ] 

Hive QA commented on HIVE-10524:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12728910/HIVE-10524.1.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8826 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3646/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3646/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3646/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12728910 - PreCommit-HIVE-TRUNK-Build

 Add utility method ExprNodeDescUtils.forwardTrack()
 ---

 Key: HIVE-10524
 URL: https://issues.apache.org/jira/browse/HIVE-10524
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10524.1.patch


 ExprNodeDescUtils has a method backtrack(), which is able to take an 
 ExprNodeDesc from an operator and convert it to an equivalent expression 
 based on the columns of a parent operator. Adding a forwardTrack() method to 
 do something similar, but for a child operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10513) [CBO] return path : Fix create_func1.q for return path

2015-04-29 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520044#comment-14520044
 ] 

Laljo John Pullokkaran commented on HIVE-10513:
---

+1

 [CBO] return path : Fix create_func1.q for return path
 --

 Key: HIVE-10513
 URL: https://issues.apache.org/jira/browse/HIVE-10513
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10513.patch


 throws class cast exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10535) LLAP: Cleanup map join cache when a query completes

2015-04-29 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520051#comment-14520051
 ] 

Prasanth Jayachandran commented on HIVE-10535:
--

Duplicate of HIVE-10456? I added cleanup code for map join case as well.

 LLAP: Cleanup map join cache when a query completes
 ---

 Key: HIVE-10535
 URL: https://issues.apache.org/jira/browse/HIVE-10535
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
 Fix For: llap






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10522) CBO (Calcite Return Path): fix the wrong needed column names when TS is created

2015-04-29 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520097#comment-14520097
 ] 

Laljo John Pullokkaran commented on HIVE-10522:
---

+1

 CBO (Calcite Return Path): fix the wrong needed column names when TS is 
 created
 ---

 Key: HIVE-10522
 URL: https://issues.apache.org/jira/browse/HIVE-10522
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Critical
 Fix For: 1.2.0

 Attachments: HIVE-10522.01.patch, HIVE-10522.02.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-04-29 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520122#comment-14520122
 ] 

Peter Slawski commented on HIVE-10538:
--

Currently working on testing the fix for this issue.

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski

 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
   ... 8 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10529) Remove references to tez task context before storing operator plan in object cache

2015-04-29 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519890#comment-14519890
 ] 

Gunther Hagleitner commented on HIVE-10529:
---

+1

 Remove references to tez task context before storing operator plan in object 
 cache
 --

 Key: HIVE-10529
 URL: https://issues.apache.org/jira/browse/HIVE-10529
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: HIVE-10529.1.patch, HIVE-10529.2.patch, 
 hive_hashtable_loader.png






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10423) HIVE-7948 breaks deploy_e2e_artifacts.sh

2015-04-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520062#comment-14520062
 ] 

Eugene Koifman commented on HIVE-10423:
---

is there a reason the downloaded artifacts need to be under the source tree?
Could they be placed in testdist/?

 HIVE-7948 breaks deploy_e2e_artifacts.sh
 

 Key: HIVE-10423
 URL: https://issues.apache.org/jira/browse/HIVE-10423
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Aswathy Chellammal Sreekumar
 Attachments: HIVE-10423.patch


 HIVE-7948 added a step to download a ml-1m.zip file and unzip it.
 this only works if you call deploy_e2e_artifacts.sh once.  If you call it 
 again (which is very common in dev) it blocks and ask for additional input 
 from user because target files already exist.
 This needs to be changed similarly to what we discussed for HIVE-9272, i.e. 
 place artifacts not under source control in testdist/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10428) NPE in RegexSerDe using HCat

2015-04-29 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10428:

Affects Version/s: 1.1.0
   0.13.0
   0.14.0
   1.0.0

 NPE in RegexSerDe using HCat
 

 Key: HIVE-10428
 URL: https://issues.apache.org/jira/browse/HIVE-10428
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 1.2.0

 Attachments: HIVE-10428.1.patch, HIVE-10428.2.patch


 When HCatalog calls to table with org.apache.hadoop.hive.serde2.RegexSerDe, 
 when doing Hcatalog call to get read the table, it throws exception:
 {noformat}
 15/04/21 14:07:31 INFO security.TokenCache: Got dt for hdfs://hdpsecahdfs; 
 Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hdpsecahdfs, Ident: 
 (HDFS_DELEGATION_TOKEN token 1478 for haha)
 15/04/21 14:07:31 INFO mapred.FileInputFormat: Total input paths to process : 
 1
 Splits len : 1
 SplitInfo : [hdpseca03.seca.hwxsup.com, hdpseca04.seca.hwxsup.com, 
 hdpseca05.seca.hwxsup.com]
 15/04/21 14:07:31 INFO mapreduce.InternalUtil: Initializing 
 org.apache.hadoop.hive.serde2.RegexSerDe with properties 
 {name=casetest.regex_table, numFiles=1, columns.types=string,string, 
 serialization.format=1, columns=id,name, rawDataSize=0, numRows=0, 
 output.format.string=%1$s %2$s, 
 serialization.lib=org.apache.hadoop.hive.serde2.RegexSerDe, 
 COLUMN_STATS_ACCURATE=true, totalSize=25, serialization.null.format=\N, 
 input.regex=([^ ]*) ([^ ]*), transient_lastDdlTime=1429590172}
 15/04/21 14:07:31 WARN serde2.RegexSerDe: output.format.string has been 
 deprecated
 Exception in thread main java.lang.NullPointerException
   at 
 com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
   at com.google.common.base.Splitter.split(Splitter.java:371)
   at 
 org.apache.hadoop.hive.serde2.RegexSerDe.initialize(RegexSerDe.java:155)
   at 
 org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:49)
   at 
 org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:518)
   at 
 org.apache.hive.hcatalog.mapreduce.InternalUtil.initializeDeserializer(InternalUtil.java:156)
   at 
 org.apache.hive.hcatalog.mapreduce.HCatRecordReader.createDeserializer(HCatRecordReader.java:127)
   at 
 org.apache.hive.hcatalog.mapreduce.HCatRecordReader.initialize(HCatRecordReader.java:92)
   at HCatalogSQLMR.main(HCatalogSQLMR.java:81)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10444) HIVE-10223 breaks hadoop-1 build

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520036#comment-14520036
 ] 

Sushanth Sowmyan commented on HIVE-10444:
-

Added to 
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status as a 
blocker.

 HIVE-10223 breaks hadoop-1 build
 

 Key: HIVE-10444
 URL: https://issues.apache.org/jira/browse/HIVE-10444
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Gunther Hagleitner

 FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 
 are not present in hadoop 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10453) HS2 leaking open file descriptors when using UDFs

2015-04-29 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520087#comment-14520087
 ] 

Yongzhi Chen commented on HIVE-10453:
-

In a session, there are possible several classloaders used (for example 
different threads used for queries), at the session close time, it may not   
get all the classloaders to close them which caused the leak. Fixed it by add 
all the classloaders used in registerToSessionRegistry method to a set and 
close them at the session close time. 

 HS2 leaking open file descriptors when using UDFs
 -

 Key: HIVE-10453
 URL: https://issues.apache.org/jira/browse/HIVE-10453
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10453.1.patch


 1. create a custom function by
 CREATE FUNCTION myfunc AS 'someudfclass' using jar 'hdfs:///tmp/myudf.jar';
 2. Create a simple jdbc client, just do 
 connect, 
 run simple query which using the function such as:
 select myfunc(col1) from sometable
 3. Disconnect.
 Check open file for HiveServer2 by:
 lsof -p HSProcID | grep myudf.jar
 You will see the leak as:
 {noformat}
 java  28718 ychen  txt  REG1,4741 212977666 
 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
 java  28718 ychen  330r REG1,4741 212977666 
 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10403) Add n-way join support for Hybrid Grace Hash Join

2015-04-29 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-10403:
-
Attachment: HIVE-10403.07.patch

Upload the same patch (renamed to 07) just in case the current Jenkins job fails

 Add n-way join support for Hybrid Grace Hash Join
 -

 Key: HIVE-10403
 URL: https://issues.apache.org/jira/browse/HIVE-10403
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng
 Attachments: HIVE-10403.01.patch, HIVE-10403.02.patch, 
 HIVE-10403.03.patch, HIVE-10403.04.patch, HIVE-10403.06.patch, 
 HIVE-10403.07.patch


 Currently Hybrid Grace Hash Join only supports 2-way join (one big table and 
 one small table). This task will enable n-way join (one big table and 
 multiple small tables).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values

2015-04-29 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519997#comment-14519997
 ] 

Alexander Pivovarov commented on HIVE-10488:


Got it! Thank you.

BTW, I remember it was one issue with date to timestamp conversion for negative 
unix time HIVE-10178. But it is fixed already and has nothing to do with Orc.
{code}
select cast(cast('1966-01-01 00:00:01' as timestamp) as date);
1966-02-02
{code}

 cast DATE as TIMESTAMP returns incorrect values
 ---

 Key: HIVE-10488
 URL: https://issues.apache.org/jira/browse/HIVE-10488
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.13.1
Reporter: N Campbell
Assignee: Chaoyu Tang

 same data in textfile works
 same data loaded into an ORC table does not
 connection property of tez/mr makes no difference.
 select rnum, cdt, cast (cdt as timestamp) from tdt
 0 null  null
 1 1996-01-01  1969-12-31 19:00:09.496
 2 2000-01-01  1969-12-31 19:00:10.957
 3 2000-12-31  1969-12-31 19:00:11.322
 vs
 0 null  null
 1 1996-01-01  1996-01-01 00:00:00.0
 2 2000-01-01  2000-01-01 00:00:00.0
 3 2000-12-31  2000-12-31 00:00:00.0
 create table  if not exists TDT ( RNUM int , CDT date   )
  STORED AS orc  ;
 insert overwrite table TDT select * from  text.TDT;
 0|\N
 1|1996-01-01
 2|2000-01-01
 3|2000-12-31



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9272) Tests for utf-8 support

2015-04-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520047#comment-14520047
 ] 

Eugene Koifman commented on HIVE-9272:
--

Hive_UTF8 is failing.  I think because row is a reserved keyword.
Also I'm seeing messages like “Wide character in print at TestDriverCurl.pm 
line 563”  (and line 901) - not sure what the significance of this is

 Tests for utf-8 support
 ---

 Key: HIVE-9272
 URL: https://issues.apache.org/jira/browse/HIVE-9272
 Project: Hive
  Issue Type: Test
  Components: Tests, WebHCat
Affects Versions: 0.14.0
Reporter: Aswathy Chellammal Sreekumar
Assignee: Aswathy Chellammal Sreekumar
Priority: Minor
 Attachments: HIVE-9272.1.patch, HIVE-9272.2.patch, HIVE-9272.3.patch, 
 HIVE-9272.4.patch, HIVE-9272.5.patch, HIVE-9272.6.patch, HIVE-9272.7.patch, 
 HIVE-9272.patch


 Including some test cases for utf8 support in webhcat. The first four tests 
 invoke hive, pig, mapred and streaming apis for testing the utf8 support for 
 data processed, file names and job name. The last test case tests the 
 filtering of job name with utf8 character



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10453) HS2 leaking open file descriptors when using UDFs

2015-04-29 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-10453:

Attachment: HIVE-10453.1.patch

 HS2 leaking open file descriptors when using UDFs
 -

 Key: HIVE-10453
 URL: https://issues.apache.org/jira/browse/HIVE-10453
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10453.1.patch


 1. create a custom function by
 CREATE FUNCTION myfunc AS 'someudfclass' using jar 'hdfs:///tmp/myudf.jar';
 2. Create a simple jdbc client, just do 
 connect, 
 run simple query which using the function such as:
 select myfunc(col1) from sometable
 3. Disconnect.
 Check open file for HiveServer2 by:
 lsof -p HSProcID | grep myudf.jar
 You will see the leak as:
 {noformat}
 java  28718 ychen  txt  REG1,4741 212977666 
 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
 java  28718 ychen  330r REG1,4741 212977666 
 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10536) HIVE-10223 breaks -Phadoop-1

2015-04-29 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-10536.
--
Resolution: Duplicate

Duplicate of HIVE-10444

 HIVE-10223 breaks -Phadoop-1
 

 Key: HIVE-10536
 URL: https://issues.apache.org/jira/browse/HIVE-10536
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan

 Looks like HIVE-10223 broke -Phadoop-1 compatibility for compilation, from 
 the looks of it. We need to fix it. Even if we decide to drop support for 
 -Phadoop-1 in master, we should fix it for branch-1.2
 {noformat}
 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 /Users/sush/dev/hive.git/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java:[515,19]
  cannot find symbol
   symbol:   method isFile()
   location: variable fileStatus of type org.apache.hadoop.fs.FileStatus
 [ERROR] 
 /Users/sush/dev/hive.git/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java:[545,26]
  cannot find symbol
   symbol:   method isDirectory()
   location: variable fileStatus of type org.apache.hadoop.fs.FileStatus
 [INFO] 2 errors 
 [INFO] -
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10444) HIVE-10223 breaks hadoop-1 build

2015-04-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-10444:
-
Assignee: Chris Nauroth  (was: Gunther Hagleitner)

Sorry for the HIVE-10223 breakage.  I'll pick this up.  [~prasanth_j], thank 
you for filing the bug report.

 HIVE-10223 breaks hadoop-1 build
 

 Key: HIVE-10444
 URL: https://issues.apache.org/jira/browse/HIVE-10444
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Chris Nauroth

 FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 
 are not present in hadoop 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10519) Move TestGenericUDF classes to udf.generic package

2015-04-29 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519766#comment-14519766
 ] 

Alexander Pivovarov commented on HIVE-10519:


3 failures are not related to the fix

 Move TestGenericUDF classes to udf.generic package
 --

 Key: HIVE-10519
 URL: https://issues.apache.org/jira/browse/HIVE-10519
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Trivial
 Attachments: HIVE-10519.1.patch, HIVE-10519.2.patch


 The following TestGenericUDF classes are located in udf package instead of 
 udf.generic.
 {code}
 TestGenericUDFDate.java
 TestGenericUDFDateAdd.java
 TestGenericUDFDateDiff.java
 TestGenericUDFDateSub.java
 TestGenericUDFUtils.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10499) Ensure Session/ZooKeeperClient instances are closed

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519796#comment-14519796
 ] 

Sushanth Sowmyan commented on HIVE-10499:
-

Added to 
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status to 
reflect commit. If you have any further known bugfix jiras to include in 1.2, 
please add to that list.

 Ensure Session/ZooKeeperClient instances are closed
 ---

 Key: HIVE-10499
 URL: https://issues.apache.org/jira/browse/HIVE-10499
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 1.2.0

 Attachments: HIVE-10499.patch


 Some Session/ZooKeeperClient instances are not closed in some scenario. We 
 need to make sure they are always closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values

2015-04-29 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519846#comment-14519846
 ] 

Chaoyu Tang commented on HIVE-10488:


[~the6campbells] I was also not able to reproduce the issue in Hive 1.2. Here 
are my steps:
{code}
create table testcastts (key int, datevalue date);
insert into testcastts select 0, null from src limit 1;
insert into testcastts select 1, date '1996-01-01' from src limit 1;
insert into testcastts select 2, date '2000-01-01' from src limit 1;
insert into testcastts select 3, date '2000-12-31' from src limit 1;
---
select key, datevalue, cast(datevalue as timestamp) from testcastts;
0   NULLNULL
1   1996-01-01  1996-01-01 00:00:00
2   2000-01-01  2000-01-01 00:00:00
3   2000-12-31  2000-12-31 00:00:00

---
create table if not exists testcastorcts (key int, datevalue date) stored as 
orc;
insert overwrite table testcastorcts select * from testcastts;
select key, datevalue, cast(datevalue as timestamp) from testcastorcts;
0   NULLNULL
1   1996-01-01  1996-01-01 00:00:00
2   2000-01-01  2000-01-01 00:00:00
3   2000-12-31  2000-12-31 00:00:00
{code}
Do you see any difference between above my test case and yours? Otherwise, I 
will resolve this JIRA as Not Reproducible. Thanks


 cast DATE as TIMESTAMP returns incorrect values
 ---

 Key: HIVE-10488
 URL: https://issues.apache.org/jira/browse/HIVE-10488
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.13.1
Reporter: N Campbell
Assignee: Chaoyu Tang

 same data in textfile works
 same data loaded into an ORC table does not
 connection property of tez/mr makes no difference.
 select rnum, cdt, cast (cdt as timestamp) from tdt
 0 null  null
 1 1996-01-01  1969-12-31 19:00:09.496
 2 2000-01-01  1969-12-31 19:00:10.957
 3 2000-12-31  1969-12-31 19:00:11.322
 vs
 0 null  null
 1 1996-01-01  1996-01-01 00:00:00.0
 2 2000-01-01  2000-01-01 00:00:00.0
 3 2000-12-31  2000-12-31 00:00:00.0
 create table  if not exists TDT ( RNUM int , CDT date   )
  STORED AS orc  ;
 insert overwrite table TDT select * from  text.TDT;
 0|\N
 1|1996-01-01
 2|2000-01-01
 3|2000-12-31



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values

2015-04-29 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519865#comment-14519865
 ] 

Alexander Pivovarov commented on HIVE-10488:


you do not have to put from src limit 1 in select statement in hive-1.2

 cast DATE as TIMESTAMP returns incorrect values
 ---

 Key: HIVE-10488
 URL: https://issues.apache.org/jira/browse/HIVE-10488
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.13.1
Reporter: N Campbell
Assignee: Chaoyu Tang

 same data in textfile works
 same data loaded into an ORC table does not
 connection property of tez/mr makes no difference.
 select rnum, cdt, cast (cdt as timestamp) from tdt
 0 null  null
 1 1996-01-01  1969-12-31 19:00:09.496
 2 2000-01-01  1969-12-31 19:00:10.957
 3 2000-12-31  1969-12-31 19:00:11.322
 vs
 0 null  null
 1 1996-01-01  1996-01-01 00:00:00.0
 2 2000-01-01  2000-01-01 00:00:00.0
 3 2000-12-31  2000-12-31 00:00:00.0
 create table  if not exists TDT ( RNUM int , CDT date   )
  STORED AS orc  ;
 insert overwrite table TDT select * from  text.TDT;
 0|\N
 1|1996-01-01
 2|2000-01-01
 3|2000-12-31



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values

2015-04-29 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519863#comment-14519863
 ] 

Alexander Pivovarov commented on HIVE-10488:


Chaoyu, testcastts should be Orc table. Is Orc default table format in your 
hive config?
Cast is working fine for textfile table for N.

 cast DATE as TIMESTAMP returns incorrect values
 ---

 Key: HIVE-10488
 URL: https://issues.apache.org/jira/browse/HIVE-10488
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.13.1
Reporter: N Campbell
Assignee: Chaoyu Tang

 same data in textfile works
 same data loaded into an ORC table does not
 connection property of tez/mr makes no difference.
 select rnum, cdt, cast (cdt as timestamp) from tdt
 0 null  null
 1 1996-01-01  1969-12-31 19:00:09.496
 2 2000-01-01  1969-12-31 19:00:10.957
 3 2000-12-31  1969-12-31 19:00:11.322
 vs
 0 null  null
 1 1996-01-01  1996-01-01 00:00:00.0
 2 2000-01-01  2000-01-01 00:00:00.0
 3 2000-12-31  2000-12-31 00:00:00.0
 create table  if not exists TDT ( RNUM int , CDT date   )
  STORED AS orc  ;
 insert overwrite table TDT select * from  text.TDT;
 0|\N
 1|1996-01-01
 2|2000-01-01
 3|2000-12-31



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10521) TxnHandler.timeOutTxns only times out some of the expired transactions

2015-04-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519873#comment-14519873
 ] 

Eugene Koifman commented on HIVE-10521:
---

All connections in TxnHandler are autoCommit=false
timeOutTxns() opens a cursor and calls abortTxns() for each batch.
abortTxns() calls commit().  So if the cursor has more than 
TIMED_OUT_TXN_ABORT_BATCH_SIZE rows, it will end up calling commit while the 
cursor is open... I don't think you can do that (and continue to read it after 
that)

TestTxnHandler has unused import import junit.framework.Assert;


 TxnHandler.timeOutTxns only times out some of the expired transactions
 --

 Key: HIVE-10521
 URL: https://issues.apache.org/jira/browse/HIVE-10521
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0, 1.0.0, 1.1.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-10521.patch


 {code}
   for (int i = 0; i  20  rs.next(); i++) deadTxns.add(rs.getLong(1));
   // We don't care whether all of the transactions get deleted or not,
   // if some didn't it most likely means someone else deleted them in the 
 interum
   if (deadTxns.size()  0) abortTxns(dbConn, deadTxns);
 {code}
 While it makes sense to limit the number of transactions aborted in one pass 
 (since this get's translated to an IN clause) we should still make sure all 
 are timed out.  Also, 20 seems pretty small as a batch size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10500) Repeated deadlocks in underlying RDBMS cause transaction or lock failure

2015-04-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519870#comment-14519870
 ] 

Alan Gates commented on HIVE-10500:
---

Yes, I transposed two numbers in the commit message.  I looked at how to fix it 
but according to the documentation I read doing a 'git amend' would actually 
change the hash of the commit, which looked likely to do more damage than it 
fixed.

 Repeated deadlocks in underlying RDBMS cause transaction or lock failure
 

 Key: HIVE-10500
 URL: https://issues.apache.org/jira/browse/HIVE-10500
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0, 1.0.0, 1.1.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 1.2.0

 Attachments: HIVE-10050.patch


 In some cases in a busy system, deadlocks in the metastore RDBMS can cause 
 failures in Hive locks and transactions when using DbTxnManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9365) The Metastore should take port configuration from hive-site.xml

2015-04-29 Thread Reuben Kuhnert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-9365:
-
Attachment: HIVE-9365.03.patch

 The Metastore should take port configuration from hive-site.xml
 ---

 Key: HIVE-9365
 URL: https://issues.apache.org/jira/browse/HIVE-9365
 Project: Hive
  Issue Type: Improvement
Reporter: Nicolas Thiébaud
Assignee: Reuben Kuhnert
Priority: Minor
  Labels: metastore
 Attachments: HIVE-9365.01.patch, HIVE-9365.02.patch, 
 HIVE-9365.03.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 As opposed to the cli. Having this configuration in the launcher script 
 create fragmentation and does is not consistent with the way the hive stack 
 is configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9272) Tests for utf-8 support

2015-04-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-9272:
-
Fix Version/s: (was: 1.2.0)

 Tests for utf-8 support
 ---

 Key: HIVE-9272
 URL: https://issues.apache.org/jira/browse/HIVE-9272
 Project: Hive
  Issue Type: Test
  Components: Tests, WebHCat
Affects Versions: 0.14.0
Reporter: Aswathy Chellammal Sreekumar
Assignee: Aswathy Chellammal Sreekumar
Priority: Minor
 Attachments: HIVE-9272.1.patch, HIVE-9272.2.patch, HIVE-9272.3.patch, 
 HIVE-9272.4.patch, HIVE-9272.5.patch, HIVE-9272.6.patch, HIVE-9272.7.patch, 
 HIVE-9272.patch


 Including some test cases for utf8 support in webhcat. The first four tests 
 invoke hive, pig, mapred and streaming apis for testing the utf8 support for 
 data processed, file names and job name. The last test case tests the 
 filtering of job name with utf8 character



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-10423) HIVE-7948 breaks deploy_e2e_artifacts.sh

2015-04-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reopened HIVE-10423:
---

 HIVE-7948 breaks deploy_e2e_artifacts.sh
 

 Key: HIVE-10423
 URL: https://issues.apache.org/jira/browse/HIVE-10423
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Aswathy Chellammal Sreekumar
 Attachments: HIVE-10423.patch


 HIVE-7948 added a step to download a ml-1m.zip file and unzip it.
 this only works if you call deploy_e2e_artifacts.sh once.  If you call it 
 again (which is very common in dev) it blocks and ask for additional input 
 from user because target files already exist.
 This needs to be changed similarly to what we discussed for HIVE-9272, i.e. 
 place artifacts not under source control in testdist/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10223) Consolidate several redundant FileSystem API calls.

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519944#comment-14519944
 ] 

Sushanth Sowmyan commented on HIVE-10223:
-

Hi,

FYI, looks like this patch broke -Phadoop-1 compilation compatibility. I've 
opened HIVE-10536 for this.

 Consolidate several redundant FileSystem API calls.
 ---

 Key: HIVE-10223
 URL: https://issues.apache.org/jira/browse/HIVE-10223
 Project: Hive
  Issue Type: Improvement
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 1.2.0

 Attachments: HIVE-10223.1.patch


 This issue proposes to consolidate several Hive calls to the Hadoop Common 
 {{FileSystem}} API into a fewer number of calls that still accomplish the 
 equivalent work.  {{FileSystem}} API calls typically translate into RPCs to 
 other services like the HDFS NameNode or alternative file system 
 implementations.  Consolidating RPCs will lower latency a bit for Hive code 
 and reduce some load on these external services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10307) Support to use number literals in partition column

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519810#comment-14519810
 ] 

Sushanth Sowmyan commented on HIVE-10307:
-

Added to 
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status to 
reflect commit. 

At this time, requests for inclusion for any new feature patches are closed - 
only bugfixes are to be added on. However, since this seems to have been 
in-queue for a while, I'll make an exception for this. For any other commits to 
branch-1.2, please make sure that (a) They are added to that list before 
committing, and (b) are bugfixes.

 Support to use number literals in partition column
 --

 Key: HIVE-10307
 URL: https://issues.apache.org/jira/browse/HIVE-10307
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Fix For: 1.2.0

 Attachments: HIVE-10307.1.patch, HIVE-10307.2.patch, 
 HIVE-10307.3.patch, HIVE-10307.4.patch, HIVE-10307.5.patch, 
 HIVE-10307.6.patch, HIVE-10307.patch


 Data types like TinyInt, SmallInt, BigInt or Decimal can be expressed as 
 literals with postfix like Y, S, L, or BD appended to the number. These 
 literals work in most Hive queries, but do not when they are used as 
 partition column value. For a partitioned table like:
 create table partcoltypenum (key int, value string) partitioned by (tint 
 tinyint, sint smallint, bint bigint);
 insert into partcoltypenum partition (tint=100Y, sint=1S, 
 bint=1000L) select key, value from src limit 30;
 Queries like select, describe and drop partition do not work. For an example
 select * from partcoltypenum where tint=100Y and sint=1S and 
 bint=1000L;
 does not return any rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10307) Support to use number literals in partition column

2015-04-29 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519866#comment-14519866
 ] 

Chaoyu Tang commented on HIVE-10307:


Thanks [~jxiang] for committing the patch and [~sushanth] for including it in 
Hive 1.2.

 Support to use number literals in partition column
 --

 Key: HIVE-10307
 URL: https://issues.apache.org/jira/browse/HIVE-10307
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Fix For: 1.2.0

 Attachments: HIVE-10307.1.patch, HIVE-10307.2.patch, 
 HIVE-10307.3.patch, HIVE-10307.4.patch, HIVE-10307.5.patch, 
 HIVE-10307.6.patch, HIVE-10307.patch


 Data types like TinyInt, SmallInt, BigInt or Decimal can be expressed as 
 literals with postfix like Y, S, L, or BD appended to the number. These 
 literals work in most Hive queries, but do not when they are used as 
 partition column value. For a partitioned table like:
 create table partcoltypenum (key int, value string) partitioned by (tint 
 tinyint, sint smallint, bint bigint);
 insert into partcoltypenum partition (tint=100Y, sint=1S, 
 bint=1000L) select key, value from src limit 30;
 Queries like select, describe and drop partition do not work. For an example
 select * from partcoltypenum where tint=100Y and sint=1S and 
 bint=1000L;
 does not return any rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10415) hive.start.cleanup.scratchdir configuration is not taking effect

2015-04-29 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-10415:

Attachment: HIVE-10415.patch

 hive.start.cleanup.scratchdir configuration is not taking effect
 

 Key: HIVE-10415
 URL: https://issues.apache.org/jira/browse/HIVE-10415
 Project: Hive
  Issue Type: Bug
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: 1.2.0

 Attachments: HIVE-10415.patch


 This configuration hive.start.cleanup.scratchdir is not taking effect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values

2015-04-29 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519921#comment-14519921
 ] 

Chaoyu Tang commented on HIVE-10488:


Here are the desc formatted from two tables (testcastts and testcastorcts) I 
tested:
{code}
# Detailed Table Information 
Database:   jira 
Owner:  ctang
CreateTime: Wed Apr 29 12:03:08 EDT 2015 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   file:/user/hive/warehouse/apache/jira.db/testcastts 
 
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE   true
numFiles4   
numRows 4   
rawDataSize 40  
totalSize   44  
transient_lastDdlTime   1430323769  
 
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
 
InputFormat:org.apache.hadoop.mapred.TextInputFormat 
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed: No   
Num Buckets:-1   
Bucket Columns: []   
Sort Columns:   []   
Storage Desc Params: 
serialization.format1  

===
key int 
datevalue   date
 
# Detailed Table Information 
Database:   jira 
Owner:  ctang
CreateTime: Wed Apr 29 12:12:42 EDT 2015 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   file:/user/hive/warehouse/apache/jira.db/testcastorcts  
 
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE   true
numFiles1   
numRows 4   
rawDataSize 184 
totalSize   304 
transient_lastDdlTime   1430324019  
 
# Storage Information
SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
OutputFormat:   org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
 
Compressed: No   
Num Buckets:-1   
Bucket Columns: []   
Sort Columns:   []   
Storage Desc Params: 
serialization.format1  
{code}
BTW, I also queried with vectorized execution (set 
hive.vectorized.execution.enabled=true) for the ORC table testcastorcts, it 
also worked fine.

 cast DATE as TIMESTAMP returns incorrect values
 ---

 Key: HIVE-10488
 URL: https://issues.apache.org/jira/browse/HIVE-10488
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.13.1
Reporter: N Campbell
Assignee: Chaoyu Tang

 same data in textfile works
 same data loaded into an ORC table does not
 connection property of tez/mr makes no difference.
 select rnum, cdt, cast (cdt as timestamp) from tdt
 0 null  null
 1 1996-01-01  1969-12-31 19:00:09.496
 2 2000-01-01  1969-12-31 19:00:10.957
 3 2000-12-31  1969-12-31 19:00:11.322
 vs
 0 null  null
 1 1996-01-01  1996-01-01 00:00:00.0
 2 2000-01-01  2000-01-01 00:00:00.0
 3 2000-12-31  2000-12-31 00:00:00.0
 create table  if not exists TDT ( RNUM int , CDT date   )
  STORED AS orc  ;
 insert overwrite table TDT select * from  text.TDT;
 0|\N
 1|1996-01-01
 2|2000-01-01
 3|2000-12-31



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10513) [CBO] return path : Fix create_func1.q for return path

2015-04-29 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10513:

Component/s: Tests

 [CBO] return path : Fix create_func1.q for return path
 --

 Key: HIVE-10513
 URL: https://issues.apache.org/jira/browse/HIVE-10513
 Project: Hive
  Issue Type: Bug
  Components: CBO, Tests
Affects Versions: 1.2.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 1.2.0

 Attachments: HIVE-10513.patch


 throws class cast exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9508) MetaStore client socket connection should have a lifetime

2015-04-29 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520156#comment-14520156
 ] 

Thiruvel Thirumoolan commented on HIVE-9508:


[~vgumashta] - Can I leave the default connection lifetime to 5 mins or set it 
back to 0 (disable this functionality) ?

 MetaStore client socket connection should have a lifetime
 -

 Key: HIVE-9508
 URL: https://issues.apache.org/jira/browse/HIVE-9508
 Project: Hive
  Issue Type: Sub-task
  Components: CLI, Metastore
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
  Labels: metastore, rolling_upgrade
 Fix For: 1.2.0

 Attachments: HIVE-9508.1.patch, HIVE-9508.2.patch, HIVE-9508.3.patch, 
 HIVE-9508.4.patch


 Currently HiveMetaStoreClient (or SessionHMSC) is connected to one Metastore 
 server until the connection is closed or there is a problem. I would like to 
 introduce the concept of a MetaStore client socket life time. The MS client 
 will reconnect if the socket lifetime is reached. This will help during 
 rolling upgrade of Metastore.
 When there are multiple Metastore servers behind a VIP (load balancer), it is 
 easy to take one server out of rotation and wait for 10+ mins for all 
 existing connections will die down (if the lifetime is 5mins say) and the 
 server can be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10539) set default value of hive.repl.task.factory

2015-04-29 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-10539:
-
Attachment: HIVE-10539.1.patch

 set default value of hive.repl.task.factory
 ---

 Key: HIVE-10539
 URL: https://issues.apache.org/jira/browse/HIVE-10539
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10539.1.patch


 hive.repl.task.factory does not have a default value set. It should be set to 
 org.apache.hive.hcatalog.api.repl.exim.EximReplicationTaskFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10066) Hive on Tez job submission through WebHCat doesn't ship Tez artifacts

2015-04-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520181#comment-14520181
 ] 

Eugene Koifman commented on HIVE-10066:
---

Thanks [~cnauroth]

 Hive on Tez job submission through WebHCat doesn't ship Tez artifacts
 -

 Key: HIVE-10066
 URL: https://issues.apache.org/jira/browse/HIVE-10066
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-10066.2.patch, HIVE-10066.3.patch, HIVE-10066.patch


 From [~hitesh]:
 Tez is a client-side only component ( no daemons, etc ) and therefore it is 
 meant to be installed on the gateway box ( or where its client libraries are 
 needed by any other services’ daemons). It does not have any cluster 
 dependencies both in terms of libraries/jars as well as configs. When it runs 
 on a worker node, everything was pre-packaged and made available to the 
 worker node via the distributed cache via the client code. Hence, its 
 client-side configs are also only needed on the same (client) node as where 
 it is installed. The only other install step needed is to have the tez 
 tarball be uploaded to HDFS and the config has an entry “tez.lib.uris” which 
 points to the HDFS path. 
 We need a way to pass client jars and tez-site.xml to the LaunchMapper.
 We should create a general purpose mechanism here which can supply additional 
 artifacts per job type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.

2015-04-29 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10454:

Attachment: (was: HIVE-10454.patch)

 Query against partitioned table in strict mode failed with No partition 
 predicate found even if partition predicate is specified.
 ---

 Key: HIVE-10454
 URL: https://issues.apache.org/jira/browse/HIVE-10454
 Project: Hive
  Issue Type: Bug
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10454.patch


 The following queries fail:
 {noformat}
 create table t1 (c1 int) PARTITIONED BY (c2 string);
 set hive.mapred.mode=strict;
 select * from t1 where t1.c2  to_date(date_add(from_unixtime( 
 unix_timestamp() ),1));
 {noformat}
 The query failed with No partition predicate found for alias t1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.

2015-04-29 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10454:

Attachment: HIVE-10454.patch

 Query against partitioned table in strict mode failed with No partition 
 predicate found even if partition predicate is specified.
 ---

 Key: HIVE-10454
 URL: https://issues.apache.org/jira/browse/HIVE-10454
 Project: Hive
  Issue Type: Bug
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10454.patch


 The following queries fail:
 {noformat}
 create table t1 (c1 int) PARTITIONED BY (c2 string);
 set hive.mapred.mode=strict;
 select * from t1 where t1.c2  to_date(date_add(from_unixtime( 
 unix_timestamp() ),1));
 {noformat}
 The query failed with No partition predicate found for alias t1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.

2015-04-29 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520166#comment-14520166
 ] 

Aihua Xu commented on HIVE-10454:
-

Two unit tests failure are related. Fixed in the new patch.

 Query against partitioned table in strict mode failed with No partition 
 predicate found even if partition predicate is specified.
 ---

 Key: HIVE-10454
 URL: https://issues.apache.org/jira/browse/HIVE-10454
 Project: Hive
  Issue Type: Bug
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10454.patch


 The following queries fail:
 {noformat}
 create table t1 (c1 int) PARTITIONED BY (c2 string);
 set hive.mapred.mode=strict;
 select * from t1 where t1.c2  to_date(date_add(from_unixtime( 
 unix_timestamp() ),1));
 {noformat}
 The query failed with No partition predicate found for alias t1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10507) Expose RetryingMetastoreClient to other external users of metastore client like Flume and Storm.

2015-04-29 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520169#comment-14520169
 ] 

Thejas M Nair commented on HIVE-10507:
--

[~sushanth] This is a simple javadoc change, but an important change to clarify 
the api usage. It would be good to have this in 1.2.0.


 Expose  RetryingMetastoreClient to other external users of metastore client 
 like Flume and Storm.
 -

 Key: HIVE-10507
 URL: https://issues.apache.org/jira/browse/HIVE-10507
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10507.1.patch


 HiveMetastoreClient is now being relied upon by external clients like Flume 
 and Storm for streaming.
 When the thrift connection between MetaStoreClient and the meta store is 
 broken (due to intermittent network issues or restarting of metastore) the 
 Metastore does not handle the connection error and automatically re-establish 
 the connection. Currently the client process needs to be restarted to 
 re-establish the connection.
 The request here is consider supporting the following behavior: For each API 
 invocation on the MetastoreClient, it should try to restablish the connection 
 (if needed) once. And if that does not work out then throw a specific 
 exception indicating the same. The client could then handle the issue by 
 retrying the same API after some delay. By catching the specific connection 
 exception, the client could decide how many times to retry before aborting.
 Hive does this internally using RetryingMetastoreClient. This jira is suppose 
 to expose this mechanism to other users of that interface. This is useful for 
 users of this interface, and from metastore HA point of view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10517) HCatPartition should not be created with as location in tests

2015-04-29 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520231#comment-14520231
 ] 

Thejas M Nair commented on HIVE-10517:
--

+1

 HCatPartition should not be created with  as location in tests
 

 Key: HIVE-10517
 URL: https://issues.apache.org/jira/browse/HIVE-10517
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-10517.patch


 Tests in TestHCatClient and TestCommands wind up instantiating HCatPartition 
 with a dummy empty String as location. This causes test failures when run 
 against an existing metastore, as introduced by HIVE-10074.
 We need to instantiate actual values instead of dummy  strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10508) Strip out password information from config passed to Tez/MR in cases where password encryption is not used

2015-04-29 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520243#comment-14520243
 ] 

Thejas M Nair commented on HIVE-10508:
--

+1

 Strip out password information from config passed to Tez/MR in cases where 
 password encryption is not used
 --

 Key: HIVE-10508
 URL: https://issues.apache.org/jira/browse/HIVE-10508
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10508.1.patch, HIVE-10508.2.patch, 
 HIVE-10508.3.patch, HIVE-10508.4.patch


 Remove password information from configuration copy that is sent to Yarn/Tez. 
 We don't need it there. The config entries can potentially be visible to 
 other users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10530) Aggregate stats cache: bug fixes for RDBMS path

2015-04-29 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520348#comment-14520348
 ] 

Mostafa Mokhtar commented on HIVE-10530:


[~vgumashta]

Now the query is hitting the cache but I still see queries going to MySQL.

And I see double the number of requests going to PartitionPruner.

 Aggregate stats cache: bug fixes for RDBMS path
 ---

 Key: HIVE-10530
 URL: https://issues.apache.org/jira/browse/HIVE-10530
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.2.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 1.2.0

 Attachments: HIVE-10530.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9508) MetaStore client socket connection should have a lifetime

2015-04-29 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-9508:
---
Attachment: HIVE-9508.4.patch

Thanks [~vgumashta] for taking a look at this.

Uploading rebased patch for tests to run. I modified the LOG.info to debug 
since I see the number of log entries to be high in our usage.

 MetaStore client socket connection should have a lifetime
 -

 Key: HIVE-9508
 URL: https://issues.apache.org/jira/browse/HIVE-9508
 Project: Hive
  Issue Type: Sub-task
  Components: CLI, Metastore
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
  Labels: metastore, rolling_upgrade
 Fix For: 1.2.0

 Attachments: HIVE-9508.1.patch, HIVE-9508.2.patch, HIVE-9508.3.patch, 
 HIVE-9508.4.patch


 Currently HiveMetaStoreClient (or SessionHMSC) is connected to one Metastore 
 server until the connection is closed or there is a problem. I would like to 
 introduce the concept of a MetaStore client socket life time. The MS client 
 will reconnect if the socket lifetime is reached. This will help during 
 rolling upgrade of Metastore.
 When there are multiple Metastore servers behind a VIP (load balancer), it is 
 easy to take one server out of rotation and wait for 10+ mins for all 
 existing connections will die down (if the lifetime is 5mins say) and the 
 server can be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10444) HIVE-10223 breaks hadoop-1 build

2015-04-29 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520153#comment-14520153
 ] 

Prasanth Jayachandran commented on HIVE-10444:
--

LGTM, +1.

 HIVE-10223 breaks hadoop-1 build
 

 Key: HIVE-10444
 URL: https://issues.apache.org/jira/browse/HIVE-10444
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Chris Nauroth
 Attachments: HIVE-10444.1.patch


 FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 
 are not present in hadoop 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10066) Hive on Tez job submission through WebHCat doesn't ship Tez artifacts

2015-04-29 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520152#comment-14520152
 ] 

Chris Nauroth commented on HIVE-10066:
--

FYI, this patch's call to {{FileStatus#isDirectory}} does not work when linking 
against Hadoop 1 using {{-Phadoop-1}}.  I included a fix in my patch for 
HIVE-10444, which reported a similar problem elsewhere in the code.

 Hive on Tez job submission through WebHCat doesn't ship Tez artifacts
 -

 Key: HIVE-10066
 URL: https://issues.apache.org/jira/browse/HIVE-10066
 Project: Hive
  Issue Type: Bug
  Components: Tez, WebHCat
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-10066.2.patch, HIVE-10066.3.patch, HIVE-10066.patch


 From [~hitesh]:
 Tez is a client-side only component ( no daemons, etc ) and therefore it is 
 meant to be installed on the gateway box ( or where its client libraries are 
 needed by any other services’ daemons). It does not have any cluster 
 dependencies both in terms of libraries/jars as well as configs. When it runs 
 on a worker node, everything was pre-packaged and made available to the 
 worker node via the distributed cache via the client code. Hence, its 
 client-side configs are also only needed on the same (client) node as where 
 it is installed. The only other install step needed is to have the tez 
 tarball be uploaded to HDFS and the config has an entry “tez.lib.uris” which 
 points to the HDFS path. 
 We need a way to pass client jars and tez-site.xml to the LaunchMapper.
 We should create a general purpose mechanism here which can supply additional 
 artifacts per job type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10151) insert into A select from B is broken when both A and B are Acid tables and bucketed the same way

2015-04-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10151:
--
Description: 
BucketingSortingReduceSinkOptimizer makes 
insert into AcidTable select * from otherAcidTable
use BucketizedHiveInputFormat which bypasses ORC merge logic on read and tries 
to send bucket files (rather than table dir) down to OrcInputFormat.
(this is true only if both AcidTable and otherAcidTable are bucketed the same 
way).  Then ORC dies.

More specifically:
{noformat}
create table acidTbl(a int, b int) clustered by (a) into 2 buckets stored as 
orc TBLPROPERTIES ('transactional'='true')
create table acidTblPart(a int, b int) partitioned by (p string) clustered by 
(a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true')
insert into acidTblPart partition(p=1) (a,b) values(1,2)
insert into acidTbl(a,b) select a,b from acidTblPart where p = 1
{noformat}
results in 
{noformat}
2015-04-29 13:57:35,807 ERROR [main]: exec.Task 
(SessionState.java:printError(956)) - Job Submission failed with exception 
'java.lang.RuntimeException(serious problem)'
java.lang.RuntimeException: serious problem
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
at 
org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat.getSplits(BucketizedHiveInputFormat.java:141)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:430)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1650)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1409)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at 
org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:225)
at 
org.apache.hadoop.hive.ql.TestTxnCommands2.testDeleteIn2(TestTxnCommands2.java:148)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at 

[jira] [Commented] (HIVE-10502) Cannot specify log4j.properties file location in Beeline

2015-04-29 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520250#comment-14520250
 ] 

Chaoyu Tang commented on HIVE-10502:


Beeline seems not use log4j at all and jline2 has its own logging 
implementation.

 Cannot specify log4j.properties file location in Beeline
 

 Key: HIVE-10502
 URL: https://issues.apache.org/jira/browse/HIVE-10502
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 1.1.0
Reporter: Szehon Ho
Assignee: Chaoyu Tang

 In HiveCLI, HiveServer2, HMS, etc, the following is called early in the 
 startup to initialize log4j logging: LogUtils.initHiveLog4j().
 However, seems like this is not the case in Beeline, which also needs log4j 
 like as follows:
 {noformat}
   at org.apache.log4j.LogManager.clinit(LogManager.java:127)
   at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)
   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:270)
   at 
 org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:156)
   at 
 org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
   at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
   at org.apache.hadoop.util.VersionInfo.clinit(VersionInfo.java:37)
 {noformat}
 It would be good to specify it, so it doesn't pick the first one in the 
 classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10444) HIVE-10223 breaks hadoop-1 build

2015-04-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HIVE-10444:
-
Attachment: HIVE-10444.1.patch

I also found one occurrence of the same problem that was not introduced by 
HIVE-10223.  Instead, it was introduced by HIVE-10066.

I think the simplest thing to do is to revert to using {{FileStatus#isDir}}, 
which is present in Hadoop 1.2.1.  It's deprecated in 2.x in favor of 
{{FileStatus#isDirectory}}, but it's still usable.  I'm attaching a patch.

I verified a build locally for both {{-Phadoop-1}} and {{-Phadoop-2}}.

 HIVE-10223 breaks hadoop-1 build
 

 Key: HIVE-10444
 URL: https://issues.apache.org/jira/browse/HIVE-10444
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Chris Nauroth
 Attachments: HIVE-10444.1.patch


 FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 
 are not present in hadoop 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10508) Strip out password information from config passed to Tez/MR in cases where password encryption is not used

2015-04-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10508:
-
Attachment: HIVE-10508.4.patch

I am able to reproduce the issue related to SparkCliDriver locally which goes 
away once I undo the change in SparkTask. After all, it might be true that Hive 
on Spark requires this information or I should be making the change elsewhere 
in spark code path. I am reverting the change in SparkTask.
cc-ing [~thejas].

Thanks
Hari

 Strip out password information from config passed to Tez/MR in cases where 
 password encryption is not used
 --

 Key: HIVE-10508
 URL: https://issues.apache.org/jira/browse/HIVE-10508
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10508.1.patch, HIVE-10508.2.patch, 
 HIVE-10508.3.patch, HIVE-10508.4.patch


 Remove password information from configuration copy that is sent to Yarn/Tez. 
 We don't need it there. The config entries can potentially be visible to 
 other users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10522) CBO (Calcite Return Path): fix the wrong needed column names when TS is created

2015-04-29 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10522:

Affects Version/s: 1.2.0

 CBO (Calcite Return Path): fix the wrong needed column names when TS is 
 created
 ---

 Key: HIVE-10522
 URL: https://issues.apache.org/jira/browse/HIVE-10522
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 1.2.0
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Critical
 Fix For: 1.2.0

 Attachments: HIVE-10522.01.patch, HIVE-10522.02.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10507) Expose RetryingMetastoreClient to other external users of metastore client like Flume and Storm.

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520361#comment-14520361
 ] 

Sushanth Sowmyan commented on HIVE-10507:
-

Although we've passed the deadline for feature inclusions, this is more a 
clarification, and a trivial one at that - I can make an exception for this. 
Please note that if you have any further jiras like this, please do check with 
me asap, because after tomorrow 15:01 PDT, I will be more strict in accepting 
patches of all sorts as we increase the severity bar for inclusion then.

I've added this to 
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status

 Expose  RetryingMetastoreClient to other external users of metastore client 
 like Flume and Storm.
 -

 Key: HIVE-10507
 URL: https://issues.apache.org/jira/browse/HIVE-10507
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10507.1.patch


 HiveMetastoreClient is now being relied upon by external clients like Flume 
 and Storm for streaming.
 When the thrift connection between MetaStoreClient and the meta store is 
 broken (due to intermittent network issues or restarting of metastore) the 
 Metastore does not handle the connection error and automatically re-establish 
 the connection. Currently the client process needs to be restarted to 
 re-establish the connection.
 The request here is consider supporting the following behavior: For each API 
 invocation on the MetastoreClient, it should try to restablish the connection 
 (if needed) once. And if that does not work out then throw a specific 
 exception indicating the same. The client could then handle the issue by 
 retrying the same API after some delay. By catching the specific connection 
 exception, the client could decide how many times to retry before aborting.
 Hive does this internally using RetryingMetastoreClient. This jira is suppose 
 to expose this mechanism to other users of that interface. This is useful for 
 users of this interface, and from metastore HA point of view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10539) set default value of hive.repl.task.factory

2015-04-29 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520190#comment-14520190
 ] 

Thejas M Nair commented on HIVE-10539:
--

[~sushanth] Can you please review this change ?
Having this default set will make it easier and less error prone for users to 
use hive replication. Its a small change that would be very good to have in 
1.2.0 .


 set default value of hive.repl.task.factory
 ---

 Key: HIVE-10539
 URL: https://issues.apache.org/jira/browse/HIVE-10539
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10539.1.patch


 hive.repl.task.factory does not have a default value set. It should be set to 
 org.apache.hive.hcatalog.api.repl.exim.EximReplicationTaskFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10437) NullPointerException on queries where map/reduce is not involved on tables with partitions

2015-04-29 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520374#comment-14520374
 ] 

Gunther Hagleitner commented on HIVE-10437:
---

+1 nice and simple.

 NullPointerException on queries where map/reduce is not involved on tables 
 with partitions
 --

 Key: HIVE-10437
 URL: https://issues.apache.org/jira/browse/HIVE-10437
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.1.0
Reporter: Demeter Sztanko
Assignee: Ashutosh Chauhan
Priority: Critical
 Attachments: HIVE-10437.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 On a table with partitions, whenever I try to do a simple query which tells 
 hive not to execute mapreduce but just read data straight from hdfs, it 
 raises an exception:
 {code}
 create external table jsonbug(
 a int,
 b string
 )
 PARTITIONED BY (
 `c` string)
 ROW FORMAT SERDE
   'org.openx.data.jsonserde.JsonSerDe'
 WITH SERDEPROPERTIES (
   'ignore.malformed.json'='true')
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
 LOCATION
   '/tmp/jsonbug';
 ALTER TABLE jsonbug ADD PARTITION(c='1');
 {code}
 Runnin simple 
 {code}
 select * from jsonbug;
 {code}
 Raises the following exception:
 {code}
 FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
 Failed with exception nulljava.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:578)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.init(FetchOperator.java:140)
 at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:455)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 {code}
 It works fine if I execute a query involving map/reduce job though.
 This problem occurs only when using SerDe's created for hive versions pre 
 1.1.0, those which do not have @SerDeSpec annotation specified. Most of the 
 third party SerDE's, including hcat's JsonSerde have this problem as well. 
 It seems like changes made in HIVE-7977 introduce this bug. See 
 org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607)
 {code}
 Class? tableSerDe = tableDesc.getDeserializerClass();
 String[] schemaProps = AnnotationUtils.getAnnotation(tableSerDe, 
 SerDeSpec.class).schemaProps();
 {code}
 And it also seems like a relatively easy fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10542) Full outer joins in tez produce incorrect results in certain cases

2015-04-29 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-10542:
--
Attachment: HIVE-10542.1.patch

 Full outer joins in tez produce incorrect results in certain cases
 --

 Key: HIVE-10542
 URL: https://issues.apache.org/jira/browse/HIVE-10542
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Blocker
 Attachments: HIVE-10542.1.patch


 If there is no records for one of the tables in the full outer join, we do 
 not read the other input and end up not producing rows which we should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10542) Full outer joins in tez produce incorrect results in certain cases

2015-04-29 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520414#comment-14520414
 ] 

Gunther Hagleitner commented on HIVE-10542:
---

Might make sense to add a test case for this too (a outer b inner c outer d - 
or something like that).

 Full outer joins in tez produce incorrect results in certain cases
 --

 Key: HIVE-10542
 URL: https://issues.apache.org/jira/browse/HIVE-10542
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Blocker
 Attachments: HIVE-10542.1.patch, HIVE-10542.2.patch


 If there is no records for one of the tables in the full outer join, we do 
 not read the other input and end up not producing rows which we should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10530) Aggregate stats cache: bug fixes for RDBMS path

2015-04-29 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520411#comment-14520411
 ] 

Mostafa Mokhtar commented on HIVE-10530:


[~vgumashta] [~thejas]

Correction, we end up sending queries to MySQL but query that ends up fetching 
the column statistics goes against the cache and not MySQL as expected.

{code}
select COLUMN_NAME, COLUMN_TYPE, min(LONG_LOW_VALUE), 
max(LONG_HIGH_VALUE), min(DOUBLE_LOW_VALUE), max(DOUBLE_HIGH_VALUE), 
min(cast(BIG_DECIMAL_LOW_VALUE as decimal)), 
max(cast(BIG_DECIMAL_HIGH_VALUE as decimal)), sum(NUM_NULLS), 
max(NUM_DISTINCTS), max(AVG_COL_LEN), max(MAX_COL_LEN), sum(NUM_TRUES), 
sum(NUM_FALSES), 
avg((LONG_HIGH_VALUE-LONG_LOW_VALUE)/cast(NUM_DISTINCTS as 
decimal)),avg((DOUBLE_HIGH_VALUE-DOUBLE_LOW_VALUE)/NUM_DISTINCTS),avg((cast(BIG_DECIMAL_HIGH_VALUE
 as decimal)-cast(BIG_DECIMAL_LOW_VALUE as 
decimal))/NUM_DISTINCTS),sum(NUM_DISTINCTS) from PART_COL_STATS
{code}

 Aggregate stats cache: bug fixes for RDBMS path
 ---

 Key: HIVE-10530
 URL: https://issues.apache.org/jira/browse/HIVE-10530
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.2.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 1.2.0

 Attachments: HIVE-10530.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10542) Full outer joins in tez produce incorrect results in certain cases

2015-04-29 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520413#comment-14520413
 ] 

Gunther Hagleitner commented on HIVE-10542:
---

If you do it afterwards you also don't need the nothingTodDo flag.

 Full outer joins in tez produce incorrect results in certain cases
 --

 Key: HIVE-10542
 URL: https://issues.apache.org/jira/browse/HIVE-10542
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Blocker
 Attachments: HIVE-10542.1.patch, HIVE-10542.2.patch


 If there is no records for one of the tables in the full outer join, we do 
 not read the other input and end up not producing rows which we should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10544) Beeline/Hive JDBC Driver fails in HTTP mode on Windows with java.lang.NoSuchFieldError: INSTANCE

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520553#comment-14520553
 ] 

Sushanth Sowmyan commented on HIVE-10544:
-

Approved for 1.2.0 . Added to 
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status

 Beeline/Hive JDBC Driver fails in HTTP mode on Windows with 
 java.lang.NoSuchFieldError: INSTANCE
 

 Key: HIVE-10544
 URL: https://issues.apache.org/jira/browse/HIVE-10544
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10544.1.patch


 NO PRECOMMIT TESTS
 This appears to be caused by a dependency version mispatch with httpcore on 
 Beeline's classpath.
 We need to change beeline.cmd as well I guess to include the equivalent of 
 export HADOOP_USER_CLASSPATH_FIRST=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10539) set default value of hive.repl.task.factory

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520367#comment-14520367
 ] 

Sushanth Sowmyan commented on HIVE-10539:
-

+1 on the change.

As to the inclusion in 1.2, although we've passed the deadline for feature 
inclusions, this is a trivial conf change - I can make an exception for this. 
Please note that if you have any further jiras like this, please do check with 
me asap, because after tomorrow 15:01 PDT, I will be more strict in accepting 
patches of all sorts as we 
increase the severity bar for inclusion then.

I've added this to 
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status

 set default value of hive.repl.task.factory
 ---

 Key: HIVE-10539
 URL: https://issues.apache.org/jira/browse/HIVE-10539
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10539.1.patch


 hive.repl.task.factory does not have a default value set. It should be set to 
 org.apache.hive.hcatalog.api.repl.exim.EximReplicationTaskFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8165) Annotation changes for replication

2015-04-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520444#comment-14520444
 ] 

Sushanth Sowmyan commented on HIVE-8165:


Thanks for the review, Alan!

Committed to branch-1.2  master.

 Annotation changes for replication
 --

 Key: HIVE-8165
 URL: https://issues.apache.org/jira/browse/HIVE-8165
 Project: Hive
  Issue Type: Sub-task
  Components: Import/Export
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Fix For: 1.2.0

 Attachments: HIVE-8165.2.patch, HIVE-8165.patch


 We need to make a couple of changes for annotating the recent changes.
 a) Marking old notification listener in HCatalog as @Deprecated, linking 
 instead to the new repl/ module.
 b) Mark the new interfaces as @Evolving @Unstable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10061) HiveConf Should not be used as part of the HS2 client side code

2015-04-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10061:
-
Attachment: HIVE-10061.1.patch

[~vgumashta] Can you review this please?

Thanks
Hari

 HiveConf Should not be used as part of the HS2 client side code
 ---

 Key: HIVE-10061
 URL: https://issues.apache.org/jira/browse/HIVE-10061
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10061.1.patch


 HiveConf crept in to the JDBC driver via the  embedded mode check. 
 if (isEmbeddedMode) {
   EmbeddedThriftBinaryCLIService embeddedClient = new 
 EmbeddedThriftBinaryCLIService();
   embeddedClient.init(new HiveConf());
   client = embeddedClient;
 } else {
 
 Ideally we'd like to keep driver code free of these dependencies. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >