date:20150628

[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo

2015-06-28 Thread Nishant Kelkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Kelkar updated HIVE-9557:
-
Attachment: HIVE-9557.2.patch

 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Nishant Kelkar
  Labels: CosineSimilarity, SimilarityMetric, UDF
 Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, 
 udf_cosine_similarity-v01.patch


 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11103) Add banker's rounding BROUND UDF

2015-06-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604567#comment-14604567
 ] 

Hive QA commented on HIVE-11103:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742349/HIVE-11103.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9038 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4421/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4421/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4421/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742349 - PreCommit-HIVE-TRUNK-Build

 Add banker's rounding BROUND UDF
 

 Key: HIVE-11103
 URL: https://issues.apache.org/jira/browse/HIVE-11103
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch


 Banker's rounding: the value is rounded to the nearest even number. Also 
 known as Gaussian rounding, and, in German, mathematische Rundung.
 Example
 {code}
   2 digits2 digits
 UnroundedStandard roundingGaussian rounding
   54.1754  54.18  54.18
  343.2050 343.21 343.20
 +106.2038+106.20+106.20 
 =======
  503.5842 503.59 503.58
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10438) Architecture for ResultSet Compression via external plugin

2015-06-28 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604704#comment-14604704
 ] 

Xuefu Zhang commented on HIVE-10438:


Here are some of my high-level thoughts:

1. I don't think Hive needs to support multiple compressors at the same time. 
This is very unlikely in a real production scenario, though different users 
might choose different compression technologies (i.e. snappy vs lzo). For 
simplicity, we should start just one. Thus, we need to two flags on server 
side: #1, enable/disable compression; #2, the class name (some sort of 
identifier) of the compressor.

2. JDBC client should be able to specify whether to use result set compression. 
This can be done via a hiveconf variable specified in JdBC connection string 
hiveConfs section below:
{code}
jdbc:hive2://host:port/dbName;sessionConfs?hiveConfs#hiveVars
{code}
An example of this variable can be hive.client.use.resultset.compression.

3. When updating patch, please choose update patch instead of add file so 
as to make it easy to see diffs between the patches.


 Architecture for  ResultSet Compression via external plugin
 ---

 Key: HIVE-10438
 URL: https://issues.apache.org/jira/browse/HIVE-10438
 Project: Hive
  Issue Type: New Feature
  Components: Hive, Thrift API
Affects Versions: 1.2.0
Reporter: Rohit Dholakia
Assignee: Rohit Dholakia
  Labels: patch
 Attachments: HIVE-10438-1.patch, HIVE-10438.patch, 
 Proposal-rscompressor.pdf, README.txt, 
 Results_Snappy_protobuf_TBinary_TCompact.pdf, hs2ResultSetCompressor.zip, 
 hs2driver-master.zip


 This JIRA proposes an architecture for enabling ResultSet compression which 
 uses an external plugin. 
 The patch has three aspects to it: 
 0. An architecture for enabling ResultSet compression with external plugins
 1. An example plugin to demonstrate end-to-end functionality 
 2. A container to allow everyone to write and test ResultSet compressors with 
 a query submitter (https://github.com/xiaom/hs2driver) 
 Also attaching a design document explaining the changes, experimental results 
 document, and a pdf explaining how to setup the docker container to observe 
 end-to-end functionality of ResultSet compression. 
 https://reviews.apache.org/r/35792/ Review board link. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9625) Delegation tokens for HMS are not renewed

2015-06-28 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9625:
--
Attachment: HIVE-9625.2.patch

 Delegation tokens for HMS are not renewed
 -

 Key: HIVE-9625
 URL: https://issues.apache.org/jira/browse/HIVE-9625
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9625-branch-1.patch, HIVE-9625.1.patch, 
 HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.2.patch


 AFAICT the delegation tokens stored in [HiveSessionImplwithUGI 
 |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45]
  for HMS + Impersonation are never renewed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo

2015-06-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604682#comment-14604682
 ] 

Hive QA commented on HIVE-9557:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742383/HIVE-9557.2.patch

{color:green}SUCCESS:{color} +1 9039 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4425/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4425/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4425/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742383 - PreCommit-HIVE-TRUNK-Build

 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Nishant Kelkar
  Labels: CosineSimilarity, SimilarityMetric, UDF
 Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, 
 udf_cosine_similarity-v01.patch


 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9625) Delegation tokens for HMS are not renewed

2015-06-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604719#comment-14604719
 ] 

Hive QA commented on HIVE-9625:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742392/HIVE-9625.2.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9020 tests executed
*Failed tests:*
{noformat}
TestCliDriver-protectmode2.q-authorization_create_temp_table.q-tez_self_join.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4426/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4426/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4426/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742392 - PreCommit-HIVE-TRUNK-Build

 Delegation tokens for HMS are not renewed
 -

 Key: HIVE-9625
 URL: https://issues.apache.org/jira/browse/HIVE-9625
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9625-branch-1.patch, HIVE-9625.1.patch, 
 HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.2.patch


 AFAICT the delegation tokens stored in [HiveSessionImplwithUGI 
 |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45]
  for HMS + Impersonation are never renewed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11117) Hive external table - skip header and trailer property issue

2015-06-28 Thread Janarthanan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janarthanan updated HIVE-7:
---
Environment: Production
   Priority: Critical  (was: Major)

 Hive external table - skip header and trailer property issue
 

 Key: HIVE-7
 URL: https://issues.apache.org/jira/browse/HIVE-7
 Project: Hive
  Issue Type: Bug
 Environment: Production
Reporter: Janarthanan
Priority: Critical

 I am using an external hive table pointing to a HDFS location. The external 
 table is partitioned on year/mm/dd folders. When there are more than one 
 partition folder (ex: /2015/01/02/file.txt  /2015/01/03/file2.txt), the 
 select on external table, skips the DATA RECORD instead of skipping the 
 header/trailer record from one of the file). 
 tblproperties (skip.header.line.count=1);
 Resolution: On enabling hive.input format instead of text input format and 
 execution using TEZ engine instead of MapReduce resovled the issue. 
 How to resolve the problem without setting these parameters ? I don't want to 
 run the hive query using TEZ.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11122) ORC should not record the timezone information when there are no timestamp columns

2015-06-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604901#comment-14604901
 ] 

Hive QA commented on HIVE-11122:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742419/HIVE-11122.2.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4428/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4428/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4428/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ spark-client ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-2.0.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
spark-client ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ spark-client ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-2.0.0-SNAPSHOT.jar
 to 
/home/hiveptest/.m2/repository/org/apache/hive/spark-client/2.0.0-SNAPSHOT/spark-client-2.0.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/spark-client/pom.xml to 
/home/hiveptest/.m2/repository/org/apache/hive/spark-client/2.0.0-SNAPSHOT/spark-client-2.0.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Query Language 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-exec ---
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql/target
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-exec ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (generate-sources) @ hive-exec ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-test-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen
Generating vector expression code
Generating vector expression test code
[INFO] Executed tasks
[INFO] 
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-exec ---
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/src/gen/protobuf/gen-java
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/src/gen/thrift/gen-javabean
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java
 added.
[INFO] 
[INFO] --- antlr3-maven-plugin:3.4:antlr (default) @ hive-exec ---
[INFO] ANTLR: Processing source directory 
/data/hive-ptest/working/apache-github-source-source/ql/src/java
ANTLR Parser Generator  Version 3.4
org/apache/hadoop/hive/ql/parse/HiveLexer.g
org/apache/hadoop/hive/ql/parse/HiveParser.g
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_MAP using 
multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_SELECT 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_SORT KW_BY using 
multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_MAP LPAREN using 
multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_ALL using 
multiple

[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files

2015-06-28 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604902#comment-14604902
 ] 

Gopal V commented on HIVE-11043:


[~leftylev]: yes, it needs doc - I will write up a decision tree of the 
hybrid strategy for the docs.

 ORC split strategies should adapt based on number of files
 --

 Key: HIVE-11043
 URL: https://issues.apache.org/jira/browse/HIVE-11043
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Gopal V
 Fix For: 2.0.0

 Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch, 
 HIVE-11043.3.patch


 ORC split strategies added in HIVE-10114 chose strategies based on average 
 file size. It would be beneficial to choose a different strategy based on 
 number of files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11122) ORC should not record the timezone information when there are no timestamp columns

2015-06-28 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11122:
-
Attachment: HIVE-11122.2.patch

Previous patch had some stray characters.

 ORC should not record the timezone information when there are no timestamp 
 columns
 --

 Key: HIVE-11122
 URL: https://issues.apache.org/jira/browse/HIVE-11122
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11122.1.patch, HIVE-11122.2.patch, HIVE-11122.patch


 Currently ORC records the time zone information in the stripe footer even 
 when there are no timestamp columns. This will not only add to the size of 
 the footer but also can cause inconsistencies (file size difference) in test 
 cases when run under different time zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11122) ORC should not record the timezone information when there are no timestamp columns

2015-06-28 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11122:
-
Attachment: HIVE-11122.2.patch

 ORC should not record the timezone information when there are no timestamp 
 columns
 --

 Key: HIVE-11122
 URL: https://issues.apache.org/jira/browse/HIVE-11122
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11122.1.patch, HIVE-11122.2.patch, HIVE-11122.patch


 Currently ORC records the time zone information in the stripe footer even 
 when there are no timestamp columns. This will not only add to the size of 
 the footer but also can cause inconsistencies (file size difference) in test 
 cases when run under different time zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;

2015-06-28 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604883#comment-14604883
 ] 

Lefty Leverenz commented on HIVE-11051:
---

Nudge:  This was committed to branch-1 (1.3.0) and master (2.0.0) so the 
Status, Resolution, and Fix Version need to be updated.

Commits 5351c35bffa251ba17de22bcd5ef0b9b06d134c9  
2a77e87e347d368a806c53df5f5ac709339a47bc.

 Hive 1.2.0  MapJoin w/Tez - LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
 -

 Key: HIVE-11051
 URL: https://issues.apache.org/jira/browse/HIVE-11051
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers, Tez
Affects Versions: 1.2.0, 2.0.0
Reporter: Greg Senia
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-11051.01.patch, HIVE-11051.02.patch, 
 problem_table_joins.tar.gz


 I tried to apply: HIVE-10729 which did not solve the issue.
 The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 
 0.5.4/0.5.3
 {code}
 Status: Running (Executing on YARN cluster with App id 
 application_1434641270368_1038)
 
 VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
 KILLED
 
 Map 1 ..   SUCCEEDED  3  300   0  
  0
 Map 2 ... FAILED  3  102   7  
  0
 
 VERTICES: 01/02  [=-] 66%   ELAPSED TIME: 7.39 s
  
 
 Status: Failed
 Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, 
 diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row 
 {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
  11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
 ,cust_segment:RM 
 ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
  ,catsrsn_cd:,apealvl_cd: 
 ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
],svcrqst_lupdt:2015-04-23 
 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
 11:54:40.740061,ignore_me:1,notes:null}
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row 
 {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
  11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
 ,cust_segment:RM 
 ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
  ,catsrsn_cd:,apealvl_cd: 
 ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
],svcrqst_lupdt:2015-04-23 
 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
 11:54:40.740061,ignore_me:1,notes:null}
 at

[jira] [Commented] (HIVE-11122) ORC should not record the timezone information when there are no timestamp columns

2015-06-28 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604882#comment-14604882
 ] 

Prasanth Jayachandran commented on HIVE-11122:
--

Addressed Gopal's comments and regenerated golden files for failing tests.

 ORC should not record the timezone information when there are no timestamp 
 columns
 --

 Key: HIVE-11122
 URL: https://issues.apache.org/jira/browse/HIVE-11122
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11122.1.patch, HIVE-11122.2.patch, HIVE-11122.patch


 Currently ORC records the time zone information in the stripe footer even 
 when there are no timestamp columns. This will not only add to the size of 
 the footer but also can cause inconsistencies (file size difference) in test 
 cases when run under different time zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11090) ordering issues with windows unit test runs

2015-06-28 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604894#comment-14604894
 ] 

Lefty Leverenz commented on HIVE-11090:
---

Nudge:  This was committed to branch-1 (1.3.0) and master (2.0.0) so the 
Status, Resolution, and Fix Version need to be updated.

Commits 440c91c979226ddc970536f70ff0769c651483c1  
63deec40731c709f84b23525dc68a7cec3307052.

 ordering issues with windows unit test runs
 ---

 Key: HIVE-11090
 URL: https://issues.apache.org/jira/browse/HIVE-11090
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-11090.01.patch, HIVE-11090.02.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9625) Delegation tokens for HMS are not renewed

2015-06-28 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604791#comment-14604791
 ] 

Xuefu Zhang commented on HIVE-9625:
---

[~nemon], could you please describe what problem your proposal is addressing? 
I'm not sure if that's for the same problem here or an enhancement to the 
current solution. Please feel free to create a follow-up JIRA if necessary. 
Thanks.

 Delegation tokens for HMS are not renewed
 -

 Key: HIVE-9625
 URL: https://issues.apache.org/jira/browse/HIVE-9625
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.1.patch, 
 HIVE-9625.2.patch


 AFAICT the delegation tokens stored in [HiveSessionImplwithUGI 
 |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45]
  for HMS + Impersonation are never renewed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table

2015-06-28 Thread Sivanesan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604797#comment-14604797
 ] 

Sivanesan commented on HIVE-5795:
-

I agree with prashant kumar- I face this exacr issue. I find this issue only 
when I use CombineHiveInputFormat and not while using HiveInputFormat. Does 
this have something to do with InputSplit? Please help.

 Hive should be able to skip header and footer rows when reading data file for 
 a table
 -

 Key: HIVE-5795
 URL: https://issues.apache.org/jira/browse/HIVE-5795
 Project: Hive
  Issue Type: New Feature
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch, 
 HIVE-5795.4.patch, HIVE-5795.5.patch


 Hive should be able to skip header and footer lines when reading data file 
 from table. In this way, user don't need to processing data which generated 
 by other application with a header or footer and directly use the file for 
 table operations.
 To implement this, the idea is adding new properties in table descriptions to 
 define the number of lines in header and footer and skip them when reading 
 the record from record reader. An DDL example for creating a table with 
 header and footer should be like this:
 {code}
 Create external table testtable (name string, message string) row format 
 delimited fields terminated by '\t' lines terminated by '\n' location 
 '/testtable' tblproperties (skip.header.line.count=1, 
 skip.footer.line.count=2);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables

2015-06-28 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604843#comment-14604843
 ] 

Lefty Leverenz commented on HIVE-8:
---

Nudge:  This needs to show Fix Versions 1.3.0 and 2.0.0.

(Commits 49da35903f8334d6dd0c597563c34388772914cc  
d373962de475ea9f3ef7b2594fbc5d8488636af0.)

 Load data query should validate file formats with destination tables
 

 Key: HIVE-8
 URL: https://issues.apache.org/jira/browse/HIVE-8
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, 
 HIVE-8.4.patch, HIVE-8.patch


 Load data local inpath queries does not do any validation wrt file format. If 
 the destination table is ORC and if we try to load files that are not ORC, 
 the load will succeed but querying such tables will result in runtime 
 exceptions. We can do some simple sanity checks to prevent loading of files 
 that does not match the destination table file format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9625) Delegation tokens for HMS are not renewed

2015-06-28 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9625:
--
Attachment: (was: HIVE-9625-branch-1.patch)

 Delegation tokens for HMS are not renewed
 -

 Key: HIVE-9625
 URL: https://issues.apache.org/jira/browse/HIVE-9625
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.1.patch, 
 HIVE-9625.2.patch


 AFAICT the delegation tokens stored in [HiveSessionImplwithUGI 
 |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45]
  for HMS + Impersonation are never renewed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9625) Delegation tokens for HMS are not renewed

2015-06-28 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604786#comment-14604786
 ] 

Xuefu Zhang commented on HIVE-9625:
---

The above test failures seems rather infrastructural. Patch #2 is committed to 
both master and branch-1. Thanks to Brock and Prasad.

 Delegation tokens for HMS are not renewed
 -

 Key: HIVE-9625
 URL: https://issues.apache.org/jira/browse/HIVE-9625
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.1.patch, 
 HIVE-9625.2.patch


 AFAICT the delegation tokens stored in [HiveSessionImplwithUGI 
 |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45]
  for HMS + Impersonation are never renewed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files

2015-06-28 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604841#comment-14604841
 ] 

Lefty Leverenz commented on HIVE-11043:
---

Does this need documentation?

Also, shouldn't Fix Version include 1.3.0 (commit 
64f8e0f069f71f82518a9280d199f790174bee33 to branch-1)?

 ORC split strategies should adapt based on number of files
 --

 Key: HIVE-11043
 URL: https://issues.apache.org/jira/browse/HIVE-11043
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Gopal V
 Fix For: 2.0.0

 Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch, 
 HIVE-11043.3.patch


 ORC split strategies added in HIVE-10114 chose strategies based on average 
 file size. It would be beneficial to choose a different strategy based on 
 number of files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11083) Make test cbo_windowing robust

2015-06-28 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604858#comment-14604858
 ] 

Lefty Leverenz commented on HIVE-11083:
---

Not branch-1 (for 1.3.0)?

 Make test cbo_windowing robust
 --

 Key: HIVE-11083
 URL: https://issues.apache.org/jira/browse/HIVE-11083
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 1.2.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 2.0.0

 Attachments: HIVE-11083.patch


 Make result set deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join

2015-06-28 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-10233:
--
Labels: TODOC1.3  (was: )

 Hive on tez: memory manager for grace hash join
 ---

 Key: HIVE-10233
 URL: https://issues.apache.org/jira/browse/HIVE-10233
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: llap, 2.0.0
Reporter: Vikram Dixit K
Assignee: Gunther Hagleitner
  Labels: TODOC1.3
 Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
 HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
 HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, 
 HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, 
 HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, 
 HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, 
 HIVE-10233.18.patch, HIVE-10233.19.patch, HIVE-10233.20.patch, 
 HIVE-10233.21.patch, HIVE-10233.22.patch, HIVE-10233.23.patch, 
 HIVE-10233.24.patch


 We need a memory manager in llap/tez to manage the usage of memory across 
 threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join

2015-06-28 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604833#comment-14604833
 ] 

Lefty Leverenz commented on HIVE-10233:
---

Doc note:  This adds two configuration parameters 
(*hive.tez.enable.memory.manager*  *hive.hash.table.inflation.factor*) which 
need to be documented in the wiki in Configuration Properties for release 1.3.0.

* *hive.tez.enable.memory.manager* belongs in [Configuration Properties -- Tez 
| 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez]
* *hive.hash.table.inflation.factor* belongs in [Configuration Properties -- 
Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Is any general documentation needed for the memory manager?  Perhaps in the 
design docs?

* [Hive on Tez | https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez]
* [Hybrid Hybrid Grace Hash Join, v1.0 | 
https://cwiki.apache.org/confluence/display/Hive/Hybrid+Hybrid+Grace+Hash+Join%2C+v1.0]

Also, this jira needs updates for Status, Resolution, and Fix Version.

 Hive on tez: memory manager for grace hash join
 ---

 Key: HIVE-10233
 URL: https://issues.apache.org/jira/browse/HIVE-10233
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: llap, 2.0.0
Reporter: Vikram Dixit K
Assignee: Gunther Hagleitner
  Labels: TODOC1.3
 Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
 HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
 HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, 
 HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, 
 HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, 
 HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, 
 HIVE-10233.18.patch, HIVE-10233.19.patch, HIVE-10233.20.patch, 
 HIVE-10233.21.patch, HIVE-10233.22.patch, HIVE-10233.23.patch, 
 HIVE-10233.24.patch


 We need a memory manager in llap/tez to manage the usage of memory across 
 threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11122) ORC should not record the timezone information when there are no timestamp columns

2015-06-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604953#comment-14604953
 ] 

Hive QA commented on HIVE-11122:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742421/HIVE-11122.2.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8990 tests executed
*Failed tests:*
{noformat}
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4429/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4429/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4429/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742421 - PreCommit-HIVE-TRUNK-Build

 ORC should not record the timezone information when there are no timestamp 
 columns
 --

 Key: HIVE-11122
 URL: https://issues.apache.org/jira/browse/HIVE-11122
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11122.1.patch, HIVE-11122.2.patch, HIVE-11122.patch


 Currently ORC records the time zone information in the stripe footer even 
 when there are no timestamp columns. This will not only add to the size of 
 the footer but also can cause inconsistencies (file size difference) in test 
 cases when run under different time zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11066) Ensure tests don't share directories on FS

2015-06-28 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11066:

Fix Version/s: (was: 1.2.1)
   1.2.2

 Ensure tests don't share directories on FS
 --

 Key: HIVE-11066
 URL: https://issues.apache.org/jira/browse/HIVE-11066
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 1.2.2

 Attachments: HIVE-11066.patch


 Tests often fail with errors like
 Could not fully delete 
 D:\w\hv\hcatalog\hcatalog-pig-adapter\target\tmp\dfs\name1 on Windows 
 platforms.
 Attached is a prototype on avoiding these false negatives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11059) hcatalog-server-extensions tests scope should depend on hive-exec

2015-06-28 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11059:

Fix Version/s: (was: 1.2.1)
   1.2.2

 hcatalog-server-extensions tests scope should depend on hive-exec
 -

 Key: HIVE-11059
 URL: https://issues.apache.org/jira/browse/HIVE-11059
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.1
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Minor
 Fix For: 1.2.2

 Attachments: HIVE-11059.patch


 (causes test failures in Windows due to the lack of WindowsPathUtil being 
 available otherwise)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11060) Make test windowing.q robust

2015-06-28 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11060:

Fix Version/s: 1.2.2

 Make test windowing.q robust
 

 Key: HIVE-11060
 URL: https://issues.apache.org/jira/browse/HIVE-11060
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 2.0.0, 1.2.2

 Attachments: HIVE-11060.01.patch, HIVE-11060.patch


 Add partition / order by in over clause to make result set deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11083) Make test cbo_windowing robust

2015-06-28 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11083:

Fix Version/s: 1.2.2

 Make test cbo_windowing robust
 --

 Key: HIVE-11083
 URL: https://issues.apache.org/jira/browse/HIVE-11083
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 1.2.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 2.0.0, 1.2.2

 Attachments: HIVE-11083.patch


 Make result set deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests

2015-06-28 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11076:

Fix Version/s: 1.2.2

 Explicitly set hive.cbo.enable=true for some tests
 --

 Key: HIVE-11076
 URL: https://issues.apache.org/jira/browse/HIVE-11076
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Fix For: 2.0.0, 1.2.2

 Attachments: HIVE-11076.01.patch, HIVE-11076.02.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11048) Make test cbo_windowing robust

2015-06-28 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11048:

Fix Version/s: 1.2.2

 Make test cbo_windowing robust
 --

 Key: HIVE-11048
 URL: https://issues.apache.org/jira/browse/HIVE-11048
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 1.2.2

 Attachments: HIVE-11048.patch


 Add partition / order by in over clause to make result set deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11050) testCliDriver_vector_outer_join.* failures in Unit tests due to unstable data creation queries

2015-06-28 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11050:

Fix Version/s: (was: 1.2.1)
   1.2.2

 testCliDriver_vector_outer_join.* failures in Unit tests due to unstable data 
 creation queries
 --

 Key: HIVE-11050
 URL: https://issues.apache.org/jira/browse/HIVE-11050
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Blocker
 Fix For: 1.2.2

 Attachments: HIVE-11050.01.branch-1.patch, HIVE-11050.01.patch


 In some environments the Q file tests vector_outer_join\{1-4\}.q fail because 
 the data creation queries produce different input files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-28 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11095:

Fix Version/s: (was: 1.2.0)

 SerDeUtils  another bug ,when Text is reused
 

 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt


 {noformat}
 The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
 Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 How I found this bug？
 When i query data from a lzo table ， I found in results ： the length of the 
 current row is always largr than the previous row， and sometimes，the current 
 row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select * from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
 `line` string)
 PARTITIONED BY (
 `logdate` string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '
 U'
 WITH SERDEPROPERTIES (
 'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
 OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11010) Accumulo storage handler queries via HS2 fail

2015-06-28 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11010:

Fix Version/s: (was: 1.2.1)

 Accumulo storage handler queries via HS2 fail
 -

 Key: HIVE-11010
 URL: https://issues.apache.org/jira/browse/HIVE-11010
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0, 1.2.1
 Environment: Secure
Reporter: Takahiko Saito
Assignee: Josh Elser

 On Kerberized cluster, accumulo storage handler throws an error, 
 [usrname]@[principlaname] is not allowed to impersonate [username] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-06-28 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605018#comment-14605018
 ] 

Sushanth Sowmyan edited comment on HIVE-4577 at 6/29/15 1:15 AM:
-

Removing fix version of 1.2.1 since this is not part of the already-released 
1.2.1 release. Please set appropriate commit version when this fix is committed.


was (Author: sushanth):
Removing fix version of 1.2.1 since this is not part of the already-released 
1.2.` release. Please set appropriate commit version when this fix is committed.

 hive CLI can't handle hadoop dfs command  with space and quotes.
 

 Key: HIVE-4577
 URL: https://issues.apache.org/jira/browse/HIVE-4577
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
Reporter: Bing Li
Assignee: Bing Li
 Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
 HIVE-4577.3.patch.txt, HIVE-4577.4.patch


 As design, hive could support hadoop dfs command in hive shell, like 
 hive dfs -mkdir /user/biadmin/mydir;
 but has different behavior with hadoop if the path contains space and quotes
 hive dfs -mkdir hello; 
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
 /user/biadmin/hello
 hive dfs -mkdir 'world';
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
 /user/biadmin/'world'
 hive dfs -mkdir bei jing;
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
 /user/biadmin/bei
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
 /user/biadmin/jing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11010) Accumulo storage handler queries via HS2 fail

2015-06-28 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605019#comment-14605019
 ] 

Sushanth Sowmyan commented on HIVE-11010:
-

Removing fix version of 1.2.1 since this is not part of the already-released 
1.2.1 release. Please set appropriate commit version when this fix is committed.

 Accumulo storage handler queries via HS2 fail
 -

 Key: HIVE-11010
 URL: https://issues.apache.org/jira/browse/HIVE-11010
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0, 1.2.1
 Environment: Secure
Reporter: Takahiko Saito
Assignee: Josh Elser

 On Kerberized cluster, accumulo storage handler throws an error, 
 [usrname]@[principlaname] is not allowed to impersonate [username] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases

2015-06-28 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605017#comment-14605017
 ] 

Sushanth Sowmyan edited comment on HIVE-10792 at 6/29/15 1:15 AM:
--

Removing fix version of 1.2.1 since this is not part of the already-released 
1.2.1 release. Please set appropriate commit version when this fix is committed.


was (Author: sushanth):
Removing fix version of 1.2.1 since this is not part of the already-released 
1.2.` release. Please set appropriate commit version when this fix is committed.

 PPD leads to wrong answer when mapper scans the same table with multiple 
 aliases
 

 Key: HIVE-10792
 URL: https://issues.apache.org/jira/browse/HIVE-10792
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.2.1
Reporter: Dayue Gao
Assignee: Dayue Gao
Priority: Critical
 Attachments: HIVE-10792.1.patch, HIVE-10792.2.patch, 
 HIVE-10792.test.sql


 Here's the steps to reproduce the bug.
 First of all, prepare a simple ORC table with one row
 {code}
 create table test_orc (c0 int, c1 int) stored as ORC;
 {code}
 Table: test_orc
 ||c0||c1||
 |0|1|
 The following SQL gets empty result which is not expected
 {code}
 select * from test_orc t1
 union all
 select * from test_orc t2
 where t2.c0 = 1
 {code}
 Self join is also broken
 {code}
 set hive.auto.convert.join=false; -- force common join
 select * from test_orc t1
 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0);
 {code}
 It gets empty result while the expected answer is
 ||t1.c0||t1.c1||t2.c0||t2.c1||
 |0|1|NULL|NULL|
 In these cases, we pushdown predicates into OrcInputFormat. As a result, 
 TableScanOperator for t1 can't receive its rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6791) Support variable substition for Beeline shell command

2015-06-28 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604950#comment-14604950
 ] 

Xuefu Zhang commented on HIVE-6791:
---

+1

 Support variable substition for Beeline shell command
 -

 Key: HIVE-6791
 URL: https://issues.apache.org/jira/browse/HIVE-6791
 Project: Hive
  Issue Type: New Feature
  Components: CLI, Clients
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
Assignee: Ferdinand Xu
 Attachments: HIVE-6791-beeline-cli.2.patch, 
 HIVE-6791-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, 
 HIVE-6791.3-beeline-cli.patch, HIVE-6791.4-beeline-cli.patch, 
 HIVE-6791.5-beeline-cli.patch


 A follow-up task from HIVE-6694. Similar to HIVE-6570.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog

2015-06-28 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604994#comment-14604994
 ] 

Chaoyu Tang commented on HIVE-10754:


[~aihuaxu] Could you help elaborate what exactly the issue this patch is going 
to fix? Both Hive 2.0.0 and 1.3.0 seem use Hadoop 2.6. Thanks.

 new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
 -

 Key: HIVE-10754
 URL: https://issues.apache.org/jira/browse/HIVE-10754
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10754.patch


 Replace all the deprecated new Job() with Job.getInstance() in HCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11074) Update tests for HIVE-9302 after removing binaries

2015-06-28 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11074:

Fix Version/s: (was: 1.2.1)
   1.2.2

 Update tests for HIVE-9302 after removing binaries
 --

 Key: HIVE-11074
 URL: https://issues.apache.org/jira/browse/HIVE-11074
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 1.2.2

 Attachments: HIVE-11074.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-06-28 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605018#comment-14605018
 ] 

Sushanth Sowmyan commented on HIVE-4577:


Removing fix version of 1.2.1 since this is not part of the already-released 
1.2.` release. Please set appropriate commit version when this fix is committed.

 hive CLI can't handle hadoop dfs command  with space and quotes.
 

 Key: HIVE-4577
 URL: https://issues.apache.org/jira/browse/HIVE-4577
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
Reporter: Bing Li
Assignee: Bing Li
 Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
 HIVE-4577.3.patch.txt, HIVE-4577.4.patch


 As design, hive could support hadoop dfs command in hive shell, like 
 hive dfs -mkdir /user/biadmin/mydir;
 but has different behavior with hadoop if the path contains space and quotes
 hive dfs -mkdir hello; 
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
 /user/biadmin/hello
 hive dfs -mkdir 'world';
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
 /user/biadmin/'world'
 hive dfs -mkdir bei jing;
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
 /user/biadmin/bei
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
 /user/biadmin/jing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases

2015-06-28 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605017#comment-14605017
 ] 

Sushanth Sowmyan commented on HIVE-10792:
-

Removing fix version of 1.2.1 since this is not part of the already-released 
1.2.` release. Please set appropriate commit version when this fix is committed.

 PPD leads to wrong answer when mapper scans the same table with multiple 
 aliases
 

 Key: HIVE-10792
 URL: https://issues.apache.org/jira/browse/HIVE-10792
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.2.1
Reporter: Dayue Gao
Assignee: Dayue Gao
Priority: Critical
 Attachments: HIVE-10792.1.patch, HIVE-10792.2.patch, 
 HIVE-10792.test.sql


 Here's the steps to reproduce the bug.
 First of all, prepare a simple ORC table with one row
 {code}
 create table test_orc (c0 int, c1 int) stored as ORC;
 {code}
 Table: test_orc
 ||c0||c1||
 |0|1|
 The following SQL gets empty result which is not expected
 {code}
 select * from test_orc t1
 union all
 select * from test_orc t2
 where t2.c0 = 1
 {code}
 Self join is also broken
 {code}
 set hive.auto.convert.join=false; -- force common join
 select * from test_orc t1
 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0);
 {code}
 It gets empty result while the expected answer is
 ||t1.c0||t1.c1||t2.c0||t2.c1||
 |0|1|NULL|NULL|
 In these cases, we pushdown predicates into OrcInputFormat. As a result, 
 TableScanOperator for t1 can't receive its rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-28 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605015#comment-14605015
 ] 

Sushanth Sowmyan commented on HIVE-10983:
-

Removing fix version of 1.2.0  0.14.1 since this is not part of the 
already-released 1.2.0 and 0.14.1 release. Please set appropriate commit 
version(as the version this fix goes into) when this fix is committed.

 SerDeUtils bug  ,when Text is reused 
 -

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
  Labels: patch
 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
 HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt


 {noformat}
 The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
 invoke a bad method of Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select *   from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content  of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-28 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605013#comment-14605013
 ] 

Sushanth Sowmyan commented on HIVE-11095:
-

Removing fix version of 1.2.0 since this is not part of the already-released 
1.2.0 release. Please set appropriate commit version when this fix is committed.

 SerDeUtils  another bug ,when Text is reused
 

 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt


 {noformat}
 The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
 Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 How I found this bug？
 When i query data from a lzo table ， I found in results ： the length of the 
 current row is always largr than the previous row， and sometimes，the current 
 row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select * from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
 `line` string)
 PARTITIONED BY (
 `logdate` string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '
 U'
 WITH SERDEPROPERTIES (
 'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
 OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-06-28 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-4577:
---
Fix Version/s: (was: 1.2.1)

 hive CLI can't handle hadoop dfs command  with space and quotes.
 

 Key: HIVE-4577
 URL: https://issues.apache.org/jira/browse/HIVE-4577
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
Reporter: Bing Li
Assignee: Bing Li
 Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
 HIVE-4577.3.patch.txt, HIVE-4577.4.patch


 As design, hive could support hadoop dfs command in hive shell, like 
 hive dfs -mkdir /user/biadmin/mydir;
 but has different behavior with hadoop if the path contains space and quotes
 hive dfs -mkdir hello; 
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
 /user/biadmin/hello
 hive dfs -mkdir 'world';
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
 /user/biadmin/'world'
 hive dfs -mkdir bei jing;
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
 /user/biadmin/bei
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
 /user/biadmin/jing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-28 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10983:

Fix Version/s: (was: 1.2.0)
   (was: 0.14.1)

 SerDeUtils bug  ,when Text is reused 
 -

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
  Labels: patch
 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
 HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt


 {noformat}
 The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
 invoke a bad method of Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select *   from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content  of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9823) Load spark-defaults.conf from classpath [Spark Branch]

2015-06-28 Thread JoneZhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605031#comment-14605031
 ] 

JoneZhang commented on HIVE-9823:
-

hi, Xuefu Zhang
There is a sentence in the 
wiki(https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started)
 :  Configure Spark-application configs for Hive.  See: 
http://spark.apache.org/docs/latest/configuration.html.  This can be done 
either by adding a file spark-defaults.conf with these properties to the Hive 
classpath

According to this issue,It's not necessary to do that manual.
Is that so?

 Load spark-defaults.conf from classpath [Spark Branch]
 --

 Key: HIVE-9823
 URL: https://issues.apache.org/jira/browse/HIVE-9823
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 1.2.0

 Attachments: HIVE-9823.1-spark.patch, HIVE-9823.2-spark.patch, 
 HIVE-9823.3-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-28 Thread xiaowei wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-11095:

Fix Version/s: 2.0.0

 SerDeUtils  another bug ,when Text is reused
 

 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Fix For: 2.0.0

 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt


 {noformat}
 The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
 Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 How I found this bug？
 When i query data from a lzo table ， I found in results ： the length of the 
 current row is always largr than the previous row， and sometimes，the current 
 row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select * from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
 `line` string)
 PARTITIONED BY (
 `logdate` string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '
 U'
 WITH SERDEPROPERTIES (
 'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
 OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-28 Thread xiaowei wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Fix Version/s: 2.0.0

 SerDeUtils bug  ,when Text is reused 
 -

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
  Labels: patch
 Fix For: 2.0.0

 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
 HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt


 {noformat}
 The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
 invoke a bad method of Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select *   from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content  of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-28 Thread xiaowei wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605039#comment-14605039
 ] 

xiaowei wang commented on HIVE-11095:
-

Thank you for [~sushant.patil] suggestion!This bug affect 0.14,1.0,1.1,1.2.

 SerDeUtils  another bug ,when Text is reused
 

 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Fix For: 2.0.0

 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt


 {noformat}
 The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
 Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 How I found this bug？
 When i query data from a lzo table ， I found in results ： the length of the 
 current row is always largr than the previous row， and sometimes，the current 
 row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select * from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
 `line` string)
 PARTITIONED BY (
 `logdate` string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '
 U'
 WITH SERDEPROPERTIES (
 'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
 OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-28 Thread xiaowei wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605038#comment-14605038
 ] 

xiaowei wang commented on HIVE-10983:
-

Thank you for [~sushant.patil] suggestion!This bug affect 0.14,1.0,1.1,1.2.

 SerDeUtils bug  ,when Text is reused 
 -

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
  Labels: patch
 Fix For: 2.0.0

 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
 HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt


 {noformat}
 The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
 invoke a bad method of Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select *   from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content  of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-28 Thread xiaowei wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605043#comment-14605043
 ] 

xiaowei wang commented on HIVE-10983:
-

[~brocknoland]

 SerDeUtils bug  ,when Text is reused 
 -

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
  Labels: patch
 Fix For: 2.0.0

 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
 HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt


 {noformat}
 The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
 invoke a bad method of Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select *   from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content  of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-28 Thread xiaowei wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605045#comment-14605045
 ] 

xiaowei wang commented on HIVE-11095:
-

[~brocknoland]

 SerDeUtils  another bug ,when Text is reused
 

 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Fix For: 2.0.0

 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt


 {noformat}
 The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
 Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 How I found this bug？
 When i query data from a lzo table ， I found in results ： the length of the 
 current row is always largr than the previous row， and sometimes，the current 
 row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select * from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
 `line` string)
 PARTITIONED BY (
 `logdate` string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '
 U'
 WITH SERDEPROPERTIES (
 'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
 OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9625) Delegation tokens for HMS are not renewed

2015-06-28 Thread Nemon Lou (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605057#comment-14605057
 ] 

Nemon Lou commented on HIVE-9625:
-

[~xuefuz],thanks for your attention.What i propose is a workaround for the lack 
of renewing HMS tokens (not only for HiveServer2).The solution has been used in 
our production environment,and quite follow Thejas M Nair 's advice:
{quote}
I think it would be better if we can renew it from a HMS client implementation 
on a failure-retry, similar to how reloginFromKeyTab was added to the client in 
HIVE-4233. This way any client of HMS could potentially benefit from this 
change. 
{quote}
Here,any client of HMS can be HiveServer2,WebHcat,Impala,SparkSQL,etc in my 
opinion.
Since HIVE-9625 already has its solution and accepted by the Hive community,I 
think it's ok to fix this problem without the solution i provided.
And thanks Brock Noland and Xuefu Zhang for working on this.


 Delegation tokens for HMS are not renewed
 -

 Key: HIVE-9625
 URL: https://issues.apache.org/jira/browse/HIVE-9625
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.1.patch, 
 HIVE-9625.2.patch


 AFAICT the delegation tokens stored in [HiveSessionImplwithUGI 
 |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45]
  for HMS + Impersonation are never renewed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

53 matches

Mail list logo