[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: HIVE-9557.2.patch create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11103) Add banker's rounding BROUND UDF
[ https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604567#comment-14604567 ] Hive QA commented on HIVE-11103: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742349/HIVE-11103.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9038 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore org.apache.hive.jdbc.TestSSL.testSSLFetchHttp {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4421/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4421/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4421/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12742349 - PreCommit-HIVE-TRUNK-Build Add banker's rounding BROUND UDF Key: HIVE-11103 URL: https://issues.apache.org/jira/browse/HIVE-11103 Project: Hive Issue Type: New Feature Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch Banker's rounding: the value is rounded to the nearest even number. Also known as Gaussian rounding, and, in German, mathematische Rundung. Example {code} 2 digits2 digits UnroundedStandard roundingGaussian rounding 54.1754 54.18 54.18 343.2050 343.21 343.20 +106.2038+106.20+106.20 ======= 503.5842 503.59 503.58 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10438) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604704#comment-14604704 ] Xuefu Zhang commented on HIVE-10438: Here are some of my high-level thoughts: 1. I don't think Hive needs to support multiple compressors at the same time. This is very unlikely in a real production scenario, though different users might choose different compression technologies (i.e. snappy vs lzo). For simplicity, we should start just one. Thus, we need to two flags on server side: #1, enable/disable compression; #2, the class name (some sort of identifier) of the compressor. 2. JDBC client should be able to specify whether to use result set compression. This can be done via a hiveconf variable specified in JdBC connection string hiveConfs section below: {code} jdbc:hive2://host:port/dbName;sessionConfs?hiveConfs#hiveVars {code} An example of this variable can be hive.client.use.resultset.compression. 3. When updating patch, please choose update patch instead of add file so as to make it easy to see diffs between the patches. Architecture for ResultSet Compression via external plugin --- Key: HIVE-10438 URL: https://issues.apache.org/jira/browse/HIVE-10438 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.2.0 Reporter: Rohit Dholakia Assignee: Rohit Dholakia Labels: patch Attachments: HIVE-10438-1.patch, HIVE-10438.patch, Proposal-rscompressor.pdf, README.txt, Results_Snappy_protobuf_TBinary_TCompact.pdf, hs2ResultSetCompressor.zip, hs2driver-master.zip This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors with a query submitter (https://github.com/xiaom/hs2driver) Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. https://reviews.apache.org/r/35792/ Review board link. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9625) Delegation tokens for HMS are not renewed
[ https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9625: -- Attachment: HIVE-9625.2.patch Delegation tokens for HMS are not renewed - Key: HIVE-9625 URL: https://issues.apache.org/jira/browse/HIVE-9625 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9625-branch-1.patch, HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.2.patch AFAICT the delegation tokens stored in [HiveSessionImplwithUGI |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45] for HMS + Impersonation are never renewed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604682#comment-14604682 ] Hive QA commented on HIVE-9557: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742383/HIVE-9557.2.patch {color:green}SUCCESS:{color} +1 9039 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4425/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4425/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4425/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12742383 - PreCommit-HIVE-TRUNK-Build create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9625) Delegation tokens for HMS are not renewed
[ https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604719#comment-14604719 ] Hive QA commented on HIVE-9625: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742392/HIVE-9625.2.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9020 tests executed *Failed tests:* {noformat} TestCliDriver-protectmode2.q-authorization_create_temp_table.q-tez_self_join.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hive.jdbc.TestSSL.testSSLFetchHttp {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4426/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4426/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4426/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12742392 - PreCommit-HIVE-TRUNK-Build Delegation tokens for HMS are not renewed - Key: HIVE-9625 URL: https://issues.apache.org/jira/browse/HIVE-9625 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9625-branch-1.patch, HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.2.patch AFAICT the delegation tokens stored in [HiveSessionImplwithUGI |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45] for HMS + Impersonation are never renewed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11117) Hive external table - skip header and trailer property issue
[ https://issues.apache.org/jira/browse/HIVE-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janarthanan updated HIVE-7: --- Environment: Production Priority: Critical (was: Major) Hive external table - skip header and trailer property issue Key: HIVE-7 URL: https://issues.apache.org/jira/browse/HIVE-7 Project: Hive Issue Type: Bug Environment: Production Reporter: Janarthanan Priority: Critical I am using an external hive table pointing to a HDFS location. The external table is partitioned on year/mm/dd folders. When there are more than one partition folder (ex: /2015/01/02/file.txt /2015/01/03/file2.txt), the select on external table, skips the DATA RECORD instead of skipping the header/trailer record from one of the file). tblproperties (skip.header.line.count=1); Resolution: On enabling hive.input format instead of text input format and execution using TEZ engine instead of MapReduce resovled the issue. How to resolve the problem without setting these parameters ? I don't want to run the hive query using TEZ. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11122) ORC should not record the timezone information when there are no timestamp columns
[ https://issues.apache.org/jira/browse/HIVE-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604901#comment-14604901 ] Hive QA commented on HIVE-11122: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742419/HIVE-11122.2.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4428/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4428/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4428/ Messages: {noformat} This message was trimmed, see log for full details [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ spark-client --- [INFO] Building jar: /data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-2.0.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ spark-client --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ spark-client --- [INFO] Installing /data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-2.0.0-SNAPSHOT.jar to /home/hiveptest/.m2/repository/org/apache/hive/spark-client/2.0.0-SNAPSHOT/spark-client-2.0.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-github-source-source/spark-client/pom.xml to /home/hiveptest/.m2/repository/org/apache/hive/spark-client/2.0.0-SNAPSHOT/spark-client-2.0.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Query Language 2.0.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-exec --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql/target [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-exec --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (generate-sources) @ hive-exec --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-test-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen Generating vector expression code Generating vector expression test code [INFO] Executed tasks [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-exec --- [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/ql/src/gen/protobuf/gen-java added. [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/ql/src/gen/thrift/gen-javabean added. [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java added. [INFO] [INFO] --- antlr3-maven-plugin:3.4:antlr (default) @ hive-exec --- [INFO] ANTLR: Processing source directory /data/hive-ptest/working/apache-github-source-source/ql/src/java ANTLR Parser Generator Version 3.4 org/apache/hadoop/hive/ql/parse/HiveLexer.g org/apache/hadoop/hive/ql/parse/HiveParser.g warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_MAP using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_SELECT using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_SORT KW_BY using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_MAP LPAREN using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:455:5: Decision can match input such as {KW_REGEXP, KW_RLIKE} KW_UNION KW_ALL using multiple
[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files
[ https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604902#comment-14604902 ] Gopal V commented on HIVE-11043: [~leftylev]: yes, it needs doc - I will write up a decision tree of the hybrid strategy for the docs. ORC split strategies should adapt based on number of files -- Key: HIVE-11043 URL: https://issues.apache.org/jira/browse/HIVE-11043 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch, HIVE-11043.3.patch ORC split strategies added in HIVE-10114 chose strategies based on average file size. It would be beneficial to choose a different strategy based on number of files as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11122) ORC should not record the timezone information when there are no timestamp columns
[ https://issues.apache.org/jira/browse/HIVE-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11122: - Attachment: HIVE-11122.2.patch Previous patch had some stray characters. ORC should not record the timezone information when there are no timestamp columns -- Key: HIVE-11122 URL: https://issues.apache.org/jira/browse/HIVE-11122 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11122.1.patch, HIVE-11122.2.patch, HIVE-11122.patch Currently ORC records the time zone information in the stripe footer even when there are no timestamp columns. This will not only add to the size of the footer but also can cause inconsistencies (file size difference) in test cases when run under different time zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11122) ORC should not record the timezone information when there are no timestamp columns
[ https://issues.apache.org/jira/browse/HIVE-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11122: - Attachment: HIVE-11122.2.patch ORC should not record the timezone information when there are no timestamp columns -- Key: HIVE-11122 URL: https://issues.apache.org/jira/browse/HIVE-11122 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11122.1.patch, HIVE-11122.2.patch, HIVE-11122.patch Currently ORC records the time zone information in the stripe footer even when there are no timestamp columns. This will not only add to the size of the footer but also can cause inconsistencies (file size difference) in test cases when run under different time zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;
[ https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604883#comment-14604883 ] Lefty Leverenz commented on HIVE-11051: --- Nudge: This was committed to branch-1 (1.3.0) and master (2.0.0) so the Status, Resolution, and Fix Version need to be updated. Commits 5351c35bffa251ba17de22bcd5ef0b9b06d134c9 2a77e87e347d368a806c53df5f5ac709339a47bc. Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object; - Key: HIVE-11051 URL: https://issues.apache.org/jira/browse/HIVE-11051 Project: Hive Issue Type: Bug Components: Serializers/Deserializers, Tez Affects Versions: 1.2.0, 2.0.0 Reporter: Greg Senia Assignee: Matt McCline Priority: Critical Attachments: HIVE-11051.01.patch, HIVE-11051.02.patch, problem_table_joins.tar.gz I tried to apply: HIVE-10729 which did not solve the issue. The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 0.5.4/0.5.3 {code} Status: Running (Executing on YARN cluster with App id application_1434641270368_1038) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. SUCCEEDED 3 300 0 0 Map 2 ... FAILED 3 102 7 0 VERTICES: 01/02 [=-] 66% ELAPSED TIME: 7.39 s Status: Failed Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at
[jira] [Commented] (HIVE-11122) ORC should not record the timezone information when there are no timestamp columns
[ https://issues.apache.org/jira/browse/HIVE-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604882#comment-14604882 ] Prasanth Jayachandran commented on HIVE-11122: -- Addressed Gopal's comments and regenerated golden files for failing tests. ORC should not record the timezone information when there are no timestamp columns -- Key: HIVE-11122 URL: https://issues.apache.org/jira/browse/HIVE-11122 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11122.1.patch, HIVE-11122.2.patch, HIVE-11122.patch Currently ORC records the time zone information in the stripe footer even when there are no timestamp columns. This will not only add to the size of the footer but also can cause inconsistencies (file size difference) in test cases when run under different time zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11090) ordering issues with windows unit test runs
[ https://issues.apache.org/jira/browse/HIVE-11090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604894#comment-14604894 ] Lefty Leverenz commented on HIVE-11090: --- Nudge: This was committed to branch-1 (1.3.0) and master (2.0.0) so the Status, Resolution, and Fix Version need to be updated. Commits 440c91c979226ddc970536f70ff0769c651483c1 63deec40731c709f84b23525dc68a7cec3307052. ordering issues with windows unit test runs --- Key: HIVE-11090 URL: https://issues.apache.org/jira/browse/HIVE-11090 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-11090.01.patch, HIVE-11090.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9625) Delegation tokens for HMS are not renewed
[ https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604791#comment-14604791 ] Xuefu Zhang commented on HIVE-9625: --- [~nemon], could you please describe what problem your proposal is addressing? I'm not sure if that's for the same problem here or an enhancement to the current solution. Please feel free to create a follow-up JIRA if necessary. Thanks. Delegation tokens for HMS are not renewed - Key: HIVE-9625 URL: https://issues.apache.org/jira/browse/HIVE-9625 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Brock Noland Assignee: Brock Noland Fix For: 1.3.0, 2.0.0 Attachments: HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.2.patch AFAICT the delegation tokens stored in [HiveSessionImplwithUGI |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45] for HMS + Impersonation are never renewed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604797#comment-14604797 ] Sivanesan commented on HIVE-5795: - I agree with prashant kumar- I face this exacr issue. I find this issue only when I use CombineHiveInputFormat and not while using HiveInputFormat. Does this have something to do with InputSplit? Please help. Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: New Feature Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch, HIVE-5795.4.patch, HIVE-5795.5.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables
[ https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604843#comment-14604843 ] Lefty Leverenz commented on HIVE-8: --- Nudge: This needs to show Fix Versions 1.3.0 and 2.0.0. (Commits 49da35903f8334d6dd0c597563c34388772914cc d373962de475ea9f3ef7b2594fbc5d8488636af0.) Load data query should validate file formats with destination tables Key: HIVE-8 URL: https://issues.apache.org/jira/browse/HIVE-8 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-8.2.patch, HIVE-8.3.patch, HIVE-8.4.patch, HIVE-8.patch Load data local inpath queries does not do any validation wrt file format. If the destination table is ORC and if we try to load files that are not ORC, the load will succeed but querying such tables will result in runtime exceptions. We can do some simple sanity checks to prevent loading of files that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9625) Delegation tokens for HMS are not renewed
[ https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9625: -- Attachment: (was: HIVE-9625-branch-1.patch) Delegation tokens for HMS are not renewed - Key: HIVE-9625 URL: https://issues.apache.org/jira/browse/HIVE-9625 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.2.patch AFAICT the delegation tokens stored in [HiveSessionImplwithUGI |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45] for HMS + Impersonation are never renewed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9625) Delegation tokens for HMS are not renewed
[ https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604786#comment-14604786 ] Xuefu Zhang commented on HIVE-9625: --- The above test failures seems rather infrastructural. Patch #2 is committed to both master and branch-1. Thanks to Brock and Prasad. Delegation tokens for HMS are not renewed - Key: HIVE-9625 URL: https://issues.apache.org/jira/browse/HIVE-9625 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.2.patch AFAICT the delegation tokens stored in [HiveSessionImplwithUGI |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45] for HMS + Impersonation are never renewed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files
[ https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604841#comment-14604841 ] Lefty Leverenz commented on HIVE-11043: --- Does this need documentation? Also, shouldn't Fix Version include 1.3.0 (commit 64f8e0f069f71f82518a9280d199f790174bee33 to branch-1)? ORC split strategies should adapt based on number of files -- Key: HIVE-11043 URL: https://issues.apache.org/jira/browse/HIVE-11043 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch, HIVE-11043.3.patch ORC split strategies added in HIVE-10114 chose strategies based on average file size. It would be beneficial to choose a different strategy based on number of files as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11083) Make test cbo_windowing robust
[ https://issues.apache.org/jira/browse/HIVE-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604858#comment-14604858 ] Lefty Leverenz commented on HIVE-11083: --- Not branch-1 (for 1.3.0)? Make test cbo_windowing robust -- Key: HIVE-11083 URL: https://issues.apache.org/jira/browse/HIVE-11083 Project: Hive Issue Type: Test Components: Tests Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 2.0.0 Attachments: HIVE-11083.patch Make result set deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-10233: -- Labels: TODOC1.3 (was: ) Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Labels: TODOC1.3 Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, HIVE-10233.18.patch, HIVE-10233.19.patch, HIVE-10233.20.patch, HIVE-10233.21.patch, HIVE-10233.22.patch, HIVE-10233.23.patch, HIVE-10233.24.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604833#comment-14604833 ] Lefty Leverenz commented on HIVE-10233: --- Doc note: This adds two configuration parameters (*hive.tez.enable.memory.manager* *hive.hash.table.inflation.factor*) which need to be documented in the wiki in Configuration Properties for release 1.3.0. * *hive.tez.enable.memory.manager* belongs in [Configuration Properties -- Tez | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez] * *hive.hash.table.inflation.factor* belongs in [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution] Is any general documentation needed for the memory manager? Perhaps in the design docs? * [Hive on Tez | https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez] * [Hybrid Hybrid Grace Hash Join, v1.0 | https://cwiki.apache.org/confluence/display/Hive/Hybrid+Hybrid+Grace+Hash+Join%2C+v1.0] Also, this jira needs updates for Status, Resolution, and Fix Version. Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Labels: TODOC1.3 Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, HIVE-10233.18.patch, HIVE-10233.19.patch, HIVE-10233.20.patch, HIVE-10233.21.patch, HIVE-10233.22.patch, HIVE-10233.23.patch, HIVE-10233.24.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11122) ORC should not record the timezone information when there are no timestamp columns
[ https://issues.apache.org/jira/browse/HIVE-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604953#comment-14604953 ] Hive QA commented on HIVE-11122: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742421/HIVE-11122.2.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8990 tests executed *Failed tests:* {noformat} TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4429/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4429/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4429/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12742421 - PreCommit-HIVE-TRUNK-Build ORC should not record the timezone information when there are no timestamp columns -- Key: HIVE-11122 URL: https://issues.apache.org/jira/browse/HIVE-11122 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11122.1.patch, HIVE-11122.2.patch, HIVE-11122.patch Currently ORC records the time zone information in the stripe footer even when there are no timestamp columns. This will not only add to the size of the footer but also can cause inconsistencies (file size difference) in test cases when run under different time zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11066) Ensure tests don't share directories on FS
[ https://issues.apache.org/jira/browse/HIVE-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11066: Fix Version/s: (was: 1.2.1) 1.2.2 Ensure tests don't share directories on FS -- Key: HIVE-11066 URL: https://issues.apache.org/jira/browse/HIVE-11066 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 1.2.2 Attachments: HIVE-11066.patch Tests often fail with errors like Could not fully delete D:\w\hv\hcatalog\hcatalog-pig-adapter\target\tmp\dfs\name1 on Windows platforms. Attached is a prototype on avoiding these false negatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11059) hcatalog-server-extensions tests scope should depend on hive-exec
[ https://issues.apache.org/jira/browse/HIVE-11059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11059: Fix Version/s: (was: 1.2.1) 1.2.2 hcatalog-server-extensions tests scope should depend on hive-exec - Key: HIVE-11059 URL: https://issues.apache.org/jira/browse/HIVE-11059 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Fix For: 1.2.2 Attachments: HIVE-11059.patch (causes test failures in Windows due to the lack of WindowsPathUtil being available otherwise) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11060) Make test windowing.q robust
[ https://issues.apache.org/jira/browse/HIVE-11060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11060: Fix Version/s: 1.2.2 Make test windowing.q robust Key: HIVE-11060 URL: https://issues.apache.org/jira/browse/HIVE-11060 Project: Hive Issue Type: Bug Components: Tests Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 2.0.0, 1.2.2 Attachments: HIVE-11060.01.patch, HIVE-11060.patch Add partition / order by in over clause to make result set deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11083) Make test cbo_windowing robust
[ https://issues.apache.org/jira/browse/HIVE-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11083: Fix Version/s: 1.2.2 Make test cbo_windowing robust -- Key: HIVE-11083 URL: https://issues.apache.org/jira/browse/HIVE-11083 Project: Hive Issue Type: Test Components: Tests Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 2.0.0, 1.2.2 Attachments: HIVE-11083.patch Make result set deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests
[ https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11076: Fix Version/s: 1.2.2 Explicitly set hive.cbo.enable=true for some tests -- Key: HIVE-11076 URL: https://issues.apache.org/jira/browse/HIVE-11076 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Fix For: 2.0.0, 1.2.2 Attachments: HIVE-11076.01.patch, HIVE-11076.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11048) Make test cbo_windowing robust
[ https://issues.apache.org/jira/browse/HIVE-11048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11048: Fix Version/s: 1.2.2 Make test cbo_windowing robust -- Key: HIVE-11048 URL: https://issues.apache.org/jira/browse/HIVE-11048 Project: Hive Issue Type: Test Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 1.2.2 Attachments: HIVE-11048.patch Add partition / order by in over clause to make result set deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11050) testCliDriver_vector_outer_join.* failures in Unit tests due to unstable data creation queries
[ https://issues.apache.org/jira/browse/HIVE-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11050: Fix Version/s: (was: 1.2.1) 1.2.2 testCliDriver_vector_outer_join.* failures in Unit tests due to unstable data creation queries -- Key: HIVE-11050 URL: https://issues.apache.org/jira/browse/HIVE-11050 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Blocker Fix For: 1.2.2 Attachments: HIVE-11050.01.branch-1.patch, HIVE-11050.01.patch In some environments the Q file tests vector_outer_join\{1-4\}.q fail because the data creation queries produce different input files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11095: Fix Version/s: (was: 1.2.0) SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11010) Accumulo storage handler queries via HS2 fail
[ https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11010: Fix Version/s: (was: 1.2.1) Accumulo storage handler queries via HS2 fail - Key: HIVE-11010 URL: https://issues.apache.org/jira/browse/HIVE-11010 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0, 1.2.1 Environment: Secure Reporter: Takahiko Saito Assignee: Josh Elser On Kerberized cluster, accumulo storage handler throws an error, [usrname]@[principlaname] is not allowed to impersonate [username] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605018#comment-14605018 ] Sushanth Sowmyan edited comment on HIVE-4577 at 6/29/15 1:15 AM: - Removing fix version of 1.2.1 since this is not part of the already-released 1.2.1 release. Please set appropriate commit version when this fix is committed. was (Author: sushanth): Removing fix version of 1.2.1 since this is not part of the already-released 1.2.` release. Please set appropriate commit version when this fix is committed. hive CLI can't handle hadoop dfs command with space and quotes. Key: HIVE-4577 URL: https://issues.apache.org/jira/browse/HIVE-4577 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 Reporter: Bing Li Assignee: Bing Li Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, HIVE-4577.3.patch.txt, HIVE-4577.4.patch As design, hive could support hadoop dfs command in hive shell, like hive dfs -mkdir /user/biadmin/mydir; but has different behavior with hadoop if the path contains space and quotes hive dfs -mkdir hello; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 /user/biadmin/hello hive dfs -mkdir 'world'; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 /user/biadmin/'world' hive dfs -mkdir bei jing; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 /user/biadmin/bei drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 /user/biadmin/jing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11010) Accumulo storage handler queries via HS2 fail
[ https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605019#comment-14605019 ] Sushanth Sowmyan commented on HIVE-11010: - Removing fix version of 1.2.1 since this is not part of the already-released 1.2.1 release. Please set appropriate commit version when this fix is committed. Accumulo storage handler queries via HS2 fail - Key: HIVE-11010 URL: https://issues.apache.org/jira/browse/HIVE-11010 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0, 1.2.1 Environment: Secure Reporter: Takahiko Saito Assignee: Josh Elser On Kerberized cluster, accumulo storage handler throws an error, [usrname]@[principlaname] is not allowed to impersonate [username] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases
[ https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605017#comment-14605017 ] Sushanth Sowmyan edited comment on HIVE-10792 at 6/29/15 1:15 AM: -- Removing fix version of 1.2.1 since this is not part of the already-released 1.2.1 release. Please set appropriate commit version when this fix is committed. was (Author: sushanth): Removing fix version of 1.2.1 since this is not part of the already-released 1.2.` release. Please set appropriate commit version when this fix is committed. PPD leads to wrong answer when mapper scans the same table with multiple aliases Key: HIVE-10792 URL: https://issues.apache.org/jira/browse/HIVE-10792 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Dayue Gao Assignee: Dayue Gao Priority: Critical Attachments: HIVE-10792.1.patch, HIVE-10792.2.patch, HIVE-10792.test.sql Here's the steps to reproduce the bug. First of all, prepare a simple ORC table with one row {code} create table test_orc (c0 int, c1 int) stored as ORC; {code} Table: test_orc ||c0||c1|| |0|1| The following SQL gets empty result which is not expected {code} select * from test_orc t1 union all select * from test_orc t2 where t2.c0 = 1 {code} Self join is also broken {code} set hive.auto.convert.join=false; -- force common join select * from test_orc t1 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0); {code} It gets empty result while the expected answer is ||t1.c0||t1.c1||t2.c0||t2.c1|| |0|1|NULL|NULL| In these cases, we pushdown predicates into OrcInputFormat. As a result, TableScanOperator for t1 can't receive its rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6791) Support variable substition for Beeline shell command
[ https://issues.apache.org/jira/browse/HIVE-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604950#comment-14604950 ] Xuefu Zhang commented on HIVE-6791: --- +1 Support variable substition for Beeline shell command - Key: HIVE-6791 URL: https://issues.apache.org/jira/browse/HIVE-6791 Project: Hive Issue Type: New Feature Components: CLI, Clients Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Ferdinand Xu Attachments: HIVE-6791-beeline-cli.2.patch, HIVE-6791-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.4-beeline-cli.patch, HIVE-6791.5-beeline-cli.patch A follow-up task from HIVE-6694. Similar to HIVE-6570. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
[ https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604994#comment-14604994 ] Chaoyu Tang commented on HIVE-10754: [~aihuaxu] Could you help elaborate what exactly the issue this patch is going to fix? Both Hive 2.0.0 and 1.3.0 seem use Hadoop 2.6. Thanks. new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog - Key: HIVE-10754 URL: https://issues.apache.org/jira/browse/HIVE-10754 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10754.patch Replace all the deprecated new Job() with Job.getInstance() in HCatalog. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11074) Update tests for HIVE-9302 after removing binaries
[ https://issues.apache.org/jira/browse/HIVE-11074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11074: Fix Version/s: (was: 1.2.1) 1.2.2 Update tests for HIVE-9302 after removing binaries -- Key: HIVE-11074 URL: https://issues.apache.org/jira/browse/HIVE-11074 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.2 Attachments: HIVE-11074.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605018#comment-14605018 ] Sushanth Sowmyan commented on HIVE-4577: Removing fix version of 1.2.1 since this is not part of the already-released 1.2.` release. Please set appropriate commit version when this fix is committed. hive CLI can't handle hadoop dfs command with space and quotes. Key: HIVE-4577 URL: https://issues.apache.org/jira/browse/HIVE-4577 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 Reporter: Bing Li Assignee: Bing Li Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, HIVE-4577.3.patch.txt, HIVE-4577.4.patch As design, hive could support hadoop dfs command in hive shell, like hive dfs -mkdir /user/biadmin/mydir; but has different behavior with hadoop if the path contains space and quotes hive dfs -mkdir hello; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 /user/biadmin/hello hive dfs -mkdir 'world'; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 /user/biadmin/'world' hive dfs -mkdir bei jing; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 /user/biadmin/bei drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 /user/biadmin/jing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases
[ https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605017#comment-14605017 ] Sushanth Sowmyan commented on HIVE-10792: - Removing fix version of 1.2.1 since this is not part of the already-released 1.2.` release. Please set appropriate commit version when this fix is committed. PPD leads to wrong answer when mapper scans the same table with multiple aliases Key: HIVE-10792 URL: https://issues.apache.org/jira/browse/HIVE-10792 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Dayue Gao Assignee: Dayue Gao Priority: Critical Attachments: HIVE-10792.1.patch, HIVE-10792.2.patch, HIVE-10792.test.sql Here's the steps to reproduce the bug. First of all, prepare a simple ORC table with one row {code} create table test_orc (c0 int, c1 int) stored as ORC; {code} Table: test_orc ||c0||c1|| |0|1| The following SQL gets empty result which is not expected {code} select * from test_orc t1 union all select * from test_orc t2 where t2.c0 = 1 {code} Self join is also broken {code} set hive.auto.convert.join=false; -- force common join select * from test_orc t1 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0); {code} It gets empty result while the expected answer is ||t1.c0||t1.c1||t2.c0||t2.c1|| |0|1|NULL|NULL| In these cases, we pushdown predicates into OrcInputFormat. As a result, TableScanOperator for t1 can't receive its rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605015#comment-14605015 ] Sushanth Sowmyan commented on HIVE-10983: - Removing fix version of 1.2.0 0.14.1 since this is not part of the already-released 1.2.0 and 0.14.1 release. Please set appropriate commit version(as the version this fix goes into) when this fix is committed. SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt {noformat} The mothod transformTextToUTF8 and transformTextFromUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605013#comment-14605013 ] Sushanth Sowmyan commented on HIVE-11095: - Removing fix version of 1.2.0 since this is not part of the already-released 1.2.0 release. Please set appropriate commit version when this fix is committed. SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-4577: --- Fix Version/s: (was: 1.2.1) hive CLI can't handle hadoop dfs command with space and quotes. Key: HIVE-4577 URL: https://issues.apache.org/jira/browse/HIVE-4577 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 Reporter: Bing Li Assignee: Bing Li Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, HIVE-4577.3.patch.txt, HIVE-4577.4.patch As design, hive could support hadoop dfs command in hive shell, like hive dfs -mkdir /user/biadmin/mydir; but has different behavior with hadoop if the path contains space and quotes hive dfs -mkdir hello; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 /user/biadmin/hello hive dfs -mkdir 'world'; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 /user/biadmin/'world' hive dfs -mkdir bei jing; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 /user/biadmin/bei drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 /user/biadmin/jing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10983: Fix Version/s: (was: 1.2.0) (was: 0.14.1) SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt {noformat} The mothod transformTextToUTF8 and transformTextFromUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9823) Load spark-defaults.conf from classpath [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605031#comment-14605031 ] JoneZhang commented on HIVE-9823: - hi, Xuefu Zhang There is a sentence in the wiki(https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started) : Configure Spark-application configs for Hive. See: http://spark.apache.org/docs/latest/configuration.html. This can be done either by adding a file spark-defaults.conf with these properties to the Hive classpath According to this issue,It's not necessary to do that manual. Is that so? Load spark-defaults.conf from classpath [Spark Branch] -- Key: HIVE-9823 URL: https://issues.apache.org/jira/browse/HIVE-9823 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Fix For: 1.2.0 Attachments: HIVE-9823.1-spark.patch, HIVE-9823.2-spark.patch, HIVE-9823.3-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-11095: Fix Version/s: 2.0.0 SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 2.0.0 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-10983: Fix Version/s: 2.0.0 SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 2.0.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt {noformat} The mothod transformTextToUTF8 and transformTextFromUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605039#comment-14605039 ] xiaowei wang commented on HIVE-11095: - Thank you for [~sushant.patil] suggestion!This bug affect 0.14,1.0,1.1,1.2. SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 2.0.0 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605038#comment-14605038 ] xiaowei wang commented on HIVE-10983: - Thank you for [~sushant.patil] suggestion!This bug affect 0.14,1.0,1.1,1.2. SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 2.0.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt {noformat} The mothod transformTextToUTF8 and transformTextFromUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605043#comment-14605043 ] xiaowei wang commented on HIVE-10983: - [~brocknoland] SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 2.0.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt {noformat} The mothod transformTextToUTF8 and transformTextFromUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605045#comment-14605045 ] xiaowei wang commented on HIVE-11095: - [~brocknoland] SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 2.0.0 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9625) Delegation tokens for HMS are not renewed
[ https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605057#comment-14605057 ] Nemon Lou commented on HIVE-9625: - [~xuefuz],thanks for your attention.What i propose is a workaround for the lack of renewing HMS tokens (not only for HiveServer2).The solution has been used in our production environment,and quite follow Thejas M Nair 's advice: {quote} I think it would be better if we can renew it from a HMS client implementation on a failure-retry, similar to how reloginFromKeyTab was added to the client in HIVE-4233. This way any client of HMS could potentially benefit from this change. {quote} Here,any client of HMS can be HiveServer2,WebHcat,Impala,SparkSQL,etc in my opinion. Since HIVE-9625 already has its solution and accepted by the Hive community,I think it's ok to fix this problem without the solution i provided. And thanks Brock Noland and Xuefu Zhang for working on this. Delegation tokens for HMS are not renewed - Key: HIVE-9625 URL: https://issues.apache.org/jira/browse/HIVE-9625 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Brock Noland Assignee: Brock Noland Fix For: 1.3.0, 2.0.0 Attachments: HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.1.patch, HIVE-9625.2.patch AFAICT the delegation tokens stored in [HiveSessionImplwithUGI |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45] for HMS + Impersonation are never renewed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)