date:20150611

[jira] [Updated] (HIVE-8128) Improve Parquet Vectorization

2015-06-11 Thread Dong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-8128:

Attachment: HIVE-8128.1-parquet.patch

Rebased to parquet branch based on HIVE-10975. Build pass locally.

 Improve Parquet Vectorization
 -

 Key: HIVE-8128
 URL: https://issues.apache.org/jira/browse/HIVE-8128
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Dong Chen
 Fix For: parquet-branch

 Attachments: HIVE-8128-parquet.patch.POC, HIVE-8128.1-parquet.patch


 NO PRECOMMIT TESTS
 We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, 
 VectorizedOrcSerde) which was partially done in HIVE-5998.
 As discussed in PARQUET-131, we will work out Hive POC based on the new 
 Parquet vectorized API, and then finish the implementation after finilized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10820) Hive Server2 should register itself again when encounters some failures in HA mode

2015-06-11 Thread Nemon Lou (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581715#comment-14581715
 ] 

Nemon Lou commented on HIVE-10820:
--

This has been fixed in 1.2.0 by HIVE-8890.

 Hive Server2 should register itself again when encounters some failures in HA 
 mode
 --

 Key: HIVE-10820
 URL: https://issues.apache.org/jira/browse/HIVE-10820
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Wang Hao

 Hive Server2 should register itself again when encounters some failure in HA 
 mode.
 For example,the network problem will cause session expire in ZK, the hive 
 server2 ephemeral sequential node will be deleted with it.
 So , I think we can add some watch to handle it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Summary: LazySimpleSerDe bug  ,when Text is reused   (was: LazySimpleSerDe 
bug  when Text is reused )

 LazySimpleSerDe bug  ,when Text is reused 
 --

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
Priority: Critical
  Labels: patch
 Fix For: 1.2.0

 Attachments: HIVE-10983.1.patch.txt


 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql 
 ,select *   from web_searchhub where logdate=2015061003, the result of sql 
 see blow.Notice that ,the second row content contains the first row content.
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 The content  of origin lzo file content see below ,just 2 rows.
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ；



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Summary: LazySimpleSerDe bug  when Text is reused   (was: Lazysimpleserde 
bug  when Text is reused )

 LazySimpleSerDe bug  when Text is reused 
 -

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
Priority: Critical
  Labels: patch
 Fix For: 1.2.0

 Attachments: HIVE-10983.1.patch.txt


 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql 
 ,select *   from web_searchhub where logdate=2015061003, the result of sql 
 see blow.Notice that ,the second row content contains the first row content.
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 The content  of origin lzo file content see below ,just 2 rows.
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ；



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10983) Lazysimpleserde bug when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Component/s: CLI

 Lazysimpleserde bug  when Text is reused 
 -

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
Priority: Critical

 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql 
 ,select *   from web_searchhub where logdate=2015061003, the result of sql 
 see blow.Notice that ,the second row content contains the first row content.
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 The content  of origin lzo file content see below ,just 2 rows.
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ；



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10790) orc file sql excute fail

2015-06-11 Thread xiaowei wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581723#comment-14581723
 ] 

xiaowei wang commented on HIVE-10790:
-

OK,I have put up a patch 

 orc file sql excute fail 
 -

 Key: HIVE-10790
 URL: https://issues.apache.org/jira/browse/HIVE-10790
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.13.0, 0.14.0
 Environment: Hadoop 2.5.0-cdh5.3.2 
 hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Attachments: HIVE-10790.0.patch.txt


 from a text table insert into a orc table，like as 
 insert overwrite table custom.rank_less_orc_none 
 partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
 where logdate='2015051500';
 will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
 while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
 getDefaultReplication on empty path is invalid
 at 
 org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
 ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10790) orc file sql excute fail

2015-06-11 Thread xiaowei wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10790:

Attachment: HIVE-10790.0.patch.txt

 orc file sql excute fail 
 -

 Key: HIVE-10790
 URL: https://issues.apache.org/jira/browse/HIVE-10790
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.13.0, 0.14.0
 Environment: Hadoop 2.5.0-cdh5.3.2 
 hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Attachments: HIVE-10790.0.patch.txt


 from a text table insert into a orc table，like as 
 insert overwrite table custom.rank_less_orc_none 
 partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
 where logdate='2015051500';
 will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
 while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
 getDefaultReplication on empty path is invalid
 at 
 org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
 ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10790) orc file sql excute fail

2015-06-11 Thread xiaowei wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581724#comment-14581724
 ] 

xiaowei wang commented on HIVE-10790:
-

OK,I have put up a patch

 orc file sql excute fail 
 -

 Key: HIVE-10790
 URL: https://issues.apache.org/jira/browse/HIVE-10790
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.13.0, 0.14.0
 Environment: Hadoop 2.5.0-cdh5.3.2 
 hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Attachments: HIVE-10790.0.patch.txt


 from a text table insert into a orc table，like as 
 insert overwrite table custom.rank_less_orc_none 
 partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
 where logdate='2015051500';
 will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
 while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
 getDefaultReplication on empty path is invalid
 at 
 org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
 ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581728#comment-14581728
 ] 

Hive QA commented on HIVE-10983:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739021/HIVE-10983.1.patch.txt

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4247/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4247/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4247/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/tmp/conf
 [copy] Copying 11 files to 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-shims ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-shims ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-shims ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/hive-shims-2.0.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-shims ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-shims ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/hive-shims-2.0.0-SNAPSHOT.jar
 to 
/home/hiveptest/.m2/repository/org/apache/hive/hive-shims/2.0.0-SNAPSHOT/hive-shims-2.0.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/shims/aggregator/pom.xml 
to 
/home/hiveptest/.m2/repository/org/apache/hive/hive-shims/2.0.0-SNAPSHOT/hive-shims-2.0.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Common 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-common ---
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/common/target
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/common 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-common ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (generate-version-annotation) @ 
hive-common ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-common 
---
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/common/src/gen added.
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-common ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hive-common ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-common ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-common ---
[INFO] Compiling 74 source files to 
/data/hive-ptest/working/apache-github-source-source/common/target/classes
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java:
 
/data/hive-ptest/working/apache-github-source-source/common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java
 uses or overrides a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java:
 Recompile with -Xlint:deprecation for details.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/common/src/java/org/apache/hadoop/hive/common/ObjectPair.java:
 Some input files use unchecked or unsafe operations.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/common/src/java/org/apache/hadoop/hive/common/ObjectPair.java:
 Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] ---

[jira] [Commented] (HIVE-10866) Give a warning when client try to insert into bucketed table

2015-06-11 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581827#comment-14581827
 ] 

Yongzhi Chen commented on HIVE-10866:
-

The 3 failures are not related to the patch. Their age is 3 or more. 

 Give a warning when client try to insert into bucketed table
 

 Key: HIVE-10866
 URL: https://issues.apache.org/jira/browse/HIVE-10866
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0, 1.3.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch, 
 HIVE-10866.3.patch, HIVE-10866.4.patch


 Currently, hive does not support appends(insert into) bucketed table, see 
 open jira HIVE-3608. When insert into such table, the data will be 
 corrupted and not fit for sort merge bucket mapjoin. 
 We need find a way to prevent client from inserting into such table. Or at 
 least give a warning.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert into table buckettestoutput1 select code from sample_07 where 
 total_emp  134354250 limit 10;
 After this first insert, I did:
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 set hive.auto.convert.sortmerge.join.noconditionaltask=true;
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 +---+---+
 | data  | data  |
 +---+---+
 +---+---+
 So select works fine. 
 Second insert:
 0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select 
 code from sample_07 where total_emp = 134354250 limit 10;
 No rows affected (61.235 seconds)
 Then select:
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
 (state=42000,code=10141)
 0: jdbc:hive2://localhost:1
 {noformat}
 Insert into empty table or partition will be fine, but insert into the 
 non-empty one (after second insert in the reproduce), the bucketmapjoin will 
 throw an error. We should not let second insert succeed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10975) Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT

2015-06-11 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-10975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582026#comment-14582026
 ] 

Sergio Peña commented on HIVE-10975:


I don't know when 1.8.0 will be released yet, but I think 1.7.0 has the new 
import changes you want. If you need the new org.apache.parquet imports, then 
you can try with bumping the version to 1.7.0. 

 Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT
 -

 Key: HIVE-10975
 URL: https://issues.apache.org/jira/browse/HIVE-10975
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor
 Attachments: HIVE-10975-parquet.patch, HIVE-10975.1-parquet.patch


 There are lots of changes since parquet's graduation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-11 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582001#comment-14582001
 ] 

Laljo John Pullokkaran commented on HIVE-10841:
---

RB Link posted.

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats:

[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-06-11 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582031#comment-14582031
 ] 

Yongzhi Chen commented on HIVE-6867:


[~pxiong], could you explain how to do step (2), will you or [~hsubramaniyan] 
fix HIVE-3608: Support appends (INSERT INTO) for bucketed tables? Thanks

 Bucketized Table feature fails in some cases
 

 Key: HIVE-6867
 URL: https://issues.apache.org/jira/browse/HIVE-6867
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
 HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch


 Bucketized Table feature fails in some cases. if src  destination is 
 bucketed on same key, and if actual data in the src is not bucketed (because 
 data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
 bucketed while writing to destination.
 Example
 --
 CREATE TABLE P1(key STRING, val STRING)
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
 P1;
 – perform an insert to make sure there are 2 files
 INSERT OVERWRITE TABLE P1 select key, val from P1;
 --
 This is not a regression. This has never worked.
 This got only discovered due to Hadoop2 changes.
 In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
 what is requested by app. Hadoop2 now honors the number of reducer setting in 
 local mode (by spawning threads).
 Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-11 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582041#comment-14582041
 ] 

Aihua Xu commented on HIVE-10972:
-

I think we have another issue in ZooKeeperHiveLockManager.java, in which when 
locking exclusively on an object we should also check if the children are 
locked.

The test passed before, is because we always locked the current database 
before. If we do {{use default; lock table lockneg2.tstsrcpart shared; lock 
database lockneg2 exclusive;}}, it will allow to do so which is not correct. 

HIVE-10984 has been filed to get it fixed. I will leave the test failure as it 
is. 

 DummyTxnManager always locks the current database in shared mode, which is 
 incorrect.
 -

 Key: HIVE-10972
 URL: https://issues.apache.org/jira/browse/HIVE-10972
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10972.patch


 In DummyTxnManager [line 163 | 
 http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
  it always locks the current database. 
 That is not correct since the current database can be db1, and the query 
 can be select * from db2.tb1, which will lock db1 unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10975) Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT

2015-06-11 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582045#comment-14582045
 ] 

Ferdinand Xu commented on HIVE-10975:
-

For my part, it does not exist a strong requirement since the parquet bloom 
filter is under development. I am not sure about the case for vectorization.

 Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT
 -

 Key: HIVE-10975
 URL: https://issues.apache.org/jira/browse/HIVE-10975
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor
 Attachments: HIVE-10975-parquet.patch, HIVE-10975.1-parquet.patch


 There are lots of changes since parquet's graduation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10976) Redundant HiveMetaStore connect check in HS2 CLIService start

2015-06-11 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582135#comment-14582135
 ] 

Jimmy Xiang commented on HIVE-10976:


[~ctang.ma], is it possible to remove the whole method from CLIService?

 Redundant HiveMetaStore connect check in HS2 CLIService start
 -

 Key: HIVE-10976
 URL: https://issues.apache.org/jira/browse/HIVE-10976
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Trivial
 Attachments: HIVE-10976.patch


 During HS2 startup, CLIService start() does a HMS connection test to HMS.
 It is redundant, since in its init stage, CLIService calls 
 applyAuthorizationConfigPolicy where it starts a sessionState and establishes 
 a connection to HMS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10959) webhcat launcher job should reconnect to the running child job on task retry

2015-06-11 Thread Ivan Mitic (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582156#comment-14582156
 ] 

Ivan Mitic commented on HIVE-10959:
---

Thanks [~thejas] for the review and commit!

 webhcat launcher job should reconnect to the running child job on task retry
 

 Key: HIVE-10959
 URL: https://issues.apache.org/jira/browse/HIVE-10959
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.15.0
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Fix For: 1.2.1

 Attachments: HIVE-10959.2.patch, HIVE-10959.3.patch, 
 HIVE-10959.4.patch, HIVE-10959.patch


 Currently, Templeton launcher kills all child jobs (jobs tagged with the 
 parent job's id) upon task retry. 
 Upon templeton launcher task retry, templeton should reconnect to the running 
 job and continue tracking its progress that way. 
 This logic cannot be used for all job kinds (e.g. for jobs that are driven by 
 the client side like regular hive). However, for MapReduceV2, and possibly 
 Tez and HiveOnTez, this should be the default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10976) Redundant HiveMetaStore connect check in HS2 CLIService start

2015-06-11 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582182#comment-14582182
 ] 

Chaoyu Tang commented on HIVE-10976:


[~jxiang] It is possible since it only calls the upperclass start(), but I 
thought that but still left it there because 1) to be consistent with the stop 
method; 2) in case in future CLIService needs add something especially in 
start, whose method already exists.

 Redundant HiveMetaStore connect check in HS2 CLIService start
 -

 Key: HIVE-10976
 URL: https://issues.apache.org/jira/browse/HIVE-10976
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Trivial
 Attachments: HIVE-10976.patch


 During HS2 startup, CLIService start() does a HMS connection test to HMS.
 It is redundant, since in its init stage, CLIService calls 
 applyAuthorizationConfigPolicy where it starts a sessionState and establishes 
 a connection to HMS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-06-11 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582188#comment-14582188
 ] 

Chaoyu Tang commented on HIVE-7018:
---

[~ychena] have you verified that it works with SchemaTool with [~hsubramaniyan] 
HIVE-10659 fix?

 Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
 not others
 -

 Key: HIVE-7018
 URL: https://issues.apache.org/jira/browse/HIVE-7018
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Yongzhi Chen
 Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch, HIVE-7018.3.patch


 It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
 column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10976) Redundant HiveMetaStore connect check in HS2 CLIService start

2015-06-11 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582190#comment-14582190
 ] 

Jimmy Xiang commented on HIVE-10976:


I see. The other choice is to remove the stop() as well :). Either way is fine 
with me. Thanks.

 Redundant HiveMetaStore connect check in HS2 CLIService start
 -

 Key: HIVE-10976
 URL: https://issues.apache.org/jira/browse/HIVE-10976
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Trivial
 Attachments: HIVE-10976.patch


 During HS2 startup, CLIService start() does a HMS connection test to HMS.
 It is redundant, since in its init stage, CLIService calls 
 applyAuthorizationConfigPolicy where it starts a sessionState and establishes 
 a connection to HMS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10976) Redundant HiveMetaStore connect check in HS2 CLIService start

2015-06-11 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582207#comment-14582207
 ] 

Chaoyu Tang commented on HIVE-10976:


Thanks, [~jxiang]. Could you help to commit it?

 Redundant HiveMetaStore connect check in HS2 CLIService start
 -

 Key: HIVE-10976
 URL: https://issues.apache.org/jira/browse/HIVE-10976
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Trivial
 Attachments: HIVE-10976.patch


 During HS2 startup, CLIService start() does a HMS connection test to HMS.
 It is redundant, since in its init stage, CLIService calls 
 applyAuthorizationConfigPolicy where it starts a sessionState and establishes 
 a connection to HMS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582261#comment-14582261
 ] 

Hive QA commented on HIVE-10983:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739041/HIVE-10983.2.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9007 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4250/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4250/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4250/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739041 - PreCommit-HIVE-TRUNK-Build

 LazySimpleSerDe bug  ,when Text is reused 
 --

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
Priority: Critical
  Labels: patch
 Fix For: 0.14.1

 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt


 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql 
 ,select *   from web_searchhub where logdate=2015061003, the result of sql 
 see blow.Notice that ,the second row content contains the first row content.
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 The content  of origin lzo file content see below ,just 2 rows.
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ；



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-06-11 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582330#comment-14582330
 ] 

Pengcheng Xiong commented on HIVE-6867:
---

[~ychena], yes, it would be the best if insert into is also supported. That 
depends on how far [~hsubramaniyan] would like to go. Thanks. 

 Bucketized Table feature fails in some cases
 

 Key: HIVE-6867
 URL: https://issues.apache.org/jira/browse/HIVE-6867
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
 HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch


 Bucketized Table feature fails in some cases. if src  destination is 
 bucketed on same key, and if actual data in the src is not bucketed (because 
 data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
 bucketed while writing to destination.
 Example
 --
 CREATE TABLE P1(key STRING, val STRING)
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
 P1;
 – perform an insert to make sure there are 2 files
 INSERT OVERWRITE TABLE P1 select key, val from P1;
 --
 This is not a regression. This has never worked.
 This got only discovered due to Hadoop2 changes.
 In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
 what is requested by app. Hadoop2 now honors the number of reducer setting in 
 local mode (by spawning threads).
 Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10977) No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled

2015-06-11 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582375#comment-14582375
 ] 

Sergey Shelukhin commented on HIVE-10977:
-

+1

 No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled
 

 Key: HIVE-10977
 URL: https://issues.apache.org/jira/browse/HIVE-10977
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor
 Attachments: HIVE-10977.patch


 When hive.metastore.try.direct.sql is set to false, HMS will use JDO to 
 retrieve data, therefor it is not necessary to instantiate an expensive 
 MetaStoreDirectSql during ObjectStore initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10979) Fix failed tests in TestSchemaTool after the version number change in HIVE-10921

2015-06-11 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582447#comment-14582447
 ] 

Sergey Shelukhin commented on HIVE-10979:
-

+1, sorry missed those

 Fix failed tests in TestSchemaTool after the version number change in 
 HIVE-10921
 

 Key: HIVE-10979
 URL: https://issues.apache.org/jira/browse/HIVE-10979
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-10979.patch


 Some version variables in sql are not updated in HIVE-10921 which caused unit 
 test failed. See 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4241/testReport/org.apache.hive.beeline/TestSchemaTool/testSchemaUpgrade/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Fix Version/s: (was: 1.2.0)
   0.14.1

 LazySimpleSerDe bug  ,when Text is reused 
 --

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
Priority: Critical
  Labels: patch
 Fix For: 0.14.1

 Attachments: HIVE-10983.1.patch.txt


 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql 
 ,select *   from web_searchhub where logdate=2015061003, the result of sql 
 see blow.Notice that ,the second row content contains the first row content.
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 The content  of origin lzo file content see below ,just 2 rows.
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ；



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Attachment: HIVE-10983.2.patch.txt

 LazySimpleSerDe bug  ,when Text is reused 
 --

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
Priority: Critical
  Labels: patch
 Fix For: 0.14.1

 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt


 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql 
 ,select *   from web_searchhub where logdate=2015061003, the result of sql 
 see blow.Notice that ,the second row content contains the first row content.
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 The content  of origin lzo file content see below ,just 2 rows.
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ；



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10790) orc file sql excute fail

2015-06-11 Thread xiaowei wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10790:

Fix Version/s: 0.14.1

 orc file sql excute fail 
 -

 Key: HIVE-10790
 URL: https://issues.apache.org/jira/browse/HIVE-10790
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.13.0, 0.14.0
 Environment: Hadoop 2.5.0-cdh5.3.2 
 hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Fix For: 0.14.1

 Attachments: HIVE-10790.0.patch.txt


 from a text table insert into a orc table，like as 
 insert overwrite table custom.rank_less_orc_none 
 partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
 where logdate='2015051500';
 will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
 while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
 getDefaultReplication on empty path is invalid
 at 
 org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
 ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581895#comment-14581895
 ] 

Hive QA commented on HIVE-10841:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739033/HIVE-10841.03.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9008 tests executed
*Failed tests:*
{noformat}
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4248/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4248/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4248/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739033 - PreCommit-HIVE-TRUNK-Build

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)

[jira] [Commented] (HIVE-10855) Make HIVE-10568 work with Spark [Spark Branch]

2015-06-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581906#comment-14581906
 ] 

Xuefu Zhang commented on HIVE-10855:


+1

 Make HIVE-10568 work with Spark [Spark Branch]
 --

 Key: HIVE-10855
 URL: https://issues.apache.org/jira/browse/HIVE-10855
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10855.1-spark.patch, HIVE-10855.2-spark.patch, 
 HIVE-10855.3-spark.patch


 HIVE-10568 only works with Tez. It's good to make it also work for Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581882#comment-14581882
 ] 

xiaowei wang commented on HIVE-10983:
-

in the first patch，i invoke a method of Text ，copyBytes().This method is added 
behind Hadoop1.0 ,so the compile failed.
then i put up the second patch .

 LazySimpleSerDe bug  ,when Text is reused 
 --

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
Priority: Critical
  Labels: patch
 Fix For: 0.14.1

 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt


 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql 
 ,select *   from web_searchhub where logdate=2015061003, the result of sql 
 see blow.Notice that ,the second row content contains the first row content.
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 The content  of origin lzo file content see below ,just 2 rows.
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ；



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10971) count(*) with count(distinct) gives wrong results when hive.groupby.skewindata=true

2015-06-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581911#comment-14581911
 ] 

Xuefu Zhang commented on HIVE-10971:


Similar to TestCliDriver, TestSparkCliDriver is generated as part of .q test.

 count(*) with count(distinct) gives wrong results when 
 hive.groupby.skewindata=true
 ---

 Key: HIVE-10971
 URL: https://issues.apache.org/jira/browse/HIVE-10971
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 1.2.0
Reporter: WangMeng
Assignee: WangMeng
 Fix For: 1.2.1

 Attachments: HIVE-10971.01.patch, HIVE-10971.1.patch


 When hive.groupby.skewindata=true, the following query based on TPC-H gives 
 wrong results:
 {code}
 set hive.groupby.skewindata=true;
 select l_returnflag, count(*), count(distinct l_linestatus)
 from lineitem
 group by l_returnflag
 limit 10;
 {code}
 The query plan shows that it generates only one MapReduce job instead of two 
 theoretically, which is dictated by hive.groupby.skewindata=true.
 The problem arises only when {noformat}count(*){noformat} and 
 {noformat}count(distinct){noformat} exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10977) No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled

2015-06-11 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581887#comment-14581887
 ] 

Chaoyu Tang commented on HIVE-10977:


The two failed tests seem not relevant to this patch.

 No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled
 

 Key: HIVE-10977
 URL: https://issues.apache.org/jira/browse/HIVE-10977
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor
 Attachments: HIVE-10977.patch


 When hive.metastore.try.direct.sql is set to false, HMS will use JDO to 
 retrieve data, therefor it is not necessary to instantiate an expensive 
 MetaStoreDirectSql during ObjectStore initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10977) No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled

2015-06-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581904#comment-14581904
 ] 

Xuefu Zhang commented on HIVE-10977:


+1. Patch makes sense.

 No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled
 

 Key: HIVE-10977
 URL: https://issues.apache.org/jira/browse/HIVE-10977
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor
 Attachments: HIVE-10977.patch


 When hive.metastore.try.direct.sql is set to false, HMS will use JDO to 
 retrieve data, therefor it is not necessary to instantiate an expensive 
 MetaStoreDirectSql during ObjectStore initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10142) Calculating formula based on difference between each row's value and current row's in Windowing function

2015-06-11 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581936#comment-14581936
 ] 

Aihua Xu commented on HIVE-10142:
-

{{x preceding and y preceding}} is supported now. I'm investigating what could 
be still missing. [~zhangyii] If you have sth in mind, could you please post 
the queries here, I can give it a try to see if it's already supported or 
missing?

 Calculating formula based on difference between each row's value and current 
 row's in Windowing function
 

 Key: HIVE-10142
 URL: https://issues.apache.org/jira/browse/HIVE-10142
 Project: Hive
  Issue Type: New Feature
  Components: PTF-Windowing
Affects Versions: 1.0.0
Reporter: Yi Zhang
Assignee: Aihua Xu

 For analytics with windowing function, the calculation formula sometimes 
 needs to perform over each row's value against current tow's value. The decay 
 value is a good example, such as sums of value with a decay function based on 
 difference of timestamp between each row and current row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-11 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581930#comment-14581930
 ] 

Aihua Xu commented on HIVE-10972:
-

I'm investigating lockneg_try_lock_db_in_use test case failure. Seems related.

 DummyTxnManager always locks the current database in shared mode, which is 
 incorrect.
 -

 Key: HIVE-10972
 URL: https://issues.apache.org/jira/browse/HIVE-10972
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10972.patch


 In DummyTxnManager [line 163 | 
 http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
  it always locks the current database. 
 That is not correct since the current database can be db1, and the query 
 can be select * from db2.tb1, which will lock db1 unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-11 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582615#comment-14582615
 ] 

Chaoyu Tang commented on HIVE-10972:


[~aihuaxu] I left some comments and questions on the RB.

 DummyTxnManager always locks the current database in shared mode, which is 
 incorrect.
 -

 Key: HIVE-10972
 URL: https://issues.apache.org/jira/browse/HIVE-10972
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10972.patch


 In DummyTxnManager [line 163 | 
 http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
  it always locks the current database. 
 That is not correct since the current database can be db1, and the query 
 can be select * from db2.tb1, which will lock db1 unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-06-11 Thread Mostafa Mokhtar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582700#comment-14582700
 ] 

Mostafa Mokhtar commented on HIVE-10704:


Replied to your comments. 

 Errors in Tez HashTableLoader when estimated table size is 0
 

 Key: HIVE-10704
 URL: https://issues.apache.org/jira/browse/HIVE-10704
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
 HIVE-10704.3.patch


 Couple of issues:
 - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
 tables, the largest small table selection is wrong and could select the large 
 table (which results in NPE)
 - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
 table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-11 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582596#comment-14582596
 ] 

Laljo John Pullokkaran commented on HIVE-10841:
---

Committed to 1.2.1

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats:

[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-11 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-10841:
--
Fix Version/s: 1.2.1

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.2.1

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats:

[jira] [Commented] (HIVE-10855) Make HIVE-10568 work with Spark [Spark Branch]

2015-06-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581581#comment-14581581
 ] 

Hive QA commented on HIVE-10855:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738993/HIVE-10855.3-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7943 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/876/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/876/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-876/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738993 - PreCommit-HIVE-SPARK-Build

 Make HIVE-10568 work with Spark [Spark Branch]
 --

 Key: HIVE-10855
 URL: https://issues.apache.org/jira/browse/HIVE-10855
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10855.1-spark.patch, HIVE-10855.2-spark.patch, 
 HIVE-10855.3-spark.patch


 HIVE-10568 only works with Tez. It's good to make it also work for Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10855) Make HIVE-10568 work with Spark [Spark Branch]

2015-06-11 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581586#comment-14581586
 ] 

Rui Li commented on HIVE-10855:
---

Latest failures are not related.

 Make HIVE-10568 work with Spark [Spark Branch]
 --

 Key: HIVE-10855
 URL: https://issues.apache.org/jira/browse/HIVE-10855
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10855.1-spark.patch, HIVE-10855.2-spark.patch, 
 HIVE-10855.3-spark.patch


 HIVE-10568 only works with Tez. It's good to make it also work for Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10866) Give a warning when client try to insert into bucketed table

2015-06-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581519#comment-14581519
 ] 

Hive QA commented on HIVE-10866:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12738983/HIVE-10866.4.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9008 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4245/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4245/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4245/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12738983 - PreCommit-HIVE-TRUNK-Build

 Give a warning when client try to insert into bucketed table
 

 Key: HIVE-10866
 URL: https://issues.apache.org/jira/browse/HIVE-10866
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0, 1.3.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch, 
 HIVE-10866.3.patch, HIVE-10866.4.patch


 Currently, hive does not support appends(insert into) bucketed table, see 
 open jira HIVE-3608. When insert into such table, the data will be 
 corrupted and not fit for sort merge bucket mapjoin. 
 We need find a way to prevent client from inserting into such table. Or at 
 least give a warning.
 Reproduce:
 {noformat}
 CREATE TABLE IF NOT EXISTS buckettestoutput1( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 CREATE TABLE IF NOT EXISTS buckettestoutput2( 
 data string 
 )CLUSTERED BY(data) 
 INTO 2 BUCKETS 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 set hive.enforce.bucketing = true; 
 set hive.enforce.sorting=true;
 insert into table buckettestoutput1 select code from sample_07 where 
 total_emp  134354250 limit 10;
 After this first insert, I did:
 set hive.auto.convert.sortmerge.join=true; 
 set hive.optimize.bucketmapjoin = true; 
 set hive.optimize.bucketmapjoin.sortedmerge = true; 
 set hive.auto.convert.sortmerge.join.noconditionaltask=true;
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 +---+---+
 | data  | data  |
 +---+---+
 +---+---+
 So select works fine. 
 Second insert:
 0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select 
 code from sample_07 where total_emp = 134354250 limit 10;
 No rows affected (61.235 seconds)
 Then select:
 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join 
 buckettestoutput2 b on (a.data=b.data);
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
 bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
 of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 
 (state=42000,code=10141)
 0: jdbc:hive2://localhost:1
 {noformat}
 Insert into empty table or partition will be fine, but insert into the 
 non-empty one (after second insert in the reproduce), the bucketmapjoin will 
 throw an error. We should not let second insert succeed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition

2015-06-11 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581540#comment-14581540
 ] 

Gopal V commented on HIVE-10980:


Are you using MapReduce or Tez?

 Merge of dynamic partitions loads all data to default partition
 ---

 Key: HIVE-10980
 URL: https://issues.apache.org/jira/browse/HIVE-10980
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.14.0
 Environment: HDP 2.2.4 (also reproduced on apache hive built from 
 trunk) 
Reporter: Illya Yalovyy

 Conditions that lead to the issue:
 1. Partition columns have different types
 2. Both static and dynamic partitions are used in the query
 3. Dynamically generated partitions require merge
 Result: Final data is loaded to __HIVE_DEFAULT_PARTITION__.
 Steps to reproduce:
 set hive.exec.dynamic.partition=true;
 set hive.exec.dynamic.partition.mode=strict;
 set hive.optimize.sort.dynamic.partition=false;
 set hive.merge.mapfiles=true;
 set hive.merge.mapredfiles=true;
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 create external table sdp (
   dataint bigint,
   hour int,
   req string,
   cid string,
   caid string
 )
 row format delimited
 fields terminated by ',';
 load data local inpath '../../data/files/dynpartdata1.txt' into table sdp;
 load data local inpath '../../data/files/dynpartdata2.txt' into table sdp;
 ...
 load data local inpath '../../data/files/dynpartdataN.txt' into table sdp;
 create table tdp (cid string, caid string)
 partitioned by (dataint bigint, hour int, req string);
 insert overwrite table tdp partition (dataint=20150316, hour=16, req)
 select cid, caid, req from sdp where dataint=20150316 and hour=16;
 select * from tdp order by caid;
 show partitions tdp;
 Example of the input file:
 20150316,16,reqA,clusterIdA,cacheId1
 20150316,16,reqB,clusterIdB,cacheId2 
 20150316,16,reqA,clusterIdC,cacheId3  
 20150316,16,reqD,clusterIdD,cacheId4
 20150316,16,reqA,clusterIdA,cacheId5  
 Actual result:
 clusterIdA  cacheId12015031616  
 __HIVE_DEFAULT_PARTITION__ 
 clusterIdA  cacheId12015031616  
 __HIVE_DEFAULT_PARTITION__
 clusterIdB  cacheId22015031616  
 __HIVE_DEFAULT_PARTITION__
 clusterIdC  cacheId32015031616  
 __HIVE_DEFAULT_PARTITION__
 clusterIdD  cacheId42015031616  
 __HIVE_DEFAULT_PARTITION__
 clusterIdA  cacheId52015031616  
 __HIVE_DEFAULT_PARTITION__
 clusterIdD  cacheId82015031616  
 __HIVE_DEFAULT_PARTITION__
 clusterIdB  cacheId92015031616  
 __HIVE_DEFAULT_PARTITION__
 
 dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10959) Templeton launcher job should reconnect to the running child job on task retry

2015-06-11 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581550#comment-14581550
 ] 

Thejas M Nair commented on HIVE-10959:
--

The test failures are unrelated.


 Templeton launcher job should reconnect to the running child job on task retry
 --

 Key: HIVE-10959
 URL: https://issues.apache.org/jira/browse/HIVE-10959
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.15.0
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HIVE-10959.2.patch, HIVE-10959.3.patch, 
 HIVE-10959.4.patch, HIVE-10959.patch


 Currently, Templeton launcher kills all child jobs (jobs tagged with the 
 parent job's id) upon task retry. 
 Upon templeton launcher task retry, templeton should reconnect to the running 
 job and continue tracking its progress that way. 
 This logic cannot be used for all job kinds (e.g. for jobs that are driven by 
 the client side like regular hive). However, for MapReduceV2, and possibly 
 Tez and HiveOnTez, this should be the default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10959) webhcat launcher job should reconnect to the running child job on task retry

2015-06-11 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-10959:
-
Summary: webhcat launcher job should reconnect to the running child job on 
task retry  (was: Templeton launcher job should reconnect to the running child 
job on task retry)

 webhcat launcher job should reconnect to the running child job on task retry
 

 Key: HIVE-10959
 URL: https://issues.apache.org/jira/browse/HIVE-10959
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.15.0
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: HIVE-10959.2.patch, HIVE-10959.3.patch, 
 HIVE-10959.4.patch, HIVE-10959.patch


 Currently, Templeton launcher kills all child jobs (jobs tagged with the 
 parent job's id) upon task retry. 
 Upon templeton launcher task retry, templeton should reconnect to the running 
 job and continue tracking its progress that way. 
 This logic cannot be used for all job kinds (e.g. for jobs that are driven by 
 the client side like regular hive). However, for MapReduceV2, and possibly 
 Tez and HiveOnTez, this should be the default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10790) orc file sql excute fail

2015-06-11 Thread xiaowei wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10790:

 Flags: Patch,Important
Labels: patch  (was: )

 orc file sql excute fail 
 -

 Key: HIVE-10790
 URL: https://issues.apache.org/jira/browse/HIVE-10790
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.13.0, 0.14.0
 Environment: Hadoop 2.5.0-cdh5.3.2 
 hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
  Labels: patch
 Fix For: 0.14.1

 Attachments: HIVE-10790.0.patch.txt


 from a text table insert into a orc table，like as 
 insert overwrite table custom.rank_less_orc_none 
 partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
 where logdate='2015051500';
 will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
 while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
 getDefaultReplication on empty path is invalid
 at 
 org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
 ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-06-11 Thread Alexander Pivovarov (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582813#comment-14582813
 ] 

Alexander Pivovarov commented on HIVE-10704:


+1

 Errors in Tez HashTableLoader when estimated table size is 0
 

 Key: HIVE-10704
 URL: https://issues.apache.org/jira/browse/HIVE-10704
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
 HIVE-10704.3.patch


 Couple of issues:
 - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
 tables, the largest small table selection is wrong and could select the large 
 table (which results in NPE)
 - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
 table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition

2015-06-11 Thread Illya Yalovyy (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582092#comment-14582092
 ] 

Illya Yalovyy commented on HIVE-10980:
--

Good point. I observed this behavior on MapReduce. I'll update the ticket.

 Merge of dynamic partitions loads all data to default partition
 ---

 Key: HIVE-10980
 URL: https://issues.apache.org/jira/browse/HIVE-10980
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.14.0
 Environment: HDP 2.2.4 (also reproduced on apache hive built from 
 trunk) 
Reporter: Illya Yalovyy

 Conditions that lead to the issue:
 1. Partition columns have different types
 2. Both static and dynamic partitions are used in the query
 3. Dynamically generated partitions require merge
 Result: Final data is loaded to __HIVE_DEFAULT_PARTITION__.
 Steps to reproduce:
 set hive.exec.dynamic.partition=true;
 set hive.exec.dynamic.partition.mode=strict;
 set hive.optimize.sort.dynamic.partition=false;
 set hive.merge.mapfiles=true;
 set hive.merge.mapredfiles=true;
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 create external table sdp (
   dataint bigint,
   hour int,
   req string,
   cid string,
   caid string
 )
 row format delimited
 fields terminated by ',';
 load data local inpath '../../data/files/dynpartdata1.txt' into table sdp;
 load data local inpath '../../data/files/dynpartdata2.txt' into table sdp;
 ...
 load data local inpath '../../data/files/dynpartdataN.txt' into table sdp;
 create table tdp (cid string, caid string)
 partitioned by (dataint bigint, hour int, req string);
 insert overwrite table tdp partition (dataint=20150316, hour=16, req)
 select cid, caid, req from sdp where dataint=20150316 and hour=16;
 select * from tdp order by caid;
 show partitions tdp;
 Example of the input file:
 20150316,16,reqA,clusterIdA,cacheId1
 20150316,16,reqB,clusterIdB,cacheId2 
 20150316,16,reqA,clusterIdC,cacheId3  
 20150316,16,reqD,clusterIdD,cacheId4
 20150316,16,reqA,clusterIdA,cacheId5  
 Actual result:
 clusterIdA  cacheId12015031616  
 __HIVE_DEFAULT_PARTITION__ 
 clusterIdA  cacheId12015031616  
 __HIVE_DEFAULT_PARTITION__
 clusterIdB  cacheId22015031616  
 __HIVE_DEFAULT_PARTITION__
 clusterIdC  cacheId32015031616  
 __HIVE_DEFAULT_PARTITION__
 clusterIdD  cacheId42015031616  
 __HIVE_DEFAULT_PARTITION__
 clusterIdA  cacheId52015031616  
 __HIVE_DEFAULT_PARTITION__
 clusterIdD  cacheId82015031616  
 __HIVE_DEFAULT_PARTITION__
 clusterIdB  cacheId92015031616  
 __HIVE_DEFAULT_PARTITION__
 
 dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-11 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582055#comment-14582055
 ] 

Ashutosh Chauhan commented on HIVE-10841:
-

+1 LGTM

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats:

[jira] [Updated] (HIVE-10980) Merge of dynamic partitions loads all data to default partition

2015-06-11 Thread Illya Yalovyy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-10980:
-
Description: 
Conditions that lead to the issue:
1. Execution engine set to MapReduce
2. Partition columns have different types
3. Both static and dynamic partitions are used in the query
4. Dynamically generated partitions require merge

Result: Final data is loaded to __HIVE_DEFAULT_PARTITION__.

Steps to reproduce:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=strict;
set hive.optimize.sort.dynamic.partition=false;
set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.execution.engine=mr;

create external table sdp (
  dataint bigint,
  hour int,
  req string,
  cid string,
  caid string
)
row format delimited
fields terminated by ',';

load data local inpath '../../data/files/dynpartdata1.txt' into table sdp;
load data local inpath '../../data/files/dynpartdata2.txt' into table sdp;
...
load data local inpath '../../data/files/dynpartdataN.txt' into table sdp;

create table tdp (cid string, caid string)
partitioned by (dataint bigint, hour int, req string);

insert overwrite table tdp partition (dataint=20150316, hour=16, req)
select cid, caid, req from sdp where dataint=20150316 and hour=16;

select * from tdp order by caid;
show partitions tdp;

Example of the input file:
20150316,16,reqA,clusterIdA,cacheId1
20150316,16,reqB,clusterIdB,cacheId2 
20150316,16,reqA,clusterIdC,cacheId3  
20150316,16,reqD,clusterIdD,cacheId4
20150316,16,reqA,clusterIdA,cacheId5  

Actual result:
clusterIdA  cacheId12015031616  
__HIVE_DEFAULT_PARTITION__ 
clusterIdA  cacheId12015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdB  cacheId22015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdC  cacheId32015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdD  cacheId42015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdA  cacheId52015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdD  cacheId82015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdB  cacheId92015031616  
__HIVE_DEFAULT_PARTITION__  
  
dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__  

  was:
Conditions that lead to the issue:
1. Partition columns have different types
2. Both static and dynamic partitions are used in the query
3. Dynamically generated partitions require merge

Result: Final data is loaded to __HIVE_DEFAULT_PARTITION__.

Steps to reproduce:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=strict;
set hive.optimize.sort.dynamic.partition=false;
set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

create external table sdp (
  dataint bigint,
  hour int,
  req string,
  cid string,
  caid string
)
row format delimited
fields terminated by ',';

load data local inpath '../../data/files/dynpartdata1.txt' into table sdp;
load data local inpath '../../data/files/dynpartdata2.txt' into table sdp;
...
load data local inpath '../../data/files/dynpartdataN.txt' into table sdp;

create table tdp (cid string, caid string)
partitioned by (dataint bigint, hour int, req string);

insert overwrite table tdp partition (dataint=20150316, hour=16, req)
select cid, caid, req from sdp where dataint=20150316 and hour=16;

select * from tdp order by caid;
show partitions tdp;

Example of the input file:
20150316,16,reqA,clusterIdA,cacheId1
20150316,16,reqB,clusterIdB,cacheId2 
20150316,16,reqA,clusterIdC,cacheId3  
20150316,16,reqD,clusterIdD,cacheId4
20150316,16,reqA,clusterIdA,cacheId5  

Actual result:
clusterIdA  cacheId12015031616  
__HIVE_DEFAULT_PARTITION__ 
clusterIdA  cacheId12015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdB  cacheId22015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdC  cacheId32015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdD  cacheId42015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdA  cacheId52015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdD  cacheId82015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdB  cacheId92015031616  
__HIVE_DEFAULT_PARTITION__  
  
dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__  


 Merge of dynamic partitions loads all data to default partition

[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-11 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582053#comment-14582053
 ] 

Aihua Xu commented on HIVE-10972:
-

[~damien.carol] Can you take a look again and see my following comments?

 DummyTxnManager always locks the current database in shared mode, which is 
 incorrect.
 -

 Key: HIVE-10972
 URL: https://issues.apache.org/jira/browse/HIVE-10972
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10972.patch


 In DummyTxnManager [line 163 | 
 http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
  it always locks the current database. 
 That is not correct since the current database can be db1, and the query 
 can be select * from db2.tb1, which will lock db1 unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

52 matches

Mail list logo