[jira] [Commented] (HIVE-3709) Stop storing default ConfVars in temp file
[ https://issues.apache.org/jira/browse/HIVE-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504437#comment-13504437 ] Ashutosh Chauhan commented on HIVE-3709: Kevin, Will HADOOP-8573 fix this? Stop storing default ConfVars in temp file -- Key: HIVE-3709 URL: https://issues.apache.org/jira/browse/HIVE-3709 Project: Hive Issue Type: Improvement Components: Configuration Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3709.1.patch.txt, HIVE-3709.2.patch.txt, HIVE-3709.3.patch.txt To work around issues with Hadoop's Configuration object, specifically it's addResource(InputStream), default configurations are written to a temp file (I think HIVE-2362 introduced this). This, however, introduces the problem that once that file is deleted from /tmp the client crashes. This is particularly problematic for long running services like the metastore server. Writing a custom InputStream to deal with the problems in the Configuration object should provide a work around, which does not introduce a time bomb into Hive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2266) Fix compression parameters
[ https://issues.apache.org/jira/browse/HIVE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504479#comment-13504479 ] Harsh J commented on HIVE-2266: --- bq. Hadoop loads native compression libraries. I believe that they are platform dependent hence I do not assume that they always have same compression ratio. Please correct me if I am wrong here. Compression is based on standard algorithms, which is platform independent. The native code is platform-dependent cause of the library references it has. Fix compression parameters -- Key: HIVE-2266 URL: https://issues.apache.org/jira/browse/HIVE-2266 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2266-2.patch, HIVE-2266.patch There are a number of places where compression values are not set correctly in FileSinkOperator. This results in uncompressed files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3633) sort-merge join does not work with sub-queries
[ https://issues.apache.org/jira/browse/HIVE-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3633: - Status: Patch Available (was: Open) comments addressed -- all tests passed sort-merge join does not work with sub-queries -- Key: HIVE-3633 URL: https://issues.apache.org/jira/browse/HIVE-3633 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3633.1.patch, hive.3633.2.patch, hive.3633.3.patch, hive.3633.4.patch, hive.3633.5.patch, hive.3633.6.patch, hive.3633.7.patch Consider the following query: create table smb_bucket_1(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 6 BUCKETS STORED AS TEXTFILE; create table smb_bucket_2(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 6 BUCKETS STORED AS TEXTFILE; -- load the above tables set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; explain select count(*) from ( select /*+mapjoin(a)*/ a.key as key1, b.key as key2, a.value as value1, b.value as value2 from smb_bucket_1 a join smb_bucket_2 b on a.key = b.key) subq; The above query does not use sort-merge join. This would be very useful as we automatically convert the queries to use sorting and bucketing properties for join. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #212
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/ -- [...truncated 9912 lines...] compile-test: [echo] Project: serde [javac] Compiling 26 source files to https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/serde/test/classes [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. create-dirs: [echo] Project: service [copy] Warning: https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/service/src/test/resources does not exist. init: [echo] Project: service ivy-init-settings: [echo] Project: service ivy-resolve: [echo] Project: service [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml [ivy:report] Processing https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-service-default.xml to https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/ivy/report/org.apache.hive-hive-service-default.html ivy-retrieve: [echo] Project: service compile: [echo] Project: service ivy-resolve-test: [echo] Project: service ivy-retrieve-test: [echo] Project: service compile-test: [echo] Project: service [javac] Compiling 2 source files to https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/service/test/classes test: [echo] Project: hive test-shims: [echo] Project: hive test-conditions: [echo] Project: shims gen-test: [echo] Project: shims create-dirs: [echo] Project: shims [copy] Warning: https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/test/resources does not exist. init: [echo] Project: shims ivy-init-settings: [echo] Project: shims ivy-resolve: [echo] Project: shims [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml [ivy:report] Processing https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-shims-default.xml to https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/ivy/report/org.apache.hive-hive-shims-default.html ivy-retrieve: [echo] Project: shims compile: [echo] Project: shims [echo] Building shims 0.20 build_shims: [echo] Project: shims [echo] Compiling https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20/java against hadoop 0.20.2 (https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/hadoopcore/hadoop-0.20.2) ivy-init-settings: [echo] Project: shims ivy-resolve-hadoop-shim: [echo] Project: shims [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml ivy-retrieve-hadoop-shim: [echo] Project: shims [echo] Building shims 0.20S build_shims: [echo] Project: shims [echo] Compiling https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common-secure/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20S/java against hadoop 1.0.0 (https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/hadoopcore/hadoop-1.0.0) ivy-init-settings: [echo] Project: shims ivy-resolve-hadoop-shim: [echo] Project: shims [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml ivy-retrieve-hadoop-shim: [echo] Project: shims [echo] Building shims 0.23 build_shims: [echo] Project: shims [echo] Compiling https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common-secure/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.23/java against hadoop 0.23.3 (https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/hadoopcore/hadoop-0.23.3)
[jira] [Resolved] (HIVE-3234) getting the reporter in the recordwriter
[ https://issues.apache.org/jira/browse/HIVE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-3234. Resolution: Fixed Fix Version/s: (was: 0.9.1) 0.10.0 Committed to trunk and 0.10. Thanks, Owen! getting the reporter in the recordwriter Key: HIVE-3234 URL: https://issues.apache.org/jira/browse/HIVE-3234 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.9.1 Environment: any Reporter: Jimmy Hu Assignee: Owen O'Malley Labels: newbie Fix For: 0.10.0 Attachments: HIVE-3234.D6699.1.patch, HIVE-3234.D6699.2.patch, HIVE-3234.D6987.1.patch Original Estimate: 48h Remaining Estimate: 48h We would like to generate some custom statistics and report back to map/reduce later wen implement the FileSinkOperator.RecordWriter interface. However, the current interface design doesn't allow us to get the map reduce reporter object. Please extend the current FileSinkOperator.RecordWriter interface so that it's close() method passes in a map reduce reporter object. For the same reason, please also extend the RecordReader interface too to include a reporter object so that users can passes in custom map reduce counters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3723) Hive Driver leaks ZooKeeper connections
[ https://issues.apache.org/jira/browse/HIVE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-3723. Resolution: Fixed Fix Version/s: 0.10.0 Committed to trunk and 0.10. Thanks, Gunther! Hive Driver leaks ZooKeeper connections --- Key: HIVE-3723 URL: https://issues.apache.org/jira/browse/HIVE-3723 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.10.0 Attachments: HIVE-3723.1-r1411423.patch In certain error cases (i.e.: statement fails to compile, semantic errors) the hive driver leaks zookeeper connections. This can be seen in the TestNegativeCliDriver test which accumulates a large number of open file handles and fails if the max allowed number of file handles isn't at least 2048. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HIVE-3676) INSERT INTO regression caused by HIVE-3465
[ https://issues.apache.org/jira/browse/HIVE-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reopened HIVE-3676: Carl / Navis, After this commit ant test -Dtestcase=TestCliDriver -Dqfile=insert1.q is failing consistently on trunk. First failure was reported on https://builds.apache.org/job/Hive-trunk-h0.21/1805/ Can you take a look? INSERT INTO regression caused by HIVE-3465 -- Key: HIVE-3676 URL: https://issues.apache.org/jira/browse/HIVE-3676 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Carl Steinbach Assignee: Navis Fix For: 0.10.0 Attachments: HIVE-3676.D6741.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3645) RCFileWriter does not implement the right function to support Federation
[ https://issues.apache.org/jira/browse/HIVE-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3645: --- Resolution: Fixed Fix Version/s: 0.11 Assignee: Arup Malakar Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Arup! RCFileWriter does not implement the right function to support Federation Key: HIVE-3645 URL: https://issues.apache.org/jira/browse/HIVE-3645 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0, 0.10.0 Environment: Hadoop 0.23.3 federation, Hive 0.9 and Pig 0.10 Reporter: Viraj Bhat Assignee: Arup Malakar Fix For: 0.11 Attachments: HIVE_3645_branch_0.patch, HIVE_3645_trunk_0.patch Create a table using Hive DDL {code} CREATE TABLE tmp_hcat_federated_numbers_part_1 ( id int, intnum int, floatnum float )partitioned by ( part1string, part2string ) STORED AS rcfile LOCATION 'viewfs:///database/tmp_hcat_federated_numbers_part_1'; {code} Populate it using Pig: {code} A = load 'default.numbers_pig' using org.apache.hcatalog.pig.HCatLoader(); B = filter A by id = 500; C = foreach B generate (int)id, (int)intnum, (float)floatnum; store C into 'default.tmp_hcat_federated_numbers_part_1' using org.apache.hcatalog.pig.HCatStorer ('part1=pig, part2=hcat_pig_insert', 'id: int,intnum: int,floatnum: float'); {code} Generates the following error when running on a Federated Cluster: {quote} 2012-10-29 20:40:25,011 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: AttemptID:attempt_1348522594824_0846_m_00_3 Info:Error: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:479) at org.apache.hadoop.hive.ql.io.RCFile$Writer.init(RCFile.java:723) at org.apache.hadoop.hive.ql.io.RCFile$Writer.init(RCFile.java:705) at org.apache.hadoop.hive.ql.io.RCFileOutputFormat.getRecordWriter(RCFileOutputFormat.java:86) at org.apache.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:100) at org.apache.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:228) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.init(MapTask.java:587) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:706) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3648) HiveMetaStoreFsImpl is not compatible with hadoop viewfs
[ https://issues.apache.org/jira/browse/HIVE-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3648: --- Assignee: Arup Malakar Status: Open (was: Patch Available) Arup, All the tests passed. But, patch now conflicts because of HIVE-3645 commit. Can you refresh the patch on trunk? HiveMetaStoreFsImpl is not compatible with hadoop viewfs Key: HIVE-3648 URL: https://issues.apache.org/jira/browse/HIVE-3648 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.9.0, 0.10.0 Reporter: Kihwal Lee Assignee: Arup Malakar Attachments: HIVE_3648_branch_0.patch, HIVE-3648-trunk-0.patch, HIVE_3648_trunk_1.patch HiveMetaStoreFsImpl#deleteDir() method calls Trash#moveToTrash(). This may not work when viewfs is used. It needs to call Trash#moveToAppropriateTrash() instead. Please note that this method is not available in hadoop versions earlier than 0.23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3742) The derby metastore schema script for 0.10.0 doesn't run
[ https://issues.apache.org/jira/browse/HIVE-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-3742. Resolution: Fixed Fix Version/s: 0.10.0 Committed to trunk and 0.10. Thanks, Prasad! The derby metastore schema script for 0.10.0 doesn't run Key: HIVE-3742 URL: https://issues.apache.org/jira/browse/HIVE-3742 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.10.0 Attachments: HIVE-3742-2.patch, HIVE-3742.patch The hive-schema-0.10.0.derby.sql contains incorrect alter statement for SKEWED_STRING_LIST which causes the script execution to fail -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3709) Stop storing default ConfVars in temp file
[ https://issues.apache.org/jira/browse/HIVE-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504814#comment-13504814 ] Kevin Wilfong commented on HIVE-3709: - It looks like that fixes the issue on a single thread where it ends up reading from the same InputStream repeatedly, which is why I overrode the close method to reset the InputStream. It does not look like it will fix the multi-threaded issue. If two threads get Configuration objects constructed using the copy constructor, and hence get the same InputStream since the resources are not cloned themselves, and loadResources has not been called before the copy constructor, it looks like it could be possible that both threads call loadResources at about the same time causing the issues Carl was seeing in TestHiveServerSessions. Stop storing default ConfVars in temp file -- Key: HIVE-3709 URL: https://issues.apache.org/jira/browse/HIVE-3709 Project: Hive Issue Type: Improvement Components: Configuration Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3709.1.patch.txt, HIVE-3709.2.patch.txt, HIVE-3709.3.patch.txt To work around issues with Hadoop's Configuration object, specifically it's addResource(InputStream), default configurations are written to a temp file (I think HIVE-2362 introduced this). This, however, introduces the problem that once that file is deleted from /tmp the client crashes. This is particularly problematic for long running services like the metastore server. Writing a custom InputStream to deal with the problems in the Configuration object should provide a work around, which does not introduce a time bomb into Hive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3648) HiveMetaStoreFsImpl is not compatible with hadoop viewfs
[ https://issues.apache.org/jira/browse/HIVE-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arup Malakar updated HIVE-3648: --- Attachment: HIVE-3648-trunk-1.patch Thanks Ashutosh for looking into the patch. I have updated the patch to reflect the last commit. HiveMetaStoreFsImpl is not compatible with hadoop viewfs Key: HIVE-3648 URL: https://issues.apache.org/jira/browse/HIVE-3648 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.9.0, 0.10.0 Reporter: Kihwal Lee Assignee: Arup Malakar Attachments: HIVE_3648_branch_0.patch, HIVE-3648-trunk-0.patch, HIVE_3648_trunk_1.patch, HIVE-3648-trunk-1.patch HiveMetaStoreFsImpl#deleteDir() method calls Trash#moveToTrash(). This may not work when viewfs is used. It needs to call Trash#moveToAppropriateTrash() instead. Please note that this method is not available in hadoop versions earlier than 0.23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3645) RCFileWriter does not implement the right function to support Federation
[ https://issues.apache.org/jira/browse/HIVE-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504846#comment-13504846 ] Arup Malakar commented on HIVE-3645: Thanks Ashutosh for looking into the patch. If the branch patch looks fine can you please commit this to 0.9 branch as well? RCFileWriter does not implement the right function to support Federation Key: HIVE-3645 URL: https://issues.apache.org/jira/browse/HIVE-3645 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0, 0.10.0 Environment: Hadoop 0.23.3 federation, Hive 0.9 and Pig 0.10 Reporter: Viraj Bhat Assignee: Arup Malakar Fix For: 0.11 Attachments: HIVE_3645_branch_0.patch, HIVE_3645_trunk_0.patch Create a table using Hive DDL {code} CREATE TABLE tmp_hcat_federated_numbers_part_1 ( id int, intnum int, floatnum float )partitioned by ( part1string, part2string ) STORED AS rcfile LOCATION 'viewfs:///database/tmp_hcat_federated_numbers_part_1'; {code} Populate it using Pig: {code} A = load 'default.numbers_pig' using org.apache.hcatalog.pig.HCatLoader(); B = filter A by id = 500; C = foreach B generate (int)id, (int)intnum, (float)floatnum; store C into 'default.tmp_hcat_federated_numbers_part_1' using org.apache.hcatalog.pig.HCatStorer ('part1=pig, part2=hcat_pig_insert', 'id: int,intnum: int,floatnum: float'); {code} Generates the following error when running on a Federated Cluster: {quote} 2012-10-29 20:40:25,011 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: AttemptID:attempt_1348522594824_0846_m_00_3 Info:Error: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:479) at org.apache.hadoop.hive.ql.io.RCFile$Writer.init(RCFile.java:723) at org.apache.hadoop.hive.ql.io.RCFile$Writer.init(RCFile.java:705) at org.apache.hadoop.hive.ql.io.RCFileOutputFormat.getRecordWriter(RCFileOutputFormat.java:86) at org.apache.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:100) at org.apache.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:228) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.init(MapTask.java:587) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:706) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3678: - Attachment: HIVE-3678.4.patch.txt Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, HIVE-3678.3.patch.txt, HIVE-3678.4.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504852#comment-13504852 ] Shreepadma Venugopalan commented on HIVE-3678: -- Uploaded patch rebased off tip of trunk. Thanks. Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, HIVE-3678.3.patch.txt, HIVE-3678.4.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3748) QTestUtil should correctly find data files when running in the build directory
Mikhail Bautin created HIVE-3748: Summary: QTestUtil should correctly find data files when running in the build directory Key: HIVE-3748 URL: https://issues.apache.org/jira/browse/HIVE-3748 Project: Hive Issue Type: Improvement Reporter: Mikhail Bautin Priority: Minor Some parts of the the TestCliDriver test suite (i.e. some jar lookups) require that the current directory is set to the build directory. This makes QTestUtil correctly find data files when running either in the Hive source root or in the build directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3748) QTestUtil should correctly find data files when running in the build directory
[ https://issues.apache.org/jira/browse/HIVE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3748: -- Attachment: D7005.1.patch mbautin requested code review of [jira] [HIVE-3748] QTestUtil should correctly find data files when running in the build directory. Reviewers: ashutoshc, JIRA, njain Some parts of the the TestCliDriver test suite (i.e. some jar lookups) require that the current directory is set to the build directory. This makes QTestUtil correctly find data files when running either in the Hive source root or in the build directory. TEST PLAN Run TestCliDriver REVISION DETAIL https://reviews.facebook.net/D7005 AFFECTED FILES ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/16521/ To: ashutoshc, JIRA, njain, mbautin QTestUtil should correctly find data files when running in the build directory -- Key: HIVE-3748 URL: https://issues.apache.org/jira/browse/HIVE-3748 Project: Hive Issue Type: Improvement Reporter: Mikhail Bautin Priority: Minor Attachments: D7005.1.patch Some parts of the the TestCliDriver test suite (i.e. some jar lookups) require that the current directory is set to the build directory. This makes QTestUtil correctly find data files when running either in the Hive source root or in the build directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504869#comment-13504869 ] Ashutosh Chauhan commented on HIVE-3678: Thanks, Shreepadma for updating patch. Running tests now. Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, HIVE-3678.3.patch.txt, HIVE-3678.4.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3648) HiveMetaStoreFsImpl is not compatible with hadoop viewfs
[ https://issues.apache.org/jira/browse/HIVE-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-3648. Resolution: Fixed Fix Version/s: 0.11 Committed to trunk. Thanks, Arup! HiveMetaStoreFsImpl is not compatible with hadoop viewfs Key: HIVE-3648 URL: https://issues.apache.org/jira/browse/HIVE-3648 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.9.0, 0.10.0 Reporter: Kihwal Lee Assignee: Arup Malakar Fix For: 0.11 Attachments: HIVE_3648_branch_0.patch, HIVE-3648-trunk-0.patch, HIVE_3648_trunk_1.patch, HIVE-3648-trunk-1.patch HiveMetaStoreFsImpl#deleteDir() method calls Trash#moveToTrash(). This may not work when viewfs is used. It needs to call Trash#moveToAppropriateTrash() instead. Please note that this method is not available in hadoop versions earlier than 0.23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3665) Allow URIs without port to be specified in metatool
[ https://issues.apache.org/jira/browse/HIVE-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504890#comment-13504890 ] Ashutosh Chauhan commented on HIVE-3665: +1 Allow URIs without port to be specified in metatool --- Key: HIVE-3665 URL: https://issues.apache.org/jira/browse/HIVE-3665 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-3665.1.patch.txt Metatool should accept input URIs where one URI contains a port and the other doesn't. While metatool today accepts input URIs without the port when both the input URIs (oldLoc and newLoc) don't contain the port, we should make the tool a little more flexible to allow for the case where one URI contains a valid port and the other input URI doesn't. This makes more sense when transitioning to HA and a user chooses to specify the port as part of the oldLoc, but the port doesn't mean much for the newLoc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #212
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/212/ -- [...truncated 36470 lines...] [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/jenkins/hive_2012-11-27_12-43-58_494_8381877270687109129/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211271244_1754915096.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] Copying file: file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath '/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath '/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/jenkins/hive_2012-11-27_12-44-02_388_4658100672971353387/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/jenkins/hive_2012-11-27_12-44-02_388_4658100672971353387/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211271244_1481136741.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211271244_1247248829.txt [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211271244_1449552180.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] Copying file:
[jira] [Commented] (HIVE-3746) TRowSet resultset structure should be column-oriented
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504959#comment-13504959 ] Phil Prudich commented on HIVE-3746: To make sure I'm reading the new thrift definitions correctly -- does this mean that all rows' column 1 values will come first on the wire, and then be followed by all rows' values for column 2, and so on? I clearly see how this would save bytes on the wire. However, any client trying to return rows one-at-a-time to an application would be required to read, process, and buffer almost an entire reply-worth of data before being able to return the first complete row. I'm unfamiliar with the server code; but similar buffering may be needed there as well. Is my understanding of the issue correct? TRowSet resultset structure should be column-oriented - Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: Server Infrastructure Reporter: Carl Steinbach Assignee: Carl Steinbach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3734) Static partition DML create duplicate files and records
[ https://issues.apache.org/jira/browse/HIVE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504962#comment-13504962 ] Ashutosh Chauhan commented on HIVE-3734: Gang, I fail to see a bug here. You didn't show how you created the srcpart, but I assume you did similar to following: {code} create table srcpart (key string, value string) partitioned by (ds string, hr string); load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' overwrite into table srcpart partition (ds='2008-04-08', hr='11'); load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' overwrite into table srcpart partition (ds='2008-04-08', hr='12'); load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' overwrite into table srcpart partition (ds='2008-04-09', hr='11'); load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' overwrite into table srcpart partition (ds='2008-04-09', hr='12'); {code} If so, in your insert statement, you are going to select all the rows from srcpart corresponding to ds=2008-04-08 which includes rows corresponding to both hr=11 and hr=12 and then insert into testtable in partition ds='2008-04-08', hr='11'. This implies rows corresponding to hr=12 in srcpart will be in hr=11 in testtable. Then if you are going to do select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; you will get two rows since hr='11' in testable has rows from hr='12' also of srcpart. This is expected. This is how partitioning has always worked in Hive. To be doubly sure, I also checked on hive-0.9, it has same behavior. Though, I agree it is bit confusing. Static partition DML create duplicate files and records --- Key: HIVE-3734 URL: https://issues.apache.org/jira/browse/HIVE-3734 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Gang Tim Liu Static DML create duplicate files and record. Given the following test case, hive will return 2 records: 484 val_484 484 val_484 but srcpart returns one record: 484 val_484 If you look at file system, DML generates duplicate file with the same content: -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0 Test Case === set hive.mapred.supports.subdirectories=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.merge.mapfiles=false; set hive.merge.mapredfiles=false; set mapred.input.dir.recursive=true; create table testtable (key String, value String) partitioned by (ds String, hr String) ; explain extended insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; desc formatted testtable partition (ds='2008-04-08', hr='11'); select count(1) from srcpart where ds='2008-04-08'; select count(1) from testtable where ds='2008-04-08'; select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 484; explain extended select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; === -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3734) Static partition DML create duplicate files and records
[ https://issues.apache.org/jira/browse/HIVE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu resolved HIVE-3734. Resolution: Not A Problem Static partition DML create duplicate files and records --- Key: HIVE-3734 URL: https://issues.apache.org/jira/browse/HIVE-3734 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Gang Tim Liu Static DML create duplicate files and record. Given the following test case, hive will return 2 records: 484 val_484 484 val_484 but srcpart returns one record: 484 val_484 If you look at file system, DML generates duplicate file with the same content: -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0 Test Case === set hive.mapred.supports.subdirectories=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.merge.mapfiles=false; set hive.merge.mapredfiles=false; set mapred.input.dir.recursive=true; create table testtable (key String, value String) partitioned by (ds String, hr String) ; explain extended insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; desc formatted testtable partition (ds='2008-04-08', hr='11'); select count(1) from srcpart where ds='2008-04-08'; select count(1) from testtable where ds='2008-04-08'; select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 484; explain extended select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; === -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3724) Metastore tests use hardcoded ports
[ https://issues.apache.org/jira/browse/HIVE-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-3724: Status: Patch Available (was: Open) Metastore tests use hardcoded ports --- Key: HIVE-3724 URL: https://issues.apache.org/jira/browse/HIVE-3724 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Priority: Minor Attachments: HIVE-3724.1.patch.txt Several of the metastore tests use hardcoded ports for remote metastore Thrift servers. This is causing transient failures in Jenkins, e.g. https://builds.apache.org/job/Hive-trunk-h0.21/1804/ A few tests already dynamically determine free ports, and this logic can be shared. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3726) History file closed in finalize method
[ https://issues.apache.org/jira/browse/HIVE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505047#comment-13505047 ] Gunther Hagleitner commented on HIVE-3726: -- Good points. I'll look into it. History file closed in finalize method -- Key: HIVE-3726 URL: https://issues.apache.org/jira/browse/HIVE-3726 Project: Hive Issue Type: Bug Affects Versions: 0.9.0, 0.10.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-3726.2-r1411423.patch, HIVE-3736.1-r1411423.patch TestCliNegative fails intermittently because it's up to the garbage collector to close History files. This is only a problem if you deal with a lot of SessionState objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3648) HiveMetaStoreFsImpl is not compatible with hadoop viewfs
[ https://issues.apache.org/jira/browse/HIVE-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505061#comment-13505061 ] Arup Malakar commented on HIVE-3648: Ashutosh, thanks for committing in trunk. Can you commit it to branch-0.9 as well? I will provide the rebased patch once HIVE-3645 is committed for branch. HiveMetaStoreFsImpl is not compatible with hadoop viewfs Key: HIVE-3648 URL: https://issues.apache.org/jira/browse/HIVE-3648 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.9.0, 0.10.0 Reporter: Kihwal Lee Assignee: Arup Malakar Fix For: 0.11 Attachments: HIVE_3648_branch_0.patch, HIVE-3648-trunk-0.patch, HIVE_3648_trunk_1.patch, HIVE-3648-trunk-1.patch HiveMetaStoreFsImpl#deleteDir() method calls Trash#moveToTrash(). This may not work when viewfs is used. It needs to call Trash#moveToAppropriateTrash() instead. Please note that this method is not available in hadoop versions earlier than 0.23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3676) INSERT INTO regression caused by HIVE-3465
[ https://issues.apache.org/jira/browse/HIVE-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505069#comment-13505069 ] Navis commented on HIVE-3676: - @Ashutosh, Newly added test case seemed not deterministic. I'll fix this in another issue. Sorry. INSERT INTO regression caused by HIVE-3465 -- Key: HIVE-3676 URL: https://issues.apache.org/jira/browse/HIVE-3676 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Carl Steinbach Assignee: Navis Fix For: 0.10.0 Attachments: HIVE-3676.D6741.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3749) New test cases added by HIVE-3676 in insert1.q is not deterministic
Navis created HIVE-3749: --- Summary: New test cases added by HIVE-3676 in insert1.q is not deterministic Key: HIVE-3749 URL: https://issues.apache.org/jira/browse/HIVE-3749 Project: Hive Issue Type: Test Components: Tests Reporter: Navis Assignee: Navis The test case inserts two row and selects those all. But the displaying order can be different from env to env. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3676) INSERT INTO regression caused by HIVE-3465
[ https://issues.apache.org/jira/browse/HIVE-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505076#comment-13505076 ] Ashutosh Chauhan commented on HIVE-3676: Thanks, Navis for taking a look. Yeah, it seems non-deterministic. It passed for me on one machine, but failed on other two. Appreciate your help. If you are going to open a new jira to fix this, feel free to resolve this one. INSERT INTO regression caused by HIVE-3465 -- Key: HIVE-3676 URL: https://issues.apache.org/jira/browse/HIVE-3676 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Carl Steinbach Assignee: Navis Fix For: 0.10.0 Attachments: HIVE-3676.D6741.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1820 - Still Failing
Changes for Build #1775 [namit] HIVE-3673 Sort merge join not used when join columns have different names (Kevin Wilfong via namit) Changes for Build #1776 [kevinwilfong] HIVE-3627. eclipse misses library: javolution-@javolution-version@.jar. (Gang Tim Liu via kevinwilfong) Changes for Build #1777 [kevinwilfong] HIVE-3524. Storing certain Exception objects thrown in HiveMetaStore.java in MetaStoreEndFunctionContext. (Maheshwaran Srinivasan via kevinwilfong) [cws] HIVE-1977. DESCRIBE TABLE syntax doesn't support specifying a database qualified table name (Zhenxiao Luo via cws) [cws] HIVE-3674. Test case TestParse broken after recent checkin (Sambavi Muthukrishnan via cws) Changes for Build #1778 [cws] HIVE-1362. Column level scalar valued statistics on Tables and Partitions (Shreepadma Venugopalan via cws) Changes for Build #1779 Changes for Build #1780 [kevinwilfong] HIVE-3686. Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524. (Shreepadma Venugopalan via kevinwilfong) Changes for Build #1781 [namit] HIVE-3687 smb_mapjoin_13.q is nondeterministic (Kevin Wilfong via namit) Changes for Build #1782 [hashutosh] HIVE-2715: Upgrade Thrift dependency to 0.9.0 (Ashutosh Chauhan) Changes for Build #1783 [kevinwilfong] HIVE-3654. block relative path access in hive. (njain via kevinwilfong) [hashutosh] HIVE-3658 : Unable to generate the Hbase related unit tests using velocity templates on Windows (Kanna Karanam via Ashutosh Chauhan) [hashutosh] HIVE-3661 : Remove the Windows specific = related swizzle path changes from Proxy FileSystems (Kanna Karanam via Ashutosh Chauhan) [hashutosh] HIVE-3480 : Resource leak: Fix the file handle leaks in Symbolic Symlink related input formats. (Kanna Karanam via Ashutosh Chauhan) Changes for Build #1784 [kevinwilfong] HIVE-3675. NaN does not work correctly for round(n). (njain via kevinwilfong) [cws] HIVE-3651. bucketmapjoin?.q tests fail with hadoop 0.23 (Prasad Mujumdar via cws) Changes for Build #1785 [namit] HIVE-3613 Implement grouping_id function (Ian Gorbachev via namit) [namit] HIVE-3692 Update parallel test documentation (Ivan Gorbachev via namit) [namit] HIVE-3649 Hive List Bucketing - enhance DDL to specify list bucketing table (Gang Tim Liu via namit) Changes for Build #1786 [namit] HIVE-3696 Revert HIVE-3483 which causes performance regression (Gang Tim Liu via namit) Changes for Build #1787 [kevinwilfong] HIVE-3621. Make prompt in Hive CLI configurable. (Jingwei Lu via kevinwilfong) [kevinwilfong] HIVE-3695. TestParse breaks due to HIVE-3675. (njain via kevinwilfong) Changes for Build #1788 [kevinwilfong] HIVE-3557. Access to external URLs in hivetest.py. (Ivan Gorbachev via kevinwilfong) Changes for Build #1789 [hashutosh] HIVE-3662 : TestHiveServer: testScratchDirShouldClearWhileStartup is failing on Windows (Kanna Karanam via Ashutosh Chauhan) [hashutosh] HIVE-3659 : TestHiveHistory::testQueryloglocParentDirNotExist Test fails on Windows because of some resource leaks in ZK (Kanna Karanam via Ashutosh Chauhan) [hashutosh] HIVE-3663 Unable to display the MR Job file path on Windows in case of MR job failures. (Kanna Karanam via Ashutosh Chauhan) Changes for Build #1790 Changes for Build #1791 Changes for Build #1792 Changes for Build #1793 [hashutosh] HIVE-3704 : name of some metastore scripts are not per convention (Ashutosh Chauhan) Changes for Build #1794 [hashutosh] HIVE-3243 : ignore white space between entries of hive/hbase table mapping (Shengsheng Huang via Ashutosh Chauhan) [hashutosh] HIVE-3215 : JobDebugger should use RunningJob.getTrackingURL (Bhushan Mandhani via Ashutosh Chauhan) Changes for Build #1795 [cws] HIVE-3437. 0.23 compatibility: fix unit tests when building against 0.23 (Chris Drome via cws) [hashutosh] HIVE-3626 : RetryingHMSHandler should wrap JDOException inside MetaException (Bhushan Mandhani via Ashutosh Chauhan) [hashutosh] HIVE-3560 : Hive always prints a warning message when using remote metastore (Travis Crawford via Ashutosh Chauhan) Changes for Build #1796 Changes for Build #1797 [hashutosh] HIVE-3664 : Avoid to create a symlink for hive-contrib.jar file in dist\lib folder. (Kanna Karanam via Ashutosh Chauhan) Changes for Build #1798 [namit] HIVE-3706 getBoolVar in FileSinkOperator can be optimized (Kevin Wilfong via namit) [namit] HIVE-3707 Round map/reduce progress down when it is in the range [99.5, 100) (Kevin Wilfong via namit) [namit] HIVE-3471 Implement grouping sets in hive (Ivan Gorbachev via namit) Changes for Build #1799 [hashutosh] HIVE-3291 : fix fs resolvers (Ashish Singh via Ashutosh Chauhan) [hashutosh] HIVE-3680 : Include Table information in Hive's AddPartitionEvent. (Mithun Radhakrishnan via Ashutosh Chauhan) Changes for Build #1800 [hashutosh] HIVE-3520 : ivysettings.xml does not let you override .m2/repository (Raja Aluri via Ashutosh Chauhan) [hashutosh] HIVE-3435 : Get pdk pluginTest
[jira] [Updated] (HIVE-3749) New test cases added by HIVE-3676 in insert1.q is not deterministic
[ https://issues.apache.org/jira/browse/HIVE-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3749: Status: Patch Available (was: Open) New test cases added by HIVE-3676 in insert1.q is not deterministic --- Key: HIVE-3749 URL: https://issues.apache.org/jira/browse/HIVE-3749 Project: Hive Issue Type: Test Components: Tests Reporter: Navis Assignee: Navis Attachments: HIVE-3749.D7011.1.patch The test case inserts two row and selects those all. But the displaying order can be different from env to env. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3749) New test cases added by HIVE-3676 in insert1.q is not deterministic
[ https://issues.apache.org/jira/browse/HIVE-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3749: -- Attachment: HIVE-3749.D7011.1.patch navis requested code review of HIVE-3749 [jira] New test cases added by HIVE-3676 in insert1.q is not deterministic. Reviewers: JIRA DPAL-1933 New test cases added by HIVE-3676 in insert1.q is not deterministic The test case inserts two row and selects those all. But the displaying order can be different from env to env. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D7011 AFFECTED FILES ql/src/test/queries/clientpositive/insert1.q ql/src/test/results/clientpositive/insert1.q.out MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/16533/ To: JIRA, navis New test cases added by HIVE-3676 in insert1.q is not deterministic --- Key: HIVE-3749 URL: https://issues.apache.org/jira/browse/HIVE-3749 Project: Hive Issue Type: Test Components: Tests Reporter: Navis Assignee: Navis Attachments: HIVE-3749.D7011.1.patch The test case inserts two row and selects those all. But the displaying order can be different from env to env. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3750) JDBCStatsPublisher fails when ID length exceeds length of ID column
Kevin Wilfong created HIVE-3750: --- Summary: JDBCStatsPublisher fails when ID length exceeds length of ID column Key: HIVE-3750 URL: https://issues.apache.org/jira/browse/HIVE-3750 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.11 Reporter: Kevin Wilfong Assignee: Kevin Wilfong When the length of the ID field passed to JDBCStatsPublisher exceeds the length of the column in the table (currently 255 characters) stats collection fails. This causes the entire query to fail when hive.stats.reliable is set to true. One way to prevent this would be to calculate a deterministic, very low collision hash of the ID prefix used for aggregation and use that when the length of the ID is too long. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3749) New test cases added by HIVE-3676 in insert1.q is not deterministic
[ https://issues.apache.org/jira/browse/HIVE-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505113#comment-13505113 ] Ashutosh Chauhan commented on HIVE-3749: +1 New test cases added by HIVE-3676 in insert1.q is not deterministic --- Key: HIVE-3749 URL: https://issues.apache.org/jira/browse/HIVE-3749 Project: Hive Issue Type: Test Components: Tests Reporter: Navis Assignee: Navis Attachments: HIVE-3749.D7011.1.patch The test case inserts two row and selects those all. But the displaying order can be different from env to env. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3676) INSERT INTO regression caused by HIVE-3465
[ https://issues.apache.org/jira/browse/HIVE-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-3676. Resolution: Fixed This is getting taken care of in HIVE-3749 INSERT INTO regression caused by HIVE-3465 -- Key: HIVE-3676 URL: https://issues.apache.org/jira/browse/HIVE-3676 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Carl Steinbach Assignee: Navis Fix For: 0.10.0 Attachments: HIVE-3676.D6741.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3749) New test cases added by HIVE-3676 in insert1.q is not deterministic
[ https://issues.apache.org/jira/browse/HIVE-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3749: --- Resolution: Fixed Fix Version/s: 0.10.0 Status: Resolved (was: Patch Available) Patch committed to trunk and 0.10. Thanks, Navis! New test cases added by HIVE-3676 in insert1.q is not deterministic --- Key: HIVE-3749 URL: https://issues.apache.org/jira/browse/HIVE-3749 Project: Hive Issue Type: Test Components: Tests Reporter: Navis Assignee: Navis Fix For: 0.10.0 Attachments: HIVE-3749.D7011.1.patch The test case inserts two row and selects those all. But the displaying order can be different from env to env. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3400) Add Retries to Hive MetaStore Connections
[ https://issues.apache.org/jira/browse/HIVE-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505174#comment-13505174 ] Ashutosh Chauhan commented on HIVE-3400: Bhushan, Can you upload the latest patch on the jira too ? Add Retries to Hive MetaStore Connections - Key: HIVE-3400 URL: https://issues.apache.org/jira/browse/HIVE-3400 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Priority: Minor Attachments: HIVE-3400.1.patch.txt Currently, when using Thrift to access the MetaStore, if the Thrift host dies, there is no mechanism to reconnect to some other host even if the MetaStore URIs variable in the Conf contains multiple hosts. Hive should retry and reconnect rather than throwing a communication link error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3400) Add Retries to Hive MetaStore Connections
[ https://issues.apache.org/jira/browse/HIVE-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505175#comment-13505175 ] Bhushan Mandhani commented on HIVE-3400: Yes, I'll do Submit Patch after making one key change that Carl pointed out. Working on that now. Add Retries to Hive MetaStore Connections - Key: HIVE-3400 URL: https://issues.apache.org/jira/browse/HIVE-3400 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Priority: Minor Attachments: HIVE-3400.1.patch.txt Currently, when using Thrift to access the MetaStore, if the Thrift host dies, there is no mechanism to reconnect to some other host even if the MetaStore URIs variable in the Conf contains multiple hosts. Hive should retry and reconnect rather than throwing a communication link error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3678: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk and 0.10 Thanks, Shreepadma! Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, HIVE-3678.3.patch.txt, HIVE-3678.4.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema
[ https://issues.apache.org/jira/browse/HIVE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3712: --- Resolution: Fixed Fix Version/s: 0.10.0 Status: Resolved (was: Patch Available) Taken care of in HIVE-3678 Use varbinary instead of longvarbinary to store min and max column values in column stats schema Key: HIVE-3712 URL: https://issues.apache.org/jira/browse/HIVE-3712 Project: Hive Issue Type: Bug Components: Metastore, Statistics Affects Versions: 0.9.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min and max column values for numeric types takes up 8 bytes and hence doesn't require a BLOB. Storing these values in a BLOB will impact performance without providing much benefits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3665) Allow URIs without port to be specified in metatool
[ https://issues.apache.org/jira/browse/HIVE-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3665: --- Resolution: Fixed Fix Version/s: 0.11 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Shreepadma! Allow URIs without port to be specified in metatool --- Key: HIVE-3665 URL: https://issues.apache.org/jira/browse/HIVE-3665 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.11 Attachments: HIVE-3665.1.patch.txt Metatool should accept input URIs where one URI contains a port and the other doesn't. While metatool today accepts input URIs without the port when both the input URIs (oldLoc and newLoc) don't contain the port, we should make the tool a little more flexible to allow for the case where one URI contains a valid port and the other input URI doesn't. This makes more sense when transitioning to HA and a user chooses to specify the port as part of the oldLoc, but the port doesn't mean much for the newLoc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review request: JIRAs useful for the Shark project
Hi Mikhail, I will take a look into those jiras. Thanks, Ashutosh On Tue, Nov 27, 2012 at 11:43 AM, Mikhail Bautin bautin.mailing.li...@gmail.com wrote: Hello, There are the following review requests pending that are very useful for the Shark project (http://shark.cs.berkeley.edu/). It would be great if someone could take a look and help us get these JIRAs committed. - https://reviews.facebook.net/D6879 (HIVE-3731https://issues.apache.org/jira/browse/HIVE-3731): adding an Ant target to create a Debian package, which allows deploying the patched version of Hive alongside Shark on Debian systems. - https://reviews.facebook.net/D7005 (HIVE-3748https://issues.apache.org/jira/browse/HIVE-3748): making QTestUtil work correctly when running the test suite, which helps with running Hive/Shark unit tests from using Maven. In addition, the following JIRA would make a lot easier to work with Hive for anyone who is using JDK 1.7: - https://reviews.facebook.net/D6873 (HIVE-3384https://issues.apache.org/jira/browse/HIVE-3384): HIVE JDBC module won't compile under JDK1.7 as new methods added in JDBC specification Your help in reviewing/committing these patches is greatly appreciated! Thanks, Mikhail
Transform.java Vs. PhysicalPlanResolver.java
Hello, 1) Does Hive consider a clear-cut distinction between compile-time optimization and run-time optimization? 2) Does anybody know the difference between the optimizations implementing the Transform and the ones implementing the PhysicalPlanResolver? Why such optimizations are they in separate packages? Thanks and Regards, Mahsa
Re: Transform.java Vs. PhysicalPlanResolver.java
Optimizations in the Transform take the operator tree and transform that into a new operator tree. Then the operator tree is broken into various tasks. Physical optimizer takes a task and optimizes/changes the task. Both these optimizations are done at compile time. There is nothing like a runtime optimization right now, the plan does not change dynamically. On 11/28/12 8:54 AM, Mahsa Mofidpoor mofidp...@gmail.com wrote: Hello, 1) Does Hive consider a clear-cut distinction between compile-time optimization and run-time optimization? 2) Does anybody know the difference between the optimizations implementing the Transform and the ones implementing the PhysicalPlanResolver? Why such optimizations are they in separate packages? Thanks and Regards, Mahsa
[jira] [Commented] (HIVE-3552) performant manner for performing cubes and rollups in case of less aggretation
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505230#comment-13505230 ] Namit Jain commented on HIVE-3552: -- This approach wont work for distincts. performant manner for performing cubes and rollups in case of less aggretation -- Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional stage - in the first stage perform the group by assuming there was no cube. Ad another stage, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3552) performant manner for performing cubes and rollups in case of less aggretation
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505231#comment-13505231 ] Namit Jain commented on HIVE-3552: -- https://reviews.facebook.net/D7029 performant manner for performing cubes and rollups in case of less aggretation -- Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional stage - in the first stage perform the group by assuming there was no cube. Ad another stage, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3552) performant manner for performing cubes and rollups in case of less aggregation
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Grover updated HIVE-3552: -- Summary: performant manner for performing cubes and rollups in case of less aggregation (was: performant manner for performing cubes and rollups in case of less aggretation) performant manner for performing cubes and rollups in case of less aggregation -- Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional stage - in the first stage perform the group by assuming there was no cube. Ad another stage, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3596) Regression - HiveConf static variable causes issues in long running JVM instances with /tmp/ data
[ https://issues.apache.org/jira/browse/HIVE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505273#comment-13505273 ] Carl Steinbach commented on HIVE-3596: -- @Chris: Turns out that Kevin has been working on this same problem in HIVE-3709. He was able to get a little bit farther, but his solution seems to have some concurrency issues. If you have time it may be worth looking at his solution and seeing if you can spot threading problem. Regression - HiveConf static variable causes issues in long running JVM instances with /tmp/ data - Key: HIVE-3596 URL: https://issues.apache.org/jira/browse/HIVE-3596 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.8.0, 0.8.1, 0.9.0 Reporter: Chris McConnell Assignee: Chris McConnell Fix For: 0.8.1, 0.9.0, 0.10.0 Attachments: HIVE-3596.patch With Hive 0.8.x, HiveConf was changed to utilize the private, static member confVarURL which points to /tmp/hive-user-tmp_number.xml for job configuration settings. During long running JVMs, such as a Beeswax server, which creates multiple HiveConf objects over time this variable does not properly get updated between jobs and can cause job failure if the OS cleans /tmp/ during a cron job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira