[jira] [Commented] (HIVE-5302) PartitionPruner fails on Avro non-partitioned data
[ https://issues.apache.org/jira/browse/HIVE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780718#comment-13780718 ] Sean Busbey commented on HIVE-5302: --- Arg. Okay, tl;dr: I need to go back to the drawing board on finding a suitable test. Please lower priority or close as appropriate. Long version: In setting up my test case I was too quick to presume AvroSerdeException showing up in the logs was a hard failure. But there does appear to be a non-fatal problem when the partition pruner optimization is working with a non-partitioned avro table. It attempts to make a shadow partition to represent the whole table. Creating this partition relies on an initializer that goes through a code path for instantiating the SerDe based on feedback just from MetaStoreUtils. So the AvroSerDe fails during initialization (and logs a WARN about it with an AvroSerdeException), but since this instance of the serde is never actually used, it doesn't result in a failure. you can see this by even running the basic sanity test: {noformat} $ ant clean package … $ ant -Dmodule=ql -Dtestcase=TestCliDriver -Dqfile=avro_sanity_test.q test … BUILD SUCCESSFUL Total time: 1 minute 15 seconds $ less build/ql/tmp/hive.log {noformat} In the log grep for AvroSerdeException (for me it's line 3198) So sad Sean will need to go back to finding a case where this explodes in a way that stops things. On the matter of query plan bloat, we could isolate related changes to the Avro Serde so long as there's a way to get at table properties during SerDe initialization. That way it could check partition-specific and then fall back to table on its own. I'll worry about that once I find a test case. PartitionPruner fails on Avro non-partitioned data -- Key: HIVE-5302 URL: https://issues.apache.org/jira/browse/HIVE-5302 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.11.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Blocker Labels: avro Attachments: HIVE-5302.1-branch-0.12.patch.txt, HIVE-5302.1.patch.txt, HIVE-5302.1.patch.txt While updating HIVE-3585 I found a test case that causes the failure in the MetaStoreUtils partition retrieval from back in HIVE-4789. in this case, the failure is triggered when the partition pruner is handed a non-partitioned table and has to construct a pseudo-partition. e.g. {code} INSERT OVERWRITE TABLE partitioned_table PARTITION(col) SELECT id, foo, col FROM non_partitioned_table WHERE col = 9; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5391) make ORC predicate pushdown work with vectorization
[ https://issues.apache.org/jira/browse/HIVE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780724#comment-13780724 ] Hive QA commented on HIVE-5391: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12605630/HIVE-5391.01-vectorization.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 4054 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask org.apache.hive.hcatalog.mapreduce.TestHCatExternalDynamicPartitioned.testHCatDynamicPartitionedTable {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/948/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/948/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. make ORC predicate pushdown work with vectorization --- Key: HIVE-5391 URL: https://issues.apache.org/jira/browse/HIVE-5391 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5391.01-vectorization.patch, HIVE-5391-vectorization.patch Vectorized execution doesn't utilize ORC predicate pushdown. It should. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780727#comment-13780727 ] Hive QA commented on HIVE-4561: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12586100/HIVE-4561.3.patch Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/951/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/951/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-951/source-prep.txt + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1527155. At revision 1527155. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0 to p2 + exit 1 ' {noformat} This message is automatically generated. Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: caofangkun Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5395) Various cleanup in ptf code
[ https://issues.apache.org/jira/browse/HIVE-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780725#comment-13780725 ] Hive QA commented on HIVE-5395: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12605636/HIVE-5395.3.patch.txt Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/949/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/949/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-949/source-prep.txt + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1527155. At revision 1527155. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0 to p2 + exit 1 ' {noformat} This message is automatically generated. Various cleanup in ptf code --- Key: HIVE-5395 URL: https://issues.apache.org/jira/browse/HIVE-5395 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5395.1.patch.txt, HIVE-5395.2.patch.txt, HIVE-5395.3.patch.txt Some minor issues: Implementing classes on left side of equals Stack used instead of ArrayDeque Classes defined statically inside other files (when they do not need to be Checkstyle errors like indenting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5395) Various cleanup in ptf code
[ https://issues.apache.org/jira/browse/HIVE-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780726#comment-13780726 ] Hive QA commented on HIVE-5395: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12605636/HIVE-5395.3.patch.txt Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/950/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/950/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-950/source-prep.txt + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1527155. At revision 1527155. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0 to p2 + exit 1 ' {noformat} This message is automatically generated. Various cleanup in ptf code --- Key: HIVE-5395 URL: https://issues.apache.org/jira/browse/HIVE-5395 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5395.1.patch.txt, HIVE-5395.2.patch.txt, HIVE-5395.3.patch.txt Some minor issues: Implementing classes on left side of equals Stack used instead of ArrayDeque Classes defined statically inside other files (when they do not need to be Checkstyle errors like indenting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833
[ https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780749#comment-13780749 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-5298: - [~xuefuz] and [~ashutoshc] : I looked at the exact piece of code and thought of doing a similar optimization mentioned here while looking at one of my jiras, HIVE-5348. It seems like 1. conf.getPathToAliases() gives the path to aliases mapping 2. conf.getPathToPartitionInfo() gives the path to partition info mapping It is clear that (1) and (2) return HashMaps of the same size, say numPaths. In the change [~xuefuz] added the below line , {code:title=MapOperator.java|borderStyle=solid} ... SetPartitionDesc partDescSet = new HashSetPartitionDesc(conf.getPathToPartitionInfo().values()); ... {code} The size of partDescSet returns the number of distinct partitions associated with the map operator. The size of the above partDescSet, say numParts, can be way less than numPaths if a partition comprises of many files. Hence the relatively less # of iterations. Hence, I would +1 since the idea behind this fix looks correct. NB: The contents of the for loop in the original code looks kind of hairy and I am rewriting the contents of the for loop as part of HIVE-5348. Thanks, Hari AvroSerde performance problem caused by HIVE-3833 - Key: HIVE-5298 URL: https://issues.apache.org/jira/browse/HIVE-5298 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.13.0 Attachments: HIVE-5298.1.patch, HIVE-5298.patch HIVE-3833 fixed the targeted problem and made Hive to use partition-level metadata to initialize object inspector. In doing that, however, it goes thru every file under the table to access the partition metadata, which is very inefficient, especially in case of multiple files per partition. This causes more problem for AvroSerde because AvroSerde initialization accesses schema, which is located on file system. As a result, before hive can process any data, it needs to access every file for a table, which can take long enough to cause job failure because of lack of job progress. The improvement can be made so that partition metadata is only access once per partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics
[ https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780753#comment-13780753 ] Hudson commented on HIVE-5324: -- SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #185 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/185/]) HIVE-5324 : Extend record writer and ORC reader/writer interfaces to provide statistics (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527149) * /hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/base64/Base64TextOutputFormat.java * /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java * /hive/trunk/hcatalog/core/src/test/java/org/apache/hcatalog/cli/DummyStorageHandler.java * /hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBaseOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/FSRecordWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveBinaryOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveIgnoreKeyTextOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveNullValueSequenceFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughRecordWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveSequenceFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13OutputFormat.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java Extend record writer and ORC reader/writer interfaces to provide statistics --- Key: HIVE-5324 URL: https://issues.apache.org/jira/browse/HIVE-5324 Project: Hive Issue Type: New Feature Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile, statistics Fix For: 0.13.0 Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt The current implementation for computing statistics (number of rows and raw data size) happens for every single row processed. The processOp() method in FileSinkOperator gets raw data size for each row from the serde and accumulates the size in hashmap while counting the number of rows. This accumulated statistics is then published to metastore. In case of ORC, ORC already stores enough statistics internally which can be made use of when publishing the stats to metastore. This will avoid the duplication of work that is happening in the processOp(). Also getting the statistics directly from ORC is very cheap (can directly read from the file footer). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics
[ https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780763#comment-13780763 ] Hudson commented on HIVE-5324: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #119 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/119/]) HIVE-5324 : Extend record writer and ORC reader/writer interfaces to provide statistics (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527149) * /hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/base64/Base64TextOutputFormat.java * /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java * /hive/trunk/hcatalog/core/src/test/java/org/apache/hcatalog/cli/DummyStorageHandler.java * /hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBaseOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/FSRecordWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveBinaryOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveIgnoreKeyTextOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveNullValueSequenceFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughRecordWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveSequenceFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13OutputFormat.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java Extend record writer and ORC reader/writer interfaces to provide statistics --- Key: HIVE-5324 URL: https://issues.apache.org/jira/browse/HIVE-5324 Project: Hive Issue Type: New Feature Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile, statistics Fix For: 0.13.0 Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt The current implementation for computing statistics (number of rows and raw data size) happens for every single row processed. The processOp() method in FileSinkOperator gets raw data size for each row from the serde and accumulates the size in hashmap while counting the number of rows. This accumulated statistics is then published to metastore. In case of ORC, ORC already stores enough statistics internally which can be made use of when publishing the stats to metastore. This will avoid the duplication of work that is happening in the processOp(). Also getting the statistics directly from ORC is very cheap (can directly read from the file footer). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics
[ https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780764#comment-13780764 ] Hudson commented on HIVE-5324: -- FAILURE: Integrated in Hive-trunk-h0.21 #2364 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2364/]) HIVE-5324 : Extend record writer and ORC reader/writer interfaces to provide statistics (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527149) * /hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/base64/Base64TextOutputFormat.java * /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java * /hive/trunk/hcatalog/core/src/test/java/org/apache/hcatalog/cli/DummyStorageHandler.java * /hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBaseOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/FSRecordWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveBinaryOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveIgnoreKeyTextOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveNullValueSequenceFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughRecordWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveSequenceFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13OutputFormat.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java Extend record writer and ORC reader/writer interfaces to provide statistics --- Key: HIVE-5324 URL: https://issues.apache.org/jira/browse/HIVE-5324 Project: Hive Issue Type: New Feature Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile, statistics Fix For: 0.13.0 Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt The current implementation for computing statistics (number of rows and raw data size) happens for every single row processed. The processOp() method in FileSinkOperator gets raw data size for each row from the serde and accumulates the size in hashmap while counting the number of rows. This accumulated statistics is then published to metastore. In case of ORC, ORC already stores enough statistics internally which can be made use of when publishing the stats to metastore. This will avoid the duplication of work that is happening in the processOp(). Also getting the statistics directly from ORC is very cheap (can directly read from the file footer). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5358) ReduceSinkDeDuplication should ignore column orders when check overlapping part of keys between parent and child
[ https://issues.apache.org/jira/browse/HIVE-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780765#comment-13780765 ] Chun Chen commented on HIVE-5358: - Thanks, [~yhuai]. I got it. Seems like the first method adjust the first GBY to construct its key from both key and value of the reduce input is easier and don't have to waste extra resources to sort the rows. ReduceSinkDeDuplication should ignore column orders when check overlapping part of keys between parent and child Key: HIVE-5358 URL: https://issues.apache.org/jira/browse/HIVE-5358 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Chun Chen Assignee: Chun Chen Attachments: D13113.1.patch, HIVE-5358.2.patch, HIVE-5358.patch {code} select key, value from (select key, value from src group by key, value) t group by key, value; {code} This can be optimized by ReduceSinkDeDuplication {code} select key, value from (select key, value from src group by key, value) t group by value, key; {code} However the sql above can't be optimized by ReduceSinkDeDuplication currently due to different column orders of parent and child operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5297) Hive does not honor type for partition columns
[ https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780793#comment-13780793 ] Hudson commented on HIVE-5297: -- ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/461/]) HIVE-5297 Hive does not honor type for partition columns (Vikram Dixit via Harish Butani) (rhbutani: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527024) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java * /hive/trunk/ql/src/test/queries/clientnegative/illegal_partition_type.q * /hive/trunk/ql/src/test/queries/clientnegative/illegal_partition_type2.q * /hive/trunk/ql/src/test/queries/clientpositive/alter_partition_coltype.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_type_check.q * /hive/trunk/ql/src/test/results/clientnegative/alter_table_add_partition.q.out * /hive/trunk/ql/src/test/results/clientnegative/alter_view_failure5.q.out * /hive/trunk/ql/src/test/results/clientnegative/illegal_partition_type.q.out * /hive/trunk/ql/src/test/results/clientnegative/illegal_partition_type2.q.out * /hive/trunk/ql/src/test/results/clientpositive/alter_partition_coltype.q.out * /hive/trunk/ql/src/test/results/clientpositive/partition_type_check.q.out Hive does not honor type for partition columns -- Key: HIVE-5297 URL: https://issues.apache.org/jira/browse/HIVE-5297 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.12.0 Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch, HIVE-5297.4.patch, HIVE-5297.5.patch, HIVE-5297.6.patch, HIVE-5297.7.patch, HIVE-5297.8.patch Hive does not consider the type of the partition column while writing partitions. Consider for example the query: {noformat} create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); {noformat} Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. We should throw an exception on such user error at the time the partition addition/load happens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5272) Column statistics on a invalid column name results in IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780794#comment-13780794 ] Hudson commented on HIVE-5272: -- ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/461/]) HIVE-5272 : Column statistics on a invalid column name results in IndexOutOfBoundsException (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527078) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java * /hive/trunk/ql/src/test/results/clientnegative/columnstats_tbllvl.q.out * /hive/trunk/ql/src/test/results/clientnegative/columnstats_tbllvl_incorrect_column.q.out Column statistics on a invalid column name results in IndexOutOfBoundsException --- Key: HIVE-5272 URL: https://issues.apache.org/jira/browse/HIVE-5272 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: statistics Fix For: 0.13.0 Attachments: HIVE-5272.1.patch.txt, HIVE-5272.2.patch.txt, HIVE-5272.3.patch.txt, junit-noframes.html When invalid column name is specified for column statistics IndexOutOfBoundsException is thrown. {code}hive analyze table customer_staging compute statistics for columns c_first_name, invalid_name, c_customer_sk; FAILED: IndexOutOfBoundsException Index: 2, Size: 1{code} If the invalid column name appears at first or last then INVALID_COLUMN_REFERENCE is thrown at query planning stage. But if the invalid column name appears somewhere in the middle of column lists then IndexOutOfBoundsException is thrown at semantic analysis step. The problem is with getTableColumnType() and getPartitionColumnType() methods. The following segment {code}for (int i=0; i numCols; i++) { colName = colNames.get(i); for (FieldSchema col: cols) { if (colName.equalsIgnoreCase(col.getName())) { colTypes.add(i, new String(col.getType())); } } }{code} is the reason for it. If the invalid column names appears in the middle of column list then the equalsIgnoreCase() skips the invalid name and increments the i. Since the list is not initialized it results in exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5231) Remove TestSerDe.jar from data/files
[ https://issues.apache.org/jira/browse/HIVE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780796#comment-13780796 ] Hudson commented on HIVE-5231: -- ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/461/]) HIVE-5231 : Remove TestSerDe.jar from data/files (Hari Sankar via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527004) * /hive/trunk/build-common.xml * /hive/trunk/data/files/TestSerDe.jar * /hive/trunk/ql/src/test/queries/clientnegative/deletejar.q * /hive/trunk/ql/src/test/queries/clientnegative/invalid_columns.q * /hive/trunk/ql/src/test/queries/clientpositive/alter1.q * /hive/trunk/ql/src/test/queries/clientpositive/input16.q * /hive/trunk/ql/src/test/queries/clientpositive/input16_cc.q Remove TestSerDe.jar from data/files Key: HIVE-5231 URL: https://issues.apache.org/jira/browse/HIVE-5231 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Fix For: 0.13.0 Attachments: HIVE-5231.1.patch.txt, HIVE-5231.2.patch.txt, HIVE-5231.3.patch.txt, HIVE-5231.4.patch.txt TestSerDe.jar should be removed from data/files. Even though, TestSerDe.java is present in ql/src/test/org/apache/hadoop/hive/serde2/TestSerDe.java, this is never compiled during build process. The jar file should be created as part of build process for testing purpose rather than using a hard-coded jar file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5379) NoClassDefFoundError is thrown when using lead/lag with kryo serialization
[ https://issues.apache.org/jira/browse/HIVE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780792#comment-13780792 ] Hudson commented on HIVE-5379: -- ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/461/]) HIVE-5379 - NoClassDefFoundError is thrown when using lead/lag with kryo serialization (Reviewed By Ashutosh, Contributed by Navis) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1526941) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/LeadLagInfo.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingExprNodeEvaluatorFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java NoClassDefFoundError is thrown when using lead/lag with kryo serialization -- Key: HIVE-5379 URL: https://issues.apache.org/jira/browse/HIVE-5379 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.13.0 Attachments: D13155.1.patch {noformat} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:432) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.NoClassDefFoundError: org/antlr/runtime/tree/TreeWizard$ContextVisitor at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.getDeclaringClass(Native Method) at java.lang.Class.getEnclosingClass(Class.java:1085) at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1054) at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1110) at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:502) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) at
[jira] [Commented] (HIVE-5374) hive-schema-0.13.0.postgres.sql doesn't work
[ https://issues.apache.org/jira/browse/HIVE-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780795#comment-13780795 ] Hudson commented on HIVE-5374: -- ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/461/]) HIVE-5374 : hive-schema-0.13.0.postgres.sql doesn't work (Kousuke Saruta via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527007) * /hive/trunk/metastore/scripts/upgrade/postgres/014-HIVE-3764.postgres.sql * /hive/trunk/metastore/scripts/upgrade/postgres/hive-schema-0.12.0.postgres.sql * /hive/trunk/metastore/scripts/upgrade/postgres/hive-schema-0.13.0.postgres.sql * /hive/trunk/metastore/scripts/upgrade/postgres/upgrade-0.11.0-to-0.12.0.postgres.sql * /hive/trunk/metastore/scripts/upgrade/postgres/upgrade-0.12.0-to-0.13.0.postgres.sql hive-schema-0.13.0.postgres.sql doesn't work Key: HIVE-5374 URL: https://issues.apache.org/jira/browse/HIVE-5374 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Priority: Blocker Fix For: 0.12.0 Attachments: HIVE-5374.1.patch, HIVE-5374.patch.1, HIVE-5374.patch.2 hive-schema-0.13.0.postgres.sql doesn't work. In PostgreSQL, if we double quote keyword (colmn name, table name etc ), those names are treated case sensitively. But in the script, there is a non double quoted table name and column anme although those are double quoted at the definition. {code} CREATE TABLE VERSION ( VER_ID bigint, SCHEMA_VERSION character varying(127) NOT NULL, COMMENT character varying(255) NOT NULL, PRIMARY KEY (VER_ID) ); {code} {code} INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, '0.13.0', 'Hive release version 0.13.0'); {code} Also, the definition above defines column COMMENT but I think it should be named VERSION_COMMENT. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5361) PTest2 should allow a different JVM for compilation versus execution
[ https://issues.apache.org/jira/browse/HIVE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780790#comment-13780790 ] Hudson commented on HIVE-5361: -- ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/461/]) HIVE-5361 - PTest2 should allow a different JVM for compilation versus execution (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1526925) * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/CleanupPhase.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/PTest.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestConfiguration.java * /hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/context/CloudExecutionContextProvider.java * /hive/trunk/testutils/ptest2/src/main/resources/batch-exec.vm * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestCleanupPhase.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestScripts.java * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestScripts.testAlternativeTestJVM.approved.txt * /hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestScripts.testBatch.approved.txt PTest2 should allow a different JVM for compilation versus execution Key: HIVE-5361 URL: https://issues.apache.org/jira/browse/HIVE-5361 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5361.patch NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5357) ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr scenario when there are distinct keys in child GBY
[ https://issues.apache.org/jira/browse/HIVE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780791#comment-13780791 ] Hudson commented on HIVE-5357: -- ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/461/]) HIVE-5357 : ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr scenario when there are distinct keys in child GBY (Chun Chen via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1526990) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java * /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q * /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr scenario when there are distinct keys in child GBY --- Key: HIVE-5357 URL: https://issues.apache.org/jira/browse/HIVE-5357 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Chun Chen Assignee: Chun Chen Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-5357.patch Example: {code} select key, count(distinct value) from (select key, value from src group by key, value) t group by key; //result 0 0 NULL 10 10 NULL 100 100 NULL 103 103 NULL 104 104 NULL {code} Obviously the result is wrong. When we have a simple group by query with a distinct column {code} explain select count(distinct value) from src group by key; {code} The plan is {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: src TableScan alias: src Select Operator expressions: expr: key type: string expr: value type: string outputColumnNames: key, value Group By Operator aggregations: expr: count(DISTINCT value) bucketGroup: false keys: expr: key type: string expr: value type: string mode: hash outputColumnNames: _col0, _col1, _col2 Reduce Output Operator key expressions: expr: _col0 type: string expr: _col1 type: string sort order: ++ Map-reduce partition columns: expr: _col0 type: string tag: -1 value expressions: expr: _col2 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: count(DISTINCT KEY._col1:0._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col1 type: bigint outputColumnNames: _col0 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 {code} The map side GBY also adds the distinct columns (value in this case) to its key columns. When RSDedup optimizes a query involving a GBY with distinct keys, if map-side aggregation is enabled, currently it assigns the map-side GBY's key columns to the reduce-side GBY. So, for the example shown at the beginning, after we generate a plan with a single MR job, the second GBY in the reduce-side uses both key and value as its key columns. The correct key column is key. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics
[ https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780799#comment-13780799 ] Hudson commented on HIVE-5324: -- FAILURE: Integrated in Hive-trunk-hadoop2 #462 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/462/]) HIVE-5324 : Extend record writer and ORC reader/writer interfaces to provide statistics (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527149) * /hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/base64/Base64TextOutputFormat.java * /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java * /hive/trunk/hcatalog/core/src/test/java/org/apache/hcatalog/cli/DummyStorageHandler.java * /hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBaseOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/FSRecordWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveBinaryOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveIgnoreKeyTextOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveNullValueSequenceFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughRecordWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveSequenceFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13OutputFormat.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java Extend record writer and ORC reader/writer interfaces to provide statistics --- Key: HIVE-5324 URL: https://issues.apache.org/jira/browse/HIVE-5324 Project: Hive Issue Type: New Feature Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile, statistics Fix For: 0.13.0 Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt The current implementation for computing statistics (number of rows and raw data size) happens for every single row processed. The processOp() method in FileSinkOperator gets raw data size for each row from the serde and accumulates the size in hashmap while counting the number of rows. This accumulated statistics is then published to metastore. In case of ORC, ORC already stores enough statistics internally which can be made use of when publishing the stats to metastore. This will avoid the duplication of work that is happening in the processOp(). Also getting the statistics directly from ORC is very cheap (can directly read from the file footer). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5302) PartitionPruner fails on Avro non-partitioned data
[ https://issues.apache.org/jira/browse/HIVE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780880#comment-13780880 ] Edward Capriolo commented on HIVE-5302: --- We do not necessarily need a documented testable case in the to justify the change, seeing a non fatal error in the logs is reason enough to apply the patch. {quote} In the matter of query plan bloat, we could isolate related changes to the Avro Serde so long as there's a way to get at table properties during SerDe initialization. That way it could check partition-specific and then fall back to table on its own. I'll worry about that once I find a test case. {quote} I would focus less on finding a test case. We can treat this as an optimization, and take your word that their are cases where the current system does not work. See if you can find this other way to solve this without effecting the plan, i think that is a big win for all parties, if it is not possible there is nothing wrong with committing your original patch in my eyes. PartitionPruner fails on Avro non-partitioned data -- Key: HIVE-5302 URL: https://issues.apache.org/jira/browse/HIVE-5302 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.11.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Blocker Labels: avro Attachments: HIVE-5302.1-branch-0.12.patch.txt, HIVE-5302.1.patch.txt, HIVE-5302.1.patch.txt While updating HIVE-3585 I found a test case that causes the failure in the MetaStoreUtils partition retrieval from back in HIVE-4789. in this case, the failure is triggered when the partition pruner is handed a non-partitioned table and has to construct a pseudo-partition. e.g. {code} INSERT OVERWRITE TABLE partitioned_table PARTITION(col) SELECT id, foo, col FROM non_partitioned_table WHERE col = 9; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics
[ https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780883#comment-13780883 ] Edward Capriolo commented on HIVE-5324: --- Have we considered providing this interface as a property of the object rather then a ctor parameter, people have implemented record-writes outside of hive and this could be a breaking change for them. Are there plans to produce stats in trunk for anything besides orc ? What type of load with publishing stats put on the metastore? Is this feature disabled via hive.stats.publish? Extend record writer and ORC reader/writer interfaces to provide statistics --- Key: HIVE-5324 URL: https://issues.apache.org/jira/browse/HIVE-5324 Project: Hive Issue Type: New Feature Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile, statistics Fix For: 0.13.0 Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt The current implementation for computing statistics (number of rows and raw data size) happens for every single row processed. The processOp() method in FileSinkOperator gets raw data size for each row from the serde and accumulates the size in hashmap while counting the number of rows. This accumulated statistics is then published to metastore. In case of ORC, ORC already stores enough statistics internally which can be made use of when publishing the stats to metastore. This will avoid the duplication of work that is happening in the processOp(). Also getting the statistics directly from ORC is very cheap (can directly read from the file footer). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5395) Various cleanup in ptf code
[ https://issues.apache.org/jira/browse/HIVE-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5395: -- Attachment: HIVE-5395.4.patch.txt Various cleanup in ptf code --- Key: HIVE-5395 URL: https://issues.apache.org/jira/browse/HIVE-5395 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5395.1.patch.txt, HIVE-5395.2.patch.txt, HIVE-5395.3.patch.txt, HIVE-5395.4.patch.txt Some minor issues: Implementing classes on left side of equals Stack used instead of ArrayDeque Classes defined statically inside other files (when they do not need to be Checkstyle errors like indenting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5253) Create component to compile and jar dynamic code
[ https://issues.apache.org/jira/browse/HIVE-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-5253: -- Attachment: HIVE-5253.8.patch.txt Create component to compile and jar dynamic code Key: HIVE-5253 URL: https://issues.apache.org/jira/browse/HIVE-5253 Project: Hive Issue Type: Sub-task Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5253.1.patch.txt, HIVE-5253.3.patch.txt, HIVE-5253.3.patch.txt, HIVE-5253.3.patch.txt, HIVE-5253.8.patch.txt, HIVE-5253.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3925) dependencies of fetch task are not shown by explain
[ https://issues.apache.org/jira/browse/HIVE-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780903#comment-13780903 ] Ashutosh Chauhan commented on HIVE-3925: [~navis] I think we should move forward with this. This is very useful to understand the behavior of query planner. If you refresh your source only patch, I will take care of updating .q files and committing it. dependencies of fetch task are not shown by explain --- Key: HIVE-3925 URL: https://issues.apache.org/jira/browse/HIVE-3925 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Navis Attachments: HIVE-3925.D8577.1.patch, HIVE-3925.D8577.2.patch, HIVE-3925.D8577.3.patch A simple query like: hive explain select * from src order by key; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME src))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_ORDERBY (TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL key) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage Stage: Stage-0 Fetch Operator limit: -1 Stage-0 is not a root stage and depends on stage-1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5220) Add option for removing intermediate directory for partition, which is empty
[ https://issues.apache.org/jira/browse/HIVE-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780906#comment-13780906 ] Ashutosh Chauhan commented on HIVE-5220: I see existing behavior as a bug, which your patch is fixing. Don't see a need for config variable. This should be default. Also, I could be mistaken but I think {{FileSystem}} provides rmr api. Atleast {{FsShell}} provides it. It will be better to reuse those apis instead of writing our own recursive delete. Add option for removing intermediate directory for partition, which is empty Key: HIVE-5220 URL: https://issues.apache.org/jira/browse/HIVE-5220 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5220.D12729.1.patch For deeply nested partitioned table, intermediate directories are not removed even if there is no partitions in it by removing them. {noformat} /deep_part/c=09/d=01 /deep_part/c=09/d=01/e=01 /deep_part/c=09/d=01/e=02 /deep_part/c=09/d=02 /deep_part/c=09/d=02/e=01 /deep_part/c=09/d=02/e=02 {noformat} After removing partition (c='09'), directory remains like this, {noformat} /deep_part/c=09/d=01 /deep_part/c=09/d=02 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5395) Various cleanup in ptf code
[ https://issues.apache.org/jira/browse/HIVE-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780907#comment-13780907 ] Hive QA commented on HIVE-5395: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12605658/HIVE-5395.4.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 3179 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/952/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/952/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. Various cleanup in ptf code --- Key: HIVE-5395 URL: https://issues.apache.org/jira/browse/HIVE-5395 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5395.1.patch.txt, HIVE-5395.2.patch.txt, HIVE-5395.3.patch.txt, HIVE-5395.4.patch.txt Some minor issues: Implementing classes on left side of equals Stack used instead of ArrayDeque Classes defined statically inside other files (when they do not need to be Checkstyle errors like indenting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5178) Wincompat : QTestUtil changes
[ https://issues.apache.org/jira/browse/HIVE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780909#comment-13780909 ] Ashutosh Chauhan commented on HIVE-5178: +1 Wincompat : QTestUtil changes - Key: HIVE-5178 URL: https://issues.apache.org/jira/browse/HIVE-5178 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-5178.2.patch, HIVE-5178.patch Miscellaneous QTestUtil changes are needed to make tests work under windows: a) Aux jars needed to be set up for minimr b) Ignore empty test lines if windows -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4837) Union on void type fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780910#comment-13780910 ] Ashutosh Chauhan commented on HIVE-4837: +1 Union on void type fails with NPE - Key: HIVE-4837 URL: https://issues.apache.org/jira/browse/HIVE-4837 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4837.D11649.1.patch From mailing list, http://www.mail-archive.com/user@hive.apache.org/msg08683.html {noformat} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 22 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:64) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:100) ... 22 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE
[ https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HIVE-4501: - Status: Open (was: Patch Available) HS2 memory leak - FileSystem objects in FileSystem.CACHE Key: HIVE-4501 URL: https://issues.apache.org/jira/browse/HIVE-4501 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Critical Attachments: HIVE-4501.1.patch, HIVE-4501.1.patch, HIVE-4501.1.patch, HIVE-4501.trunk.patch org.apache.hadoop.fs.FileSystem objects are getting accumulated in FileSystem.CACHE, with HS2 in unsecure mode. As a workaround, it is possible to set fs.hdfs.impl.disable.cache and fs.file.impl.disable.cache to true. Users should not have to bother with this extra configuration. As a workaround disable impersonation by setting hive.server2.enable.doAs to false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE
[ https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HIVE-4501: - Attachment: HIVE-4501.trunk.patch HS2 memory leak - FileSystem objects in FileSystem.CACHE Key: HIVE-4501 URL: https://issues.apache.org/jira/browse/HIVE-4501 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Critical Attachments: HIVE-4501.1.patch, HIVE-4501.1.patch, HIVE-4501.1.patch, HIVE-4501.trunk.patch org.apache.hadoop.fs.FileSystem objects are getting accumulated in FileSystem.CACHE, with HS2 in unsecure mode. As a workaround, it is possible to set fs.hdfs.impl.disable.cache and fs.file.impl.disable.cache to true. Users should not have to bother with this extra configuration. As a workaround disable impersonation by setting hive.server2.enable.doAs to false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE
[ https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HIVE-4501: - Status: Patch Available (was: Open) Here's the trunk patch. The original one wasn't applicable for the trunk version and should have been clearly marked as such. Apologies. HS2 memory leak - FileSystem objects in FileSystem.CACHE Key: HIVE-4501 URL: https://issues.apache.org/jira/browse/HIVE-4501 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Critical Attachments: HIVE-4501.1.patch, HIVE-4501.1.patch, HIVE-4501.1.patch, HIVE-4501.trunk.patch org.apache.hadoop.fs.FileSystem objects are getting accumulated in FileSystem.CACHE, with HS2 in unsecure mode. As a workaround, it is possible to set fs.hdfs.impl.disable.cache and fs.file.impl.disable.cache to true. Users should not have to bother with this extra configuration. As a workaround disable impersonation by setting hive.server2.enable.doAs to false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780916#comment-13780916 ] Ashutosh Chauhan commented on HIVE-3972: [~navis] HIVE-3562 and HIVE-1402 are in now. In light of that, is this optimization still relevant? Are there any queries which may see still further benefits from this patch even after both of those optimizations are on. Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780918#comment-13780918 ] Ashutosh Chauhan commented on HIVE-3959: This is useful work. [~bmadhvani] / [~gangtimliu] Do you guys want to refresh this patch? Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3930) Generate and publish source jars
[ https://issues.apache.org/jira/browse/HIVE-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13781187#comment-13781187 ] Konstantin Boudnik commented on HIVE-3930: -- Any chance to have this fix on 0.12 (or trunk?) Generate and publish source jars Key: HIVE-3930 URL: https://issues.apache.org/jira/browse/HIVE-3930 Project: Hive Issue Type: Improvement Reporter: Mikhail Bautin Hive should generate and publish source jars to Maven. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-3694) Generate test jars and publish them to Maven
[ https://issues.apache.org/jira/browse/HIVE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HIVE-3694: - Affects Version/s: 0.9.0 Generate test jars and publish them to Maven Key: HIVE-3694 URL: https://issues.apache.org/jira/browse/HIVE-3694 Project: Hive Issue Type: Improvement Components: Build Infrastructure Affects Versions: 0.9.0 Reporter: Mikhail Bautin Priority: Minor Attachments: D6843.1.patch, D6843.2.patch, D6843.3.patch, D6843.4.patch It should be possible to generate Hive test jars and publish them to Maven so that other projects that rely on Hive or extend it could reuse its test library. -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: did you always have to log in to phabricator
Bump. Any update on this? On Tue, Sep 17, 2013 at 12:41 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I do not like this. It is inconvenience when using a mobile device, but more importantly it does not seem very transparent to our end users. For example, a user is browsing jira they may want to review the code only on review board (not yet attached to the issue), they should not be forced to sign up to help in the process. Would anyone from facebook care to chime in here? I think we all like fabricator for the most part. Our docs suggest this fabricator is our de-facto review system. As an ASF project I do not think requiring a login on some external service even to review a jira is correct. On Tue, Sep 17, 2013 at 12:27 PM, Xuefu Zhang xzh...@cloudera.com wrote: Yeah. I used to be able to view w/o login, but now I am not. On Tue, Sep 17, 2013 at 7:27 AM, Brock Noland br...@cloudera.com wrote: Personally I prefer Review Board. On Tue, Sep 17, 2013 at 8:31 AM, Edward Capriolo edlinuxg...@gmail.com wrote: I never remeber having to log into phabricator to view a patch. Has this changed recently? I believe that having to create an external account to view a patch in progress is not something we should be doing. -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org -- Sean