[jira] [Commented] (HIVE-6298) Add config flag to turn off fetching partition stats
[ https://issues.apache.org/jira/browse/HIVE-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880798#comment-13880798 ] Gunther Hagleitner commented on HIVE-6298: -- Thanks for taking up adding this particular flag as well, [~prasanth_j]. [~leftylev] thanks for the reminder. I wasn't sure that this particular flag needs to be documented (mostly for developers when debugging), but I think you're right. It should get added. Add config flag to turn off fetching partition stats Key: HIVE-6298 URL: https://issues.apache.org/jira/browse/HIVE-6298 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6298.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HIVE-5883) Plan is deserialized more often than necessary on Tez (in container reuse case)
[ https://issues.apache.org/jira/browse/HIVE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-5883. -- Resolution: Fixed This had been committed to branch before merge. Missed updating the ticket. My apologies. Plan is deserialized more often than necessary on Tez (in container reuse case) --- Key: HIVE-5883 URL: https://issues.apache.org/jira/browse/HIVE-5883 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-5883.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HIVE-5861) Fix exception in multi insert statement on Tez
[ https://issues.apache.org/jira/browse/HIVE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-5861. -- Resolution: Fixed This had been committed to branch before merge. Missed updating the ticket. My apologies. Fix exception in multi insert statement on Tez -- Key: HIVE-5861 URL: https://issues.apache.org/jira/browse/HIVE-5861 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-5861.1.patch Multi insert statements that have multiple group by clauses aren't handled properly in tez. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6218) Stats for row-count not getting updated with Tez insert + dbclass=counter
[ https://issues.apache.org/jira/browse/HIVE-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880805#comment-13880805 ] Gunther Hagleitner commented on HIVE-6218: -- Looked into this. Turns out that the problem is that we're running analyze as MR via Tez' yarn runner. That one drops the required counters on the floor. Best fix is to probably just do the stats computation directly in Tez. I'll get on that. Stats for row-count not getting updated with Tez insert + dbclass=counter - Key: HIVE-6218 URL: https://issues.apache.org/jira/browse/HIVE-6218 Project: Hive Issue Type: Bug Components: Statistics, Tez Affects Versions: 0.13.0 Reporter: Gopal V Assignee: Gunther Hagleitner Priority: Minor Inserting data into hive with Tez, the stats on row-count is not getting updated when using the counter dbclass. To reproduce, run ANALYZE TABLE store_sales COMPUTE STATISTICS; with tez as the execution engine. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6268) Network resource leak with HiveClientCache when using HCatInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880842#comment-13880842 ] Lefty Leverenz commented on HIVE-6268: -- Where should *hcatalog.hive.client.cache.disabled* be documented, besides the release notes? AFAIK, none of the configuration properties in HCatConstants.java are mentioned in the wiki. HCatConstants itself is only mentioned a couple of times in Notification for a New Partition: https://cwiki.apache.org/confluence/display/Hive/HCatalog+Notification#HCatalogNotification-NotificationforaNewPartition. Network resource leak with HiveClientCache when using HCatInputFormat - Key: HIVE-6268 URL: https://issues.apache.org/jira/browse/HIVE-6268 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-6268.patch HCatInputFormat has a cache feature that allows HCat to cache hive client connections to the metastore, so as to not keep reinstantiating a new hive server every single time. This uses a guava cache of hive clients, which only evicts entries from cache on the next write, or by manually managing the cache. So, in a single threaded case, where we reuse the hive client, the cache works well, but in a massively multithreaded case, where each thread might perform one action, and then is never used, there are no more writes to the cache, and all the clients stay alive, thus keeping ports open. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: HIVE-5783.patch The updated patch. This fixes incorrect behavior when using HiveInputSplits. Regression tests have been added as a qtest (parquet_partitioned.q). Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 17262: HIVE-6246: Sign(a) UDF is not supported for decimal type
On Jan. 23, 2014, 10:57 p.m., Mohammad Islam wrote: Overall looks good. Is it possible to add a .q test or append to an existing .q test? Actually unit test such as that provided is preferrable. Additional .q test only prelongs the build process while not providing much value for this case. On Jan. 23, 2014, 10:57 p.m., Mohammad Islam wrote: ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFSign.java, line 31 https://reviews.apache.org/r/17262/diff/1/?file=436440#file436440line31 Minor issue: does testByte() is a good name? A result of CP. Will change it. - Xuefu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17262/#review32677 --- On Jan. 23, 2014, 8:57 p.m., Xuefu Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17262/ --- (Updated Jan. 23, 2014, 8:57 p.m.) Review request for hive. Bugs: HIVE-6246 https://issues.apache.org/jira/browse/HIVE-6246 Repository: hive-git Description --- Please see the JIRA description. It's believed that this has nevered worked. Added a method in UDFSign class to handle Decimal data type to make it work. This method returns INT instead of doulbe to be inline with other data bases. Diffs - common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 729908a ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSign.java 0fef283 ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFSign.java PRE-CREATION Diff: https://reviews.apache.org/r/17262/diff/ Testing --- Unit test is added. Thanks, Xuefu Zhang
Could anyone take a look at the testing issue described in HIVE-6293?
Thanks, Xuefu
[jira] [Commented] (HIVE-6293) Not all minimr tests are executed or reported in precommit test run
[ https://issues.apache.org/jira/browse/HIVE-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881078#comment-13881078 ] Brock Noland commented on HIVE-6293: I just created https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2 to expain this. I linked it from the Precommit page and the FAQ. Not all minimr tests are executed or reported in precommit test run --- Key: HIVE-6293 URL: https://issues.apache.org/jira/browse/HIVE-6293 Project: Hive Issue Type: Bug Components: Testing Infrastructure Affects Versions: 0.13.0 Reporter: Xuefu Zhang It seems that not all q file tests for minimr are executed or reported in the pre-commit test run. Here is an example: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/987/testReport/org.apache.hadoop.hive.cli/TestMinimrCliDriver/ This might be due to ptest because manually running test TestMinimrCliDriver seems executing all tests. My last run shows 38 tests run, with 8 test failures. This is identified in HIVE-5446. It needs to be fixed to have broader coverage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6293) Not all minimr tests are executed or reported in precommit test run
[ https://issues.apache.org/jira/browse/HIVE-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881083#comment-13881083 ] Brock Noland commented on HIVE-6293: If we wanted to improve this, we would either: Make a change here https://github.com/apache/hive/blob/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestParser.java#L99 to parse the property which specifies the minimr tests out of the pom.xml or Move the minimr tests to a different directory. I'd be in favor of the file move. Not all minimr tests are executed or reported in precommit test run --- Key: HIVE-6293 URL: https://issues.apache.org/jira/browse/HIVE-6293 Project: Hive Issue Type: Bug Components: Testing Infrastructure Affects Versions: 0.13.0 Reporter: Xuefu Zhang It seems that not all q file tests for minimr are executed or reported in the pre-commit test run. Here is an example: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/987/testReport/org.apache.hadoop.hive.cli/TestMinimrCliDriver/ This might be due to ptest because manually running test TestMinimrCliDriver seems executing all tests. My last run shows 38 tests run, with 8 test failures. This is identified in HIVE-5446. It needs to be fixed to have broader coverage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Could anyone take a look at the testing issue described in HIVE-6293?
Done On Fri, Jan 24, 2014 at 9:22 AM, Xuefu Zhang xzh...@cloudera.com wrote: Thanks, Xuefu -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
[jira] [Updated] (HIVE-3872) MAP JOIN for VIEW throws NULL pointer exception error
[ https://issues.apache.org/jira/browse/HIVE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yoni Ben-Meshulam updated HIVE-3872: Summary: MAP JOIN for VIEW throws NULL pointer exception error (was: MAP JOIN for VIEW thorws NULL pointer exception error) MAP JOIN for VIEW throws NULL pointer exception error -- Key: HIVE-3872 URL: https://issues.apache.org/jira/browse/HIVE-3872 Project: Hive Issue Type: Bug Components: Views Reporter: Santosh Achhra Assignee: Navis Priority: Critical Labels: HINTS, MAPJOIN Fix For: 0.11.0 Attachments: HIVE-3872.D7965.1.patch I have created a view as shown below. CREATE VIEW V1 AS select /*+ MAPJOIN(t1) ,MAPJOIN(t2) */ t1.f1, t1.f2, t1.f3, t1.f4, t2.f1, t2.f2, t2.f3 from TABLE1 t1 join TABLE t2 on ( t1.f2= t2.f2 and t1.f3 = t2.f3 and t1.f4 = t2.f4 ) group by t1.f1, t1.f2, t1.f3, t1.f4, t2.f1, t2.f2, t2.f3 View get created successfully however when I execute below mentioned SQL or any SQL on the view get NULLPOINTER exception error hive select count (*) from V1; FAILED: NullPointerException null hive Is there anything wrong with the view creation ? Next I created view without MAPJOIN hints CREATE VIEW V1 AS select t1.f1, t1.f2, t1.f3, t1.f4, t2.f1, t2.f2, t2.f3 from TABLE1 t1 join TABLE t2 on ( t1.f2= t2.f2 and t1.f3 = t2.f3 and t1.f4 = t2.f4 ) group by t1.f1, t1.f2, t1.f3, t1.f4, t2.f1, t2.f2, t2.f3 Before executing select SQL I excute set hive.auto.convert.join=true; I am getting beloow mentioned warnings java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... And I see from log that total 5 mapreduce jobs are started however when don't set auto.convert.join to true, I see only 3 mapreduce jobs getting invoked. Total MapReduce jobs = 5 Ended Job = 1116112419, job is filtered out (removed at runtime). Ended Job = -33256989, job is filtered out (removed at runtime). WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881201#comment-13881201 ] Brock Noland commented on HIVE-5843: Great to hear! Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: HIVE-5843-src-only.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 17061: HIVE-5783 - Native Parquet Support in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17061/ --- (Updated Jan. 24, 2014, 5:55 p.m.) Review request for hive. Changes --- Latest patch rebased. Bugs: HIVE-5783 https://issues.apache.org/jira/browse/HIVE-5783 Repository: hive-git Description --- Adds native parquet support hive Diffs (updated) - data/files/parquet_create.txt PRE-CREATION data/files/parquet_partitioned.txt PRE-CREATION pom.xml 41f5337 ql/pom.xml 7087a4c ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableGroupConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveGroupConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/DeepParquetHiveMapInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveArrayInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/StandardParquetHiveMapInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetByteInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetPrimitiveInspectorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetShortInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetStringInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/writable/BigDecimalWritable.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/writable/BinaryWritable.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 13d0a56 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g f83c15d ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g c15c4b5 ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 4147503 ql/src/java/parquet/hive/DeprecatedParquetInputFormat.java PRE-CREATION ql/src/java/parquet/hive/DeprecatedParquetOutputFormat.java PRE-CREATION ql/src/java/parquet/hive/MapredParquetInputFormat.java PRE-CREATION ql/src/java/parquet/hive/MapredParquetOutputFormat.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestHiveSchemaConverter.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetInputFormat.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestAbstractParquetMapInspector.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestDeepParquetHiveMapInspector.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetHiveArrayInspector.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestStandardParquetHiveMapInspector.java PRE-CREATION ql/src/test/queries/clientpositive/parquet_create.q PRE-CREATION ql/src/test/queries/clientpositive/parquet_partitioned.q PRE-CREATION ql/src/test/results/clientpositive/parquet_create.q.out PRE-CREATION
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5783: --- Attachment: HIVE-5783.patch Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881208#comment-13881208 ] Brock Noland commented on HIVE-5783: Uploaded the latest patch rebased on trunk. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6287) batchSize computation in Vectorized ORC reader can cause BufferUnderFlowException when PPD is enabled
[ https://issues.apache.org/jira/browse/HIVE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6287: -- Description: nextBatch() method that computes the batchSize is only aware of stripe boundaries. This will not work when predicate pushdown (PPD) in ORC is enabled as PPD works at row group level (stripe contains multiple row groups). By default, row group stride is 1. When PPD is enabled, some row groups may get eliminated. After row group elimination, disk ranges are computed based on the selected row groups. If batchSize computation is not aware of this, it will lead to BufferUnderFlowException (reading beyond disk range). Following scenario should illustrate it more clearly {code} |- STRIPE 1 | |-- row grp 1 --|-- row grp 2 --|-- row grp 3 --|-- row grp 4 --|-- row grp 5 --| |- diskrange 1 -| |- diskrange 2 -| ^ (marker) {code} diskrange1 will have 2 rows and diskrange 2 will have 1 rows. Since nextBatch() was not aware of row groups and hence the diskranges, it tries to read 1024 values from the end of diskrange 1 where it should only read 2 % 1024 = 544 values. This will result in BufferUnderFlowException. To fix this, a marker is placed at the end of each range and batchSize is computed accordingly. {code}batchSize = Math.min(VectorizedRowBatch.DEFAULT_SIZE, (markerPosition - rowInStripe));{code} was: nextBatch() method that computes the batchSize is only aware of stripe boundaries. This will not work when PPD in ORC is enabled as PPD works at row group level (stripe contains multiple row groups). By default, row group stride is 1. When PPD is enabled, some row groups may get eliminated. After row group elimination, disk ranges are computed based on the selected row groups. If batchSize computation is not aware of this, it will lead to BufferUnderFlowException (reading beyond disk range). Following scenario should illustrate it more clearly {code} |- STRIPE 1 | |-- row grp 1 --|-- row grp 2 --|-- row grp 3 --|-- row grp 4 --|-- row grp 5 --| |- diskrange 1 -| |- diskrange 2 -| ^ (marker) {code} diskrange1 will have 2 rows and diskrange 2 will have 1 rows. Since nextBatch() was not aware of row groups and hence the diskranges, it tries to read 1024 values from the end of diskrange 1 where it should only read 2 % 1024 = 544 values. This will result in BufferUnderFlowException. To fix this, a marker is placed at the end of each range and batchSize is computed accordingly. {code}batchSize = Math.min(VectorizedRowBatch.DEFAULT_SIZE, (markerPosition - rowInStripe));{code} batchSize computation in Vectorized ORC reader can cause BufferUnderFlowException when PPD is enabled - Key: HIVE-6287 URL: https://issues.apache.org/jira/browse/HIVE-6287 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile, vectorization Attachments: HIVE-6287.1.patch, HIVE-6287.WIP.patch nextBatch() method that computes the batchSize is only aware of stripe boundaries. This will not work when predicate pushdown (PPD) in ORC is enabled as PPD works at row group level (stripe contains multiple row groups). By default, row group stride is 1. When PPD is enabled, some row groups may get eliminated. After row group elimination, disk ranges are computed based on the selected row groups. If batchSize computation is not aware of this, it will lead to BufferUnderFlowException (reading beyond disk range). Following scenario should illustrate it more clearly {code} |- STRIPE 1 | |-- row grp 1 --|-- row grp 2 --|-- row grp 3 --|-- row grp 4 --|-- row grp 5 --| |- diskrange 1 -| |- diskrange 2 -| ^ (marker) {code} diskrange1 will have 2 rows and diskrange 2 will have 1 rows. Since nextBatch() was not aware of row groups and hence the diskranges, it tries to read 1024 values from the end of diskrange 1 where it should only read 2 % 1024 = 544 values. This will result in BufferUnderFlowException. To fix this, a
[jira] [Commented] (HIVE-6287) batchSize computation in Vectorized ORC reader can cause BufferUnderFlowException when PPD is enabled
[ https://issues.apache.org/jira/browse/HIVE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881225#comment-13881225 ] Eric Hanson commented on HIVE-6287: --- I think that by PPD you mean predicate pushdown. This was not immediately obvious to me. I edited it into the description. It's a good idea to define acronyms on first use. Thanks! batchSize computation in Vectorized ORC reader can cause BufferUnderFlowException when PPD is enabled - Key: HIVE-6287 URL: https://issues.apache.org/jira/browse/HIVE-6287 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile, vectorization Attachments: HIVE-6287.1.patch, HIVE-6287.WIP.patch nextBatch() method that computes the batchSize is only aware of stripe boundaries. This will not work when predicate pushdown (PPD) in ORC is enabled as PPD works at row group level (stripe contains multiple row groups). By default, row group stride is 1. When PPD is enabled, some row groups may get eliminated. After row group elimination, disk ranges are computed based on the selected row groups. If batchSize computation is not aware of this, it will lead to BufferUnderFlowException (reading beyond disk range). Following scenario should illustrate it more clearly {code} |- STRIPE 1 | |-- row grp 1 --|-- row grp 2 --|-- row grp 3 --|-- row grp 4 --|-- row grp 5 --| |- diskrange 1 -| |- diskrange 2 -| ^ (marker) {code} diskrange1 will have 2 rows and diskrange 2 will have 1 rows. Since nextBatch() was not aware of row groups and hence the diskranges, it tries to read 1024 values from the end of diskrange 1 where it should only read 2 % 1024 = 544 values. This will result in BufferUnderFlowException. To fix this, a marker is placed at the end of each range and batchSize is computed accordingly. {code}batchSize = Math.min(VectorizedRowBatch.DEFAULT_SIZE, (markerPosition - rowInStripe));{code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6298) Add config flag to turn off fetching partition stats
[ https://issues.apache.org/jira/browse/HIVE-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881244#comment-13881244 ] Sergey Shelukhin commented on HIVE-6298: lgtm Add config flag to turn off fetching partition stats Key: HIVE-6298 URL: https://issues.apache.org/jira/browse/HIVE-6298 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6298.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Status: Open (was: Patch Available) Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Attachment: HIVE-6157.03.patch exact same patch, HiveQA won't run Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: --- Status: Patch Available (was: Open) Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
precommit builds backed up
Precommit builds for patches submitted yesterday didn't start. Is there an ETA for when they will start, and do we need to take action to make sure the builds start? Thanks, Eric
Re: precommit builds backed up
Hi, Apache Jenkins (which starts our peoples via https://builds.apache.org/job/PreCommit-Admin/) is having issues. However, Apache Jenkins can be bypassed with the following command curl http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/buildWithParameters?token=$TOKENISSUE_NUM=$JIRA curl http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/buildWithParameters?token=XISSUE_NUM=5783 I can share the token with committers privately. Brock On Fri, Jan 24, 2014 at 12:30 PM, Eric Hanson (BIG DATA) eric.n.han...@microsoft.com wrote: Precommit builds for patches submitted yesterday didn't start. Is there an ETA for when they will start, and do we need to take action to make sure the builds start? Thanks, Eric -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
[jira] [Commented] (HIVE-6248) HCatReader/Writer should hide Hadoop and Hive classes
[ https://issues.apache.org/jira/browse/HIVE-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881293#comment-13881293 ] Ashutosh Chauhan commented on HIVE-6248: +1 I think we should also create a wiki doc for it to document usage of this api. In case an overall page for this already exists somewhere, we need to update it with changes. HCatReader/Writer should hide Hadoop and Hive classes - Key: HIVE-6248 URL: https://issues.apache.org/jira/browse/HIVE-6248 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: HIVE-6248.patch HCat's HCatReader and HCatWriter interfaces expose Hadoop classes Configuration and InputSplit, as well as HCatInputSplit. This exposes users to changes over Hadoop or HCatalog versions. It also makes it harder to some day move this interface to use WebHCat, which we'd like to do. The eventual goal is for this interface to not require any other jars (no Hadoop, Hive, etc.) As a first step to this the references to Hadoop and HCat classes in the interface should be hidden. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6287) batchSize computation in Vectorized ORC reader can cause BufferUnderFlowException when PPD is enabled
[ https://issues.apache.org/jira/browse/HIVE-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-6287: - Attachment: HIVE-6287.2.patch Reuploading the same patch for HIVE QA to pick up. Thanks [~ehans] for the update to description. batchSize computation in Vectorized ORC reader can cause BufferUnderFlowException when PPD is enabled - Key: HIVE-6287 URL: https://issues.apache.org/jira/browse/HIVE-6287 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile, vectorization Attachments: HIVE-6287.1.patch, HIVE-6287.2.patch, HIVE-6287.WIP.patch nextBatch() method that computes the batchSize is only aware of stripe boundaries. This will not work when predicate pushdown (PPD) in ORC is enabled as PPD works at row group level (stripe contains multiple row groups). By default, row group stride is 1. When PPD is enabled, some row groups may get eliminated. After row group elimination, disk ranges are computed based on the selected row groups. If batchSize computation is not aware of this, it will lead to BufferUnderFlowException (reading beyond disk range). Following scenario should illustrate it more clearly {code} |- STRIPE 1 | |-- row grp 1 --|-- row grp 2 --|-- row grp 3 --|-- row grp 4 --|-- row grp 5 --| |- diskrange 1 -| |- diskrange 2 -| ^ (marker) {code} diskrange1 will have 2 rows and diskrange 2 will have 1 rows. Since nextBatch() was not aware of row groups and hence the diskranges, it tries to read 1024 values from the end of diskrange 1 where it should only read 2 % 1024 = 544 values. This will result in BufferUnderFlowException. To fix this, a marker is placed at the end of each range and batchSize is computed accordingly. {code}batchSize = Math.min(VectorizedRowBatch.DEFAULT_SIZE, (markerPosition - rowInStripe));{code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-2558) Timestamp comparisons don't work
[ https://issues.apache.org/jira/browse/HIVE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881331#comment-13881331 ] Jason Dere commented on HIVE-2558: -- This has changed as of HIVE-5204 - the comparison is done as string Timestamp comparisons don't work Key: HIVE-2558 URL: https://issues.apache.org/jira/browse/HIVE-2558 Project: Hive Issue Type: Bug Reporter: Robert Surówka I may be missing something, but: After performing: create table rrt (r timestamp); insert into table rrt select '1970-01-01 00:00:01' from src limit 1; Following queries give undesirable results: select * from rrt where r in ('1970-01-01 00:00:01'); select * from rrt where r in (0); select * from rrt where r = 0; select * from rrt where r = '1970-01-01 00:00:01'; At least for the first two, the reason may be the lack of timestamp in numericTypes Map from FunctionRegistry.java (591) . Yet whether we really want to have a linear hierarchy of primitive types in the end, is another question. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HIVE-2558) Timestamp comparisons don't work
[ https://issues.apache.org/jira/browse/HIVE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere resolved HIVE-2558. -- Resolution: Fixed Fix Version/s: 0.12.0 Timestamp comparisons don't work Key: HIVE-2558 URL: https://issues.apache.org/jira/browse/HIVE-2558 Project: Hive Issue Type: Bug Reporter: Robert Surówka Fix For: 0.12.0 I may be missing something, but: After performing: create table rrt (r timestamp); insert into table rrt select '1970-01-01 00:00:01' from src limit 1; Following queries give undesirable results: select * from rrt where r in ('1970-01-01 00:00:01'); select * from rrt where r in (0); select * from rrt where r = 0; select * from rrt where r = '1970-01-01 00:00:01'; At least for the first two, the reason may be the lack of timestamp in numericTypes Map from FunctionRegistry.java (591) . Yet whether we really want to have a linear hierarchy of primitive types in the end, is another question. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: VOTE: Remove phabricator instructions from hive-development guide (wiki), officially only support Apache's review board.
Good call, I made a very basic fix and noted on the Phabricator page that it's no longer used. Brock On Thu, Jan 23, 2014 at 3:32 PM, Lefty Leverenz leftylever...@gmail.comwrote: The wiki still has Phabricator information, with nothing about Apache's review board. How to Contribute: Review Process https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-ReviewProcess See Phabricator https://cwiki.apache.org/confluence/display/Hive/PhabricatorCodeReview for instructions. - Use Hadoop's code review checklist http://wiki.apache.org/hadoop/CodeReviewChecklist as a rough guide when doing reviews. - In JIRA, use 'Submit Patch' to get your review request into the queue. - If a committer requests changes, set the issue status to 'Resume Progress', then once you're ready, submit an updated patch with necessary fixes and then request another round of review with 'Submit Patch' again. - Once your patch is accepted, be sure to upload a final version which grants rights to the ASF. Would someone please update this section with the appropriate link to review board instructions? I'm a review board newbie (or wanna-be) but can't even get registration to work so I won't volunteer. Should the link go to http://www.reviewboard.org/docs/manual/1.7/? -- Lefty On Sat, Oct 19, 2013 at 12:10 PM, Prasad Mujumdar pras...@cloudera.com wrote: +1 (non-binding) Its good to use a common review tool and one that's has no third party dependency. thanks Prasad On Fri, Oct 18, 2013 at 1:59 PM, Ashutosh Chauhan hashut...@apache.org wrote: 0 IMO phabricator interface is better than review board, but threat of losing comments and patches is also real. Actually, we already lost in few cases, ironically it was RB. Try to read the very first review request posted on HIVE-1634 Ashutosh On Thu, Oct 17, 2013 at 6:55 PM, Yin Huai huaiyin@gmail.com wrote: +1 On Thu, Oct 17, 2013 at 5:51 PM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: +1 Thanks, Gunther. On Thu, Oct 17, 2013 at 2:18 PM, Owen O'Malley omal...@apache.org wrote: Ed, I didn't remember being unable to see revisions without a login. That is uncool. I'll change my vote to +1. -- Owen On Wed, Oct 16, 2013 at 9:08 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Owen, In your issues: https://issues.apache.org/jira/browse/HIVE-5567 When I click this link: REVISION DETAIL https://reviews.facebook.net/D13479 I am prompted for a password. On Wed, Oct 16, 2013 at 11:16 PM, Owen O'Malley owen.omal...@gmail.com wrote: -0 I like phabricator, but it is a pain to setup. It doesn't require a fb account, but clearly it isn't managed or supported by Apache. -- Owen On Oct 16, 2013, at 17:32, Edward Capriolo edlinuxg...@gmail.com wrote: Our wiki has instructions for posting to phabricator for code reviews. https://cwiki.apache.org/confluence/display/Hive/PhabricatorCodeReview Phabricator now requires an external facebook account to review patches, and we have no technical support contact where phabricator is hosted. It also seems like some of the phabricator features are no longer working. Apache has a review board system many people are already using. https://reviews.apache.org/account/login/?next_page=/dashboard/ This vote is to remove the phabricator instructions from the wiki. The instructions will reference review board and that will be the only system that Hive supports for patch review process. +1 is a vote for removing the phabricator instructions from the wiki. Thank you, Edward -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Apache MRUnit - Unit testing MapReduce -
[jira] [Updated] (HIVE-6226) It should be possible to get hadoop, hive, and pig version being used by WebHCat
[ https://issues.apache.org/jira/browse/HIVE-6226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-6226: - Status: Open (was: Patch Available) It should be possible to get hadoop, hive, and pig version being used by WebHCat Key: HIVE-6226 URL: https://issues.apache.org/jira/browse/HIVE-6226 Project: Hive Issue Type: New Feature Components: WebHCat Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: HIVE-6226.patch Calling /version on WebHCat tells the caller the protocol verison, but there is no way to determine the versions of software being run by the applications that WebHCat spawns. I propose to add an end-point: /version/\{module\} where module could be pig, hive, or hadoop. The response will then be: {code} { module : _module_name_, version : _version_string_ } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-5843: - Status: Patch Available (was: Open) Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: HIVE-5843-src-only.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6226) It should be possible to get hadoop, hive, and pig version being used by WebHCat
[ https://issues.apache.org/jira/browse/HIVE-6226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-6226: - Status: Patch Available (was: Open) It should be possible to get hadoop, hive, and pig version being used by WebHCat Key: HIVE-6226 URL: https://issues.apache.org/jira/browse/HIVE-6226 Project: Hive Issue Type: New Feature Components: WebHCat Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: HIVE-6226.2.patch, HIVE-6226.patch Calling /version on WebHCat tells the caller the protocol verison, but there is no way to determine the versions of software being run by the applications that WebHCat spawns. I propose to add an end-point: /version/\{module\} where module could be pig, hive, or hadoop. The response will then be: {code} { module : _module_name_, version : _version_string_ } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6226) It should be possible to get hadoop, hive, and pig version being used by WebHCat
[ https://issues.apache.org/jira/browse/HIVE-6226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-6226: - Attachment: HIVE-6226.2.patch New version of the patch that has three separate URLs, per Eugene's feedback. I don't think the .q test that failed on the last run of the build bot is related, as I didn't make any changes anywhere close to that code. It should be possible to get hadoop, hive, and pig version being used by WebHCat Key: HIVE-6226 URL: https://issues.apache.org/jira/browse/HIVE-6226 Project: Hive Issue Type: New Feature Components: WebHCat Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: HIVE-6226.2.patch, HIVE-6226.patch Calling /version on WebHCat tells the caller the protocol verison, but there is no way to determine the versions of software being run by the applications that WebHCat spawns. I propose to add an end-point: /version/\{module\} where module could be pig, hive, or hadoop. The response will then be: {code} { module : _module_name_, version : _version_string_ } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6304) Update HCatReader/Writer docs to reflect recent changes
Alan Gates created HIVE-6304: Summary: Update HCatReader/Writer docs to reflect recent changes Key: HIVE-6304 URL: https://issues.apache.org/jira/browse/HIVE-6304 Project: Hive Issue Type: Improvement Components: Documentation Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 HIVE-6248 made changes to the HCatReader and HCatWriter classes. Those changes need to be reflect in the [HCatReader/Writer docs|https://cwiki.apache.org/confluence/display/Hive/HCatalog+ReaderWriter] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6248) HCatReader/Writer should hide Hadoop and Hive classes
[ https://issues.apache.org/jira/browse/HIVE-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-6248: - Resolution: Fixed Release Note: HCatReader and HCatWriter API changed. See https://cwiki.apache.org/confluence/display/Hive/HCatalog+ReaderWriter for details. Status: Resolved (was: Patch Available) Patch committed. Thanks Ashutosh for the review. I agree we need to update the docs. Filed HIVE-6304 to track those changes. HCatReader/Writer should hide Hadoop and Hive classes - Key: HIVE-6248 URL: https://issues.apache.org/jira/browse/HIVE-6248 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: HIVE-6248.patch HCat's HCatReader and HCatWriter interfaces expose Hadoop classes Configuration and InputSplit, as well as HCatInputSplit. This exposes users to changes over Hadoop or HCatalog versions. It also makes it harder to some day move this interface to use WebHCat, which we'd like to do. The eventual goal is for this interface to not require any other jars (no Hadoop, Hive, etc.) As a first step to this the references to Hadoop and HCat classes in the interface should be hidden. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881418#comment-13881418 ] Hive QA commented on HIVE-5783: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12625077/HIVE-5783.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 4981 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1003/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1003/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12625077 Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5728) Make ORC InputFormat/OutputFormat usable outside Hive
[ https://issues.apache.org/jira/browse/HIVE-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881423#comment-13881423 ] Brock Noland commented on HIVE-5728: This patch is now causing everyones precommit tests to show failing tests. Agreed with Navis, let's make sure that all commits have precommit tests. Make ORC InputFormat/OutputFormat usable outside Hive - Key: HIVE-5728 URL: https://issues.apache.org/jira/browse/HIVE-5728 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.13.0 Attachments: HIVE-5728-1.patch, HIVE-5728-2.patch, HIVE-5728-3.patch, HIVE-5728-4.patch, HIVE-5728-5.patch, HIVE-5728-6.patch, HIVE-5728-7.patch, HIVE-5728-8.patch ORC InputFormat/OutputFormat is currently not usable outside Hive. There are several issues need to solve: 1. Several class is not public, eg: OrcStruct 2. There is no InputFormat/OutputFormat for new api (Some tools such as Pig need new api) 3. Has no way to push WriteOption to OutputFormat outside Hive -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881424#comment-13881424 ] Brock Noland commented on HIVE-5783: Those tests failed due to HIVE-5728 (which was committed without testing) and will be fixed via HIVE-6302. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6302) annotate_stats_*.q are failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881425#comment-13881425 ] Brock Noland commented on HIVE-6302: FYI apache jenkins is haing trouble so I kicked off the build for this manually. annotate_stats_*.q are failing on trunk --- Key: HIVE-6302 URL: https://issues.apache.org/jira/browse/HIVE-6302 Project: Hive Issue Type: Task Components: Tests Reporter: Navis Assignee: Navis Attachments: HIVE-6302.1.patch.txt I'm checking it out -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881435#comment-13881435 ] Thejas M Nair commented on HIVE-6013: - Harish, The doc seems to suggest that quoted identfiers are supported only for column names. But it seems to work when I try it with user name in grant statement. Is that not expected to work ? - eg - {code} grant all on x to user `user-qa`; show grant user `user-qa` on table x; {code} Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, HIVE-6013.6.patch, HIVE-6013.7.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6293) Not all minimr tests are executed or reported in precommit test run
[ https://issues.apache.org/jira/browse/HIVE-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881444#comment-13881444 ] Thejas M Nair commented on HIVE-6293: - +1 for the file move. Not all minimr tests are executed or reported in precommit test run --- Key: HIVE-6293 URL: https://issues.apache.org/jira/browse/HIVE-6293 Project: Hive Issue Type: Bug Components: Testing Infrastructure Affects Versions: 0.13.0 Reporter: Xuefu Zhang It seems that not all q file tests for minimr are executed or reported in the pre-commit test run. Here is an example: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/987/testReport/org.apache.hadoop.hive.cli/TestMinimrCliDriver/ This might be due to ptest because manually running test TestMinimrCliDriver seems executing all tests. My last run shows 38 tests run, with 8 test failures. This is identified in HIVE-5446. It needs to be fixed to have broader coverage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6257) Add more unit tests for high-precision Decimal128 arithmetic
[ https://issues.apache.org/jira/browse/HIVE-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6257: -- Attachment: hive-6257.03.patch Finished random tests for high-precision decimal add, subtract, multiply, and divide on Decimal128. Add more unit tests for high-precision Decimal128 arithmetic Key: HIVE-6257 URL: https://issues.apache.org/jira/browse/HIVE-6257 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Priority: Minor Attachments: HIVE-6257.02.patch, hive-6257.03.patch Add more unit tests for high-precision Decimal128 arithmetic, with arguments close to or at 38 digit limit. Consider some random stress tests for broader coverage. Coverage is pretty good now (after HIVE-6243) for precision up to about 18. This is to go beyond that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6257) Add more unit tests for high-precision Decimal128 arithmetic
[ https://issues.apache.org/jira/browse/HIVE-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6257: -- Attachment: (was: hive-6257.03.patch) Add more unit tests for high-precision Decimal128 arithmetic Key: HIVE-6257 URL: https://issues.apache.org/jira/browse/HIVE-6257 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Priority: Minor Attachments: HIVE-6257.02.patch, HIVE-6257.03.patch Add more unit tests for high-precision Decimal128 arithmetic, with arguments close to or at 38 digit limit. Consider some random stress tests for broader coverage. Coverage is pretty good now (after HIVE-6243) for precision up to about 18. This is to go beyond that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881467#comment-13881467 ] Harish Butani commented on HIVE-6013: - At the language level any identifier can be quoted. The change made was at the Lexer level. Special characters is probably ok in Usernames. I didn't want to make this assertion because there maybe code in the metadata layer that doesn't like special characters. For e.g we know for tableNames this is an issue. If you don't anticipate an issue, we can say that special characters are supported for Usernames. Hopefully this can be extended to role/privilege names also. Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, HIVE-6013.6.patch, HIVE-6013.7.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6257) Add more unit tests for high-precision Decimal128 arithmetic
[ https://issues.apache.org/jira/browse/HIVE-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6257: -- Attachment: HIVE-6257.03.patch fixed name capitalization Add more unit tests for high-precision Decimal128 arithmetic Key: HIVE-6257 URL: https://issues.apache.org/jira/browse/HIVE-6257 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Priority: Minor Attachments: HIVE-6257.02.patch, HIVE-6257.03.patch Add more unit tests for high-precision Decimal128 arithmetic, with arguments close to or at 38 digit limit. Consider some random stress tests for broader coverage. Coverage is pretty good now (after HIVE-6243) for precision up to about 18. This is to go beyond that. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6305) test use of quoted identifiers in user/role names
Thejas M Nair created HIVE-6305: --- Summary: test use of quoted identifiers in user/role names Key: HIVE-6305 URL: https://issues.apache.org/jira/browse/HIVE-6305 Project: Hive Issue Type: Bug Components: Authorization Reporter: Thejas M Nair Tests need to be added to verify that quoted identifiers can be used with user and role names. For example - {code} grant all on x to user `user-qa`; show grant user `user-qa` on table x; {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881486#comment-13881486 ] Thejas M Nair commented on HIVE-6013: - I have created a jira to test it out with users and role names - HIVE-6305 . I think it should work fine. [~leftylev], is the documentation of this jira already part of any wiki page ? I had trouble finding it. Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, HIVE-6013.6.patch, HIVE-6013.7.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 17005: Vectorized reader for DECIMAL datatype for ORC format.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17005/ --- (Updated Jan. 24, 2014, 10:28 p.m.) Review request for hive and Eric Hanson. Bugs: HIVE-6178 https://issues.apache.org/jira/browse/HIVE-6178 Repository: hive-git Description --- vectorized reader for DECIMAL datatype for ORC format. Diffs (updated) - common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java 3939511 common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java d71ebb3 common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java fbb2aa0 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/DecimalColumnVector.java 23564bb ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 0df82b9 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java 0d5b7ff Diff: https://reviews.apache.org/r/17005/diff/ Testing --- Thanks, Jitendra Pandey
[jira] [Updated] (HIVE-6178) Implement vectorized reader for DECIMAL datatype for ORC format.
[ https://issues.apache.org/jira/browse/HIVE-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-6178: --- Attachment: HIVE-6178.2.patch Implement vectorized reader for DECIMAL datatype for ORC format. Key: HIVE-6178 URL: https://issues.apache.org/jira/browse/HIVE-6178 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6178.1.patch, HIVE-6178.2.patch Implement vectorized reader for DECIMAL datatype for ORC format. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6178) Implement vectorized reader for DECIMAL datatype for ORC format.
[ https://issues.apache.org/jira/browse/HIVE-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881492#comment-13881492 ] Jitendra Nath Pandey commented on HIVE-6178: Uploaded a new patch addressing a few comments. I have also posted an explanation for handling variable scales. Implement vectorized reader for DECIMAL datatype for ORC format. Key: HIVE-6178 URL: https://issues.apache.org/jira/browse/HIVE-6178 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6178.1.patch, HIVE-6178.2.patch Implement vectorized reader for DECIMAL datatype for ORC format. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6178) Implement vectorized reader for DECIMAL datatype for ORC format.
[ https://issues.apache.org/jira/browse/HIVE-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-6178: --- Status: Open (was: Patch Available) Implement vectorized reader for DECIMAL datatype for ORC format. Key: HIVE-6178 URL: https://issues.apache.org/jira/browse/HIVE-6178 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6178.1.patch, HIVE-6178.2.patch Implement vectorized reader for DECIMAL datatype for ORC format. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881495#comment-13881495 ] Lefty Leverenz commented on HIVE-6013: -- Not in the wiki yet. I'll bump its priority to the top. Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, HIVE-6013.6.patch, HIVE-6013.7.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6243) error in high-precision division for Decimal128
[ https://issues.apache.org/jira/browse/HIVE-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881502#comment-13881502 ] Hive QA commented on HIVE-6243: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12624837/HIVE-6243.02.patch {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 4952 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union org.apache.hive.hcatalog.hbase.TestHiveHBaseStorageHandler.testTableCreateDrop {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1006/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1006/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12624837 error in high-precision division for Decimal128 --- Key: HIVE-6243 URL: https://issues.apache.org/jira/browse/HIVE-6243 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-6243.01.patch, HIVE-6243.02.patch, divide-error.01.patch a = 213474114411690 b = 5062120663 a * b = 1080631725579042037750470 (a * b) / b == actual: 251599050984618 expected: 213474114411690 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6226) It should be possible to get hadoop, hive, and pig version being used by WebHCat
[ https://issues.apache.org/jira/browse/HIVE-6226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881521#comment-13881521 ] Eugene Koifman commented on HIVE-6226: -- +1 It should be possible to get hadoop, hive, and pig version being used by WebHCat Key: HIVE-6226 URL: https://issues.apache.org/jira/browse/HIVE-6226 Project: Hive Issue Type: New Feature Components: WebHCat Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: HIVE-6226.2.patch, HIVE-6226.patch Calling /version on WebHCat tells the caller the protocol verison, but there is no way to determine the versions of software being run by the applications that WebHCat spawns. I propose to add an end-point: /version/\{module\} where module could be pig, hive, or hadoop. The response will then be: {code} { module : _module_name_, version : _version_string_ } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-4764) support the authentication modes for thrift over http transport for HS2
[ https://issues.apache.org/jira/browse/HIVE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-4764: --- Description: This subtask covers support for following functionality for thrift over http transport in hive server2 - Support for LDAP,kerberos, custom authorization modes was: This subtask covers support for following functionality for thrift over http transport in hive server2 - Support for LDAP,kerberos, custom authorization modes - Support for doAs functionality. support the authentication modes for thrift over http transport for HS2 --- Key: HIVE-4764 URL: https://issues.apache.org/jira/browse/HIVE-4764 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Thejas M Nair Assignee: Vaibhav Gumashta Fix For: 0.13.0 This subtask covers support for following functionality for thrift over http transport in hive server2 - Support for LDAP,kerberos, custom authorization modes -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6306) HiveServer2 running in http mode should support for doAs functionality
Vaibhav Gumashta created HIVE-6306: -- Summary: HiveServer2 running in http mode should support for doAs functionality Key: HIVE-6306 URL: https://issues.apache.org/jira/browse/HIVE-6306 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Currently http mode does not support doAs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6306) HiveServer2 running in http mode should support for doAs functionality
[ https://issues.apache.org/jira/browse/HIVE-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6306: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-4752 HiveServer2 running in http mode should support for doAs functionality -- Key: HIVE-6306 URL: https://issues.apache.org/jira/browse/HIVE-6306 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Currently http mode does not support doAs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HIVE-4026) Add HTTP support to HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta resolved HIVE-4026. Resolution: Duplicate Duplicate of HIVE-4752 Add HTTP support to HiveServer2 --- Key: HIVE-4026 URL: https://issues.apache.org/jira/browse/HIVE-4026 Project: Hive Issue Type: New Feature Components: CLI, Server Infrastructure Reporter: Mike Liddell Assignee: Mike Liddell Attachments: HIVE-4026.patch Add HTTP as endpoint option for HiveServer2. This supports environments for which TCP connectivity is inconvenient or impossible. One key scenario is beeline connecting to a HTTPS proxy/gateway which forwards to HS2-HTTP. Due to the proxy/gateway scenario being most secure, support for HS2 HTTPS has not been added. new behavior: new configuration options to use HTTP server mode rather than TCP http mode uses Jetty server/servlets new beeline client URI parsing and HTTP transport behavior. Usage: (1) TCP-mode: beeline !connect jdbc:hive2://server:port/ user password (2) HTTP-mode: beeline !connect jdbc:hive2:http://server:port/path/../ user password (3) via HTTPS proxy: beeline !connect jdbc:hive2:https://server:port/path/../ user password -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HIVE-4752) Add support for hs2 api to use thrift over http
[ https://issues.apache.org/jira/browse/HIVE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta reassigned HIVE-4752: -- Assignee: Vaibhav Gumashta (was: Thejas M Nair) Add support for hs2 api to use thrift over http --- Key: HIVE-4752 URL: https://issues.apache.org/jira/browse/HIVE-4752 Project: Hive Issue Type: New Feature Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Vaibhav Gumashta Fix For: 0.13.0 Hiveserver2 acts as service on the cluster for external applications. One way to implement access control to services on a hadoop cluster to have a gateway server authorizes service requests before forwarding them to the server. The [knox project | http://wiki.apache.org/incubator/knox] has taken this approach to simplify cluster security management. Other services on hadoop cluster such as webhdfs and webhcat already use HTTP. Having hiveserver2 also support thrift over http transport will enable securing hiveserver2 as well using the same approach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6234) Implement fast vectorized InputFormat extension for text files
[ https://issues.apache.org/jira/browse/HIVE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6234: -- Attachment: Vectorized Text InputFormat design.pdf Vectorized Text InputFormat design.docx Attaching version 01 of design specification for this feature. Implement fast vectorized InputFormat extension for text files -- Key: HIVE-6234 URL: https://issues.apache.org/jira/browse/HIVE-6234 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Attachments: Vectorized Text InputFormat design.docx, Vectorized Text InputFormat design.pdf Implement support for vectorized scan input of text files (plain text with configurable record and field separators). This should work for CSV files, tab delimited files, etc. The goal is to provide high-performance reading of these files using vectorized scans, and also to do it as an extension of existing Hive. Then, if vectorized query is enabled, existing tables based on text files will be able to benefit immediately without the need to use a different input format. After upgrading to new Hive bits that support this, faster, vectorized processing over existing text tables should just work, when vectorization is enabled. Another goal is to go beyond a simple layering of vectorized row batch iterator over the top of the existing row iterator. It should be possible to, say, read a chunk of data into a byte buffer (several thousand or even million rows), and then read data from it into vectorized row batches directly. Object creations should be minimized to save allocation time and GC overhead. If it is possible to save CPU for values like dates and numbers by caching the translation from string to the final data type, that should ideally be implemented. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6263) Avoid sending input files multiple times on Tez
[ https://issues.apache.org/jira/browse/HIVE-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881554#comment-13881554 ] Hive QA commented on HIVE-6263: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12624934/HIVE-6263.3.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 4949 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1007/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1007/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12624934 Avoid sending input files multiple times on Tez --- Key: HIVE-6263 URL: https://issues.apache.org/jira/browse/HIVE-6263 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6263.1.patch, HIVE-6263.2.patch, HIVE-6263.3.patch Input paths can be recontructed from the plan. No need to send them in the job conf as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6307) completed field description should be clarified.
Eugene Koifman created HIVE-6307: Summary: completed field description should be clarified. Key: HIVE-6307 URL: https://issues.apache.org/jira/browse/HIVE-6307 Project: Hive Issue Type: Bug Components: Documentation, WebHCat Affects Versions: 0.12.0 Reporter: Eugene Koifman https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Job explains the fields in the JSON document which contains status information for a particular job. completed field is set once the process that the Launcher task launched returns. For example, if user submitted a M/R job via webhcat, completed will be set to done once the hadoop jar command that the Launcher invokes exits. If one is looking for status of the job itself, the fields inside status element should be consulted (e.g. jobComplete or runState) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6307) completed field description should be clarified.
[ https://issues.apache.org/jira/browse/HIVE-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-6307: - Description: https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Job explains the fields in the JSON document which contains status information for a particular job. completed field is set once the process that the Launcher task launched returns. For example, if user submitted a M/R job via webhcat, completed will be set to done once the hadoop jar command that the Launcher invokes exits. If one is looking for status of the job itself, the fields inside status element should be consulted (e.g. jobComplete or runState). Current doc is not clear and may mislead WebHCat user into thinking completed is a property of the job itself. was: https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Job explains the fields in the JSON document which contains status information for a particular job. completed field is set once the process that the Launcher task launched returns. For example, if user submitted a M/R job via webhcat, completed will be set to done once the hadoop jar command that the Launcher invokes exits. If one is looking for status of the job itself, the fields inside status element should be consulted (e.g. jobComplete or runState) completed field description should be clarified. -- Key: HIVE-6307 URL: https://issues.apache.org/jira/browse/HIVE-6307 Project: Hive Issue Type: Bug Components: Documentation, WebHCat Affects Versions: 0.12.0 Reporter: Eugene Koifman https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Job explains the fields in the JSON document which contains status information for a particular job. completed field is set once the process that the Launcher task launched returns. For example, if user submitted a M/R job via webhcat, completed will be set to done once the hadoop jar command that the Launcher invokes exits. If one is looking for status of the job itself, the fields inside status element should be consulted (e.g. jobComplete or runState). Current doc is not clear and may mislead WebHCat user into thinking completed is a property of the job itself. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6293) Not all minimr tests are executed or reported in precommit test run
[ https://issues.apache.org/jira/browse/HIVE-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881598#comment-13881598 ] Xuefu Zhang commented on HIVE-6293: --- Even if we copy/move the miniMR tests to a different directory, we still need to modify ptest so that it knows where to pick it up, right? What sort of change is required? I have temporarily modified the test property file to be consistent with pom file w.r.t miniMR test. Let's see how many of them are going to fail. Not all minimr tests are executed or reported in precommit test run --- Key: HIVE-6293 URL: https://issues.apache.org/jira/browse/HIVE-6293 Project: Hive Issue Type: Bug Components: Testing Infrastructure Affects Versions: 0.13.0 Reporter: Xuefu Zhang It seems that not all q file tests for minimr are executed or reported in the pre-commit test run. Here is an example: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/987/testReport/org.apache.hadoop.hive.cli/TestMinimrCliDriver/ This might be due to ptest because manually running test TestMinimrCliDriver seems executing all tests. My last run shows 38 tests run, with 8 test failures. This is identified in HIVE-5446. It needs to be fixed to have broader coverage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 17005: Vectorized reader for DECIMAL datatype for ORC format.
On Jan. 20, 2014, 6:56 p.m., Eric Hanson wrote: ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java, line 1119 https://reviews.apache.org/r/17005/diff/1/?file=425358#file425358line1119 It seems odd that we're reading from a scaleStream because the scale should be the same for every value in the column. Is this necessary? The orc decimal encoding currently supports arbitrary scale. Although, hive doesn't allow variable scales, the orc format allows it. We should have another decimal encoding in hive optimized for specific precision and scale, and correspondingly we will have to add additional vectorized reader as well for decimal. Since the reader is part of ORC code, I think it should also allow reading variable scales as per the encoding. If that doesn't match the scale in the schema, then we definitely have a data/schema corruption issue. On Jan. 20, 2014, 6:56 p.m., Eric Hanson wrote: ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java, line 1123 https://reviews.apache.org/r/17005/diff/1/?file=425358#file425358line1123 If any scale values are different inside a single DecimalColumnVector, I think that could cause unpredictable or wrong results. Later operations on DecimalColumnVector take the scale from the columnvector sometimes, not each individual object. If the scale in the data is different from the scale assumed in the vectorized reader, we would still have erroneous results. - Jitendra --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17005/#review32299 --- On Jan. 24, 2014, 10:28 p.m., Jitendra Pandey wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17005/ --- (Updated Jan. 24, 2014, 10:28 p.m.) Review request for hive and Eric Hanson. Bugs: HIVE-6178 https://issues.apache.org/jira/browse/HIVE-6178 Repository: hive-git Description --- vectorized reader for DECIMAL datatype for ORC format. Diffs - common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java 3939511 common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java d71ebb3 common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java fbb2aa0 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/DecimalColumnVector.java 23564bb ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 0df82b9 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java 0d5b7ff Diff: https://reviews.apache.org/r/17005/diff/ Testing --- Thanks, Jitendra Pandey
[jira] [Commented] (HIVE-6302) annotate_stats_*.q are failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881611#comment-13881611 ] Gunther Hagleitner commented on HIVE-6302: -- [~navis] [~owen.omalley] wanted to keep HiveConf out of ORC to keep the dependencies to hadoop core. (That way you can use ORC outside hive, eg pig). It'd be good to have him weigh in at least. I think we should revert HIVE-5728 until we have a fix so that the trunk is healthy again. annotate_stats_*.q are failing on trunk --- Key: HIVE-6302 URL: https://issues.apache.org/jira/browse/HIVE-6302 Project: Hive Issue Type: Task Components: Tests Reporter: Navis Assignee: Navis Attachments: HIVE-6302.1.patch.txt I'm checking it out -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881624#comment-13881624 ] Carl Steinbach commented on HIVE-5783: -- I noticed that this SerDe doesn't support several of Hive's types: binary, timestamp, date, and probably a couple others as well. If there other known limitations it would be helpful to list them. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Reopened] (HIVE-5728) Make ORC InputFormat/OutputFormat usable outside Hive
[ https://issues.apache.org/jira/browse/HIVE-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair reopened HIVE-5728: - Make ORC InputFormat/OutputFormat usable outside Hive - Key: HIVE-5728 URL: https://issues.apache.org/jira/browse/HIVE-5728 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.13.0 Attachments: HIVE-5728-1.patch, HIVE-5728-2.patch, HIVE-5728-3.patch, HIVE-5728-4.patch, HIVE-5728-5.patch, HIVE-5728-6.patch, HIVE-5728-7.patch, HIVE-5728-8.patch ORC InputFormat/OutputFormat is currently not usable outside Hive. There are several issues need to solve: 1. Several class is not public, eg: OrcStruct 2. There is no InputFormat/OutputFormat for new api (Some tools such as Pig need new api) 3. Has no way to push WriteOption to OutputFormat outside Hive -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5728) Make ORC InputFormat/OutputFormat usable outside Hive
[ https://issues.apache.org/jira/browse/HIVE-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881628#comment-13881628 ] Thejas M Nair commented on HIVE-5728: - Reverted this patch and re-opened the jira, as we need a different fix than one in HIVE-6302 . Make ORC InputFormat/OutputFormat usable outside Hive - Key: HIVE-5728 URL: https://issues.apache.org/jira/browse/HIVE-5728 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.13.0 Attachments: HIVE-5728-1.patch, HIVE-5728-2.patch, HIVE-5728-3.patch, HIVE-5728-4.patch, HIVE-5728-5.patch, HIVE-5728-6.patch, HIVE-5728-7.patch, HIVE-5728-8.patch ORC InputFormat/OutputFormat is currently not usable outside Hive. There are several issues need to solve: 1. Several class is not public, eg: OrcStruct 2. There is no InputFormat/OutputFormat for new api (Some tools such as Pig need new api) 3. Has no way to push WriteOption to OutputFormat outside Hive -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.
Alexander Behm created HIVE-6308: Summary: COLUMNS_V2 Metastore table not populated for tables created without an explicit column list. Key: HIVE-6308 URL: https://issues.apache.org/jira/browse/HIVE-6308 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.10.0 Reporter: Alexander Behm Consider this example table: CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc'); When I try to run an ANALYZE TABLE for computing column stats on any of the columns, then I get: org.apache.hadoop.hive.ql.metadata.HiveException: NoSuchObjectException(message:Column o_orderpriority for which stats gathering is requested doesn't exist.) at org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't populated properly during the table creation. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5181) RetryingRawStore should not retry on logical failures (e.g. from commit)
[ https://issues.apache.org/jira/browse/HIVE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881629#comment-13881629 ] Jayesh commented on HIVE-5181: -- just wanted to report my finding...as I see this is been committed to hive 0.13 I didnt get this issue resolved by the patch provided here for the same issue I am having in hive-0.12 looks like the problem lies somewhere the way DB pool dealing with transaction, and switching to DBCP (HIVE-4996.patch) fixed and in some way confirmed it. Thanks Jay RetryingRawStore should not retry on logical failures (e.g. from commit) Key: HIVE-5181 URL: https://issues.apache.org/jira/browse/HIVE-5181 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Prasad Mujumdar Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5181.1.patch, HIVE-5181.3.patch RetryingRawStore retries calls. Some method (e.g. drop_table_core in HiveMetaStore) explicitly call openTransaction and commitTransaction on RawStore. When the commit call fails due to some real issue, it is retried, and instead of a real cause for failure one gets some bogus exception about transaction open count. I doesn't make sense to retry logical errors, especially not from commitTransaction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881630#comment-13881630 ] Gunther Hagleitner commented on HIVE-6157: -- [~prasanth_j] do you want to also take a look? This would effect the stats annotation too. Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.03.patch, HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-4996) unbalanced calls to openTransaction/commitTransaction
[ https://issues.apache.org/jira/browse/HIVE-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881631#comment-13881631 ] Jayesh commented on HIVE-4996: -- just want to add my experience and testing with this bug, - looks like BoneCP has bug. - switching to DBCP resolved this issue for me. unbalanced calls to openTransaction/commitTransaction - Key: HIVE-4996 URL: https://issues.apache.org/jira/browse/HIVE-4996 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0, 0.11.0, 0.12.0 Environment: hiveserver1 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Reporter: wangfeng Priority: Critical Labels: hive, metastore Attachments: hive-4996.path Original Estimate: 504h Remaining Estimate: 504h when we used hiveserver1 based on hive-0.10.0, we found the Exception thrown.It was: FAILED: Error in metadata: MetaException(message:java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that the re are unbalanced calls to openTransaction/commitTransaction) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask help -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6293) Not all minimr tests are executed or reported in precommit test run
[ https://issues.apache.org/jira/browse/HIVE-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881632#comment-13881632 ] Brock Noland commented on HIVE-6293: Ptest2 has configurable directories so no change would be required. Not all minimr tests are executed or reported in precommit test run --- Key: HIVE-6293 URL: https://issues.apache.org/jira/browse/HIVE-6293 Project: Hive Issue Type: Bug Components: Testing Infrastructure Affects Versions: 0.13.0 Reporter: Xuefu Zhang It seems that not all q file tests for minimr are executed or reported in the pre-commit test run. Here is an example: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/987/testReport/org.apache.hadoop.hive.cli/TestMinimrCliDriver/ This might be due to ptest because manually running test TestMinimrCliDriver seems executing all tests. My last run shows 38 tests run, with 8 test failures. This is identified in HIVE-5446. It needs to be fixed to have broader coverage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 17002: alter table partition column throws NPE in authorization
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17002/#review32774 --- Ship it! Ship It! - Thejas Nair On Jan. 24, 2014, 1:02 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17002/ --- (Updated Jan. 24, 2014, 1:02 a.m.) Review request for hive. Bugs: HIVE-6205 https://issues.apache.org/jira/browse/HIVE-6205 Repository: hive-git Description --- alter table alter_coltype partition column (dt int); {noformat} 2014-01-15 15:53:40,364 ERROR ql.Driver (SessionState.java:printError(457)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:599) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:996) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1039) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:932) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:922) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) {noformat} Operation for TOK_ALTERTABLE_ALTERPARTS is not defined. Diffs - hcatalog/core/src/main/java/org/apache/hcatalog/cli/SemanticAnalysis/HCatSemanticAnalyzer.java 1d4a9a1 hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/HCatSemanticAnalyzer.java 97973db ql/src/java/org/apache/hadoop/hive/ql/Driver.java 5af1ec6 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0e2d555 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g c15c4b5 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 835a654 ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java fe88a50 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveOperationType.java e20b183 ql/src/test/results/clientnegative/alter_partition_coltype_2columns.q.out e1f9a27 ql/src/test/results/clientpositive/alter_partition_coltype.q.out 685bf88 Diff: https://reviews.apache.org/r/17002/diff/ Testing --- Thanks, Navis Ryu
[jira] [Commented] (HIVE-6205) alter table partition column throws NPE in authorization
[ https://issues.apache.org/jira/browse/HIVE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881644#comment-13881644 ] Thejas M Nair commented on HIVE-6205: - +1 alter table partition column throws NPE in authorization -- Key: HIVE-6205 URL: https://issues.apache.org/jira/browse/HIVE-6205 Project: Hive Issue Type: Bug Components: Authorization Reporter: Navis Assignee: Navis Attachments: HIVE-6205.1.patch.txt, HIVE-6205.2.patch.txt, HIVE-6205.3.patch.txt, HIVE-6205.4.patch.txt, HIVE-6205.5.patch.txt alter table alter_coltype partition column (dt int); {noformat} 2014-01-15 15:53:40,364 ERROR ql.Driver (SessionState.java:printError(457)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:599) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:996) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1039) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:932) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:922) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) {noformat} Operation for TOK_ALTERTABLE_ALTERPARTS is not defined. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6298) Add config flag to turn off fetching partition stats
[ https://issues.apache.org/jira/browse/HIVE-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881651#comment-13881651 ] Hive QA commented on HIVE-6298: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12624956/HIVE-6298.1.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 4963 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_import_exported_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_file_with_header_footer_negative {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1008/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1008/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12624956 Add config flag to turn off fetching partition stats Key: HIVE-6298 URL: https://issues.apache.org/jira/browse/HIVE-6298 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6298.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6261) Update metadata.q.out file for tez (after change to .q file)
[ https://issues.apache.org/jira/browse/HIVE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6261: - Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Update metadata.q.out file for tez (after change to .q file) Key: HIVE-6261 URL: https://issues.apache.org/jira/browse/HIVE-6261 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.13.0 Attachments: HIVE-6261.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6260) Compress plan when sending via RPC (Tez)
[ https://issues.apache.org/jira/browse/HIVE-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6260: - Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks for the review Vikram! Compress plan when sending via RPC (Tez) Key: HIVE-6260 URL: https://issues.apache.org/jira/browse/HIVE-6260 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.13.0 Attachments: HIVE-6260.1.patch When trying to send plan via RPC it's helpful to compress the payload. That way more potential plans can be sent (size limit). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6263) Avoid sending input files multiple times on Tez
[ https://issues.apache.org/jira/browse/HIVE-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881659#comment-13881659 ] Gunther Hagleitner commented on HIVE-6263: -- Test failures are unrelated. Avoid sending input files multiple times on Tez --- Key: HIVE-6263 URL: https://issues.apache.org/jira/browse/HIVE-6263 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6263.1.patch, HIVE-6263.2.patch, HIVE-6263.3.patch Input paths can be recontructed from the plan. No need to send them in the job conf as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6263) Avoid sending input files multiple times on Tez
[ https://issues.apache.org/jira/browse/HIVE-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881664#comment-13881664 ] Gunther Hagleitner commented on HIVE-6263: -- Committed to trunk. Thanks for the review Vikram! Avoid sending input files multiple times on Tez --- Key: HIVE-6263 URL: https://issues.apache.org/jira/browse/HIVE-6263 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.13.0 Attachments: HIVE-6263.1.patch, HIVE-6263.2.patch, HIVE-6263.3.patch Input paths can be recontructed from the plan. No need to send them in the job conf as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6263) Avoid sending input files multiple times on Tez
[ https://issues.apache.org/jira/browse/HIVE-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6263: - Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Avoid sending input files multiple times on Tez --- Key: HIVE-6263 URL: https://issues.apache.org/jira/browse/HIVE-6263 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.13.0 Attachments: HIVE-6263.1.patch, HIVE-6263.2.patch, HIVE-6263.3.patch Input paths can be recontructed from the plan. No need to send them in the job conf as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6183) Implement vectorized type cast from/to decimal(p, s)
[ https://issues.apache.org/jira/browse/HIVE-6183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881673#comment-13881673 ] Hive QA commented on HIVE-6183: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12624946/HIVE-6183.10.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 4969 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_import_exported_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_file_with_header_footer_negative org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1009/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1009/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12624946 Implement vectorized type cast from/to decimal(p, s) Key: HIVE-6183 URL: https://issues.apache.org/jira/browse/HIVE-6183 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-6183.07.patch, HIVE-6183.08.patch, HIVE-6183.09.patch, HIVE-6183.09.patch, HIVE-6183.10.patch Add support for all the type supported type casts to/from decimal(p,s) in vectorized mode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)