[jira] [Commented] (HIVE-686) add UDF substring_index
[ https://issues.apache.org/jira/browse/HIVE-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911364#comment-13911364 ] CHEN GEN commented on HIVE-686: --- BUG: this funtion is not support substring_index(www.test.com,test,-1)=com FIX Suggestion: last line: r.set(input.substring(k + delim.length())); add UDF substring_index --- Key: HIVE-686 URL: https://issues.apache.org/jira/browse/HIVE-686 Project: Hive Issue Type: New Feature Components: UDF Reporter: Namit Jain Assignee: Larry Ogrodnek Attachments: HIVE-686.patch, HIVE-686.patch add UDFsubstring_index look at http://dev.mysql.com/doc/refman/5.0/en/func-op-summary-ref.html for details -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6500) Stats collection via filesystem
[ https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6500: --- Attachment: HIVE-6500.patch Stats collection via filesystem --- Key: HIVE-6500 URL: https://issues.apache.org/jira/browse/HIVE-6500 Project: Hive Issue Type: New Feature Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6500.patch Recently, support for stats gathering via counter was [added | https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has following issues: * [Length of counter group name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340] * [Length of counter name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337] * [Number of distinct counter groups are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343] * [Number of distinct counters are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334] Although, these limits are configurable, but setting them to higher value implies increased memory load on AM and job history server. Now, whether these limits makes sense or not is [debatable | https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that Hive doesn't make use of counters features of framework so that it we can evolve this feature without relying on support from framework. Filesystem based counter collection is a step in that direction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6500) Stats collection via filesystem
[ https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911365#comment-13911365 ] Ashutosh Chauhan commented on HIVE-6500: In FS based stats collection, idea is each task will write stats it has collected in a file on FS, which than will be aggregated after job has finished. Stats collection via filesystem --- Key: HIVE-6500 URL: https://issues.apache.org/jira/browse/HIVE-6500 Project: Hive Issue Type: New Feature Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6500.patch Recently, support for stats gathering via counter was [added | https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has following issues: * [Length of counter group name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340] * [Length of counter name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337] * [Number of distinct counter groups are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343] * [Number of distinct counters are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334] Although, these limits are configurable, but setting them to higher value implies increased memory load on AM and job history server. Now, whether these limits makes sense or not is [debatable | https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that Hive doesn't make use of counters features of framework so that it we can evolve this feature without relying on support from framework. Filesystem based counter collection is a step in that direction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6500) Stats collection via filesystem
[ https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6500: --- Attachment: HIVE-6500.patch Stats collection via filesystem --- Key: HIVE-6500 URL: https://issues.apache.org/jira/browse/HIVE-6500 Project: Hive Issue Type: New Feature Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6500.patch Recently, support for stats gathering via counter was [added | https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has following issues: * [Length of counter group name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340] * [Length of counter name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337] * [Number of distinct counter groups are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343] * [Number of distinct counters are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334] Although, these limits are configurable, but setting them to higher value implies increased memory load on AM and job history server. Now, whether these limits makes sense or not is [debatable | https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that Hive doesn't make use of counters features of framework so that it we can evolve this feature without relying on support from framework. Filesystem based counter collection is a step in that direction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6500) Stats collection via filesystem
[ https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6500: --- Attachment: (was: HIVE-6500.patch) Stats collection via filesystem --- Key: HIVE-6500 URL: https://issues.apache.org/jira/browse/HIVE-6500 Project: Hive Issue Type: New Feature Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6500.patch Recently, support for stats gathering via counter was [added | https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has following issues: * [Length of counter group name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340] * [Length of counter name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337] * [Number of distinct counter groups are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343] * [Number of distinct counters are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334] Although, these limits are configurable, but setting them to higher value implies increased memory load on AM and job history server. Now, whether these limits makes sense or not is [debatable | https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that Hive doesn't make use of counters features of framework so that it we can evolve this feature without relying on support from framework. Filesystem based counter collection is a step in that direction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6500) Stats collection via filesystem
[ https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6500: --- Status: Patch Available (was: Open) Stats collection via filesystem --- Key: HIVE-6500 URL: https://issues.apache.org/jira/browse/HIVE-6500 Project: Hive Issue Type: New Feature Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6500.patch Recently, support for stats gathering via counter was [added | https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has following issues: * [Length of counter group name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340] * [Length of counter name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337] * [Number of distinct counter groups are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343] * [Number of distinct counters are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334] Although, these limits are configurable, but setting them to higher value implies increased memory load on AM and job history server. Now, whether these limits makes sense or not is [debatable | https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that Hive doesn't make use of counters features of framework so that it we can evolve this feature without relying on support from framework. Filesystem based counter collection is a step in that direction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Review Request 18459: FS based stats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18459/ --- Review request for hive and Navis Ryu. Bugs: HIVE-6500 https://issues.apache.org/jira/browse/HIVE-6500 Repository: hive Description --- FS based stats collection. Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java 1571554 trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsAggregator.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsAggregatorTez.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsPublisher.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsCollectionTaskIndependent.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsAggregator.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsPublisher.java PRE-CREATION trunk/ql/src/test/queries/clientpositive/statsfs.q PRE-CREATION trunk/ql/src/test/results/clientpositive/statsfs.q.out PRE-CREATION Diff: https://reviews.apache.org/r/18459/diff/ Testing --- Added new tests. Thanks, Ashutosh Chauhan
[jira] [Commented] (HIVE-6380) Specify jars/files when creating permanent UDFs
[ https://issues.apache.org/jira/browse/HIVE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911387#comment-13911387 ] Lefty Leverenz commented on HIVE-6380: -- Well done, Jason. I tinkered with your wiki fixes by adding a few links. Specify jars/files when creating permanent UDFs --- Key: HIVE-6380 URL: https://issues.apache.org/jira/browse/HIVE-6380 Project: Hive Issue Type: Sub-task Components: UDF Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.13.0 Attachments: HIVE-6380.1.patch, HIVE-6380.2.patch, HIVE-6380.3.patch, HIVE-6380.4.patch Need a way for a permanent UDF to reference jars/files. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: (was: HIVE-5687.v2.patch) Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, HIVE-5687.v2.patch Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: HIVE-5687.v2.patch updating patch v2 with minor tweaks Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, HIVE-5687.v2.patch Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5380) Non-default OI constructors should be supported for backwards compatibility
[ https://issues.apache.org/jira/browse/HIVE-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911395#comment-13911395 ] Lefty Leverenz commented on HIVE-5380: -- [~lars_francke] identified the source of the exclamation point prefixes in the Hive SerDe Object Inspector sections of the Developer Guide: old MoinMoin syntax. (See his comment on the SerDes doc: https://cwiki.apache.org/confluence/display/Hive/SerDe?focusedCommentId=39620650#comment-39620650.) So I'm taking them out. Non-default OI constructors should be supported for backwards compatibility --- Key: HIVE-5380 URL: https://issues.apache.org/jira/browse/HIVE-5380 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5380.patch, HIVE-5380.patch, HIVE-5380.patch In HIVE-5263 we started serializing OI's when cloning the plan. This was a great boost in speed for many queries. In the future we'd like to stop copying the OI's, perhaps in HIVE-4396. Until then Custom Serdes will not work on trunk. This is a fix to allow custom serdes such as the Hive JSon Serde work until we address the fact we don't want to have to copy the OI's. Since this is modifying the byte code, we should recommend that the no-arg constructor be added. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-3938) Hive MetaStore should send a single AddPartitionEvent for atomically added partition-set.
[ https://issues.apache.org/jira/browse/HIVE-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-3938: --- Attachment: HIVE-3938.trunk.2.patch I've rebased this patch for the latest trunk (0.13-ish). I've had to remove the support for multi-table add-partitions, because the metastore now seems to check that all partitions in add_partitions_core() actually belong to the same table. I've modified the TestNotificationListener accordingly. Hive MetaStore should send a single AddPartitionEvent for atomically added partition-set. - Key: HIVE-3938 URL: https://issues.apache.org/jira/browse/HIVE-3938 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0, 0.11.0, 0.12.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-3938.trunk.2.patch, Hive-3938-Support_for_Multi-table-insert.patch HiveMetaStore::add_partitions() currently adds all partitions specified in one call using a single meta-store transaction. This acts correctly. However, there's one AddPartitionEvent created per partition specified. Ideally, the set of partitions added atomically can be communicated using a single AddPartitionEvent, such that they are consumed together. I'll post a patch that does this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-3938) Hive MetaStore should send a single AddPartitionEvent for atomically added partition-set.
[ https://issues.apache.org/jira/browse/HIVE-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-3938: --- Status: Patch Available (was: Open) Hive MetaStore should send a single AddPartitionEvent for atomically added partition-set. - Key: HIVE-3938 URL: https://issues.apache.org/jira/browse/HIVE-3938 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.12.0, 0.11.0, 0.10.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-3938.trunk.2.patch, Hive-3938-Support_for_Multi-table-insert.patch HiveMetaStore::add_partitions() currently adds all partitions specified in one call using a single meta-store transaction. This acts correctly. However, there's one AddPartitionEvent created per partition specified. Ideally, the set of partitions added atomically can be communicated using a single AddPartitionEvent, such that they are consumed together. I'll post a patch that does this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6147) Support avro data stored in HBase columns
[ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911420#comment-13911420 ] Hive QA commented on HIVE-6147: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12630595/HIVE-6147.3.patch.txt {color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 5186 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers org.apache.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt org.apache.hcatalog.pig.TestHCatLoader.testGetInputBytes org.apache.hcatalog.pig.TestHCatLoader.testProjectionsBasic org.apache.hcatalog.pig.TestHCatLoader.testReadDataBasic org.apache.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic org.apache.hcatalog.pig.TestHCatLoader.testSchemaLoadComplex org.apache.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData org.apache.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema org.apache.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag org.apache.hcatalog.pig.TestHCatStorer.testBagNStruct org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt org.apache.hive.hcatalog.pig.TestHCatLoader.testGetInputBytes org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataBasic org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic org.apache.hive.hcatalog.pig.TestHCatLoader.testSchemaLoadBasic org.apache.hive.hcatalog.pig.TestHCatLoader.testSchemaLoadComplex org.apache.hive.hcatalog.pig.TestHCatLoader.testSchemaLoadPrimitiveTypes org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag org.apache.hive.hcatalog.pig.TestHCatStorer.testBagNStruct org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testExecuteStatementAsync {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1485/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1485/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 25 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12630595 Support avro data stored in HBase columns - Key: HIVE-6147 URL: https://issues.apache.org/jira/browse/HIVE-6147 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.12.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt Presently, the HBase Hive integration supports querying only primitive data types in columns. It would be nice to be able to store and query Avro objects in HBase columns by making them visible as structs to Hive. This will allow Hive to perform ad hoc analysis of HBase data which can be deeply structured. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6501) Change hadoop dependency on tez branch
[ https://issues.apache.org/jira/browse/HIVE-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6501: - Attachment: HIVE-6501.1.patch Change hadoop dependency on tez branch -- Key: HIVE-6501 URL: https://issues.apache.org/jira/browse/HIVE-6501 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-6501.1.patch Now that 2.3.0 is out, we no longer need to pull the snapshot. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6501) Change hadoop dependency on tez branch
Gunther Hagleitner created HIVE-6501: Summary: Change hadoop dependency on tez branch Key: HIVE-6501 URL: https://issues.apache.org/jira/browse/HIVE-6501 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-6501.1.patch Now that 2.3.0 is out, we no longer need to pull the snapshot. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911465#comment-13911465 ] Hive QA commented on HIVE-6329: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12630599/HIVE-6329.7.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5181 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testExecuteStatementAsync {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1486/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1486/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12630599 Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.indices'='2,3', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 18185: Support Kerberos HTTP authentication for HiveServer2 running in http mode
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18185/ --- (Updated Feb. 25, 2014, 12:23 p.m.) Review request for hive and Thejas Nair. Changes --- Review feedback + cleanup + simpler kerberos negotiation + kerberos doAs. Bugs: HIVE-4764 https://issues.apache.org/jira/browse/HIVE-4764 Repository: hive-git Description --- Support Kerberos HTTP authentication for HiveServer2 running in http mode Diffs (updated) - jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 4102d7a jdbc/src/java/org/apache/hive/jdbc/HttpBasicAuthInterceptor.java 66eba1b jdbc/src/java/org/apache/hive/jdbc/HttpKerberosRequestInterceptor.java PRE-CREATION service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java d8ba3aa service/src/java/org/apache/hive/service/auth/HttpAuthHelper.java PRE-CREATION service/src/java/org/apache/hive/service/auth/HttpAuthenticationException.java PRE-CREATION service/src/java/org/apache/hive/service/auth/HttpCLIServiceProcessor.java PRE-CREATION service/src/java/org/apache/hive/service/auth/HttpCLIServiceUGIProcessor.java PRE-CREATION service/src/java/org/apache/hive/service/cli/CLIService.java 2b1e712 service/src/java/org/apache/hive/service/cli/session/SessionManager.java bfe0e7b service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java 6fbc847 service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 26bda5a service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java a6ff6ce service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java e77f043 shims/common-secure/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java dc89de1 shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 9e9a60d shims/common/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge.java 03f4e51 Diff: https://reviews.apache.org/r/18185/diff/ Testing --- Thanks, Vaibhav Gumashta
[jira] [Updated] (HIVE-4764) Support Kerberos HTTP authentication for HiveServer2 running in http mode
[ https://issues.apache.org/jira/browse/HIVE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-4764: --- Attachment: HIVE-4764.2.patch Support Kerberos HTTP authentication for HiveServer2 running in http mode - Key: HIVE-4764 URL: https://issues.apache.org/jira/browse/HIVE-4764 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: HIVE-4764.1.patch, HIVE-4764.2.patch Support Kerberos authentication for HiveServer2 running in http mode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-4764) Support Kerberos HTTP authentication for HiveServer2 running in http mode
[ https://issues.apache.org/jira/browse/HIVE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-4764: --- Status: Patch Available (was: Open) Support Kerberos HTTP authentication for HiveServer2 running in http mode - Key: HIVE-4764 URL: https://issues.apache.org/jira/browse/HIVE-4764 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: HIVE-4764.1.patch, HIVE-4764.2.patch Support Kerberos authentication for HiveServer2 running in http mode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6486) Support secure Subject.doAs() in HiveServer2 JDBC client.
[ https://issues.apache.org/jira/browse/HIVE-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911518#comment-13911518 ] Vaibhav Gumashta commented on HIVE-6486: [~shivshi]: Thanks a lot for the patch! Can you create a review link on the apache review board as well (https://reviews.apache.org/). It's very easy to browse through the code changes there. Let me know if you need any help in doing that. Thanks! Support secure Subject.doAs() in HiveServer2 JDBC client. - Key: HIVE-6486 URL: https://issues.apache.org/jira/browse/HIVE-6486 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.11.0, 0.12.0 Reporter: Shivaraju Gowda Attachments: Hive_011_Support-Subject_doAS.patch, TestHive_SujectDoAs.java HIVE-5155 addresses the problem of kerberos authentication in multi-user middleware server using proxy user. In this mode the principal used by the middle ware server has privileges to impersonate selected users in Hive/Hadoop. This enhancement is to support Subject.doAs() authentication in Hive JDBC layer so that the end users Kerberos Subject is passed through in the middle ware server. With this improvement there won't be any additional setup in the server to grant proxy privileges to some users and there won't be need to specify a proxy user in the JDBC client. This version should also be more secure since it won't require principals with the privileges to impersonate other users in Hive/Hadoop setup. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6484) HiveServer2 doAs should be session aware both for secured and unsecured session implementation.
[ https://issues.apache.org/jira/browse/HIVE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911522#comment-13911522 ] Vaibhav Gumashta commented on HIVE-6484: [~navis]: Thanks for linking! That jira had slipped out of my radar. HiveServer2 doAs should be session aware both for secured and unsecured session implementation. --- Key: HIVE-6484 URL: https://issues.apache.org/jira/browse/HIVE-6484 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Currently in unsecured case, the doAs is performed by decorating TProcessor.process method. This has been causing cleanup issues as we end up creating a new clientUgi for each request rather than for each session. This also cleans up the code. [~thejas] Probably you can add more if you've seen other issues related to this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6418) MapJoinRowContainer has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911542#comment-13911542 ] Hive QA commented on HIVE-6418: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12630604/HIVE-6418.05.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5170 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hive.service.cli.thrift.TestThriftHttpCLIService.org.apache.hive.service.cli.thrift.TestThriftHttpCLIService {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1488/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1488/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12630604 MapJoinRowContainer has large memory overhead in typical cases -- Key: HIVE-6418 URL: https://issues.apache.org/jira/browse/HIVE-6418 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, HIVE-6418.03.patch, HIVE-6418.04.patch, HIVE-6418.04.patch, HIVE-6418.05.patch, HIVE-6418.WIP.patch, HIVE-6418.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6412) SMB join on Decimal columns causes cast exception in JoinUtil.computeKeys
[ https://issues.apache.org/jira/browse/HIVE-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911568#comment-13911568 ] Remus Rusanu commented on HIVE-6412: I concur, this seems to no longer repro in current trunk. SMB join on Decimal columns causes cast exception in JoinUtil.computeKeys - Key: HIVE-6412 URL: https://issues.apache.org/jira/browse/HIVE-6412 Project: Hive Issue Type: Bug Reporter: Remus Rusanu Assignee: Xuefu Zhang Priority: Critical {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveDecimalWritable cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaHiveDecimalObjectInspector.getPrimitiveWritableObject(JavaHiveDecimalObjectInspector.java:49) at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaHiveDecimalObjectInspector.getPrimitiveWritableObject(JavaHiveDecimalObjectInspector.java:27) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:281) at org.apache.hadoop.hive.ql.exec.JoinUtil.computeKeys(JoinUtil.java:143) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.next(SMBMapJoinOperator.java:809) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:771) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.setupContext(SMBMapJoinOperator.java:710) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.setUpFetchContexts(SMBMapJoinOperator.java:538) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:248) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) {code} Repro: {code} create table vsmb_bucket_1(key decimal(9,0), value decimal(38,10)) CLUSTERED BY (key) SORTED BY (key) INTO 1 BUCKETS STORED AS ORC; create table vsmb_bucket_2(key decimal(19,3), value decimal(28,0)) CLUSTERED BY (key) SORTED BY (key) INTO 1 BUCKETS STORED AS ORC; insert into table vsmb_bucket_1 select cast(cint as decimal(9,0)) as key, cast(cfloat as decimal(38,10)) as value from alltypesorc limit 2; insert into table vsmb_bucket_2 select cast(cint as decimal(19,3)) as key, cast(cfloat as decimal(28,0)) as value from alltypesorc limit 2; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.auto.convert.sortmerge.join.noconditionaltask = true; set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; explain select /*+MAPJOIN(a)*/ * from vsmb_bucket_1 a join vsmb_bucket_2 b on a.key = b.key; select /*+MAPJOIN(a)*/ * from vsmb_bucket_1 a join vsmb_bucket_2 b on a.key = b.key; {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6502) Add query for vectorized_decimal_smbjoin
Remus Rusanu created HIVE-6502: -- Summary: Add query for vectorized_decimal_smbjoin Key: HIVE-6502 URL: https://issues.apache.org/jira/browse/HIVE-6502 Project: Hive Issue Type: Test Reporter: Remus Rusanu Priority: Minor The patch for HIVE-6345 did not contain a query for SMB join because decimal SMB join failed (HIVE-6412). I've tested vectorized decimal SMB and it works fine now. This issue is the check-in vehicle for regression testing .q and .q.out for it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HIVE-6502) Add query for vectorized_decimal_smbjoin
[ https://issues.apache.org/jira/browse/HIVE-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu reassigned HIVE-6502: -- Assignee: Remus Rusanu Add query for vectorized_decimal_smbjoin Key: HIVE-6502 URL: https://issues.apache.org/jira/browse/HIVE-6502 Project: Hive Issue Type: Test Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor The patch for HIVE-6345 did not contain a query for SMB join because decimal SMB join failed (HIVE-6412). I've tested vectorized decimal SMB and it works fine now. This issue is the check-in vehicle for regression testing .q and .q.out for it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911585#comment-13911585 ] Justin Coffey commented on HIVE-6414: - Hi Szehon, I worked off of the trunk on this. We are applying cleanly to the latest commit and unit tests pass, but our qtest fails after the commit for #HIVE-5958. qtests for parquet_create.q work just fine though. We're digging into it. ParquetInputFormat provides data values that do not match the object inspectors --- Key: HIVE-6414 URL: https://issues.apache.org/jira/browse/HIVE-6414 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Remus Rusanu Assignee: Justin Coffey Labels: Parquet Fix For: 0.13.0 Attachments: HIVE-6414.patch While working on HIVE-5998 I noticed that the ParquetRecordReader returns IntWritable for all 'int like' types, in disaccord with the row object inspectors. I though fine, and I worked my way around it. But I see now that the issue trigger failuers in other places, eg. in aggregates: {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cint:528534767,ctinyint:31,csmallint:4963,cfloat:31.0,cdouble:4963.0,cstring1:cvLH6Eat2yFsyy7p} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) ... 9 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) ... 15 more {noformat} My test is (I'm writing a test .q from HIVE-5998, but the repro does not involve vectorization): {noformat} create table if not exists alltypes_parquet ( cint int, ctinyint tinyint, csmallint smallint, cfloat float, cdouble double, cstring1 string) stored as parquet; insert overwrite table alltypes_parquet select cint, ctinyint, csmallint, cfloat, cdouble, cstring1 from alltypesorc; explain select * from alltypes_parquet limit 10; select * from alltypes_parquet limit 10; explain select ctinyint, max(cint), min(csmallint), count(cstring1), avg(cfloat), stddev_pop(cdouble) from alltypes_parquet group by ctinyint; select ctinyint, max(cint), min(csmallint), count(cstring1), avg(cfloat), stddev_pop(cdouble) from alltypes_parquet group by ctinyint; {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911587#comment-13911587 ] Justin Coffey commented on HIVE-6414: - Oh, and we don't appear to need the order by for deterministic tests, but I have added it and will submit an updated patch with it (once we have gotten to the bottom of these failures). btw are your qtests passing in #HIVE-6477? ParquetInputFormat provides data values that do not match the object inspectors --- Key: HIVE-6414 URL: https://issues.apache.org/jira/browse/HIVE-6414 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Remus Rusanu Assignee: Justin Coffey Labels: Parquet Fix For: 0.13.0 Attachments: HIVE-6414.patch While working on HIVE-5998 I noticed that the ParquetRecordReader returns IntWritable for all 'int like' types, in disaccord with the row object inspectors. I though fine, and I worked my way around it. But I see now that the issue trigger failuers in other places, eg. in aggregates: {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cint:528534767,ctinyint:31,csmallint:4963,cfloat:31.0,cdouble:4963.0,cstring1:cvLH6Eat2yFsyy7p} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) ... 9 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) ... 15 more {noformat} My test is (I'm writing a test .q from HIVE-5998, but the repro does not involve vectorization): {noformat} create table if not exists alltypes_parquet ( cint int, ctinyint tinyint, csmallint smallint, cfloat float, cdouble double, cstring1 string) stored as parquet; insert overwrite table alltypes_parquet select cint, ctinyint, csmallint, cfloat, cdouble, cstring1 from alltypesorc; explain select * from alltypes_parquet limit 10; select * from alltypes_parquet limit 10; explain select ctinyint, max(cint), min(csmallint), count(cstring1), avg(cfloat), stddev_pop(cdouble) from alltypes_parquet group by ctinyint; select ctinyint, max(cint), min(csmallint), count(cstring1), avg(cfloat), stddev_pop(cdouble) from alltypes_parquet group by ctinyint; {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-6414: Attachment: HIVE-6414.2.patch Updated patch with working unit and qtests applicable to trunk commit: 6010e22bd24d5004990c63f0aeb232d75693dd94 (#HIVE-5954) ParquetInputFormat provides data values that do not match the object inspectors --- Key: HIVE-6414 URL: https://issues.apache.org/jira/browse/HIVE-6414 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Remus Rusanu Assignee: Justin Coffey Labels: Parquet Fix For: 0.13.0 Attachments: HIVE-6414.2.patch, HIVE-6414.patch While working on HIVE-5998 I noticed that the ParquetRecordReader returns IntWritable for all 'int like' types, in disaccord with the row object inspectors. I though fine, and I worked my way around it. But I see now that the issue trigger failuers in other places, eg. in aggregates: {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cint:528534767,ctinyint:31,csmallint:4963,cfloat:31.0,cdouble:4963.0,cstring1:cvLH6Eat2yFsyy7p} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) ... 9 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) ... 15 more {noformat} My test is (I'm writing a test .q from HIVE-5998, but the repro does not involve vectorization): {noformat} create table if not exists alltypes_parquet ( cint int, ctinyint tinyint, csmallint smallint, cfloat float, cdouble double, cstring1 string) stored as parquet; insert overwrite table alltypes_parquet select cint, ctinyint, csmallint, cfloat, cdouble, cstring1 from alltypesorc; explain select * from alltypes_parquet limit 10; select * from alltypes_parquet limit 10; explain select ctinyint, max(cint), min(csmallint), count(cstring1), avg(cfloat), stddev_pop(cdouble) from alltypes_parquet group by ctinyint; select ctinyint, max(cint), min(csmallint), count(cstring1), avg(cfloat), stddev_pop(cdouble) from alltypes_parquet group by ctinyint; {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 18464: Support secure Subject.doAs() in HiveServer2 JDBC client
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18464/ --- (Updated Feb. 25, 2014, 2:50 p.m.) Review request for hive, Kevin Minder and Vaibhav Gumashta. Changes --- Added hive group Bugs: HIVE-6486 https://issues.apache.org/jira/browse/HIVE-6486 Repository: hive-git Description --- Support secure Subject.doAs() in HiveServer2 JDBC client Diffs - jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 17b4d39 service/src/java/org/apache/hive/service/auth/KerberosSaslHelper.java 379dafb service/src/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java PRE-CREATION Diff: https://reviews.apache.org/r/18464/diff/ Testing --- Manual testing Thanks, Kevin Minder
[jira] [Commented] (HIVE-6486) Support secure Subject.doAs() in HiveServer2 JDBC client.
[ https://issues.apache.org/jira/browse/HIVE-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911632#comment-13911632 ] Kevin Minder commented on HIVE-6486: Added review https://reviews.apache.org/r/18464/ Support secure Subject.doAs() in HiveServer2 JDBC client. - Key: HIVE-6486 URL: https://issues.apache.org/jira/browse/HIVE-6486 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.11.0, 0.12.0 Reporter: Shivaraju Gowda Attachments: Hive_011_Support-Subject_doAS.patch, TestHive_SujectDoAs.java HIVE-5155 addresses the problem of kerberos authentication in multi-user middleware server using proxy user. In this mode the principal used by the middle ware server has privileges to impersonate selected users in Hive/Hadoop. This enhancement is to support Subject.doAs() authentication in Hive JDBC layer so that the end users Kerberos Subject is passed through in the middle ware server. With this improvement there won't be any additional setup in the server to grant proxy privileges to some users and there won't be need to specify a proxy user in the JDBC client. This version should also be more secure since it won't require principals with the privileges to impersonate other users in Hive/Hadoop setup. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HIVE-6412) SMB join on Decimal columns causes cast exception in JoinUtil.computeKeys
[ https://issues.apache.org/jira/browse/HIVE-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-6412. --- Resolution: Cannot Reproduce Close the issue as non-reproducible. SMB join on Decimal columns causes cast exception in JoinUtil.computeKeys - Key: HIVE-6412 URL: https://issues.apache.org/jira/browse/HIVE-6412 Project: Hive Issue Type: Bug Reporter: Remus Rusanu Assignee: Xuefu Zhang Priority: Critical {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveDecimalWritable cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaHiveDecimalObjectInspector.getPrimitiveWritableObject(JavaHiveDecimalObjectInspector.java:49) at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaHiveDecimalObjectInspector.getPrimitiveWritableObject(JavaHiveDecimalObjectInspector.java:27) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:281) at org.apache.hadoop.hive.ql.exec.JoinUtil.computeKeys(JoinUtil.java:143) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.next(SMBMapJoinOperator.java:809) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:771) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.setupContext(SMBMapJoinOperator.java:710) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.setUpFetchContexts(SMBMapJoinOperator.java:538) at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:248) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) {code} Repro: {code} create table vsmb_bucket_1(key decimal(9,0), value decimal(38,10)) CLUSTERED BY (key) SORTED BY (key) INTO 1 BUCKETS STORED AS ORC; create table vsmb_bucket_2(key decimal(19,3), value decimal(28,0)) CLUSTERED BY (key) SORTED BY (key) INTO 1 BUCKETS STORED AS ORC; insert into table vsmb_bucket_1 select cast(cint as decimal(9,0)) as key, cast(cfloat as decimal(38,10)) as value from alltypesorc limit 2; insert into table vsmb_bucket_2 select cast(cint as decimal(19,3)) as key, cast(cfloat as decimal(28,0)) as value from alltypesorc limit 2; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.auto.convert.sortmerge.join.noconditionaltask = true; set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; explain select /*+MAPJOIN(a)*/ * from vsmb_bucket_1 a join vsmb_bucket_2 b on a.key = b.key; select /*+MAPJOIN(a)*/ * from vsmb_bucket_1 a join vsmb_bucket_2 b on a.key = b.key; {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911669#comment-13911669 ] Hive QA commented on HIVE-6429: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12630885/HIVE-6429.06.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5178 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_left_outer_join org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testExecuteStatementAsync {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1489/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1489/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12630885 MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, HIVE-6429.06.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HIVE-4198) Move HCatalog code into Hive
[ https://issues.apache.org/jira/browse/HIVE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4198. Resolution: Fixed Fix Version/s: 0.11.0 Move HCatalog code into Hive Key: HIVE-4198 URL: https://issues.apache.org/jira/browse/HIVE-4198 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.11.0 The HCatalog code needs to be moved into Hive. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6486) Support secure Subject.doAs() in HiveServer2 JDBC client.
[ https://issues.apache.org/jira/browse/HIVE-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911810#comment-13911810 ] Shivaraju Gowda commented on HIVE-6486: --- Thanks Kevin for the review. Vaibhav, Let me know if you need any information or clarification on it. Support secure Subject.doAs() in HiveServer2 JDBC client. - Key: HIVE-6486 URL: https://issues.apache.org/jira/browse/HIVE-6486 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.11.0, 0.12.0 Reporter: Shivaraju Gowda Attachments: Hive_011_Support-Subject_doAS.patch, TestHive_SujectDoAs.java HIVE-5155 addresses the problem of kerberos authentication in multi-user middleware server using proxy user. In this mode the principal used by the middle ware server has privileges to impersonate selected users in Hive/Hadoop. This enhancement is to support Subject.doAs() authentication in Hive JDBC layer so that the end users Kerberos Subject is passed through in the middle ware server. With this improvement there won't be any additional setup in the server to grant proxy privileges to some users and there won't be need to specify a proxy user in the JDBC client. This version should also be more secure since it won't require principals with the privileges to impersonate other users in Hive/Hadoop setup. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6495) TableDesc.getDeserializer() should use correct classloader when calling Class.forName()
[ https://issues.apache.org/jira/browse/HIVE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911816#comment-13911816 ] Ashutosh Chauhan commented on HIVE-6495: (+)1 TableDesc.getDeserializer() should use correct classloader when calling Class.forName() --- Key: HIVE-6495 URL: https://issues.apache.org/jira/browse/HIVE-6495 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6495.1.patch User is getting an error with the following stack trace below. It looks like when Class.forName() is called, it may not be using the correct class loader (JavaUtils.getClassLoader() is used in other contexts when the loaded jar may be required). {noformat} FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception java.lang.ClassNotFoundException: my.serde.ColonSerdejava.lang.RuntimeException: java.lang.ClassNotFoundException: my.serde.ColonSerde at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:68) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRowInspectorFromTable(FetchOperator.java:231) at org.apache.hadoop.hive.ql.exec.FetchOperator.getOutputObjectInspector(FetchOperator.java:608) at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:80) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:497) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:352) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:995) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1038) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.lang.ClassNotFoundException: my.serde.ColonSerde at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:66) ... 20 more {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HIVE-6495) TableDesc.getDeserializer() should use correct classloader when calling Class.forName()
[ https://issues.apache.org/jira/browse/HIVE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911816#comment-13911816 ] Ashutosh Chauhan edited comment on HIVE-6495 at 2/25/14 6:13 PM: - +1 was (Author: ashutoshc): (+)1 TableDesc.getDeserializer() should use correct classloader when calling Class.forName() --- Key: HIVE-6495 URL: https://issues.apache.org/jira/browse/HIVE-6495 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6495.1.patch User is getting an error with the following stack trace below. It looks like when Class.forName() is called, it may not be using the correct class loader (JavaUtils.getClassLoader() is used in other contexts when the loaded jar may be required). {noformat} FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception java.lang.ClassNotFoundException: my.serde.ColonSerdejava.lang.RuntimeException: java.lang.ClassNotFoundException: my.serde.ColonSerde at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:68) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRowInspectorFromTable(FetchOperator.java:231) at org.apache.hadoop.hive.ql.exec.FetchOperator.getOutputObjectInspector(FetchOperator.java:608) at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:80) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:497) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:352) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:995) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1038) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.lang.ClassNotFoundException: my.serde.ColonSerde at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:66) ... 20 more {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-5843: - Attachment: HIVE-5843-src-only.6.patch Latest version of the code minus the generated files. Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: 5843.5-wip.patch, HIVE-5843-src-only.6.patch, HIVE-5843-src-only.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.6.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-5843: - Attachment: HIVE-5843.6.patch Latest version of the code. This has been merged with trunk and should be ready for review and hopefully commit. Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: 5843.5-wip.patch, HIVE-5843-src-only.6.patch, HIVE-5843-src-only.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.6.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-5843: - Status: Patch Available (was: Open) Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: 5843.5-wip.patch, HIVE-5843-src-only.6.patch, HIVE-5843-src-only.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.6.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6375) Fix CTAS for parquet
[ https://issues.apache.org/jira/browse/HIVE-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911922#comment-13911922 ] Szehon Ho commented on HIVE-6375: - [~xuefuz] Can you please take a look? Thanks. Fix CTAS for parquet Key: HIVE-6375 URL: https://issues.apache.org/jira/browse/HIVE-6375 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Brock Noland Assignee: Szehon Ho Priority: Critical Labels: Parquet Attachments: HIVE-6375.2.patch, HIVE-6375.3.patch, HIVE-6375.4.patch, HIVE-6375.patch More details here: https://github.com/Parquet/parquet-mr/issues/272 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Review Request 18478: HIVE-6459: Change the precison/scale for intermediate sum result in the avg() udf
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18478/ --- Review request for hive. Bugs: HIVE-6459 https://issues.apache.org/jira/browse/HIVE-6459 Repository: hive-git Description --- Patch addressed the issue by keeping the type of the sum field consistent with that of sum UDF. The type of the final avg result is unchanged. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFAvgDecimal.java 6f593f9 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java abd54be ql/src/test/queries/clientpositive/vector_decimal_aggregate.q eb9146e ql/src/test/results/clientpositive/create_genericudaf.q.out 96fe2fa ql/src/test/results/clientpositive/decimal_precision.q.out a80695c ql/src/test/results/clientpositive/decimal_udf.q.out 74ae554 ql/src/test/results/clientpositive/groupby10.q.out 341427f ql/src/test/results/clientpositive/groupby3.q.out a74f2b5 ql/src/test/results/clientpositive/groupby3_map.q.out 9424071 ql/src/test/results/clientpositive/groupby3_map_multi_distinct.q.out 9bcd7c9 ql/src/test/results/clientpositive/groupby3_map_skew.q.out f438f89 ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out 310a202 ql/src/test/results/clientpositive/limit_pushdown.q.out a8add4c ql/src/test/results/clientpositive/subquery_in.q.out 48be22b ql/src/test/results/clientpositive/subquery_in_having.q.out ef3dc18 ql/src/test/results/clientpositive/subquery_notin.q.out b2d687b ql/src/test/results/clientpositive/subquery_notin_having.q.out 5f4d96e ql/src/test/results/clientpositive/udaf_number_format.q.out 339ef94 ql/src/test/results/clientpositive/udf3.q.out 546f949 ql/src/test/results/clientpositive/udf8.q.out 79c3bff ql/src/test/results/clientpositive/vector_decimal_aggregate.q.out 8b73971 ql/src/test/results/clientpositive/vectorization_limit.q.out 51a4e81 ql/src/test/results/clientpositive/vectorization_pushdown.q.out df474d6 ql/src/test/results/clientpositive/vectorization_short_regress.q.out 07accb6 ql/src/test/results/clientpositive/vectorized_mapjoin.q.out 9590642 ql/src/test/results/clientpositive/vectorized_shufflejoin.q.out 928bc82 ql/src/test/results/compiler/plan/groupby3.q.xml cc88d5c Diff: https://reviews.apache.org/r/18478/diff/ Testing --- Existing tests cover this. Some test output is regenerated due to the output diff. Thanks, Xuefu Zhang
[jira] [Updated] (HIVE-5176) Wincompat : Changes for allowing various path compatibilities with Windows
[ https://issues.apache.org/jira/browse/HIVE-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5176: --- Assignee: Jason Dere (was: Sushanth Sowmyan) Wincompat : Changes for allowing various path compatibilities with Windows -- Key: HIVE-5176 URL: https://issues.apache.org/jira/browse/HIVE-5176 Project: Hive Issue Type: Sub-task Components: Windows Reporter: Sushanth Sowmyan Assignee: Jason Dere Attachments: HIVE-5176.2.patch, HIVE-5176.3.patch, HIVE-5176.patch We need to make certain changes across the board to allow us to read/parse windows paths. Some are escaping changes, some are being strict about how we read paths (through URL.encode/decode, etc) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911974#comment-13911974 ] Szehon Ho commented on HIVE-6414: - Hmm, I applied your patch on trunk, and new test (parquet_types) still fails for me with missing output due to HIVE-5958. Let's see how pre-commit tests go. Yea my tests pass pre-commit test in HIVE-6477, I had added regeneration of output. Other than that, +1 (non-binding). Thanks for doing order-by, from my experience its useful for group by, as each group goes to one reducer, and no guarantee from MR framework that they wont run in parallel. ParquetInputFormat provides data values that do not match the object inspectors --- Key: HIVE-6414 URL: https://issues.apache.org/jira/browse/HIVE-6414 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Remus Rusanu Assignee: Justin Coffey Labels: Parquet Fix For: 0.13.0 Attachments: HIVE-6414.2.patch, HIVE-6414.patch While working on HIVE-5998 I noticed that the ParquetRecordReader returns IntWritable for all 'int like' types, in disaccord with the row object inspectors. I though fine, and I worked my way around it. But I see now that the issue trigger failuers in other places, eg. in aggregates: {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cint:528534767,ctinyint:31,csmallint:4963,cfloat:31.0,cdouble:4963.0,cstring1:cvLH6Eat2yFsyy7p} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) ... 9 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) ... 15 more {noformat} My test is (I'm writing a test .q from HIVE-5998, but the repro does not involve vectorization): {noformat} create table if not exists alltypes_parquet ( cint int, ctinyint tinyint, csmallint smallint, cfloat float, cdouble double, cstring1 string) stored as parquet; insert overwrite table alltypes_parquet select cint, ctinyint, csmallint, cfloat, cdouble, cstring1 from alltypesorc; explain select * from alltypes_parquet limit 10; select * from alltypes_parquet limit 10; explain select ctinyint, max(cint), min(csmallint), count(cstring1), avg(cfloat), stddev_pop(cdouble) from alltypes_parquet group by ctinyint; select ctinyint, max(cint), min(csmallint), count(cstring1), avg(cfloat), stddev_pop(cdouble) from alltypes_parquet group by ctinyint; {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6375) Fix CTAS for parquet
[ https://issues.apache.org/jira/browse/HIVE-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911979#comment-13911979 ] Xuefu Zhang commented on HIVE-6375: --- +1 Fix CTAS for parquet Key: HIVE-6375 URL: https://issues.apache.org/jira/browse/HIVE-6375 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Brock Noland Assignee: Szehon Ho Priority: Critical Labels: Parquet Attachments: HIVE-6375.2.patch, HIVE-6375.3.patch, HIVE-6375.4.patch, HIVE-6375.patch More details here: https://github.com/Parquet/parquet-mr/issues/272 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6356) Dependency injection in hbase storage handler is broken
[ https://issues.apache.org/jira/browse/HIVE-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912002#comment-13912002 ] Xuefu Zhang commented on HIVE-6356: --- [~ashutoshc] Could we put a close on this? I understood that patch v3 is committed to trunk. Do we still need addendum patch to be committed, in order to close this JIRA? Dependency injection in hbase storage handler is broken --- Key: HIVE-6356 URL: https://issues.apache.org/jira/browse/HIVE-6356 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Navis Priority: Minor Fix For: 0.13.0 Attachments: HIVE-6356.1.patch.txt, HIVE-6356.2.patch.txt, HIVE-6356.3.patch.txt, HIVE-6356.addendum.00.patch Dependent jars for hbase is not added to tmpjars, which is caused by the change of method signature(TableMapReduceUtil.addDependencyJars). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-1459) wildcards in UDF/UDAF should expand to all columns (rather than no columns)
[ https://issues.apache.org/jira/browse/HIVE-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arvind Prabhakar updated HIVE-1459: --- Assignee: (was: Arvind Prabhakar) wildcards in UDF/UDAF should expand to all columns (rather than no columns) --- Key: HIVE-1459 URL: https://issues.apache.org/jira/browse/HIVE-1459 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.6.0 Reporter: John Sichi When a function is invoked with a wildcard * for its parameter, it should be passed all of the columns in the expansion, exception in the special case of COUNT, where none of the columns should be passed. As part of this issue, we also need to test qualified wildcards (e.g. t.*) and Hive's extension for regular-expression selection of column subsets. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-5843: - Status: Open (was: Patch Available) Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: 5843.5-wip.patch, HIVE-5843-src-only.6.patch, HIVE-5843-src-only.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.6.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-5843: - Attachment: HIVE-5843.7.patch New latest version of the patch. I had forgotten to add the new thrift generated files to the previous version. Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: 5843.5-wip.patch, HIVE-5843-src-only.6.patch, HIVE-5843-src-only.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.6.patch, HIVE-5843.7.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-5843: - Status: Patch Available (was: Open) Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: 5843.5-wip.patch, HIVE-5843-src-only.6.patch, HIVE-5843-src-only.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.6.patch, HIVE-5843.7.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912036#comment-13912036 ] Lefty Leverenz commented on HIVE-5843: -- HiveConf comment nits: * hive.compactor.check.interval: // Time in seconds between checks to see if any partitions need compacted. -- need to be compacted. * hive.txn.timeout: // time after which ... -- init cap Time Also a question: If this goes into Hive 0.13.0, will it be useful immediately or just a piece of an incomplete feature? Thirteen new config parameters are added, and I'm wondering about documentation (as always). When HIVE-6037 gets committed we won't need to update hive-default.xml.template anymore but the parameter comments will have to be moved into the definitions. Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: 5843.5-wip.patch, HIVE-5843-src-only.6.patch, HIVE-5843-src-only.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.6.patch, HIVE-5843.7.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf
[ https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912041#comment-13912041 ] Lefty Leverenz commented on HIVE-6037: -- Parameter alert! HIVE-5843 (transaction manager) introduces 13 new config params. This has been merged with trunk and should be ready for review and hopefully commit. All have definitions in the comments except the first three, which speak for themselves. * HIVE-5843: hive.txn.manager, hive.txn.driver, hive.txn.connection.string, hive.txn.timeout, hive.txn.max.open.batch, hive.txn.testing, hive.compactor.initiator.on, hive.compactor.worker.threads, hive.compactor.worker.timeout, hive.compactor.check.interval, hive.compactor.delta.num.threshold, hive.compactor.delta.pct.threshold, hive.compactor.abortedtxn.threshold Synchronize HiveConf with hive-default.xml.template and support show conf - Key: HIVE-6037 URL: https://issues.apache.org/jira/browse/HIVE-6037 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.13.0 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, HIVE-6037.2.patch.txt, HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, HIVE-6037.6.patch.txt, HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, HIVE-6037.9.patch.txt, HIVE-6037.patch see HIVE-5879 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912059#comment-13912059 ] Alan Gates commented on HIVE-5843: -- I'm definitely hoping this makes it into 0.13. And no, it isn't only incomplete feature. If it was, I'd wait until after 0.13 branched. HIVE-5687 depends on this, and the hope is to get it into 0.13. As for the comments in HiveConf, I didn't realize I was writing documentation there or I would have paid closer attention to my grammar. However, a nit on your nit. need compacted. -- need to be compacted. Are you sure? What is the grammar rule there? Overall on documentation though, there will be a fair amount to write, especially once we have HIVE-6319 and HIVE-6060 there. Should I file a separate JIRA outline the documentation needs? Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: 5843.5-wip.patch, HIVE-5843-src-only.6.patch, HIVE-5843-src-only.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.6.patch, HIVE-5843.7.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6455) Scalable dynamic partitioning and bucketing optimization
[ https://issues.apache.org/jira/browse/HIVE-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-6455: - Attachment: HIVE-6455.8.patch Added fix when dynamic partition context is null for bucketed tables. Scalable dynamic partitioning and bucketing optimization Key: HIVE-6455 URL: https://issues.apache.org/jira/browse/HIVE-6455 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: optimization Attachments: HIVE-6455.1.patch, HIVE-6455.1.patch, HIVE-6455.2.patch, HIVE-6455.3.patch, HIVE-6455.4.patch, HIVE-6455.4.patch, HIVE-6455.5.patch, HIVE-6455.6.patch, HIVE-6455.7.patch, HIVE-6455.8.patch The current implementation of dynamic partition works by keeping at least one record writer open per dynamic partition directory. In case of bucketing there can be multispray file writers which further adds up to the number of open record writers. The record writers of column oriented file format (like ORC, RCFile etc.) keeps some sort of in-memory buffers (value buffer or compression buffers) open all the time to buffer up the rows and compress them before flushing it to disk. Since these buffers are maintained per column basis the amount of constant memory that will required at runtime increases as the number of partitions and number of columns per partition increases. This often leads to OutOfMemory (OOM) exception in mappers or reducers depending on the number of open record writers. Users often tune the JVM heapsize (runtime memory) to get over such OOM issues. With this optimization, the dynamic partition columns and bucketing columns (in case of bucketed tables) are sorted before being fed to the reducers. Since the partitioning and bucketing columns are sorted, each reducers can keep only one record writer open at any time thereby reducing the memory pressure on the reducers. This optimization is highly scalable as the number of partition and number of columns per partition increases at the cost of sorting the columns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 18459: FS based stats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18459/#review35452 --- trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/18459/#comment66000 Do you need to update hive-site template + test hive-site too? trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsAggregator.java https://reviews.apache.org/r/18459/#comment65998 how does this work with task attempts? is there a chance of counting failed stuff? trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsPublisher.java https://reviews.apache.org/r/18459/#comment65977 this would be easier to debug if the exception gets logged at a higher level (error/warn/exception) - multiple instances in both new files. - Gunther Hagleitner On Feb. 25, 2014, 8:09 a.m., Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18459/ --- (Updated Feb. 25, 2014, 8:09 a.m.) Review request for hive and Navis Ryu. Bugs: HIVE-6500 https://issues.apache.org/jira/browse/HIVE-6500 Repository: hive Description --- FS based stats collection. Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java 1571554 trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsAggregator.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsAggregatorTez.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsPublisher.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsCollectionTaskIndependent.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsAggregator.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsPublisher.java PRE-CREATION trunk/ql/src/test/queries/clientpositive/statsfs.q PRE-CREATION trunk/ql/src/test/results/clientpositive/statsfs.q.out PRE-CREATION Diff: https://reviews.apache.org/r/18459/diff/ Testing --- Added new tests. Thanks, Ashutosh Chauhan
[jira] [Commented] (HIVE-6500) Stats collection via filesystem
[ https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912103#comment-13912103 ] Gunther Hagleitner commented on HIVE-6500: -- Small comments/question on rb. Stats collection via filesystem --- Key: HIVE-6500 URL: https://issues.apache.org/jira/browse/HIVE-6500 Project: Hive Issue Type: New Feature Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6500.patch Recently, support for stats gathering via counter was [added | https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has following issues: * [Length of counter group name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340] * [Length of counter name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337] * [Number of distinct counter groups are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343] * [Number of distinct counters are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334] Although, these limits are configurable, but setting them to higher value implies increased memory load on AM and job history server. Now, whether these limits makes sense or not is [debatable | https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that Hive doesn't make use of counters features of framework so that it we can evolve this feature without relying on support from framework. Filesystem based counter collection is a step in that direction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 18185: Support Kerberos HTTP authentication for HiveServer2 running in http mode
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18185/#review35436 --- jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java https://reviews.apache.org/r/18185/#comment65958 lets track this TODO in a jira. It is not very useful comment here (ie not something like warning against an unimplemented part or so) jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java https://reviews.apache.org/r/18185/#comment65957 It will be better to keep the position of createBinaryTransport and createHttpTransport same as before. That way the diff will be smaller and easier to read. Also, git blame will remain an useful tool for analyzing changes (it would be easier to find which line in createBinaryTransport changed when with it). jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java https://reviews.apache.org/r/18185/#comment65959 this variable is not being used anywhere jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java https://reviews.apache.org/r/18185/#comment65960 I am probably being too opinionated here! Feel free to disagree (if you do). I don't think we need this no-argument method, we can just use the method with single boolean argument. I think that will be more readable. jdbc/src/java/org/apache/hive/jdbc/HttpKerberosRequestInterceptor.java https://reviews.apache.org/r/18185/#comment65976 Can you add a class level comment ? service/src/java/org/apache/hive/service/auth/HttpAuthHelper.java https://reviews.apache.org/r/18185/#comment65989 can you add a class comment ?Something like utility functions for http mode authentication. Maybe call this class HttpAuthUtils, so that its more clear what it contains ? service/src/java/org/apache/hive/service/auth/HttpCLIServiceProcessor.java https://reviews.apache.org/r/18185/#comment65993 can you add a class comment ? service/src/java/org/apache/hive/service/auth/HttpCLIServiceUGIProcessor.java https://reviews.apache.org/r/18185/#comment65994 can you add a class comment ? service/src/java/org/apache/hive/service/auth/HttpCLIServiceUGIProcessor.java https://reviews.apache.org/r/18185/#comment66002 I think the better place to clear this is in ThriftHttpServlet, after the call to super.doPost(request, response), as it is set in the same place. shims/common-secure/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java https://reviews.apache.org/r/18185/#comment65988 This is duplicating code in createClientTransport . Should we move this code to a static util class and re-use in both places ? shims/common/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge.java https://reviews.apache.org/r/18185/#comment65987 wouldn't it be sufficient to use HadoopShims.getUGIForConf() instead of new method in thrift shims ? - Thejas Nair On Feb. 25, 2014, 12:23 p.m., Vaibhav Gumashta wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18185/ --- (Updated Feb. 25, 2014, 12:23 p.m.) Review request for hive and Thejas Nair. Bugs: HIVE-4764 https://issues.apache.org/jira/browse/HIVE-4764 Repository: hive-git Description --- Support Kerberos HTTP authentication for HiveServer2 running in http mode Diffs - jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 4102d7a jdbc/src/java/org/apache/hive/jdbc/HttpBasicAuthInterceptor.java 66eba1b jdbc/src/java/org/apache/hive/jdbc/HttpKerberosRequestInterceptor.java PRE-CREATION service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java d8ba3aa service/src/java/org/apache/hive/service/auth/HttpAuthHelper.java PRE-CREATION service/src/java/org/apache/hive/service/auth/HttpAuthenticationException.java PRE-CREATION service/src/java/org/apache/hive/service/auth/HttpCLIServiceProcessor.java PRE-CREATION service/src/java/org/apache/hive/service/auth/HttpCLIServiceUGIProcessor.java PRE-CREATION service/src/java/org/apache/hive/service/cli/CLIService.java 2b1e712 service/src/java/org/apache/hive/service/cli/session/SessionManager.java bfe0e7b service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java 6fbc847 service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 26bda5a service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java a6ff6ce service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java e77f043 shims/common-secure/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java dc89de1
[jira] [Updated] (HIVE-6389) LazyBinaryColumnarSerDe-based RCFile tables break when looking up elements in null-maps.
[ https://issues.apache.org/jira/browse/HIVE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-6389: --- Status: Patch Available (was: Open) LazyBinaryColumnarSerDe-based RCFile tables break when looking up elements in null-maps. Key: HIVE-6389 URL: https://issues.apache.org/jira/browse/HIVE-6389 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.13.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: Hive-6389.patch RCFile tables that use the LazyBinaryColumnarSerDe don't seem to handle look-ups into map-columns when the value of the column is null. When an RCFile table is created with LazyBinaryColumnarSerDe (as is default in 0.12), and queried as follows: {code} select mymap['1024'] from mytable; {code} and if the mymap column has nulls, then one is treated to the following guttural utterance: {code} 2014-02-05 21:50:25,050 FATAL mr.ExecMapper (ExecMapper.java:map(194)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:null,mymap:null,isnull:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41) at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:226) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:560) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) ... 10 more {code} A patch is on the way, but the short of it is that the LazyBinaryMapOI needs to return nulls if either the map or the lookup-key is null. This is handled correctly for Text data, and for RCFiles using ColumnarSerDe. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6389) LazyBinaryColumnarSerDe-based RCFile tables break when looking up elements in null-maps.
[ https://issues.apache.org/jira/browse/HIVE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-6389: --- Status: Open (was: Patch Available) LazyBinaryColumnarSerDe-based RCFile tables break when looking up elements in null-maps. Key: HIVE-6389 URL: https://issues.apache.org/jira/browse/HIVE-6389 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.13.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: Hive-6389.patch RCFile tables that use the LazyBinaryColumnarSerDe don't seem to handle look-ups into map-columns when the value of the column is null. When an RCFile table is created with LazyBinaryColumnarSerDe (as is default in 0.12), and queried as follows: {code} select mymap['1024'] from mytable; {code} and if the mymap column has nulls, then one is treated to the following guttural utterance: {code} 2014-02-05 21:50:25,050 FATAL mr.ExecMapper (ExecMapper.java:map(194)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {id:null,mymap:null,isnull:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41) at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:226) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:560) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) ... 10 more {code} A patch is on the way, but the short of it is that the LazyBinaryMapOI needs to return nulls if either the map or the lookup-key is null. This is handled correctly for Text data, and for RCFiles using ColumnarSerDe. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6499) Using Metastore-side Auth errors on non-resolvable IF/OF/SerDe
[ https://issues.apache.org/jira/browse/HIVE-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912174#comment-13912174 ] Thejas M Nair commented on HIVE-6499: - +1 Using Metastore-side Auth errors on non-resolvable IF/OF/SerDe -- Key: HIVE-6499 URL: https://issues.apache.org/jira/browse/HIVE-6499 Project: Hive Issue Type: Bug Components: Metastore, Security Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-6499.patch In cases where a user needs to use a custom IF/OF/SerDe that is not accessible from the metastore, calls like msc.createTable and msc.dropTable should still work without being able to load the class. This is possible as long as one does not enable MetaStore-side authorization, at which point this becomes impossible, erroring out with a ClassNotFoundException. The reason this happens is that since the AuthorizationProvider interface is defined against a ql.metadata.Table, we wind up needing to instantiate a ql.metadata.Table object, which, in its constructor tries to instantiate IF/OF/SerDe elements in an attempt to pre-load those fields. And if we do not have access to those classes in the metastore, this is when that fails. The constructor/initialize methods of Table and Partition do not really need to pre-initialize these fields, since the fields are accessed only through the accessor, and will be instantiated on first-use. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6503) document pluggable authentication modules (PAM) in template config, wiki
Thejas M Nair created HIVE-6503: --- Summary: document pluggable authentication modules (PAM) in template config, wiki Key: HIVE-6503 URL: https://issues.apache.org/jira/browse/HIVE-6503 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Vaibhav Gumashta Priority: Blocker Fix For: 0.13.0 HIVE-6466 adds support for PAM as a supported value for hive.server2.authentication. It also adds a config parameter hive.server2.authentication.pam.services. The default template file needs to be updated to document these. The wiki docs should also document the support for pluggable authentication modules. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6466) Add support for pluggable authentication modules (PAM) in Hive
[ https://issues.apache.org/jira/browse/HIVE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-6466: Resolution: Fixed Status: Resolved (was: Patch Available) I have committed this to trunk. Thanks for the contribution [~vgumashta]!. Thanks for reviewing [~kamrul]. I realized post commit that this does not update the default.xml.template file. I have filed a blocker jira to track that - HIVE-6503 . Vaibhav, can you please address that when you get a chance ? We should fix that in 0.13. Add support for pluggable authentication modules (PAM) in Hive -- Key: HIVE-6466 URL: https://issues.apache.org/jira/browse/HIVE-6466 Project: Hive Issue Type: New Feature Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: HIVE-6466.1.patch, HIVE-6466.2.patch More on PAM in these articles: http://www.tuxradar.com/content/how-pam-works https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Managing_Smart_Cards/Pluggable_Authentication_Modules.html Usage from JPAM api: http://jpam.sourceforge.net/JPamUserGuide.html#id.s7.1 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5970) ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java
[ https://issues.apache.org/jira/browse/HIVE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912231#comment-13912231 ] Prasanth J commented on HIVE-5970: -- I just verified with the attached test data. This bug is solved by HIVE-6382. ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java --- Key: HIVE-5970 URL: https://issues.apache.org/jira/browse/HIVE-5970 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.12.0 Reporter: Eric Chu Priority: Critical Labels: orcfile Attachments: test_data A workload involving ORC tables starts getting the following ArrayIndexOutOfBoundsException AFTER the upgrade to Hive 0.12. The file is added as part of HIVE-4123. 2013-12-04 14:42:08,537 ERROR cause:java.io.IOException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0 2013-12-04 14:42:08,537 WARN org.apache.hadoop.mapred.Child: Error running child java.io.IOException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302) ... 11 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:171) at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:54) at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:287) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:473) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1157) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2196) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:129) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:80) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274) ... 15 more -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6466) Add support for pluggable authentication modules (PAM) in Hive
[ https://issues.apache.org/jira/browse/HIVE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912235#comment-13912235 ] Vaibhav Gumashta commented on HIVE-6466: [~thejas] Thanks for the review! Sure, I'll resolve HIVE-6503 by the end of week. Add support for pluggable authentication modules (PAM) in Hive -- Key: HIVE-6466 URL: https://issues.apache.org/jira/browse/HIVE-6466 Project: Hive Issue Type: New Feature Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: HIVE-6466.1.patch, HIVE-6466.2.patch More on PAM in these articles: http://www.tuxradar.com/content/how-pam-works https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Managing_Smart_Cards/Pluggable_Authentication_Modules.html Usage from JPAM api: http://jpam.sourceforge.net/JPamUserGuide.html#id.s7.1 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6455) Scalable dynamic partitioning and bucketing optimization
[ https://issues.apache.org/jira/browse/HIVE-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912237#comment-13912237 ] Gunther Hagleitner commented on HIVE-6455: -- This is cool. Still reviewing but some ideas: - Instead of adding a column to the record to be used in the file sink, it'd be cleaner (and faster) to use the key to determine new files. I believe that could be achieved through startGroup/endGroup - Looks like we'd end up duplicating partition column, bucket column, sort columns in both key and value on the reduce sink. It might be possible to avoid that, making the intermediate output smaller. Although I'm not sure this would require additional changes to rebuild the row in the reduce task. Scalable dynamic partitioning and bucketing optimization Key: HIVE-6455 URL: https://issues.apache.org/jira/browse/HIVE-6455 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: optimization Attachments: HIVE-6455.1.patch, HIVE-6455.1.patch, HIVE-6455.2.patch, HIVE-6455.3.patch, HIVE-6455.4.patch, HIVE-6455.4.patch, HIVE-6455.5.patch, HIVE-6455.6.patch, HIVE-6455.7.patch, HIVE-6455.8.patch The current implementation of dynamic partition works by keeping at least one record writer open per dynamic partition directory. In case of bucketing there can be multispray file writers which further adds up to the number of open record writers. The record writers of column oriented file format (like ORC, RCFile etc.) keeps some sort of in-memory buffers (value buffer or compression buffers) open all the time to buffer up the rows and compress them before flushing it to disk. Since these buffers are maintained per column basis the amount of constant memory that will required at runtime increases as the number of partitions and number of columns per partition increases. This often leads to OutOfMemory (OOM) exception in mappers or reducers depending on the number of open record writers. Users often tune the JVM heapsize (runtime memory) to get over such OOM issues. With this optimization, the dynamic partition columns and bucketing columns (in case of bucketed tables) are sorted before being fed to the reducers. Since the partitioning and bucketing columns are sorted, each reducers can keep only one record writer open at any time thereby reducing the memory pressure on the reducers. This optimization is highly scalable as the number of partition and number of columns per partition increases at the cost of sorting the columns. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6504) Refactor JDBC HiveConnection to use a factory to create client transport.
Vaibhav Gumashta created HIVE-6504: -- Summary: Refactor JDBC HiveConnection to use a factory to create client transport. Key: HIVE-6504 URL: https://issues.apache.org/jira/browse/HIVE-6504 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 The client transport creation is quite messy. Need to clean it up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5504) OrcOutputFormat honors compression properties only from within hive
[ https://issues.apache.org/jira/browse/HIVE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5504: --- Attachment: HIVE-5504.2.patch Updated patch per reviewboard comments. OrcOutputFormat honors compression properties only from within hive - Key: HIVE-5504 URL: https://issues.apache.org/jira/browse/HIVE-5504 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Venkat Ranganathan Assignee: Sushanth Sowmyan Attachments: HIVE-5504.2.patch, HIVE-5504.patch When we import data into a HCatalog table created with the following storage description .. stored as orc tblproperties (orc.compress=SNAPPY) the resultant orc file still uses the default zlib compression It looks like HCatOutputFormat is ignoring the tblproperties specified. show tblproperties shows that the table indeed has the properties properly saved. An insert/select into the table has the resulting orc file honor the tbl property. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6393) Support unqualified column references in Joining conditions
[ https://issues.apache.org/jira/browse/HIVE-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912261#comment-13912261 ] Gunther Hagleitner commented on HIVE-6393: -- [~rhbutani] could you open a review board request for this one? Support unqualified column references in Joining conditions --- Key: HIVE-6393 URL: https://issues.apache.org/jira/browse/HIVE-6393 Project: Hive Issue Type: Improvement Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-6393.1.patch, HIVE-6393.2.patch Support queries of the form: {noformat} create table r1(a int); create table r2(b); select a, b from r1 join r2 on a = b {noformat} This becomes more useful in old style syntax: {noformat} select a, b from r1, r2 where a = b {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6393) Support unqualified column references in Joining conditions
[ https://issues.apache.org/jira/browse/HIVE-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912264#comment-13912264 ] Harish Butani commented on HIVE-6393: - https://reviews.apache.org/r/18293/ Support unqualified column references in Joining conditions --- Key: HIVE-6393 URL: https://issues.apache.org/jira/browse/HIVE-6393 Project: Hive Issue Type: Improvement Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-6393.1.patch, HIVE-6393.2.patch Support queries of the form: {noformat} create table r1(a int); create table r2(b); select a, b from r1 join r2 on a = b {noformat} This becomes more useful in old style syntax: {noformat} select a, b from r1, r2 where a = b {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Review Request 18492: HIVE-6473: Allow writing HFiles via HBaseStorageHandler table
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18492/ --- Review request for hive. Bugs: HIVE-6473 https://issues.apache.org/jira/browse/HIVE-6473 Repository: hive-git Description --- From the JIRA: Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. Diffs - hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 8cd594b hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java 6d383b5 hbase-handler/src/test/queries/negative/generatehfiles_require_family_path.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_bulk.m f8bb47d hbase-handler/src/test/queries/positive/hbase_bulk.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_handler_bulk.q PRE-CREATION hbase-handler/src/test/results/negative/generatehfiles_require_family_path.q.out PRE-CREATION hbase-handler/src/test/results/positive/hbase_handler_bulk.q.out PRE-CREATION Diff: https://reviews.apache.org/r/18492/diff/ Testing --- Thanks, nick dimiduk
[jira] [Commented] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912291#comment-13912291 ] Nick Dimiduk commented on HIVE-6473: Sure think. I've opened https://reviews.apache.org/r/18492/ . Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HIVE-6473.0.patch.txt Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5504) OrcOutputFormat honors compression properties only from within hive
[ https://issues.apache.org/jira/browse/HIVE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912302#comment-13912302 ] Thejas M Nair commented on HIVE-5504: - +1 OrcOutputFormat honors compression properties only from within hive - Key: HIVE-5504 URL: https://issues.apache.org/jira/browse/HIVE-5504 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Venkat Ranganathan Assignee: Sushanth Sowmyan Attachments: HIVE-5504.2.patch, HIVE-5504.patch When we import data into a HCatalog table created with the following storage description .. stored as orc tblproperties (orc.compress=SNAPPY) the resultant orc file still uses the default zlib compression It looks like HCatOutputFormat is ignoring the tblproperties specified. show tblproperties shows that the table indeed has the properties properly saved. An insert/select into the table has the resulting orc file honor the tbl property. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6456) Implement Parquet schema evolution
[ https://issues.apache.org/jira/browse/HIVE-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6456: -- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks to Brock for the patch. Implement Parquet schema evolution -- Key: HIVE-6456 URL: https://issues.apache.org/jira/browse/HIVE-6456 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Priority: Trivial Fix For: 0.13.0 Attachments: HIVE-6456.patch In HIVE-5783 we removed schema evolution: https://github.com/Parquet/parquet-mr/pull/297/files#r9824155 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6505) Make stats optimizer more robust in presence of distinct clause
Ashutosh Chauhan created HIVE-6505: -- Summary: Make stats optimizer more robust in presence of distinct clause Key: HIVE-6505 URL: https://issues.apache.org/jira/browse/HIVE-6505 Project: Hive Issue Type: Bug Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Currently it throws exceptions in few cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6505) Make stats optimizer more robust in presence of distinct clause
[ https://issues.apache.org/jira/browse/HIVE-6505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6505: --- Attachment: HIVE-6505.patch More checks to make sure stats optimizer fires correctly. Make stats optimizer more robust in presence of distinct clause --- Key: HIVE-6505 URL: https://issues.apache.org/jira/browse/HIVE-6505 Project: Hive Issue Type: Bug Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6505.patch Currently it throws exceptions in few cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6505) Make stats optimizer more robust in presence of distinct clause
[ https://issues.apache.org/jira/browse/HIVE-6505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6505: --- Status: Patch Available (was: In Progress) Make stats optimizer more robust in presence of distinct clause --- Key: HIVE-6505 URL: https://issues.apache.org/jira/browse/HIVE-6505 Project: Hive Issue Type: Bug Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6505.patch Currently it throws exceptions in few cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HIVE-6505) Make stats optimizer more robust in presence of distinct clause
[ https://issues.apache.org/jira/browse/HIVE-6505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-6505 started by Ashutosh Chauhan. Make stats optimizer more robust in presence of distinct clause --- Key: HIVE-6505 URL: https://issues.apache.org/jira/browse/HIVE-6505 Project: Hive Issue Type: Bug Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6505.patch Currently it throws exceptions in few cases. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Review Request 18494: Fix stats optimizer for distinct clause.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18494/ --- Review request for hive. Bugs: HIVE-6505 https://issues.apache.org/jira/browse/HIVE-6505 Repository: hive-git Description --- Fix stats optimizer for distinct clause. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/StatsOptimizer.java 1d23449 ql/src/test/queries/clientpositive/distinct_stats.q PRE-CREATION ql/src/test/results/clientpositive/distinct_stats.q.out PRE-CREATION Diff: https://reviews.apache.org/r/18494/diff/ Testing --- Added new .q test Thanks, Ashutosh Chauhan
[jira] [Commented] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912318#comment-13912318 ] Xuefu Zhang commented on HIVE-6414: --- Quick comment on the code change: Hive doesn't throw runtime exception when the data isn't right. In terms of error handling, Hive returns null for data errors, including data-out-of-bound as in this case. ParquetInputFormat provides data values that do not match the object inspectors --- Key: HIVE-6414 URL: https://issues.apache.org/jira/browse/HIVE-6414 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Remus Rusanu Assignee: Justin Coffey Labels: Parquet Fix For: 0.13.0 Attachments: HIVE-6414.2.patch, HIVE-6414.patch While working on HIVE-5998 I noticed that the ParquetRecordReader returns IntWritable for all 'int like' types, in disaccord with the row object inspectors. I though fine, and I worked my way around it. But I see now that the issue trigger failuers in other places, eg. in aggregates: {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cint:528534767,ctinyint:31,csmallint:4963,cfloat:31.0,cdouble:4963.0,cstring1:cvLH6Eat2yFsyy7p} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) ... 9 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) ... 15 more {noformat} My test is (I'm writing a test .q from HIVE-5998, but the repro does not involve vectorization): {noformat} create table if not exists alltypes_parquet ( cint int, ctinyint tinyint, csmallint smallint, cfloat float, cdouble double, cstring1 string) stored as parquet; insert overwrite table alltypes_parquet select cint, ctinyint, csmallint, cfloat, cdouble, cstring1 from alltypesorc; explain select * from alltypes_parquet limit 10; select * from alltypes_parquet limit 10; explain select ctinyint, max(cint), min(csmallint), count(cstring1), avg(cfloat), stddev_pop(cdouble) from alltypes_parquet group by ctinyint; select ctinyint, max(cint), min(csmallint), count(cstring1), avg(cfloat), stddev_pop(cdouble) from alltypes_parquet group by ctinyint; {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912320#comment-13912320 ] Hive QA commented on HIVE-6414: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12630955/HIVE-6414.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5196 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1491/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1491/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12630955 ParquetInputFormat provides data values that do not match the object inspectors --- Key: HIVE-6414 URL: https://issues.apache.org/jira/browse/HIVE-6414 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Remus Rusanu Assignee: Justin Coffey Labels: Parquet Fix For: 0.13.0 Attachments: HIVE-6414.2.patch, HIVE-6414.patch While working on HIVE-5998 I noticed that the ParquetRecordReader returns IntWritable for all 'int like' types, in disaccord with the row object inspectors. I though fine, and I worked my way around it. But I see now that the issue trigger failuers in other places, eg. in aggregates: {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cint:528534767,ctinyint:31,csmallint:4963,cfloat:31.0,cdouble:4963.0,cstring1:cvLH6Eat2yFsyy7p} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) ... 9 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) ... 15 more {noformat} My test is (I'm writing a test .q from HIVE-5998, but the repro does not involve vectorization): {noformat} create table if not exists alltypes_parquet ( cint int, ctinyint tinyint, csmallint smallint, cfloat float, cdouble double, cstring1 string) stored as parquet; insert overwrite table alltypes_parquet select cint, ctinyint, csmallint, cfloat, cdouble, cstring1 from
[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf
[ https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912323#comment-13912323 ] Thejas M Nair commented on HIVE-6037: - I think it is better to commit this now, rather than later. When will you be able to rebase ? Once its ready and reviewed we should commit this one without waiting for another 24 hours (so that it does not go stale). Maybe have more than one committer review it instead. What to people thing ? Synchronize HiveConf with hive-default.xml.template and support show conf - Key: HIVE-6037 URL: https://issues.apache.org/jira/browse/HIVE-6037 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.13.0 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, HIVE-6037.2.patch.txt, HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, HIVE-6037.6.patch.txt, HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, HIVE-6037.9.patch.txt, HIVE-6037.patch see HIVE-5879 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6506) hcatalog should automatically work with new tableproperties in ORC
Thejas M Nair created HIVE-6506: --- Summary: hcatalog should automatically work with new tableproperties in ORC Key: HIVE-6506 URL: https://issues.apache.org/jira/browse/HIVE-6506 Project: Hive Issue Type: Bug Components: HCatalog, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Thejas M Nair HIVE-5504 has changes to handle existing table properties for ORC file format. But it does not automatically pick newly added table properties. We should refactor ORC so that its table property list can be automatically determined. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6506) hcatalog should automatically work with new tableproperties in ORC
[ https://issues.apache.org/jira/browse/HIVE-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912339#comment-13912339 ] Thejas M Nair commented on HIVE-6506: - ORC should return list of table properties names (maybe use an enum instead of 'final string), and hcatalog should add that list to jobconf. hcatalog should automatically work with new tableproperties in ORC -- Key: HIVE-6506 URL: https://issues.apache.org/jira/browse/HIVE-6506 Project: Hive Issue Type: Bug Components: HCatalog, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Thejas M Nair HIVE-5504 has changes to handle existing table properties for ORC file format. But it does not automatically pick newly added table properties. We should refactor ORC so that its table property list can be automatically determined. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6434) Restrict function create/drop to admin roles
[ https://issues.apache.org/jira/browse/HIVE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912352#comment-13912352 ] Thejas M Nair commented on HIVE-6434: - I think we should require admin privileges for temporary functions as well. This is not a backward compatibility issue as the requirement would apply only if the new sql standard auth is enabled. Restrict function create/drop to admin roles Key: HIVE-6434 URL: https://issues.apache.org/jira/browse/HIVE-6434 Project: Hive Issue Type: Sub-task Components: Authorization, UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6434.1.patch, HIVE-6434.2.patch, HIVE-6434.3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6507) OrcFile table property names are specified as strings
Sushanth Sowmyan created HIVE-6507: -- Summary: OrcFile table property names are specified as strings Key: HIVE-6507 URL: https://issues.apache.org/jira/browse/HIVE-6507 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan In HIVE-5504, we had to do some special casing in HCatalog to add a particular set of orc table properties from table properties to job properties. In doing so, it's obvious that that is a bit cumbersome, and ideally, the list of all orc file table properties should really be an enum, rather than individual loosely tied constant strings. If we were to clean this up, we can clean up other code that references this to reference the entire enum, and avoid future errors when new table properties are introduced, but other referencing code is not updated. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6507) OrcFile table property names are specified as strings
[ https://issues.apache.org/jira/browse/HIVE-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-6507: --- Attachment: HIVE-6507.patch Attaching patch. This applies on top of HIVE-5504, and depends on that being committed first. OrcFile table property names are specified as strings - Key: HIVE-6507 URL: https://issues.apache.org/jira/browse/HIVE-6507 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-6507.patch In HIVE-5504, we had to do some special casing in HCatalog to add a particular set of orc table properties from table properties to job properties. In doing so, it's obvious that that is a bit cumbersome, and ideally, the list of all orc file table properties should really be an enum, rather than individual loosely tied constant strings. If we were to clean this up, we can clean up other code that references this to reference the entire enum, and avoid future errors when new table properties are introduced, but other referencing code is not updated. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6137) Hive should report that the file/path doesn’t exist when it doesn’t (it now reports SocketTimeoutException)
[ https://issues.apache.org/jira/browse/HIVE-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-6137: Attachment: HIVE-6137.2.patch cc-ing [~thejas] for review. I have made changes in HiveMetaStore to throw a MetaException which gets caught at the client side. Hive should report that the file/path doesn’t exist when it doesn’t (it now reports SocketTimeoutException) --- Key: HIVE-6137 URL: https://issues.apache.org/jira/browse/HIVE-6137 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-6137.1.patch, HIVE-6137.2.patch Hive should report that the file/path doesn’t exist when it doesn’t (it now reports SocketTimeoutException): Execute a Hive DDL query with a reference to a non-existent blob (such as CREATE EXTERNAL TABLE...) and check Hive logs (stderr): FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask This error message is not intuitive. If a file doesn't exist, Hive should report FileNotFoundException -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HIVE-6506) hcatalog should automatically work with new tableproperties in ORC
[ https://issues.apache.org/jira/browse/HIVE-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair resolved HIVE-6506. - Resolution: Duplicate Resolving as duplicate hcatalog should automatically work with new tableproperties in ORC -- Key: HIVE-6506 URL: https://issues.apache.org/jira/browse/HIVE-6506 Project: Hive Issue Type: Bug Components: HCatalog, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Thejas M Nair HIVE-5504 has changes to handle existing table properties for ORC file format. But it does not automatically pick newly added table properties. We should refactor ORC so that its table property list can be automatically determined. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912361#comment-13912361 ] Lefty Leverenz commented on HIVE-5843: -- Thanks Alan, a doc JIRA seems like a good idea for this. About the nit, I'm sure that partitions need compacted sounds wrong -- maybe you meant need compaction or maybe I misunderstood the concept. I read it as similar to The dishes in the sink need cleaned vs. need to be cleaned or need cleaning. Not so? But I can't cite a grammar rule without doing some research. So far I've found out that need can be a regular verb or a modal. I need to look it up, but I need not obsess over it, respectively. Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: 5843.5-wip.patch, HIVE-5843-src-only.6.patch, HIVE-5843-src-only.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.6.patch, HIVE-5843.7.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5099) Some partition publish operation cause OOM in metastore backed by SQL Server
[ https://issues.apache.org/jira/browse/HIVE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912370#comment-13912370 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-5099: - non-binding +1. cc-ing [~ashutoshc], [~thejas] for review. HIVE-5218 should be marked as a duplicate of this jira, since this upgrades datanucleus to an ever newer version. Some partition publish operation cause OOM in metastore backed by SQL Server Key: HIVE-5099 URL: https://issues.apache.org/jira/browse/HIVE-5099 Project: Hive Issue Type: Bug Components: Metastore, Windows Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-5099-1.patch, HIVE-5099-2.patch For certain metastore operation combination, metastore operation hangs and metastore server eventually fail due to OOM. This happens when metastore is backed by SQL Server. Here is a testcase to reproduce: {code} CREATE TABLE tbl_repro_oom1 (a STRING, b INT) PARTITIONED BY (c STRING, d STRING); CREATE TABLE tbl_repro_oom_2 (a STRING ) PARTITIONED BY (e STRING); ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='France', d=4); ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='Russia', d=3); ALTER TABLE tbl_repro_oom_2 ADD PARTITION (e='Russia'); ALTER TABLE tbl_repro_oom1 DROP PARTITION (c = 'India'); --failure {code} The code cause the issue is in ExpressionTree.java: {code} valString = partitionName.substring(partitionName.indexOf(\ + keyEqual + \)+ + keyEqualLength + ).substring(0, partitionName.substring(partitionName.indexOf(\ + keyEqual + \)+ + keyEqualLength + ).indexOf(\/\)); {code} The snapshot of table partition before the drop partition statement is: {code} PART_ID CREATE_TIMELAST_ACCESS_TIME PART_NAMESD_ID TBL_ID 931376526718 0c=France/d=4 127 33 941376526718 0c=Russia/d=3 128 33 951376526718 0e=Russia 129 34 {code} Datanucleus query try to find the value of a particular key by locating $key= as the start, / as the end. For example, value of c in c=France/d=4 by locating c= as the start, / following as the end. However, this query fail if we try to find value e in e=Russia since there is no tailing /. Other database works since the query plan first filter out the partition not belonging to tbl_repro_oom1. Whether this error surface or not depends on the query optimizer. When this exception happens, metastore keep trying and throw exception. The memory image of metastore contains a large number of exception objects: {code} com.microsoft.sqlserver.jdbc.SQLServerException: Invalid length parameter passed to the LEFT or SUBSTRING function. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:197) at com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:4762) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1682) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:955) at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) at org.datanucleus.store.rdbms.query.ForwardQueryResult.init(ForwardQueryResult.java:90) at org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:686) at org.datanucleus.store.query.Query.executeQuery(Query.java:1791) at org.datanucleus.store.query.Query.executeWithMap(Query.java:1694) at org.datanucleus.api.jdo.JDOQuery.executeWithMap(JDOQuery.java:334) at org.apache.hadoop.hive.metastore.ObjectStore.listMPartitionsByFilter(ObjectStore.java:1715) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1590) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy4.getPartitionsByFilter(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:2163) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at
[jira] [Updated] (HIVE-6507) OrcFile table property names are specified as strings
[ https://issues.apache.org/jira/browse/HIVE-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-6507: --- Affects Version/s: 0.13.0 OrcFile table property names are specified as strings - Key: HIVE-6507 URL: https://issues.apache.org/jira/browse/HIVE-6507 Project: Hive Issue Type: Bug Components: HCatalog, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-6507.patch In HIVE-5504, we had to do some special casing in HCatalog to add a particular set of orc table properties from table properties to job properties. In doing so, it's obvious that that is a bit cumbersome, and ideally, the list of all orc file table properties should really be an enum, rather than individual loosely tied constant strings. If we were to clean this up, we can clean up other code that references this to reference the entire enum, and avoid future errors when new table properties are introduced, but other referencing code is not updated. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6507) OrcFile table property names are specified as strings
[ https://issues.apache.org/jira/browse/HIVE-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-6507: --- Component/s: Serializers/Deserializers HCatalog OrcFile table property names are specified as strings - Key: HIVE-6507 URL: https://issues.apache.org/jira/browse/HIVE-6507 Project: Hive Issue Type: Bug Components: HCatalog, Serializers/Deserializers Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-6507.patch In HIVE-5504, we had to do some special casing in HCatalog to add a particular set of orc table properties from table properties to job properties. In doing so, it's obvious that that is a bit cumbersome, and ideally, the list of all orc file table properties should really be an enum, rather than individual loosely tied constant strings. If we were to clean this up, we can clean up other code that references this to reference the entire enum, and avoid future errors when new table properties are introduced, but other referencing code is not updated. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6434) Restrict function create/drop to admin roles
[ https://issues.apache.org/jira/browse/HIVE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912391#comment-13912391 ] Jason Dere commented on HIVE-6434: -- Ok, I can add the restriction on temp functions/macros back to the patch. Restrict function create/drop to admin roles Key: HIVE-6434 URL: https://issues.apache.org/jira/browse/HIVE-6434 Project: Hive Issue Type: Sub-task Components: Authorization, UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6434.1.patch, HIVE-6434.2.patch, HIVE-6434.3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6434) Restrict function create/drop to admin roles
[ https://issues.apache.org/jira/browse/HIVE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-6434: - Release Note: (was: Restrict function create/drop to admin roles, if sql std auth is enabled. This would include temp/permanent functions, as well as macros. ) Restrict function create/drop to admin roles Key: HIVE-6434 URL: https://issues.apache.org/jira/browse/HIVE-6434 Project: Hive Issue Type: Sub-task Components: Authorization, UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6434.1.patch, HIVE-6434.2.patch, HIVE-6434.3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6434) Restrict function create/drop to admin roles
[ https://issues.apache.org/jira/browse/HIVE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-6434: - Description: Restrict function create/drop to admin roles, if sql std auth is enabled. This would include temp/permanent functions, as well as macros. Restrict function create/drop to admin roles Key: HIVE-6434 URL: https://issues.apache.org/jira/browse/HIVE-6434 Project: Hive Issue Type: Sub-task Components: Authorization, UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6434.1.patch, HIVE-6434.2.patch, HIVE-6434.3.patch Restrict function create/drop to admin roles, if sql std auth is enabled. This would include temp/permanent functions, as well as macros. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 18459: FS based stats.
On Feb. 25, 2014, 9:36 p.m., Gunther Hagleitner wrote: trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 626 https://reviews.apache.org/r/18459/diff/1/?file=503283#file503283line626 Do you need to update hive-site template + test hive-site too? The template file will be generated from HiveConf.java after HIVE-6037 gets committed, so updating it would be wasted effort. But a parameter description is needed, and it can go in a comment for now but once HIVE-6037 commits the description has to be part of the parameter definition like this example: CLIPROMPT(hive.cli.prompt, hive, Command line prompt configuration value. Other hiveconf can be used in this configuration value. \n + Variable substitution will only be invoked at the Hive CLI startup.), - Lefty --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18459/#review35452 --- On Feb. 25, 2014, 8:09 a.m., Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18459/ --- (Updated Feb. 25, 2014, 8:09 a.m.) Review request for hive and Navis Ryu. Bugs: HIVE-6500 https://issues.apache.org/jira/browse/HIVE-6500 Repository: hive Description --- FS based stats collection. Diffs - trunk/common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java 1571554 trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsAggregator.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsAggregatorTez.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/CounterStatsPublisher.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsCollectionTaskIndependent.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java 1571554 trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsAggregator.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/fs/FSStatsPublisher.java PRE-CREATION trunk/ql/src/test/queries/clientpositive/statsfs.q PRE-CREATION trunk/ql/src/test/results/clientpositive/statsfs.q.out PRE-CREATION Diff: https://reviews.apache.org/r/18459/diff/ Testing --- Added new tests. Thanks, Ashutosh Chauhan
[jira] [Commented] (HIVE-4545) HS2 should return describe table results without space padding
[ https://issues.apache.org/jira/browse/HIVE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912418#comment-13912418 ] Ashutosh Chauhan commented on HIVE-4545: +1 HS2 should return describe table results without space padding -- Key: HIVE-4545 URL: https://issues.apache.org/jira/browse/HIVE-4545 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Vaibhav Gumashta Attachments: HIVE-4545-1.patch, HIVE-4545.2.patch, HIVE-4545.3.patch, HIVE-4545.4.patch, HIVE-4545.5.patch HIVE-3140 changed behavior of 'DESCRIBE table;' to be like 'DESCRIBE FORMATTED table;'. HIVE-3140 introduced changes to not print header in 'DESCRIBE table;'. But jdbc/odbc calls still get fields padded with space for the 'DESCRIBE table;' query. As the jdbc/odbc results are not for direct human consumption the space padding should not be done for hive server2. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6434) Restrict function create/drop to admin roles
[ https://issues.apache.org/jira/browse/HIVE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-6434: - Attachment: HIVE-6434.4.patch Patch v4 adds back the restricting of create temp function/macro to admin roles. This is only in effect if sql standard auth is enabled. Restrict function create/drop to admin roles Key: HIVE-6434 URL: https://issues.apache.org/jira/browse/HIVE-6434 Project: Hive Issue Type: Sub-task Components: Authorization, UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6434.1.patch, HIVE-6434.2.patch, HIVE-6434.3.patch, HIVE-6434.4.patch Restrict function create/drop to admin roles, if sql std auth is enabled. This would include temp/permanent functions, as well as macros. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 18162: HIVE-6434: Restrict function create/drop to admin roles
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18162/ --- (Updated Feb. 26, 2014, 2:10 a.m.) Review request for hive and Thejas Nair. Changes --- Adds back the restricting of create temp function/macro to admin roles. This is only in effect if sql standard auth is enabled. Bugs: HIVE-6434 https://issues.apache.org/jira/browse/HIVE-6434 Repository: hive-git Description --- Add output entity of DB object to make sure only admin roles can add/drop functions/macros. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java 68a25e0 ql/src/java/org/apache/hadoop/hive/ql/parse/MacroSemanticAnalyzer.java 0ae07e3 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java c43bcea ql/src/test/queries/clientnegative/authorization_create_func1.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_create_func2.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_create_macro1.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_create_func1.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_create_macro1.q PRE-CREATION ql/src/test/results/clientnegative/authorization_create_func1.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_create_func2.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_create_macro1.q.out PRE-CREATION ql/src/test/results/clientnegative/cluster_tasklog_retrieval.q.out 747aa6a ql/src/test/results/clientnegative/create_function_nonexistent_class.q.out 393a3e8 ql/src/test/results/clientnegative/create_function_nonexistent_db.q.out ebb069e ql/src/test/results/clientnegative/create_function_nonudf_class.q.out dd66afc ql/src/test/results/clientnegative/create_udaf_failure.q.out 3fc3d36 ql/src/test/results/clientnegative/create_unknown_genericudf.q.out af3d50b ql/src/test/results/clientnegative/create_unknown_udf_udaf.q.out e138fd0 ql/src/test/results/clientnegative/drop_native_udf.q.out 1913df9 ql/src/test/results/clientnegative/udf_function_does_not_implement_udf.q.out 9ea8668 ql/src/test/results/clientnegative/udf_local_resource.q.out b6ea77d ql/src/test/results/clientnegative/udf_nonexistent_resource.q.out ad70d54 ql/src/test/results/clientnegative/udf_test_error.q.out a788a10 ql/src/test/results/clientnegative/udf_test_error_reduce.q.out 98b42e0 ql/src/test/results/clientpositive/authorization_create_func1.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_create_macro1.q.out PRE-CREATION ql/src/test/results/clientpositive/autogen_colalias.q.out a074b96 ql/src/test/results/clientpositive/compile_processor.q.out 7e9bb29 ql/src/test/results/clientpositive/create_func1.q.out 5a249c3 ql/src/test/results/clientpositive/create_genericudaf.q.out 96fe2fa ql/src/test/results/clientpositive/create_genericudf.q.out bf1f4ac ql/src/test/results/clientpositive/create_udaf.q.out 2e86a36 ql/src/test/results/clientpositive/create_view.q.out ecc7618 ql/src/test/results/clientpositive/drop_udf.q.out 422933a ql/src/test/results/clientpositive/macro.q.out c483029 ql/src/test/results/clientpositive/ptf_register_tblfn.q.out 11c9724 ql/src/test/results/clientpositive/udaf_sum_list.q.out b1922d9 ql/src/test/results/clientpositive/udf_compare_java_string.q.out 8e6e365 ql/src/test/results/clientpositive/udf_context_aware.q.out 10414fa ql/src/test/results/clientpositive/udf_logic_java_boolean.q.out 88c1984 ql/src/test/results/clientpositive/udf_testlength.q.out 4d75482 ql/src/test/results/clientpositive/udf_testlength2.q.out 8a1e03e ql/src/test/results/clientpositive/udf_using.q.out 69e5f3b ql/src/test/results/clientpositive/windowing_udaf2.q.out 5043a45 Diff: https://reviews.apache.org/r/18162/diff/ Testing --- positive/negative q files added Thanks, Jason Dere
[jira] [Updated] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-5843: - Status: Open (was: Patch Available) Found an NPE that shows up on a cluster but not in .q file tests. Will post a new version of the patch soon. Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: 5843.5-wip.patch, HIVE-5843-src-only.6.patch, HIVE-5843-src-only.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.6.patch, HIVE-5843.7.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160)