[jira] [Commented] (HIVE-7105) Enable ReduceRecordProcessor to generate VectorizedRowBatches
[ https://issues.apache.org/jira/browse/HIVE-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030321#comment-14030321 ] Remus Rusanu commented on HIVE-7105: Can you share the rb link? Enable ReduceRecordProcessor to generate VectorizedRowBatches - Key: HIVE-7105 URL: https://issues.apache.org/jira/browse/HIVE-7105 Project: Hive Issue Type: Bug Components: Tez, Vectorization Reporter: Rajesh Balamohan Assignee: Gopal V Fix For: 0.14.0 Attachments: HIVE-7105.1.patch, HIVE-7105.2.patch Currently, ReduceRecordProcessor sends one key,value pair at a time to its operator pipeline. It would be beneficial to send VectorizedRowBatch to downstream operators. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7220) Empty dir in external table causes issue (root_dir_external_table.q failure)
[ https://issues.apache.org/jira/browse/HIVE-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030328#comment-14030328 ] Szehon Ho commented on HIVE-7220: - OK, never mind about this patch. Empty dir in external table causes issue (root_dir_external_table.q failure) Key: HIVE-7220 URL: https://issues.apache.org/jira/browse/HIVE-7220 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7220.patch While looking at root_dir_external_table.q failure, which is doing a query on an external table located at root ('/'), I noticed that latest Hadoop2 CombineFileInputFormat returns split representing empty directories (like '/Users'), which leads to failure in Hive's CombineFileRecordReader as it tries to open the directory for processing. Tried with an external table in a normal HDFS directory, and it also returns the same error. Looks like a real bug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7005) MiniTez tests have non-deterministic explain plans
[ https://issues.apache.org/jira/browse/HIVE-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7005: - Attachment: HIVE-7005.1.patch I believe the problem was that the filesinkoperators were kept in a hashset in tez. I've ran the tests a few times with the patch and didn't get any non-deterministic output. MiniTez tests have non-deterministic explain plans -- Key: HIVE-7005 URL: https://issues.apache.org/jira/browse/HIVE-7005 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Gunther Hagleitner Attachments: HIVE-7005.1.patch TestMiniTezCliDriver has a few test failures where there is a diff in the explain plan generated. According to Vikram, the plan generated is correct, but the plan can be generated in a couple of different ways and so sometimes the plan will not diff against the expected output. We should probably come up with a way to validate this explain plan in a reproducible way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7005) MiniTez tests have non-deterministic explain plans
[ https://issues.apache.org/jira/browse/HIVE-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030335#comment-14030335 ] Gunther Hagleitner commented on HIVE-7005: -- rb: https://reviews.apache.org/r/22547 MiniTez tests have non-deterministic explain plans -- Key: HIVE-7005 URL: https://issues.apache.org/jira/browse/HIVE-7005 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Gunther Hagleitner Attachments: HIVE-7005.1.patch TestMiniTezCliDriver has a few test failures where there is a diff in the explain plan generated. According to Vikram, the plan generated is correct, but the plan can be generated in a couple of different ways and so sometimes the plan will not diff against the expected output. We should probably come up with a way to validate this explain plan in a reproducible way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7005) MiniTez tests have non-deterministic explain plans
[ https://issues.apache.org/jira/browse/HIVE-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7005: - Status: Patch Available (was: Open) MiniTez tests have non-deterministic explain plans -- Key: HIVE-7005 URL: https://issues.apache.org/jira/browse/HIVE-7005 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Gunther Hagleitner Attachments: HIVE-7005.1.patch TestMiniTezCliDriver has a few test failures where there is a diff in the explain plan generated. According to Vikram, the plan generated is correct, but the plan can be generated in a couple of different ways and so sometimes the plan will not diff against the expected output. We should probably come up with a way to validate this explain plan in a reproducible way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030340#comment-14030340 ] Lefty Leverenz commented on HIVE-7158: -- Does the design doc need guidance about this (or is it time to add Tez documentation to the user docs)? * [Hive on Tez | https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez] At a minimum, Configuration Properties needs to document these parameters: * new parameter: hive.tez.auto.reducer.parallelism * new parameter: hive.tez.max.partition.factor * new parameter: hive.tez.min.partition.factor * new default for [hive.exec.reducers.bytes.per.reducer | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.reducers.bytes.per.reducer] (with version information) * new default for [hive.exec.reducers.max | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.reducers.max] (with version information) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, HIVE-7158.4.patch, HIVE-7158.5.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7158: - Labels: TODOC14 (was: ) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, HIVE-7158.4.patch, HIVE-7158.5.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7220) Empty dir in external table causes issue (root_dir_external_table.q failure)
[ https://issues.apache.org/jira/browse/HIVE-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7220: Status: Open (was: Patch Available) Cancelling for now, unless there's interest in having a workaround in Hive. It will not be necessary to pursue if MAPREDUCE-5756 is fixed. Empty dir in external table causes issue (root_dir_external_table.q failure) Key: HIVE-7220 URL: https://issues.apache.org/jira/browse/HIVE-7220 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7220.patch While looking at root_dir_external_table.q failure, which is doing a query on an external table located at root ('/'), I noticed that latest Hadoop2 CombineFileInputFormat returns split representing empty directories (like '/Users'), which leads to failure in Hive's CombineFileRecordReader as it tries to open the directory for processing. Tried with an external table in a normal HDFS directory, and it also returns the same error. Looks like a real bug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030351#comment-14030351 ] Ravi Prakash commented on HIVE-7100: Purge is an acceptable option for us. Thanks Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: Jayesh Attachments: HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers
[ https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030365#comment-14030365 ] Sushanth Sowmyan commented on HIVE-7209: Looks good to me. +1. allow metastore authorization api calls to be restricted to certain invokers Key: HIVE-7209 URL: https://issues.apache.org/jira/browse/HIVE-7209 Project: Hive Issue Type: Bug Components: Authentication, Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7209.1.patch, HIVE-7209.2.patch, HIVE-7209.3.patch Any user who has direct access to metastore can make metastore api calls that modify the authorization policy. The users who can make direct metastore api calls in a secure cluster configuration are usually the 'cluster insiders' such as Pig and MR users, who are not (securely) covered by the metastore based authorization policy. But it makes sense to disallow access from such users as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030369#comment-14030369 ] Sushanth Sowmyan commented on HIVE-6584: Teng, I'd be interested in how your patch winds up being. If you mean that at runtime, the HBaseStorageHandler decides to deputize a subclass of itself to do the work, then that might work. But if you mean that your approach would lead to the user having to create a separate table (kinda like a view) that associates with a snapshot, then speaking from the hive side, I think I would prefer having only one SH to deal with, and having it decide what to do with various set parameters as opposed to creating separate hive tables with a different SH in hive. That way, using the same hive table definition, a query could decide to use a snapshot or not. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Documentation Policy
One more question: what should we do after the documentation is done for a JIRA ticket? (a) Just remove the TODOC## label. (b) Replace TODOC## with docdone (no caps, no version number). (c) Add a docdone label but keep TODOC##. (d) Something else. -- Lefty On Thu, Jun 12, 2014 at 12:54 PM, Brock Noland br...@cloudera.com wrote: Thank you guys! This is great work. On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Going through the issues, I think overall Lefty did an awesome job catching and documenting most of them in time. Following are some of the 0.13 and 0.14 ones which I found which either do not have documentation or have outdated one and probably need one to be consumeable. Contributors, feel free to remove the label if you disagree. *TODOC13:* https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed) *TODOC14:* https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed) I'll continue digging through the queue going backwards to 0.12 and 0.11 and see if I find similar stuff there as well. On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Feel free to label such jiras with this keyword and ask the contributors for more information if you need any. Cool. I'll start chugging through the queue today adding labels as apt. On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair the...@hortonworks.com wrote: Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13? Sounds good to me. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Swarnim -- Swarnim
[jira] [Commented] (HIVE-7022) Replace BinaryWritable with BytesWritable in Parquet serde
[ https://issues.apache.org/jira/browse/HIVE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030391#comment-14030391 ] Lefty Leverenz commented on HIVE-7022: -- No user doc needed, right? Replace BinaryWritable with BytesWritable in Parquet serde -- Key: HIVE-7022 URL: https://issues.apache.org/jira/browse/HIVE-7022 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.14.0 Attachments: HIVE-7022.patch Currently ParquetHiveSerde uses BinaryWritable to enclose bytes read from Parquet data. However, existing Hadoop class, BytesWritable, already does that, and BinaryWritable offers no advantage. On the other hand, BinaryWritable has a confusing getString() method, which, if misused, can cause unexpected result. The proposal here is to replace it with Hadoop BytesWritable. The issue was identified in HIVE-6367, serving as a follow-up JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7159: - Status: Patch Available (was: Open) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7159: - Attachment: HIVE-7159.3.patch +golden file updates For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7155) WebHCat controller job exceeds container memory limit
[ https://issues.apache.org/jira/browse/HIVE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7155: - Labels: TODOC14 (was: ) WebHCat controller job exceeds container memory limit - Key: HIVE-7155 URL: https://issues.apache.org/jira/browse/HIVE-7155 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: shanyu zhao Assignee: shanyu zhao Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7155.1.patch, HIVE-7155.2.patch, HIVE-7155.patch Submit a Hive query on a large table via WebHCat results in failure because the WebHCat controller job is killed by Yarn since it exceeds the memory limit (set by mapreduce.map.memory.mb, defaults to 1GB): {code} INSERT OVERWRITE TABLE Temp_InjusticeEvents_2014_03_01_00_00 SELECT * from Stage_InjusticeEvents where LogTimestamp '2014-03-01 00:00:00' and LogTimestamp = '2014-03-01 01:00:00'; {code} We could increase mapreduce.map.memory.mb to solve this problem, but this way we are changing this setting system wise. We need to provide a WebHCat configuration to overwrite mapreduce.map.memory.mb when submitting the controller job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7155) WebHCat controller job exceeds container memory limit
[ https://issues.apache.org/jira/browse/HIVE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030413#comment-14030413 ] Lefty Leverenz commented on HIVE-7155: -- Need to document *templeton.mapper.memory.mb* in the wiki with version information (0.14.0): * [WebHCat Configuration: Configuration Variables | https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure#WebHCatConfigure-ConfigurationVariables] WebHCat controller job exceeds container memory limit - Key: HIVE-7155 URL: https://issues.apache.org/jira/browse/HIVE-7155 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: shanyu zhao Assignee: shanyu zhao Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7155.1.patch, HIVE-7155.2.patch, HIVE-7155.patch Submit a Hive query on a large table via WebHCat results in failure because the WebHCat controller job is killed by Yarn since it exceeds the memory limit (set by mapreduce.map.memory.mb, defaults to 1GB): {code} INSERT OVERWRITE TABLE Temp_InjusticeEvents_2014_03_01_00_00 SELECT * from Stage_InjusticeEvents where LogTimestamp '2014-03-01 00:00:00' and LogTimestamp = '2014-03-01 01:00:00'; {code} We could increase mapreduce.map.memory.mb to solve this problem, but this way we are changing this setting system wise. We need to provide a WebHCat configuration to overwrite mapreduce.map.memory.mb when submitting the controller job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7224) Set incremental printing to true by default in Beeline
[ https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030414#comment-14030414 ] Hive QA commented on HIVE-7224: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650083/HIVE-7224.1.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5535 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.beeline.TestBeeLineWithArgs.testNullEmpty org.apache.hive.beeline.TestBeeLineWithArgs.testNullEmptyCmdArg org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/453/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/453/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-453/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650083 Set incremental printing to true by default in Beeline -- Key: HIVE-7224 URL: https://issues.apache.org/jira/browse/HIVE-7224 Project: Hive Issue Type: Bug Components: Clients, JDBC Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7224.1.patch See HIVE-7221. By default beeline tries to buffer the entire output relation before printing it on stdout. This can cause OOM when the output relation is large. However, beeline has the option of incremental prints. We should keep that as the default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6473: - Labels: TODOC14 (was: ) Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch, HIVE-6473.1.patch.txt, HIVE-6473.2.patch, HIVE-6473.3.patch, HIVE-6473.4.patch, HIVE-6473.5.patch, HIVE-6473.6.patch Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7136: - Labels: TODOC14 (was: ) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system --- Key: HIVE-7136 URL: https://issues.apache.org/jira/browse/HIVE-7136 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.13.0 Reporter: Sumit Kumar Assignee: Sumit Kumar Priority: Minor Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7136.01.patch, HIVE-7136.patch Current hive cli assumes that the source file (hive script) is always on the local file system. This patch implements support for reading source files from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping the default behavior intact to be reading from default filesystem (local) in case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7159: - Attachment: HIVE-7159.4.patch For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7159: - Status: Patch Available (was: Open) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7159: - Status: Open (was: Patch Available) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7143) Add Streaming support in Windowing mode for more UDAFs (min/max, lead/lag, fval/lval)
[ https://issues.apache.org/jira/browse/HIVE-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030424#comment-14030424 ] Lefty Leverenz commented on HIVE-7143: -- What user doc does this need? * [Language Manual -- Windowing and Analytics Functions | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics] Add Streaming support in Windowing mode for more UDAFs (min/max, lead/lag, fval/lval) - Key: HIVE-7143 URL: https://issues.apache.org/jira/browse/HIVE-7143 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-7143.1.patch, HIVE-7143.3.patch Provided implementations for Streaming for the above fns. Min/Max based on Alg by Daniel Lemire: http://www.archipel.uqam.ca/309/1/webmaximinalgo.pdf -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns
[ https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7168: - Labels: TODOC14 (was: ) Don't require to name all columns in analyze statements if stats collection is for all columns -- Key: HIVE-7168 URL: https://issues.apache.org/jira/browse/HIVE-7168 Project: Hive Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7168.1.patch, HIVE-7168.2.patch, HIVE-7168.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7119) Extended ACL's should be inherited if warehouse perm inheritance enabled
[ https://issues.apache.org/jira/browse/HIVE-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7119: - Labels: TODOC14 (was: ) Extended ACL's should be inherited if warehouse perm inheritance enabled Key: HIVE-7119 URL: https://issues.apache.org/jira/browse/HIVE-7119 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7119.2.patch, HIVE-7119.3.patch, HIVE-7119.4.patch, HIVE-7119.patch HDFS recently came out with support for extended ACL's, ie permission for specific group/user in addition to the general owner/group/other permission. Hive permission inheritance should also inherit those as well, if user has set them at any point in the warehouse directory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6586) Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)
[ https://issues.apache.org/jira/browse/HIVE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6586: - Labels: TODOC14 (was: ) Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos) --- Key: HIVE-6586 URL: https://issues.apache.org/jira/browse/HIVE-6586 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Lefty Leverenz Labels: TODOC14 HIVE-6037 puts the definitions of configuration parameters into the HiveConf.java file, but several recent jiras for release 0.13.0 introduce new parameters that aren't in HiveConf.java yet and some parameter definitions need to be altered for 0.13.0. This jira will patch HiveConf.java after HIVE-6037 gets committed. Also, four typos patched in HIVE-6582 need to be fixed in the new HiveConf.java. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
[ https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7050: - Labels: TODOC14 (was: ) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE - Key: HIVE-7050 URL: https://issues.apache.org/jira/browse/HIVE-7050 Project: Hive Issue Type: Bug Components: Statistics Reporter: Prasanth J Assignee: Prasanth J Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7050.1.patch, HIVE-7050.2.patch, HIVE-7050.3.patch, HIVE-7050.4.patch, HIVE-7050.5.patch, HIVE-7050.6.patch There is currently no way to display the column level stats from hive CLI. It will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7062) Support Streaming mode in Windowing
[ https://issues.apache.org/jira/browse/HIVE-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7062: - Labels: TODOC14 (was: ) Support Streaming mode in Windowing --- Key: HIVE-7062 URL: https://issues.apache.org/jira/browse/HIVE-7062 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7062.1.patch, HIVE-7062.4.patch, HIVE-7062.5.patch, HIVE-7062.6.patch 1. Have the Windowing Table Function support streaming mode. 2. Have special handling for Ranking UDAFs. 3. Have special handling for Sum/Avg for fixed size Wdws. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5072) [WebHCat]Enable directly invoke Sqoop job through Templeton
[ https://issues.apache.org/jira/browse/HIVE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-5072: - Labels: TODOC14 (was: ) [WebHCat]Enable directly invoke Sqoop job through Templeton --- Key: HIVE-5072 URL: https://issues.apache.org/jira/browse/HIVE-5072 Project: Hive Issue Type: Improvement Components: WebHCat Affects Versions: 0.12.0 Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-5072.1.patch, HIVE-5072.2.patch, HIVE-5072.3.patch, HIVE-5072.4.patch, HIVE-5072.5.patch, Templeton-Sqoop-Action.pdf Now it is hard to invoke a Sqoop job through templeton. The only way is to use the classpath jar generated by a sqoop job and use the jar delegator in Templeton. We should implement Sqoop Delegator to enable directly invoke Sqoop job through Templeton. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6940) [WebHCat]Update documentation for Templeton-Sqoop action
[ https://issues.apache.org/jira/browse/HIVE-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6940: - Labels: TODOC14 (was: ) [WebHCat]Update documentation for Templeton-Sqoop action Key: HIVE-6940 URL: https://issues.apache.org/jira/browse/HIVE-6940 Project: Hive Issue Type: Bug Components: Documentation, WebHCat Affects Versions: 0.14.0 Reporter: Shuaishuai Nie Labels: TODOC14 WebHCat documentation need to be updated based on the new feature introduced in HIVE-5072 Here is some examples using the endpoint templeton/v1/sqoop example1: (passing Sqoop command directly) curl -s -d command=import --connect jdbc:sqlserver://localhost:4033;databaseName=SqoopDB;user=hadoop;password=password --table mytable --target-dir user/hadoop/importtable -d statusdir=sqoop.output 'http://localhost:50111/templeton/v1/sqoop?user.name=hadoop' example2: (passing source file which contains sqoop command) curl -s -d optionsfile=/sqoopcommand/command0.txt -d statusdir=sqoop.output 'http://localhost:50111/templeton/v1/sqoop?user.name=hadoop' example3: (using --options-file in the middle of sqoop command to enable reuse part of Sqoop command like connection string) curl -s -d files=/sqoopcommand/command1.txt,/sqoopcommand/command2.txt -d command=import --options-file command1.txt --options-file command2.txt -d statusdir=sqoop.output 'http://localhost:50111/templeton/v1/sqoop?user.name=hadoop' Also, for user to pass their JDBC driver jar, they can use the -libjars generic option in the Sqoop command. This is a functionality provided by Sqoop. Set of parameters can be passed to the endpoint: command (Sqoop command string to run) optionsfile (Options file which contain Sqoop command need to run, each section in the Sqoop command separated by space should be a single line in the options file) files (Comma seperated files to be copied to the map reduce cluster) statusdir (A directory where WebHCat will write the status of the Sqoop job. If provided, it is the caller’s responsibility to remove this directory when done) callback (Define a URL to be called upon job completion. You may embed a specific job ID into the URL using $jobId. This tag will be replaced in the callback URL with the job’s job ID. ) enablelog (when set to true, WebHCat will upload job log to statusdir. Need to define statusdir when enabled) All the above parameters are optional, but use have to provide either command or optionsfile in the command. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5072) [WebHCat]Enable directly invoke Sqoop job through Templeton
[ https://issues.apache.org/jira/browse/HIVE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030434#comment-14030434 ] Lefty Leverenz commented on HIVE-5072: -- Doc jira for this feature is HIVE-6940. [WebHCat]Enable directly invoke Sqoop job through Templeton --- Key: HIVE-5072 URL: https://issues.apache.org/jira/browse/HIVE-5072 Project: Hive Issue Type: Improvement Components: WebHCat Affects Versions: 0.12.0 Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-5072.1.patch, HIVE-5072.2.patch, HIVE-5072.3.patch, HIVE-5072.4.patch, HIVE-5072.5.patch, Templeton-Sqoop-Action.pdf Now it is hard to invoke a Sqoop job through templeton. The only way is to use the classpath jar generated by a sqoop job and use the jar delegator in Templeton. We should implement Sqoop Delegator to enable directly invoke Sqoop job through Templeton. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7227) Configuration parameters without descriptions
Lefty Leverenz created HIVE-7227: Summary: Configuration parameters without descriptions Key: HIVE-7227 URL: https://issues.apache.org/jira/browse/HIVE-7227 Project: Hive Issue Type: Bug Components: Documentation Reporter: Lefty Leverenz More than 50 configuration parameters lack descriptions in hive-default.xml.template (or in HiveConf.java, after HIVE-6037 gets committed). They are listed by release number in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7227) Configuration parameters without descriptions
[ https://issues.apache.org/jira/browse/HIVE-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030445#comment-14030445 ] Lefty Leverenz commented on HIVE-7227: -- Here's a list (possibly incomplete) of 51 Hive configuration parameters that don't have descriptions in hive-default.xml.template. Parameters created after Hive 0.13.0 are not covered here. _Release 1 or 2_ hive.exec.submitviachild hive.metastore.metadb.dir hive.jar.path hive.aux.jars.path hive.table.name hive.partition.name hive.alias _Release 3_ hive.cli.errors.ignore _Release 4_ hive.added.files.path hive.added.jars.path _Release 5_ hive.intermediate.compression.codec hive.intermediate.compression.type hive.added.archives.path _Release 6_ hive.metastore.archive.intermediate.archived hive.metastore.archive.intermediate.extracted hive.mapred.partitioner hive.exec.script.trust hive.hadoop.supports.splittable.combineinputformat _Release 7_ hive.lockmgr.zookeeper.default.partition.name hive.metastore.fs.handler.class hive.query.result.fileformat hive.hashtable.initialCapacity hive.hashtable.loadfactor hive.debug.localtask hive.lock.manager hive.outerjoin.supports.filters hive.semantic.analyzer.hook _Release 8_ hive.exec.job.debug.timeout hive.exec.tasklog.debug.timeout hive.merge.rcfile.block.level hive.merge.input.format.block.level hive.merge.current.job.has.dynamic.partitions hive.stats.collect.rawdatasize _Release 8.1_ hive.optimize.metadataonly _Release 9_ _Release 10_ _Release 11_ hive.exec.rcfile.use.sync.cache hive.stats.key.prefix _(internal)_ _Release 12_ hive.scratch.dir.permission datanucleus.fixedDatastore datanucleus.rdbms.useLegacyNativeValueStrategy hive.optimize.sampling.orderby _(internal?)_ hive.optimize.sampling.orderby.number hive.optimize.sampling.orderby.percent hive.server2.authentication.ldap.Domain hive.server2.session.hook hive.typecheck.on.insert _Release 13_ hive.metastore.expression.proxy hive.txn.manager hive.stageid.rearrange hive.explain.dependency.append.tasktype hive.compute.splits.in.am _(comment in HiveConf.java can be used as description)_ hive.rpc.query.plan _(comment in HiveConf.java can be used as description)_ Configuration parameters without descriptions - Key: HIVE-7227 URL: https://issues.apache.org/jira/browse/HIVE-7227 Project: Hive Issue Type: Bug Components: Documentation Reporter: Lefty Leverenz More than 50 configuration parameters lack descriptions in hive-default.xml.template (or in HiveConf.java, after HIVE-6037 gets committed). They are listed by release number in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: 49 config params without descriptions
This list of Hive configuration parameters without descriptions has been transferred to HIVE-7227 https://issues.apache.org/jira/browse/HIVE-7227. -- Lefty On Tue, Apr 22, 2014 at 2:58 AM, Lefty Leverenz leftylever...@gmail.com wrote: Found two more from HIVE-5522 https://issues.apache.org/jira/browse/HIVE-5522 (also HIVE-6098 https://issues.apache.org/jira/browse/HIVE-6098, Merge Tez branch into trunk) so the current total is 51 configs that don't have descriptions in 0.13.0: *Release 13 * hive.compute.splits.in.am hive.rpc.query.plan But these both have comments in HiveConf.java that can be used as descriptions, although they aren't included in hive-default.xml.template. I missed them because I was working from the patch for HIVE-6037 https://issues.apache.org/jira/browse/HIVE-6037 and Navis had used the HiveConf comments for descriptions. (That means there could be more parameters missing from the 0.13.0 template file.) -- Lefty On Mon, Apr 14, 2014 at 1:53 AM, Lefty Leverenz leftylever...@gmail.com wrote: Here's a list of 49 configuration parameters in RC0 (and trunk) that don't have descriptions in hive-default.xml.template: *Release 1 or 2 * hive.exec.submitviachild hive.metastore.metadb.dir hive.jar.path hive.aux.jars.path hive.table.name hive.partition.name hive.alias *Release 3 * hive.cli.errors.ignore *Release 4 * hive.added.files.path hive.added.jars.path *Release 5 * hive.intermediate.compression.codec hive.intermediate.compression.type hive.added.archives.path *Release 6 * hive.metastore.archive.intermediate.archived hive.metastore.archive.intermediate.extracted hive.mapred.partitioner hive.exec.script.trust hive.hadoop.supports.splittable.combineinputformat *Release 7 * hive.lockmgr.zookeeper.default.partition.name hive.metastore.fs.handler.class hive.query.result.fileformat hive.hashtable.initialCapacity hive.hashtable.loadfactor hive.debug.localtask hive.lock.manager hive.outerjoin.supports.filters hive.semantic.analyzer.hook *Release 8 * hive.exec.job.debug.timeout hive.exec.tasklog.debug.timeout hive.merge.rcfile.block.level hive.merge.input.format.block.level hive.merge.current.job.has.dynamic.partitions hive.stats.collect.rawdatasize *Release 8.1 * hive.optimize.metadataonly *Release 9 * *Release 10 * *Release 11 * hive.exec.rcfile.use.sync.cache hive.stats.key.prefix--- *internal* *Release 12 * hive.scratch.dir.permission datanucleus.fixedDatastore datanucleus.rdbms.useLegacyNativeValueStrategy hive.optimize.sampling.orderby --- *internal?* hive.optimize.sampling.orderby.number hive.optimize.sampling.orderby.percent hive.server2.authentication.ldap.Domain hive.server2.session.hook hive.typecheck.on.insert *Release 13 * hive.metastore.expression.proxy hive.txn.manager hive.stageid.rearrange hive.explain.dependency.append.tasktype What's the best way to deal with these? 1. Ignore them (or identify those that can be ignored). 2. Add some descriptions in Hive 0.13.0 RC1. 3. Deal with them after HIVE-6037 https://issues.apache.org/jira/browse/HIVE-6037 gets committed. - Try to cover all of them by Hive 0.14.0: - Put the list in a JIRA and create a common HiveConf.java patch, which can be appended until release 0.14.0 is ready. - Accumulate descriptions in JIRA comments, then create a patch from the comments. - Deal with them as soon as possible: - Put the list in an umbrella JIRA and use sub-task JIRAs to add descriptions individually or in small groups. 4. Deal with them in the wiki, then patch HiveConf.java before release 0.14.0. 5. [Your idea goes here.] -- Lefty
[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf
[ https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030448#comment-14030448 ] Lefty Leverenz commented on HIVE-6037: -- HIVE-7227 lists 51 parameters in releases up to 0.13 that don't have descriptions. Synchronize HiveConf with hive-default.xml.template and support show conf - Key: HIVE-6037 URL: https://issues.apache.org/jira/browse/HIVE-6037 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.14.0 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037-0.13.0, HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, HIVE-6037.16.patch.txt, HIVE-6037.17.patch, HIVE-6037.2.patch.txt, HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, HIVE-6037.6.patch.txt, HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, HIVE-6037.9.patch.txt, HIVE-6037.patch see HIVE-5879 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6586) Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)
[ https://issues.apache.org/jira/browse/HIVE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030451#comment-14030451 ] Lefty Leverenz commented on HIVE-6586: -- See HIVE-7227 for a list of parameters that don't have descriptions yet. Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos) --- Key: HIVE-6586 URL: https://issues.apache.org/jira/browse/HIVE-6586 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Lefty Leverenz Labels: TODOC14 HIVE-6037 puts the definitions of configuration parameters into the HiveConf.java file, but several recent jiras for release 0.13.0 introduce new parameters that aren't in HiveConf.java yet and some parameter definitions need to be altered for 0.13.0. This jira will patch HiveConf.java after HIVE-6037 gets committed. Also, four typos patched in HIVE-6582 need to be fixed in the new HiveConf.java. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6430: - Labels: TODOC14 (was: ) MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.14.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030467#comment-14030467 ] Lefty Leverenz commented on HIVE-6430: -- The configuration parameters *hive.mapjoin.optimized.hashtable* and *hive.mapjoin.optimized.hashtable.wbsize* need to be documented in the wiki for release 0.14.0. * [Hive Configuration Properties | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties] MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.14.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6187) Add test to verify that DESCRIBE TABLE works with quoted table names
[ https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6187: - Labels: TODOC14 (was: ) Add test to verify that DESCRIBE TABLE works with quoted table names Key: HIVE-6187 URL: https://issues.apache.org/jira/browse/HIVE-6187 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Andy Mok Assignee: Carl Steinbach Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6187.1.patch Backticks around tables named after special keywords, such as items, allow us to create, drop, and alter the table. For example {code:sql} CREATE TABLE foo.`items` (bar INT); DROP TABLE foo.`items`; ALTER TABLE `items` RENAME TO `items_`; {code} However, we cannot call {code:sql} DESCRIBE foo.`items`; DESCRIBE `items`; {code} The DESCRIBE query does not permit backticks to surround table names. The error returned is {code:sql} FAILED: SemanticException [Error 10001]: Table not found `items` {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6187) Add test to verify that DESCRIBE TABLE works with quoted table names
[ https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030498#comment-14030498 ] Lefty Leverenz commented on HIVE-6187: -- This fix should be documented in the wiki for 0.14.0. * [Language Manual -- DDL -- Describe | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Describe] Add test to verify that DESCRIBE TABLE works with quoted table names Key: HIVE-6187 URL: https://issues.apache.org/jira/browse/HIVE-6187 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Andy Mok Assignee: Carl Steinbach Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6187.1.patch Backticks around tables named after special keywords, such as items, allow us to create, drop, and alter the table. For example {code:sql} CREATE TABLE foo.`items` (bar INT); DROP TABLE foo.`items`; ALTER TABLE `items` RENAME TO `items_`; {code} However, we cannot call {code:sql} DESCRIBE foo.`items`; DESCRIBE `items`; {code} The DESCRIBE query does not permit backticks to surround table names. The error returned is {code:sql} FAILED: SemanticException [Error 10001]: Table not found `items` {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6411) Support more generic way of using composite key for HBaseHandler
[ https://issues.apache.org/jira/browse/HIVE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030506#comment-14030506 ] Lefty Leverenz commented on HIVE-6411: -- The release note says this should be documented at the Hive-HBase Integration page, which is in the Design Docs: * [Design Docs -- Completed: Hive HBase Integration | https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration] Support more generic way of using composite key for HBaseHandler Key: HIVE-6411 URL: https://issues.apache.org/jira/browse/HIVE-6411 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6411.1.patch.txt, HIVE-6411.10.patch.txt, HIVE-6411.11.patch.txt, HIVE-6411.2.patch.txt, HIVE-6411.3.patch.txt, HIVE-6411.4.patch.txt, HIVE-6411.5.patch.txt, HIVE-6411.6.patch.txt, HIVE-6411.7.patch.txt, HIVE-6411.8.patch.txt, HIVE-6411.9.patch.txt HIVE-2599 introduced using custom object for the row key. But it forces key objects to extend HBaseCompositeKey, which is again extension of LazyStruct. If user provides proper Object and OI, we can replace internal key and keyOI with those. Initial implementation is based on factory interface. {code} public interface HBaseKeyFactory { void init(SerDeParameters parameters, Properties properties) throws SerDeException; ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException; LazyObjectBase createObject(ObjectInspector inspector) throws SerDeException; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6411) Support more generic way of using composite key for HBaseHandler
[ https://issues.apache.org/jira/browse/HIVE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6411: - Labels: TODOC14 (was: ) Support more generic way of using composite key for HBaseHandler Key: HIVE-6411 URL: https://issues.apache.org/jira/browse/HIVE-6411 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6411.1.patch.txt, HIVE-6411.10.patch.txt, HIVE-6411.11.patch.txt, HIVE-6411.2.patch.txt, HIVE-6411.3.patch.txt, HIVE-6411.4.patch.txt, HIVE-6411.5.patch.txt, HIVE-6411.6.patch.txt, HIVE-6411.7.patch.txt, HIVE-6411.8.patch.txt, HIVE-6411.9.patch.txt HIVE-2599 introduced using custom object for the row key. But it forces key objects to extend HBaseCompositeKey, which is again extension of LazyStruct. If user provides proper Object and OI, we can replace internal key and keyOI with those. Initial implementation is based on factory interface. {code} public interface HBaseKeyFactory { void init(SerDeParameters parameters, Properties properties) throws SerDeException; ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException; LazyObjectBase createObject(ObjectInspector inspector) throws SerDeException; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6500) Stats collection via filesystem
[ https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6500: - Labels: TODOC14 (was: ) Stats collection via filesystem --- Key: HIVE-6500 URL: https://issues.apache.org/jira/browse/HIVE-6500 Project: Hive Issue Type: New Feature Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Labels: TODOC14 Fix For: 0.13.0 Attachments: HIVE-6500.2.patch, HIVE-6500.3.patch, HIVE-6500.patch Recently, support for stats gathering via counter was [added | https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has following issues: * [Length of counter group name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340] * [Length of counter name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337] * [Number of distinct counter groups are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343] * [Number of distinct counters are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334] Although, these limits are configurable, but setting them to higher value implies increased memory load on AM and job history server. Now, whether these limits makes sense or not is [debatable | https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that Hive doesn't make use of counters features of framework so that it we can evolve this feature without relying on support from framework. Filesystem based counter collection is a step in that direction. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7099) Add Decimal datatype support for Windowing
[ https://issues.apache.org/jira/browse/HIVE-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7099: - Labels: TODOC14 (was: ) Add Decimal datatype support for Windowing -- Key: HIVE-7099 URL: https://issues.apache.org/jira/browse/HIVE-7099 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7099.1.patch, HIVE-7099.2.patch Decimal datatype is not handled by Windowing -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7061) sql std auth - insert queries without overwrite should not require delete privileges
[ https://issues.apache.org/jira/browse/HIVE-7061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7061: - Labels: TODOC14 (was: ) sql std auth - insert queries without overwrite should not require delete privileges Key: HIVE-7061 URL: https://issues.apache.org/jira/browse/HIVE-7061 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7061.1.patch, HIVE-7061.2.patch, HIVE-7061.3.patch Insert queries can do the equivalent of delete and insert of all rows of a table or partition, if the overwrite keyword is used. As a result DELETE privilege is applicable to such queries. However, SQL Standard auth requires DELETE privilege even for queries that don't have the overwrite keyword. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6367) Implement Decimal in ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6367: - Labels: Parquet TODOC14 (was: Parquet) Implement Decimal in ParquetSerde - Key: HIVE-6367 URL: https://issues.apache.org/jira/browse/HIVE-6367 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Brock Noland Assignee: Xuefu Zhang Labels: Parquet, TODOC14 Fix For: 0.14.0 Attachments: HIVE-6367.patch, dec.parq Some code in the Parquet Serde deals with decimal and other does not. For example in ETypeConverter we convert Decimal to double (which is invalid) whereas in DataWritableWriter and other locations we throw an exception if decimal is used. This JIRA is to implement decimal support. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5961) Add explain authorize for checking privileges
[ https://issues.apache.org/jira/browse/HIVE-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-5961: - Labels: TODOC14 (was: ) Add explain authorize for checking privileges - Key: HIVE-5961 URL: https://issues.apache.org/jira/browse/HIVE-5961 Project: Hive Issue Type: Improvement Components: Authorization Reporter: Navis Assignee: Navis Priority: Trivial Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-5961.1.patch.txt, HIVE-5961.2.patch.txt, HIVE-5961.3.patch.txt, HIVE-5961.4.patch.txt, HIVE-5961.5.patch.txt, HIVE-5961.6.patch.txt For easy checking of need privileges for a query, {noformat} explain authorize select * from src join srcpart INPUTS: default@srcpart default@srcpart@ds=2008-04-08/hr=11 default@srcpart@ds=2008-04-08/hr=12 default@srcpart@ds=2008-04-09/hr=11 default@srcpart@ds=2008-04-09/hr=12 default@src OUTPUTS: file:/home/navis/apache/oss-hive/itests/qtest/target/tmp/localscratchdir/hive_2013-12-04_21-57-53_748_5323811717799107868-1/-mr-1 CURRENT_USER: hive_test_user OPERATION: QUERY AUTHORIZATION_FAILURES: No privilege 'Select' found for inputs { database:default, table:srcpart, columnName:key} No privilege 'Select' found for inputs { database:default, table:src, columnName:key} No privilege 'Select' found for inputs { database:default, table:src, columnName:key} {noformat} Hopefully good for debugging of authorization, which is in progress on HIVE-5837. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6928) Beeline should not chop off describe extended results by default
[ https://issues.apache.org/jira/browse/HIVE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030560#comment-14030560 ] Hive QA commented on HIVE-6928: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650101/HIVE-6928.2.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5610 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter_partitioned org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/454/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/454/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-454/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650101 Beeline should not chop off describe extended results by default -- Key: HIVE-6928 URL: https://issues.apache.org/jira/browse/HIVE-6928 Project: Hive Issue Type: Bug Components: CLI Reporter: Szehon Ho Assignee: Chinna Rao Lalam Attachments: HIVE-6928.1.patch, HIVE-6928.2.patch, HIVE-6928.patch By default, beeline truncates long results based on the console width like: {code} +-+--+ | col_name | | +-+--+ | pat_id | string | | score | float | | acutes | float | | | | | Detailed Table Information | Table(tableName:refills, dbName:default, owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto | +-+--+ 5 rows selected (0.4 seconds) {code} This can be changed by !outputformat, but the default should behave better to give a better experience to the first-time beeline user. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
[ https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7065: - Labels: TODOC14 (was: ) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup - Key: HIVE-7065 URL: https://issues.apache.org/jira/browse/HIVE-7065 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch WebHCat config has templeton.hive.properties to specify Hive config properties that need to be passed to Hive client on node executing a job submitted through WebHCat (hive query, for example). this should include hive.execution.engine -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6122) Implement show grant on resource
[ https://issues.apache.org/jira/browse/HIVE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6122: - Labels: TODOC13 (was: ) Implement show grant on resource -- Key: HIVE-6122 URL: https://issues.apache.org/jira/browse/HIVE-6122 Project: Hive Issue Type: Improvement Components: Authorization Reporter: Navis Assignee: Navis Priority: Minor Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-6122.1.patch.txt, HIVE-6122.2.patch.txt, HIVE-6122.3.patch.txt, HIVE-6122.4.patch, HIVE-6122.4.patch, HIVE-6122.5.patch, HIVE-6122.6.patch Currently, hive shows privileges owned by a principal. Reverse API is also needed, which shows all principals for a resource. {noformat} show grant user hive_test_user on database default; show grant user hive_test_user on table dummy; show grant user hive_test_user on all; {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set
[ https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030624#comment-14030624 ] Xuefu Zhang commented on HIVE-7200: --- The result looks good. Could you update RB with your latest patch? Beeline output displays column heading even if --showHeader=false is set Key: HIVE-7200 URL: https://issues.apache.org/jira/browse/HIVE-7200 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7200.1.patch, HIVE-7200.2.patch A few minor/cosmetic issues with the beeline CLI. 1) Tool prints the column headers despite setting the --showHeader to false. This property only seems to affect the subsequent header information that gets printed based on the value of property headerInterval (default value is 100). 2) When showHeader is true headerInterval 0, the header after the first interval gets printed after headerInterval - 1 rows. The code seems to count the initial header as a row, if you will. 3) The table footer(the line that closes the table) does not get printed if the showHeader is false. I think the table should get closed irrespective of whether it prints the header or not. {code} 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| +--+ 6 rows selected (3.998 seconds) 0: jdbc:hive2://localhost:1 !set headerInterval 2 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| +--+ | val | +--+ | f| | T| +--+ | val | +--+ | F| | 0| +--+ | val | +--+ | 1| +--+ 6 rows selected (0.691 seconds) 0: jdbc:hive2://localhost:1 !set showHeader false 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| 6 rows selected (1.728 seconds) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6928) Beeline should not chop off describe extended results by default
[ https://issues.apache.org/jira/browse/HIVE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030630#comment-14030630 ] Xuefu Zhang commented on HIVE-6928: --- [~chinnalalam] could you please update RB with your latest patch? Thanks. Beeline should not chop off describe extended results by default -- Key: HIVE-6928 URL: https://issues.apache.org/jira/browse/HIVE-6928 Project: Hive Issue Type: Bug Components: CLI Reporter: Szehon Ho Assignee: Chinna Rao Lalam Attachments: HIVE-6928.1.patch, HIVE-6928.2.patch, HIVE-6928.patch By default, beeline truncates long results based on the console width like: {code} +-+--+ | col_name | | +-+--+ | pat_id | string | | score | float | | acutes | float | | | | | Detailed Table Information | Table(tableName:refills, dbName:default, owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto | +-+--+ 5 rows selected (0.4 seconds) {code} This can be changed by !outputformat, but the default should behave better to give a better experience to the first-time beeline user. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6394) Implement Timestmap in ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030675#comment-14030675 ] Hive QA commented on HIVE-6394: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650102/HIVE-6394.7.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5613 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/455/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/455/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-455/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650102 Implement Timestmap in ParquetSerde --- Key: HIVE-6394 URL: https://issues.apache.org/jira/browse/HIVE-6394 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Jarek Jarcec Cecho Assignee: Szehon Ho Labels: Parquet Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.6.patch, HIVE-6394.7.patch, HIVE-6394.patch This JIRA is to implement timestamp support in Parquet SerDe. -- This message was sent by Atlassian JIRA (v6.2#6252)
Questions about Hive authorization under HDFS permission
Hi, all I have enabled hive authorization in my testing cluster. I use the user hive to create database hivedb and grant create privilege on hivedb to user root. But I come across the following problem that root can not create table in hivedb even it has the create privilege. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=root, access=WRITE, inode=/tmp/user/hive/warehouse/hivedb.db:hive:hadoop:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:214) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:158) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5499) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5481) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5455) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3455) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3425) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3397) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:724) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:502) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48089) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) It is obviously that the hivedb.db directory in HDFS are not allowed to be written by other user. So how does hive authorization work under the HDFS permissions? PS. if I create a table by user hive and grant update privilege to user root. The same ERROR will come across if I load data into the table by root. Look forward to your reply! Thanks Alex
[jira] [Created] (HIVE-7228) StreamPrinter should be joined to calling thread
Pankit Thapar created HIVE-7228: --- Summary: StreamPrinter should be joined to calling thread Key: HIVE-7228 URL: https://issues.apache.org/jira/browse/HIVE-7228 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Pankit Thapar Priority: Minor ISSUE: StreamPrinter class is used for connecting an input stream (connected to output) of a process with the output stream of a Session (CliSessionState/SessionState class) It acts as a pipe between the two and transfers data from input stream to the output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. From some of the current usages of this class, I noticed that the calling threads do not wait for the transfer operation to be completed. That is, the calling thread does not join the SteamPrinter threads. The calling thread would move forward thinking that the respective output stream already has the data needed. But, it is not always the right assumption since, it might happen that the StreamPrinter thread did not finish execution by the time it was expected by the calling thread. FIX: To ensure that calling thread waits for the StreamPrinter threads to complete, StreamPrinter threads are joined to calling thread. Please note , without the fix, TestCliDriverMethods#testRun failed sometimes (like 1 in 30 times). This test would not fail with this fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7201) Fix TestHiveConf#testConfProperties test case
[ https://issues.apache.org/jira/browse/HIVE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankit Thapar updated HIVE-7201: Status: Patch Available (was: Open) Fix TestHiveConf#testConfProperties test case - Key: HIVE-7201 URL: https://issues.apache.org/jira/browse/HIVE-7201 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.0 Reporter: Pankit Thapar Priority: Minor Attachments: HIVE-7201-1.patch, HIVE-7201-2.patch, HIVE-7201.03.patch, HIVE-7201.patch CHANGE 1: TEST CASE : The intention of TestHiveConf#testConfProperties() is to test the HiveConf properties being set in the priority as expected. Each HiveConf object is initialized as follows: 1) Hadoop configuration properties are applied. 2) ConfVar properties with non-null values are overlayed. 3) hive-site.xml properties are overlayed. ISSUE : The mapreduce related configurations are loaded by JobConf and not Configuration. The current test tries to get the configuration properties like : HADOOPNUMREDUCERS (mapred.job.reduces) from Configuration class. But these mapreduce related properties are loaded by JobConf class from mapred-default.xml. DETAILS : LINE 63 : checkHadoopConf(ConfVars.HADOOPNUMREDUCERS.varname, 1); --fails Because, private void checkHadoopConf(String name, String expectedHadoopVal) { Assert.assertEquals(expectedHadoopVal, new Configuration().get(name)); Second parameter is null, since its the JobConf class and not the Configuration class that initializes mapred-default values. } Code that loads mapreduce resources is in ConfigUtil and JobConf makes a call like this (in static block): public class JobConf extends Configuration { private static final Log LOG = LogFactory.getLog(JobConf.class); static{ ConfigUtil.loadResources(); -- loads mapreduce related resources (mapreduce-default.xml) } . } Please note, the test case assertion works fine if HiveConf() constructor is called before this assertion since, HiveConf() triggers JobConf() which basically sets the default values of the properties pertaining to mapreduce. This is why, there won't be any failures if testHiveSitePath() was run before testConfProperties() as that would load mapreduce properties into config properties. FIX: Instead of using a Configuration object, we can use the JobConf object to get the default values used by hadoop/mapreduce. CHANGE 2: In TestHiveConf#testHiveSitePath(), a call to static method getHiveSiteLocation() should be called statically instead of using an object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7228) StreamPrinter should be joined to calling thread
[ https://issues.apache.org/jira/browse/HIVE-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankit Thapar updated HIVE-7228: Attachment: HIVE-7228.patch Added join() to usages of StreamPrinter StreamPrinter should be joined to calling thread - Key: HIVE-7228 URL: https://issues.apache.org/jira/browse/HIVE-7228 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Pankit Thapar Priority: Minor Attachments: HIVE-7228.patch ISSUE: StreamPrinter class is used for connecting an input stream (connected to output) of a process with the output stream of a Session (CliSessionState/SessionState class) It acts as a pipe between the two and transfers data from input stream to the output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. From some of the current usages of this class, I noticed that the calling threads do not wait for the transfer operation to be completed. That is, the calling thread does not join the SteamPrinter threads. The calling thread would move forward thinking that the respective output stream already has the data needed. But, it is not always the right assumption since, it might happen that the StreamPrinter thread did not finish execution by the time it was expected by the calling thread. FIX: To ensure that calling thread waits for the StreamPrinter threads to complete, StreamPrinter threads are joined to calling thread. Please note , without the fix, TestCliDriverMethods#testRun failed sometimes (like 1 in 30 times). This test would not fail with this fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7228) StreamPrinter should be joined to calling thread
[ https://issues.apache.org/jira/browse/HIVE-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankit Thapar updated HIVE-7228: Status: Patch Available (was: Open) StreamPrinter should be joined to calling thread - Key: HIVE-7228 URL: https://issues.apache.org/jira/browse/HIVE-7228 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Pankit Thapar Priority: Minor Attachments: HIVE-7228.patch ISSUE: StreamPrinter class is used for connecting an input stream (connected to output) of a process with the output stream of a Session (CliSessionState/SessionState class) It acts as a pipe between the two and transfers data from input stream to the output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. From some of the current usages of this class, I noticed that the calling threads do not wait for the transfer operation to be completed. That is, the calling thread does not join the SteamPrinter threads. The calling thread would move forward thinking that the respective output stream already has the data needed. But, it is not always the right assumption since, it might happen that the StreamPrinter thread did not finish execution by the time it was expected by the calling thread. FIX: To ensure that calling thread waits for the StreamPrinter threads to complete, StreamPrinter threads are joined to calling thread. Please note , without the fix, TestCliDriverMethods#testRun failed sometimes (like 1 in 30 times). This test would not fail with this fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7201) Fix TestHiveConf#testConfProperties test case
[ https://issues.apache.org/jira/browse/HIVE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7201: --- Assignee: Pankit Thapar Fix TestHiveConf#testConfProperties test case - Key: HIVE-7201 URL: https://issues.apache.org/jira/browse/HIVE-7201 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.0 Reporter: Pankit Thapar Assignee: Pankit Thapar Priority: Minor Attachments: HIVE-7201-1.patch, HIVE-7201-2.patch, HIVE-7201.03.patch, HIVE-7201.patch CHANGE 1: TEST CASE : The intention of TestHiveConf#testConfProperties() is to test the HiveConf properties being set in the priority as expected. Each HiveConf object is initialized as follows: 1) Hadoop configuration properties are applied. 2) ConfVar properties with non-null values are overlayed. 3) hive-site.xml properties are overlayed. ISSUE : The mapreduce related configurations are loaded by JobConf and not Configuration. The current test tries to get the configuration properties like : HADOOPNUMREDUCERS (mapred.job.reduces) from Configuration class. But these mapreduce related properties are loaded by JobConf class from mapred-default.xml. DETAILS : LINE 63 : checkHadoopConf(ConfVars.HADOOPNUMREDUCERS.varname, 1); --fails Because, private void checkHadoopConf(String name, String expectedHadoopVal) { Assert.assertEquals(expectedHadoopVal, new Configuration().get(name)); Second parameter is null, since its the JobConf class and not the Configuration class that initializes mapred-default values. } Code that loads mapreduce resources is in ConfigUtil and JobConf makes a call like this (in static block): public class JobConf extends Configuration { private static final Log LOG = LogFactory.getLog(JobConf.class); static{ ConfigUtil.loadResources(); -- loads mapreduce related resources (mapreduce-default.xml) } . } Please note, the test case assertion works fine if HiveConf() constructor is called before this assertion since, HiveConf() triggers JobConf() which basically sets the default values of the properties pertaining to mapreduce. This is why, there won't be any failures if testHiveSitePath() was run before testConfProperties() as that would load mapreduce properties into config properties. FIX: Instead of using a Configuration object, we can use the JobConf object to get the default values used by hadoop/mapreduce. CHANGE 2: In TestHiveConf#testHiveSitePath(), a call to static method getHiveSiteLocation() should be called statically instead of using an object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7201) Fix TestHiveConf#testConfProperties test case
[ https://issues.apache.org/jira/browse/HIVE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030766#comment-14030766 ] Ashutosh Chauhan commented on HIVE-7201: +1 Fix TestHiveConf#testConfProperties test case - Key: HIVE-7201 URL: https://issues.apache.org/jira/browse/HIVE-7201 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.0 Reporter: Pankit Thapar Assignee: Pankit Thapar Priority: Minor Attachments: HIVE-7201-1.patch, HIVE-7201-2.patch, HIVE-7201.03.patch, HIVE-7201.patch CHANGE 1: TEST CASE : The intention of TestHiveConf#testConfProperties() is to test the HiveConf properties being set in the priority as expected. Each HiveConf object is initialized as follows: 1) Hadoop configuration properties are applied. 2) ConfVar properties with non-null values are overlayed. 3) hive-site.xml properties are overlayed. ISSUE : The mapreduce related configurations are loaded by JobConf and not Configuration. The current test tries to get the configuration properties like : HADOOPNUMREDUCERS (mapred.job.reduces) from Configuration class. But these mapreduce related properties are loaded by JobConf class from mapred-default.xml. DETAILS : LINE 63 : checkHadoopConf(ConfVars.HADOOPNUMREDUCERS.varname, 1); --fails Because, private void checkHadoopConf(String name, String expectedHadoopVal) { Assert.assertEquals(expectedHadoopVal, new Configuration().get(name)); Second parameter is null, since its the JobConf class and not the Configuration class that initializes mapred-default values. } Code that loads mapreduce resources is in ConfigUtil and JobConf makes a call like this (in static block): public class JobConf extends Configuration { private static final Log LOG = LogFactory.getLog(JobConf.class); static{ ConfigUtil.loadResources(); -- loads mapreduce related resources (mapreduce-default.xml) } . } Please note, the test case assertion works fine if HiveConf() constructor is called before this assertion since, HiveConf() triggers JobConf() which basically sets the default values of the properties pertaining to mapreduce. This is why, there won't be any failures if testHiveSitePath() was run before testConfProperties() as that would load mapreduce properties into config properties. FIX: Instead of using a Configuration object, we can use the JobConf object to get the default values used by hadoop/mapreduce. CHANGE 2: In TestHiveConf#testHiveSitePath(), a call to static method getHiveSiteLocation() should be called statically instead of using an object. -- This message was sent by Atlassian JIRA (v6.2#6252)
Disc out of space error
Hi, One of my job keeps facing FSError: java.io.IOException: No space left on device with some tasks fail with org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/file.out at on Host node72-142.prod-aws.eadpdata.ea.com OR org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_201405211957_566618_m_01_0/intermediate.34 at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381) at ... The nodes failed the tasks don't look that full and the stats for this job is attached below. The job is doing a self inner join in the subquery then do some aggregation. Does anybody possibly know what's the reason the job fails on space issue while we still have some space? And is there any way to optimize the query itself besides the space cleanup? Thanks a lot! SET mapred.max.split.size=134217728; SET mapred.min.split.size.per.node=1; SET mapred.min.split.size.per.rack=1; CREATE EXTERNAL TABLE IF NOT EXISTS mpst.score_per_min_v2 ( game_name STRING, hosted_platform STRING, s_kit STRING, vehicle STRING, score_amt FLOAT, min_spent FLOAT, score_per_min FLOAT ) PARTITIONED BY (load_datetime STRING) STORED AS RCFILE LOCATION '/hive/warehouse/mpst/score_per_min_v2'; INSERT OVERWRITE TABLE score_per_min_v2 PARTITION(load_datetime='2014-06-09 23-58-00') SELECT game_name, hosted_platform, CASE WHEN s_kit IS NOT NULL THEN s_kit ELSE NA END AS s_kit, vehicle, SUM(score_amt), SUM(time_duration/60) AS min_spent, CASE WHEN SUM(time_duration/60)=0 THEN 0.0 ELSE round(SUM(score_amt)/SUM(time_duration/60),2) END AS score_per_min FROM ( SELECT c.round_guid AS round_guid, c.persona_id AS persona_id, c.player_id AS player_id, c.round_start_datetime AS round_start_datetime, c.s_kit AS s_kit, c.vehicle AS vehicle, a.round_time AS start_time, c.round_time AS end_time, (c.round_time - a.round_time) AS time_duration, c.score_amt, c.hosted_platform, c.game_name FROM mpst.spm_stg_v2 c INNER JOIN mpst.spm_stg_v2 a ON a.dt= '2014-06-10' AND c.dt = '2014-06-10' AND a.dt = c.dt AND a.service = c.service AND a.hour = c.hour AND a.round_guid = c.round_guid AND a.player_id = c.player_id AND a.hosted_platform = c.hosted_platform AND a.persona_id = c.persona_id AND a.player_id = c.player_id AND a.round_start_datetime = c.round_start_datetime AND a.rank = (c.rank - 1) ) x GROUP BY game_name, hosted_platform, s_kit, vehicle; Map-Reduce Framework Map output materialized bytes 173,033,990,918 0 173,033,990,918 Map input records 555,343,308 0 555,343,308 Reduce shuffle bytes 0 173,033,990,918 173,033,990,918 Spilled Records 4,188,988,304 1,350,009,594 5,538,997,898 Map output bytes 169,705,718,344 0 169,705,718,344 Total committed heap usage (bytes) 3,002,007,552 553,385,984 3,555,393,536 CPU time spent (ms) 26,347,260 10,932,050 37,279,310 Map input bytes 1,275,536,063 0 1,275,536,063 SPLIT_RAW_BYTES 13,493 0 13,493 Combine input records 0 0 0 Reduce input records 0 1,110,686,616 1,110,686,616 Reduce input groups 0 1,110,686,616 1,110,686,616 Combine output records 0 0 0 Physical memory (bytes) snapshot 3,628,310,528 493,240,320 4,121,550,848 Reduce output records 0 0 0 Virtual memory (bytes) snapshot 21,354,807,296 4,420,263,936 25,775,071,232 Map output records 1,110,686,616 0 1,110,686,616 Regards, Y. Chen --- Perspiration never betray you ---
[jira] [Updated] (HIVE-7228) StreamPrinter should be joined to calling thread
[ https://issues.apache.org/jira/browse/HIVE-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7228: --- Assignee: Pankit Thapar StreamPrinter should be joined to calling thread - Key: HIVE-7228 URL: https://issues.apache.org/jira/browse/HIVE-7228 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Pankit Thapar Assignee: Pankit Thapar Priority: Minor Attachments: HIVE-7228.patch ISSUE: StreamPrinter class is used for connecting an input stream (connected to output) of a process with the output stream of a Session (CliSessionState/SessionState class) It acts as a pipe between the two and transfers data from input stream to the output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. From some of the current usages of this class, I noticed that the calling threads do not wait for the transfer operation to be completed. That is, the calling thread does not join the SteamPrinter threads. The calling thread would move forward thinking that the respective output stream already has the data needed. But, it is not always the right assumption since, it might happen that the StreamPrinter thread did not finish execution by the time it was expected by the calling thread. FIX: To ensure that calling thread waits for the StreamPrinter threads to complete, StreamPrinter threads are joined to calling thread. Please note , without the fix, TestCliDriverMethods#testRun failed sometimes (like 1 in 30 times). This test would not fail with this fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7228) StreamPrinter should be joined to calling thread
[ https://issues.apache.org/jira/browse/HIVE-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030771#comment-14030771 ] Ashutosh Chauhan commented on HIVE-7228: +1 StreamPrinter should be joined to calling thread - Key: HIVE-7228 URL: https://issues.apache.org/jira/browse/HIVE-7228 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Pankit Thapar Assignee: Pankit Thapar Priority: Minor Attachments: HIVE-7228.patch ISSUE: StreamPrinter class is used for connecting an input stream (connected to output) of a process with the output stream of a Session (CliSessionState/SessionState class) It acts as a pipe between the two and transfers data from input stream to the output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. From some of the current usages of this class, I noticed that the calling threads do not wait for the transfer operation to be completed. That is, the calling thread does not join the SteamPrinter threads. The calling thread would move forward thinking that the respective output stream already has the data needed. But, it is not always the right assumption since, it might happen that the StreamPrinter thread did not finish execution by the time it was expected by the calling thread. FIX: To ensure that calling thread waits for the StreamPrinter threads to complete, StreamPrinter threads are joined to calling thread. Please note , without the fix, TestCliDriverMethods#testRun failed sometimes (like 1 in 30 times). This test would not fail with this fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set
[ https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030784#comment-14030784 ] Naveen Gangam commented on HIVE-7200: - Done. The review has been updated with the latest diff. Beeline output displays column heading even if --showHeader=false is set Key: HIVE-7200 URL: https://issues.apache.org/jira/browse/HIVE-7200 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7200.1.patch, HIVE-7200.2.patch A few minor/cosmetic issues with the beeline CLI. 1) Tool prints the column headers despite setting the --showHeader to false. This property only seems to affect the subsequent header information that gets printed based on the value of property headerInterval (default value is 100). 2) When showHeader is true headerInterval 0, the header after the first interval gets printed after headerInterval - 1 rows. The code seems to count the initial header as a row, if you will. 3) The table footer(the line that closes the table) does not get printed if the showHeader is false. I think the table should get closed irrespective of whether it prints the header or not. {code} 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| +--+ 6 rows selected (3.998 seconds) 0: jdbc:hive2://localhost:1 !set headerInterval 2 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| +--+ | val | +--+ | f| | T| +--+ | val | +--+ | F| | 0| +--+ | val | +--+ | 1| +--+ 6 rows selected (0.691 seconds) 0: jdbc:hive2://localhost:1 !set showHeader false 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| 6 rows selected (1.728 seconds) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7182) ResultSet is not closed in JDBCStatsPublisher#init()
[ https://issues.apache.org/jira/browse/HIVE-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7182: --- Assignee: steve, Oh Status: Open (was: Patch Available) Patch fails to compile. ResultSet is not closed in JDBCStatsPublisher#init() Key: HIVE-7182 URL: https://issues.apache.org/jira/browse/HIVE-7182 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: steve, Oh Priority: Minor Attachments: HIVE-7182.1.patch, HIVE-7182.patch {code} ResultSet rs = dbm.getTables(null, null, JDBCStatsUtils.getStatTableName(), null); boolean tblExists = rs.next(); {code} rs is not closed upon return from init() If stmt.executeUpdate() throws exception, stmt.close() would be skipped - the close() call should be placed in finally block. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()
[ https://issues.apache.org/jira/browse/HIVE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7183: --- Resolution: Fixed Fix Version/s: 0.14.0 Assignee: SUYEON LEE Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Suyeon! Size of partColumnGrants should be checked in ObjectStore#removeRole() -- Key: HIVE-7183 URL: https://issues.apache.org/jira/browse/HIVE-7183 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: SUYEON LEE Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7183.patch Here is related code: {code} ListMPartitionColumnPrivilege partColumnGrants = listPrincipalAllPartitionColumnGrants( mRol.getRoleName(), PrincipalType.ROLE); if (tblColumnGrants.size() 0) { pm.deletePersistentAll(partColumnGrants); {code} Size of tblColumnGrants is currently checked. Size of partColumnGrants should be checked instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7216) Hive Query Failure on Hive 0.10.0
[ https://issues.apache.org/jira/browse/HIVE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030793#comment-14030793 ] Ashutosh Chauhan commented on HIVE-7216: {{org.apache.hive.hcatalog.data.JsonSerDe}} is a json serde shipped with Hive and is supported by project. Please switch using to that. Hive Query Failure on Hive 0.10.0 - Key: HIVE-7216 URL: https://issues.apache.org/jira/browse/HIVE-7216 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Environment: hadoop 0.20.0, hive 0.10.0, Ubuntu 13.04 LTS Reporter: Suddhasatwa Bhaumik Attachments: HadoopTaskDetails.html Hello, I have created a table and a view in hive as below: ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar; CREATE EXTERNAL TABLE IF NOT EXISTS ulf_raw ( transactionid STRING, externaltraceid STRING, externalreferenceid STRING, usecaseid STRING, timestampin STRING, timestampout STRING, component STRING, destination STRING, callerid STRING, service STRING, logpoint STRING, requestin STRING, status STRING, errorcode STRING, error STRING, servername STRING, inboundrequestip STRING, inboundrequestport STRING, outboundurl STRING, messagesize STRING, jmsdestination STRING, msisdn STRING, countrycode STRING, acr STRING, imei STRING, imsi STRING, iccid STRING, email STRING, payload STRING ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( mapping.transactionid = transaction-id,mapping.timestampin = timestamp-in ) LOCATION '/home/bhaumik/input'; ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar; create view IF NOT EXISTS parse_soap_payload as select transactionid, component, logpoint, g.service as service, case g.service when 'createHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()') when 'retrieveHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()') when 'updateHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()') end as opcoNodeId , case g.service when 'createHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'opcoId\']/text()') when 'retrieveHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'opcoId\']/text()') when 'updateHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'opcoId\']/text()') end as opcoId , case g.service when 'createHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()') when 'retrieveHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()') when 'updateHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()') end as partnerParentNodeId , case g.service when 'createHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'partnerId\']/text()') when 'retrieveHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'partnerId\']/text()') when 'updateHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'partnerId\']/text()') end as partnerId from ulf_raw g; When I am running hive query: select * from parse_soap_payload; it is failing with attached error. I only have json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar file in Hadoop LIB and HIVE LIB folder. Please advise if there are other JAR files required to be added here. If yes, please advise from where I can download them? Thanks, Suddhasatwa -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030807#comment-14030807 ] Sergey Shelukhin commented on HIVE-6430: They are already documented in config template as far as I recall. Should we have that copied to wiki automatically somehow? MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.14.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive
[ https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5771: --- Attachment: HIVE-5771.12.patch updated .q.out files. Constant propagation optimizer for Hive --- Key: HIVE-5771 URL: https://issues.apache.org/jira/browse/HIVE-5771 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ted Xu Assignee: Ted Xu Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, HIVE-5771.11.patch, HIVE-5771.12.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, HIVE-5771.patch.javaonly Currently there is no constant folding/propagation optimizer, all expressions are evaluated at runtime. HIVE-2470 did a great job on evaluating constants on UDF initializing phase, however, it is still a runtime evaluation and it doesn't propagate constants from a subquery to outside. It may reduce I/O and accelerate process if we introduce such an optimizer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive
[ https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5771: --- Status: Patch Available (was: Open) Constant propagation optimizer for Hive --- Key: HIVE-5771 URL: https://issues.apache.org/jira/browse/HIVE-5771 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ted Xu Assignee: Ted Xu Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, HIVE-5771.11.patch, HIVE-5771.12.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, HIVE-5771.patch.javaonly Currently there is no constant folding/propagation optimizer, all expressions are evaluated at runtime. HIVE-2470 did a great job on evaluating constants on UDF initializing phase, however, it is still a runtime evaluation and it doesn't propagate constants from a subquery to outside. It may reduce I/O and accelerate process if we introduce such an optimizer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive
[ https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5771: --- Status: Open (was: Patch Available) Constant propagation optimizer for Hive --- Key: HIVE-5771 URL: https://issues.apache.org/jira/browse/HIVE-5771 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ted Xu Assignee: Ted Xu Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, HIVE-5771.11.patch, HIVE-5771.12.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, HIVE-5771.patch.javaonly Currently there is no constant folding/propagation optimizer, all expressions are evaluated at runtime. HIVE-2470 did a great job on evaluating constants on UDF initializing phase, however, it is still a runtime evaluation and it doesn't propagate constants from a subquery to outside. It may reduce I/O and accelerate process if we introduce such an optimizer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5771) Constant propagation optimizer for Hive
[ https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030841#comment-14030841 ] Ashutosh Chauhan commented on HIVE-5771: Test subquery_in.q failed with exception: {code} java.lang.Exception: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x128x0x0x1 with properties {columns=reducesinkkey0,reducesinkkey1, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=++, columns.types=int,int} at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x128x0x0x1 with properties {columns=reducesinkkey0,reducesinkkey1, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=++, columns.types=int,int} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x128x0x0x1 with properties {columns=reducesinkkey0,reducesinkkey1, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=++, columns.types=int,int} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:222) ... 9 more Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:191) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:220) ... 9 more Caused by: java.io.EOFException at org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54) at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:201) at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187) ... 10 more {code} subquery_views.q is failing with following exception {code} java.lang.Exception: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from with properties {columns=reducesinkkey0, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=+, columns.types=string} at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from with properties {columns=reducesinkkey0, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=+, columns.types=string} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to
[jira] [Commented] (HIVE-7005) MiniTez tests have non-deterministic explain plans
[ https://issues.apache.org/jira/browse/HIVE-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030863#comment-14030863 ] Jason Dere commented on HIVE-7005: -- +1 if tests pass MiniTez tests have non-deterministic explain plans -- Key: HIVE-7005 URL: https://issues.apache.org/jira/browse/HIVE-7005 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Gunther Hagleitner Attachments: HIVE-7005.1.patch TestMiniTezCliDriver has a few test failures where there is a diff in the explain plan generated. According to Vikram, the plan generated is correct, but the plan can be generated in a couple of different ways and so sometimes the plan will not diff against the expected output. We should probably come up with a way to validate this explain plan in a reproducible way. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run
On June 12, 2014, 9:03 p.m., Eugene Koifman wrote: hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/TempletonUtils.java, line 352 https://reviews.apache.org/r/22329/diff/2/?file=607831#file607831line352 Is there a reason org.apache.hadoop.util.ClassUtil.findContainingJar(Class? clazz) won't work? ClassUtil is declared as a private interface so I don't think we should take a dependency on it. Besides this, there was another problem where I wanted to match the file name to hive-shims. This is to avoid accidentally picking up hive-exec.jar which also contains shim classes and is a 15MB jar (not sure if hive-exec including shims is intentional or by accident though). - Ivan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22329/#review45536 --- On June 12, 2014, 12:04 a.m., Ivan Mitic wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22329/ --- (Updated June 12, 2014, 12:04 a.m.) Review request for hive. Repository: hive-git Description --- Approach in the patch is similar to what Oozie does to handle this situation. Specifically, all child map jobs get tagged with the launcher MR job id. On launcher task restart, launcher queries RM for the list of jobs that have the tag and kills them. After that it moves on to start the same child job again. Again, similarly to what Oozie does, a new templeton.job.launch.time property is introduced that captures the launcher job submit timestamp and later used to reduce the search window when RM is queried. To validate the patch, you will need to add webhcat shim jars to templeton.libjars as now webhcat launcher also has a dependency on hadoop shims. I have noticed that in case of the SqoopDelegator webhcat currently does not set the MR delegation token when optionsFile flag is used. This also creates the problem in this scenario. This looks like something that should be handled via a separate Jira. Diffs - hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java 23b1c4f hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java 41b1dc5 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java 04a5c6f hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java 04e061d hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java adcd917 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java a6355a6 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java 556ee62 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/TempletonUtils.java fff4b68 hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/tool/TestTempletonUtils.java 8b46d38 shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java d3552c1 shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 5a728b2 shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 299e918 Diff: https://reviews.apache.org/r/22329/diff/ Testing --- I have validated that MR, Pig and Hive jobs do get tagged appropriately. I have also validated that previous child jobs do get killed on RM failover/task failure. Thanks, Ivan Mitic
Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22329/#review45631 --- Ship it! Ship It! - Eugene Koifman On June 12, 2014, 12:04 a.m., Ivan Mitic wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22329/ --- (Updated June 12, 2014, 12:04 a.m.) Review request for hive. Repository: hive-git Description --- Approach in the patch is similar to what Oozie does to handle this situation. Specifically, all child map jobs get tagged with the launcher MR job id. On launcher task restart, launcher queries RM for the list of jobs that have the tag and kills them. After that it moves on to start the same child job again. Again, similarly to what Oozie does, a new templeton.job.launch.time property is introduced that captures the launcher job submit timestamp and later used to reduce the search window when RM is queried. To validate the patch, you will need to add webhcat shim jars to templeton.libjars as now webhcat launcher also has a dependency on hadoop shims. I have noticed that in case of the SqoopDelegator webhcat currently does not set the MR delegation token when optionsFile flag is used. This also creates the problem in this scenario. This looks like something that should be handled via a separate Jira. Diffs - hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java 23b1c4f hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java 41b1dc5 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java 04a5c6f hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java 04e061d hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java adcd917 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java a6355a6 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java 556ee62 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/TempletonUtils.java fff4b68 hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/tool/TestTempletonUtils.java 8b46d38 shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java d3552c1 shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 5a728b2 shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 299e918 Diff: https://reviews.apache.org/r/22329/diff/ Testing --- I have validated that MR, Pig and Hive jobs do get tagged appropriately. I have also validated that previous child jobs do get killed on RM failover/task failure. Thanks, Ivan Mitic
Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22329/#review45630 --- Ship it! Ship It! - Eugene Koifman On June 12, 2014, 12:04 a.m., Ivan Mitic wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22329/ --- (Updated June 12, 2014, 12:04 a.m.) Review request for hive. Repository: hive-git Description --- Approach in the patch is similar to what Oozie does to handle this situation. Specifically, all child map jobs get tagged with the launcher MR job id. On launcher task restart, launcher queries RM for the list of jobs that have the tag and kills them. After that it moves on to start the same child job again. Again, similarly to what Oozie does, a new templeton.job.launch.time property is introduced that captures the launcher job submit timestamp and later used to reduce the search window when RM is queried. To validate the patch, you will need to add webhcat shim jars to templeton.libjars as now webhcat launcher also has a dependency on hadoop shims. I have noticed that in case of the SqoopDelegator webhcat currently does not set the MR delegation token when optionsFile flag is used. This also creates the problem in this scenario. This looks like something that should be handled via a separate Jira. Diffs - hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java 23b1c4f hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java 41b1dc5 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java 04a5c6f hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java 04e061d hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java adcd917 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java a6355a6 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java 556ee62 hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/TempletonUtils.java fff4b68 hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/tool/TestTempletonUtils.java 8b46d38 shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java d3552c1 shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 5a728b2 shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 299e918 Diff: https://reviews.apache.org/r/22329/diff/ Testing --- I have validated that MR, Pig and Hive jobs do get tagged appropriately. I have also validated that previous child jobs do get killed on RM failover/task failure. Thanks, Ivan Mitic
[jira] [Commented] (HIVE-7190) WebHCat launcher task failure can cause two concurent user jobs to run
[ https://issues.apache.org/jira/browse/HIVE-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030933#comment-14030933 ] Eugene Koifman commented on HIVE-7190: -- +1 WebHCat launcher task failure can cause two concurent user jobs to run -- Key: HIVE-7190 URL: https://issues.apache.org/jira/browse/HIVE-7190 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: Ivan Mitic Attachments: HIVE-7190.2.patch, HIVE-7190.3.patch, HIVE-7190.patch Templeton uses launcher jobs to launch the actual user jobs. Launcher jobs are 1-map jobs (a single task jobs) which kick off the actual user job and monitor it until it finishes. Given that the launcher is a task, like any other MR task, it has a retry policy in case it fails (due to a task crash, tasktracker/nodemanager crash, machine level outage, etc.). Further, when launcher task is retried, it will again launch the same user job, *however* the previous attempt user job is already running. What this means is that we can have two identical user jobs running in parallel. In case of MRv2, there will be an MRAppMaster and the launcher task, which are subject to failure. In case any of the two fails, another instance of a user job will be launched again in parallel. Above situation is already a bug. Now going further to RM HA, what RM does on failover/restart is that it kills all containers, and it restarts all applications. This means that if our customer had 10 jobs on the cluster (this is 10 launcher jobs and 10 user jobs), on RM failover, all 20 jobs will be restarted, and launcher jobs will queue user jobs again. There are two issues with this design: 1. There are *possible* chances for corruption of job outputs (it would be useful to analyze this scenario more and confirm this statement). 2. Cluster resources are spent on jobs redundantly To address the issue at least on Yarn (Hadoop 2.0) clusters, webhcat should do the same thing Oozie does in this scenario, and that is to tag all its child jobs with an id, and kill those jobs on task restart before they are kicked off again. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set
[ https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-7200: Attachment: HIVE-7200.3.patch Fixed a code style issue in this revision of the patch. Beeline output displays column heading even if --showHeader=false is set Key: HIVE-7200 URL: https://issues.apache.org/jira/browse/HIVE-7200 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7200.1.patch, HIVE-7200.2.patch, HIVE-7200.3.patch A few minor/cosmetic issues with the beeline CLI. 1) Tool prints the column headers despite setting the --showHeader to false. This property only seems to affect the subsequent header information that gets printed based on the value of property headerInterval (default value is 100). 2) When showHeader is true headerInterval 0, the header after the first interval gets printed after headerInterval - 1 rows. The code seems to count the initial header as a row, if you will. 3) The table footer(the line that closes the table) does not get printed if the showHeader is false. I think the table should get closed irrespective of whether it prints the header or not. {code} 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| +--+ 6 rows selected (3.998 seconds) 0: jdbc:hive2://localhost:1 !set headerInterval 2 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| +--+ | val | +--+ | f| | T| +--+ | val | +--+ | F| | 0| +--+ | val | +--+ | 1| +--+ 6 rows selected (0.691 seconds) 0: jdbc:hive2://localhost:1 !set showHeader false 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| 6 rows selected (1.728 seconds) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set
[ https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030957#comment-14030957 ] Xuefu Zhang commented on HIVE-7200: --- +1 Beeline output displays column heading even if --showHeader=false is set Key: HIVE-7200 URL: https://issues.apache.org/jira/browse/HIVE-7200 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7200.1.patch, HIVE-7200.2.patch, HIVE-7200.3.patch A few minor/cosmetic issues with the beeline CLI. 1) Tool prints the column headers despite setting the --showHeader to false. This property only seems to affect the subsequent header information that gets printed based on the value of property headerInterval (default value is 100). 2) When showHeader is true headerInterval 0, the header after the first interval gets printed after headerInterval - 1 rows. The code seems to count the initial header as a row, if you will. 3) The table footer(the line that closes the table) does not get printed if the showHeader is false. I think the table should get closed irrespective of whether it prints the header or not. {code} 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| +--+ 6 rows selected (3.998 seconds) 0: jdbc:hive2://localhost:1 !set headerInterval 2 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| +--+ | val | +--+ | f| | T| +--+ | val | +--+ | F| | 0| +--+ | val | +--+ | 1| +--+ 6 rows selected (0.691 seconds) 0: jdbc:hive2://localhost:1 !set showHeader false 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| 6 rows selected (1.728 seconds) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7226) Windowing Streaming mode causes NPE for empty partitions
[ https://issues.apache.org/jira/browse/HIVE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030964#comment-14030964 ] Hive QA commented on HIVE-7226: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650117/HIVE-7226.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5535 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/456/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/456/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-456/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650117 Windowing Streaming mode causes NPE for empty partitions Key: HIVE-7226 URL: https://issues.apache.org/jira/browse/HIVE-7226 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7226.1.patch Change in HIVE-7062 doesn't handle empty partitions properly. StreamingState is not correctly initialized for empty partition -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Documentation Policy
Yea, I'd imagine the TODOC tag pollutes the query of TODOC's and confuses the state of a JIRA, so its probably best to remove it. The idea of docdone is to query what docs got produced and needs review? It might be nice to have a tag for that, to easily signal to contributor or interested parties to take a look. On a side note, I already find very helpful your JIRA comments with links to doc-wikis, both to inform the contributor and just as reference for others. Thanks again for the great work. On Fri, Jun 13, 2014 at 1:33 AM, Lefty Leverenz leftylever...@gmail.com wrote: One more question: what should we do after the documentation is done for a JIRA ticket? (a) Just remove the TODOC## label. (b) Replace TODOC## with docdone (no caps, no version number). (c) Add a docdone label but keep TODOC##. (d) Something else. -- Lefty On Thu, Jun 12, 2014 at 12:54 PM, Brock Noland br...@cloudera.com wrote: Thank you guys! This is great work. On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Going through the issues, I think overall Lefty did an awesome job catching and documenting most of them in time. Following are some of the 0.13 and 0.14 ones which I found which either do not have documentation or have outdated one and probably need one to be consumeable. Contributors, feel free to remove the label if you disagree. *TODOC13:* https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed) *TODOC14:* https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed) I'll continue digging through the queue going backwards to 0.12 and 0.11 and see if I find similar stuff there as well. On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Feel free to label such jiras with this keyword and ask the contributors for more information if you need any. Cool. I'll start chugging through the queue today adding labels as apt. On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair the...@hortonworks.com wrote: Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13? Sounds good to me. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Swarnim -- Swarnim
[jira] [Updated] (HIVE-7210) NPE with No plan file found when running Driver instances on multiple threads
[ https://issues.apache.org/jira/browse/HIVE-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7210: - Attachment: HIVE-7210.1.patch Patch to prevent getSplits() from removing cached plans from other queries. Talked to Gunther and he said he can eliminate the call to clear the cached plan from getSplits() altogether, so this may not be the final fix. NPE with No plan file found when running Driver instances on multiple threads --- Key: HIVE-7210 URL: https://issues.apache.org/jira/browse/HIVE-7210 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Gunther Hagleitner Attachments: HIVE-7210.1.patch Informatica has a multithreaded application running multiple instances of CLIDriver. When running concurrent queries they sometimes hit the following error: {noformat} 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://ICRHHW21NODE1:8020/tmp/hive-qamercury/hive_2014-05-30_10-24-57_346_890014621821056491-2/-mr-10002/6169987c-3263-4737-b5cb-38daab882afb/map.xml 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO org.apache.hadoop.mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/qamercury/.staging/job_1401360353644_0078 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :ERROR org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 'java.lang.NullPointerException(null)' java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:271) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at com.informatica.platform.dtm.executor.hive.impl.AbstractHiveDriverBaseImpl.run(AbstractHiveDriverBaseImpl.java:86) at com.informatica.platform.dtm.executor.hive.MHiveDriver.executeQuery(MHiveDriver.java:126) at com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeQuery(HiveTaskHandlerImpl.java:358) at com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeScript(HiveTaskHandlerImpl.java:247) at com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeMainScript(HiveTaskHandlerImpl.java:194) at com.informatica.platform.ldtm.executor.common.workflow.taskhandler.impl.BaseTaskHandlerImpl.run(BaseTaskHandlerImpl.java:126) at
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.4.patch Rebased onto trunk and fixed two broken hbase tests. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7226) Windowing Streaming mode causes NPE for empty partitions
[ https://issues.apache.org/jira/browse/HIVE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7226: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Harish! Windowing Streaming mode causes NPE for empty partitions Key: HIVE-7226 URL: https://issues.apache.org/jira/browse/HIVE-7226 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-7226.1.patch Change in HIVE-7062 doesn't handle empty partitions properly. StreamingState is not correctly initialized for empty partition -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7210) NPE with No plan file found when running Driver instances on multiple threads
[ https://issues.apache.org/jira/browse/HIVE-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031059#comment-14031059 ] Gunther Hagleitner commented on HIVE-7210: -- Thanks [~jdere]. My plan was to do this purely in HiveSplitGen for Tez. But I think Vikram re-introduced a path that doesn't go through HiveSplitGen (rather - I broke something, he fixed it by adding that path back in). [~vikram.dixit] - can you confirm that? If that's the case the patch you uploaded is probably the best fix. NPE with No plan file found when running Driver instances on multiple threads --- Key: HIVE-7210 URL: https://issues.apache.org/jira/browse/HIVE-7210 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Gunther Hagleitner Attachments: HIVE-7210.1.patch Informatica has a multithreaded application running multiple instances of CLIDriver. When running concurrent queries they sometimes hit the following error: {noformat} 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://ICRHHW21NODE1:8020/tmp/hive-qamercury/hive_2014-05-30_10-24-57_346_890014621821056491-2/-mr-10002/6169987c-3263-4737-b5cb-38daab882afb/map.xml 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO org.apache.hadoop.mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/qamercury/.staging/job_1401360353644_0078 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :ERROR org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 'java.lang.NullPointerException(null)' java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:271) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at com.informatica.platform.dtm.executor.hive.impl.AbstractHiveDriverBaseImpl.run(AbstractHiveDriverBaseImpl.java:86) at com.informatica.platform.dtm.executor.hive.MHiveDriver.executeQuery(MHiveDriver.java:126) at com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeQuery(HiveTaskHandlerImpl.java:358) at com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeScript(HiveTaskHandlerImpl.java:247) at com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeMainScript(HiveTaskHandlerImpl.java:194) at
[jira] [Updated] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers
[ https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7209: Labels: TODOC14 (was: ) allow metastore authorization api calls to be restricted to certain invokers Key: HIVE-7209 URL: https://issues.apache.org/jira/browse/HIVE-7209 Project: Hive Issue Type: Bug Components: Authentication, Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Labels: TODOC14 Attachments: HIVE-7209.1.patch, HIVE-7209.2.patch, HIVE-7209.3.patch Any user who has direct access to metastore can make metastore api calls that modify the authorization policy. The users who can make direct metastore api calls in a secure cluster configuration are usually the 'cluster insiders' such as Pig and MR users, who are not (securely) covered by the metastore based authorization policy. But it makes sense to disallow access from such users as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers
[ https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7209: Release Note: With this change hive.security.metastore.authorization.manager configuration parameter allows you to specify more than one authorization manager class (comma separated). This patch introduces a new authorization manager for use under this configuration - org.apache.hadoop.hive.ql.security.authorization.MetaStoreAuthzAPIAuthorizerEmbedOnly. It will disallow any of the authorization api calls to be invoked in a remote metastore. HiveServer2 can be configured to use embedded metastore, and that will allow it to invoke metastore authorization api. Hive cli and any other remote metastore users would be denied authorization when they try to make authorization api calls. This allows restricting the authorization api use to privileged HiveServer2 process. allow metastore authorization api calls to be restricted to certain invokers Key: HIVE-7209 URL: https://issues.apache.org/jira/browse/HIVE-7209 Project: Hive Issue Type: Bug Components: Authentication, Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Labels: TODOC14 Attachments: HIVE-7209.1.patch, HIVE-7209.2.patch, HIVE-7209.3.patch Any user who has direct access to metastore can make metastore api calls that modify the authorization policy. The users who can make direct metastore api calls in a secure cluster configuration are usually the 'cluster insiders' such as Pig and MR users, who are not (securely) covered by the metastore based authorization policy. But it makes sense to disallow access from such users as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7209) allow metastore authorization api calls to be restricted to certain invokers
[ https://issues.apache.org/jira/browse/HIVE-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7209: Attachment: HIVE-7209.4.patch HIVE-7209.4.patch - also updating hive-default.xml.template to mention that more than one metastore authorization manager classes can be specified under hive.security.metastore.authorization.manager . allow metastore authorization api calls to be restricted to certain invokers Key: HIVE-7209 URL: https://issues.apache.org/jira/browse/HIVE-7209 Project: Hive Issue Type: Bug Components: Authentication, Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Labels: TODOC14 Attachments: HIVE-7209.1.patch, HIVE-7209.2.patch, HIVE-7209.3.patch, HIVE-7209.4.patch Any user who has direct access to metastore can make metastore api calls that modify the authorization policy. The users who can make direct metastore api calls in a secure cluster configuration are usually the 'cluster insiders' such as Pig and MR users, who are not (securely) covered by the metastore based authorization policy. But it makes sense to disallow access from such users as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Documentation Policy
+1 on deleting the TODOC tag as I think it's assumed by default that once an enhancement is done, it will be doc'ed. We may consider adding an additional docdone tag but I think we can instead just wait for a +1 from the contributor that the documentation is satisfactory (and assume a implicit +1 for no reply) before deleting the TODOC tag. On Fri, Jun 13, 2014 at 1:32 PM, Szehon Ho sze...@cloudera.com wrote: Yea, I'd imagine the TODOC tag pollutes the query of TODOC's and confuses the state of a JIRA, so its probably best to remove it. The idea of docdone is to query what docs got produced and needs review? It might be nice to have a tag for that, to easily signal to contributor or interested parties to take a look. On a side note, I already find very helpful your JIRA comments with links to doc-wikis, both to inform the contributor and just as reference for others. Thanks again for the great work. On Fri, Jun 13, 2014 at 1:33 AM, Lefty Leverenz leftylever...@gmail.com wrote: One more question: what should we do after the documentation is done for a JIRA ticket? (a) Just remove the TODOC## label. (b) Replace TODOC## with docdone (no caps, no version number). (c) Add a docdone label but keep TODOC##. (d) Something else. -- Lefty On Thu, Jun 12, 2014 at 12:54 PM, Brock Noland br...@cloudera.com wrote: Thank you guys! This is great work. On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Going through the issues, I think overall Lefty did an awesome job catching and documenting most of them in time. Following are some of the 0.13 and 0.14 ones which I found which either do not have documentation or have outdated one and probably need one to be consumeable. Contributors, feel free to remove the label if you disagree. *TODOC13:* https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed) *TODOC14:* https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed) I'll continue digging through the queue going backwards to 0.12 and 0.11 and see if I find similar stuff there as well. On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Feel free to label such jiras with this keyword and ask the contributors for more information if you need any. Cool. I'll start chugging through the queue today adding labels as apt. On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair the...@hortonworks.com wrote: Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13? Sounds good to me. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Swarnim -- Swarnim -- Swarnim
[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer
[ https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031188#comment-14031188 ] Carl Steinbach commented on HIVE-7094: -- [~davidzchen]: +1. Can you please attach a new version of the patch to trigger testing? If everything passes I will commit. Separate out static/dynamic partitioning code in FileRecordWriterContainer -- Key: HIVE-7094 URL: https://issues.apache.org/jira/browse/HIVE-7094 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: David Chen Assignee: David Chen Attachments: HIVE-7094.1.patch There are two major places in FileRecordWriterContainer that have the {{if (dynamicPartitioning)}} condition: the constructor and write(). This is the approach that I am taking: # Move the DP and SP code into two subclasses: DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer. # Make FileRecordWriterContainer an abstract class that contains the common code for both implementations. For write(), FileRecordWriterContainer will call an abstract method that will provide the local RecordWriter, ObjectInspector, SerDe, and OutputJobInfo. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer
[ https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Chen updated HIVE-7094: - Attachment: HIVE-7094.3.patch Separate out static/dynamic partitioning code in FileRecordWriterContainer -- Key: HIVE-7094 URL: https://issues.apache.org/jira/browse/HIVE-7094 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: David Chen Assignee: David Chen Attachments: HIVE-7094.1.patch, HIVE-7094.3.patch There are two major places in FileRecordWriterContainer that have the {{if (dynamicPartitioning)}} condition: the constructor and write(). This is the approach that I am taking: # Move the DP and SP code into two subclasses: DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer. # Make FileRecordWriterContainer an abstract class that contains the common code for both implementations. For write(), FileRecordWriterContainer will call an abstract method that will provide the local RecordWriter, ObjectInspector, SerDe, and OutputJobInfo. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
David Chen created HIVE-7230: Summary: Add Eclipse formatter file for Hive coding conventions Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer
[ https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031197#comment-14031197 ] David Chen commented on HIVE-7094: -- Thanks, [~cwsteinbach]! I have addressed the remaining formatting issues using the Eclipse formatter and uploaded a new patch. Separate out static/dynamic partitioning code in FileRecordWriterContainer -- Key: HIVE-7094 URL: https://issues.apache.org/jira/browse/HIVE-7094 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: David Chen Assignee: David Chen Attachments: HIVE-7094.1.patch, HIVE-7094.3.patch There are two major places in FileRecordWriterContainer that have the {{if (dynamicPartitioning)}} condition: the constructor and write(). This is the approach that I am taking: # Move the DP and SP code into two subclasses: DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer. # Make FileRecordWriterContainer an abstract class that contains the common code for both implementations. For write(), FileRecordWriterContainer will call an abstract method that will provide the local RecordWriter, ObjectInspector, SerDe, and OutputJobInfo. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 22590: HIVE-7230: Add Eclipse formatter file.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22590/ --- Review request for hive. Bugs: HIVE-7230 https://issues.apache.org/jira/browse/HIVE-7230 Repository: hive-git Description --- HIVE-7230: Add Eclipse formatter file. Diffs - eclipse-styles.xml PRE-CREATION Diff: https://reviews.apache.org/r/22590/diff/ Testing --- Manual Thanks, David Chen
[jira] [Updated] (HIVE-7210) NPE with No plan file found when running Driver instances on multiple threads
[ https://issues.apache.org/jira/browse/HIVE-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7210: - Status: Patch Available (was: Open) NPE with No plan file found when running Driver instances on multiple threads --- Key: HIVE-7210 URL: https://issues.apache.org/jira/browse/HIVE-7210 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Gunther Hagleitner Attachments: HIVE-7210.1.patch Informatica has a multithreaded application running multiple instances of CLIDriver. When running concurrent queries they sometimes hit the following error: {noformat} 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://ICRHHW21NODE1:8020/tmp/hive-qamercury/hive_2014-05-30_10-24-57_346_890014621821056491-2/-mr-10002/6169987c-3263-4737-b5cb-38daab882afb/map.xml 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO org.apache.hadoop.mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/qamercury/.staging/job_1401360353644_0078 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :ERROR org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 'java.lang.NullPointerException(null)' java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:271) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at com.informatica.platform.dtm.executor.hive.impl.AbstractHiveDriverBaseImpl.run(AbstractHiveDriverBaseImpl.java:86) at com.informatica.platform.dtm.executor.hive.MHiveDriver.executeQuery(MHiveDriver.java:126) at com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeQuery(HiveTaskHandlerImpl.java:358) at com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeScript(HiveTaskHandlerImpl.java:247) at com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeMainScript(HiveTaskHandlerImpl.java:194) at com.informatica.platform.ldtm.executor.common.workflow.taskhandler.impl.BaseTaskHandlerImpl.run(BaseTaskHandlerImpl.java:126) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
[ https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031205#comment-14031205 ] David Chen commented on HIVE-7230: -- I took the Hadoop Eclipse formatter file (https://github.com/cloudera/blog-eclipse) and adapted it for Hive's coding style, namely changing the line lengths from 80 to 100 characters. RB: https://reviews.apache.org/r/22590/ Add Eclipse formatter file for Hive coding conventions -- Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Attachments: HIVE-7230.1.patch Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
[ https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Chen updated HIVE-7230: - Attachment: HIVE-7230.1.patch Add Eclipse formatter file for Hive coding conventions -- Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Attachments: HIVE-7230.1.patch Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
[ https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Chen updated HIVE-7230: - Status: Patch Available (was: Open) Add Eclipse formatter file for Hive coding conventions -- Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Attachments: HIVE-7230.1.patch Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Jenkins permissions, and auto-trigger help
+ dev Good call, yep that will need to be configured. Brock On Fri, Jun 13, 2014 at 10:29 AM, Szehon Ho sze...@cloudera.com wrote: I was studying this a bit more, I believe the MiniTezCliDriver tests are hitting timeout after 2 hours as error code is 124. The framework is running all of them in one call, I'll try to chunk the tests into batches like the other q-tests. I'll try to take a look next week at this. Thanks Szehon On Mon, Jun 9, 2014 at 1:13 PM, Szehon Ho sze...@cloudera.com wrote: It looks like JVM OOM crash during MiniTezCliDriver tests, or its otherwise crashing. The 407 log has failures, but the 408 log is cut off. http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-407/failed/TestMiniTezCliDriver/maven-test.txt http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/maven-test.txt The MAVEN_OPTS is already set to -XmX2g -XX:MaxPermSize=256M. Do you guys know of any such issues? Thanks, Szehon On Sun, Jun 8, 2014 at 12:05 PM, Brock Noland br...@cloudera.com wrote: Looks like it's failing to generate a to generate a test output: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/ http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/TestMiniTezCliDriver.txt exiting with 124 here: + wait 21961 + timeout 2h mvn -B -o test -Dmaven.repo.local=/home/hiveptest//ip-10-31-188-232-hiveptest-2/maven -Phadoop-2 -Phadoop-2 -Dtest=TestMiniTezCliDriver + ret=124 On Sun, Jun 8, 2014 at 11:25 AM, Ashutosh Chauhan hashut...@apache.org wrote: Build #407 ran MiniTezCliDriver http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/407/testReport/org.apache.hadoop.hive.cli/ but Build #408 didn't http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/408/testReport/org.apache.hadoop.hive.cli/ On Sat, Jun 7, 2014 at 12:25 PM, Szehon Ho sze...@cloudera.com wrote: Sounds like there's randomness, either in PTest test-parser or in the maven test itself. In the history now, its running between 5633-5707, which is similar to your range. http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/testReport/history/ I didnt see any in history without MiniTezCLIDriver, can you point me to a build no. if you see one? If nobody else knows immediately, I can dig deeper at it next week to try to find out. On Sat, Jun 7, 2014 at 9:00 AM, Ashutosh Chauhan hashut...@apache.org wrote: I noticed that PTest2 framework runs different number of tests on various runs. e.g., on yesterday's runs I saw it ran 5585 5510 tests on subsequent runs. In particular, it seems its running MiniTezCliDriver tests in only half the runs. Anyone observed this?