[jira] [Updated] (HIVE-6564) WebHCat E2E tests that launch MR jobs fail on check job completion timeout
[ https://issues.apache.org/jira/browse/HIVE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated HIVE-6564: - Attachment: HIVE-6564.patch Attaching a patch that deals with JSON module incompatibility. WebHCat E2E tests that launch MR jobs fail on check job completion timeout -- Key: HIVE-6564 URL: https://issues.apache.org/jira/browse/HIVE-6564 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 0.13.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Attachments: HIVE-6564.patch WebHCat E2E tests that fire off an MR job are not correctly being detected as complete so those tests are timing out. The problem is happening because of JSON module available through cpan which returns 1 or 0 instead of true or false. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6564) WebHCat E2E tests that launch MR jobs fail on check job completion timeout
[ https://issues.apache.org/jira/browse/HIVE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated HIVE-6564: - Status: Patch Available (was: Open) WebHCat E2E tests that launch MR jobs fail on check job completion timeout -- Key: HIVE-6564 URL: https://issues.apache.org/jira/browse/HIVE-6564 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 0.13.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Attachments: HIVE-6564.patch WebHCat E2E tests that fire off an MR job are not correctly being detected as complete so those tests are timing out. The problem is happening because of JSON module available through cpan which returns 1 or 0 instead of true or false. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6564) WebHCat E2E tests that launch MR jobs fail on check job completion timeout
[ https://issues.apache.org/jira/browse/HIVE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated HIVE-6564: - Description: WebHCat E2E tests that fire off an MR job are not correctly being detected as complete so those tests are timing out. The problem is happening because of JSON module available through cpan which returns 1 or 0 instead of true or false. NO PRECOMMIT TESTS was: WebHCat E2E tests that fire off an MR job are not correctly being detected as complete so those tests are timing out. The problem is happening because of JSON module available through cpan which returns 1 or 0 instead of true or false. WebHCat E2E tests that launch MR jobs fail on check job completion timeout -- Key: HIVE-6564 URL: https://issues.apache.org/jira/browse/HIVE-6564 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 0.13.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Attachments: HIVE-6564.patch WebHCat E2E tests that fire off an MR job are not correctly being detected as complete so those tests are timing out. The problem is happening because of JSON module available through cpan which returns 1 or 0 instead of true or false. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6565) OrcSerde should be added as NativeSerDe in SerDeUtils
Branky Shao created HIVE-6565: - Summary: OrcSerde should be added as NativeSerDe in SerDeUtils Key: HIVE-6565 URL: https://issues.apache.org/jira/browse/HIVE-6565 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Branky Shao If the table defined as ORC format, the columns info can be fetched from StorageDescriptor, no need to get from SerDe. And obviously, ORC is a Hive's native file format, therefore, OrcSerde should be added as NativeSerDe in SerDeUtils. The fix is fairly simple, just add single line in SerDeUtils : nativeSerDeNames.add(org.apache.hadoop.hive.ql.io.orc.OrcSerde.class.getName()); -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6430: --- Attachment: HIVE-6430.patch New code probably has tons of bugs, but some old tests I ran have passed, let's try HiveQA. I will run tez tests MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6430: --- Status: Patch Available (was: Open) MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6566) Incorrect union-all plan with map-joins on Tez
Gunther Hagleitner created HIVE-6566: Summary: Incorrect union-all plan with map-joins on Tez Key: HIVE-6566 URL: https://issues.apache.org/jira/browse/HIVE-6566 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner The tez dag is hooked up incorrectly for some union all queries involving map joins. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-887) Allow SELECT col without a mapreduce job
[ https://issues.apache.org/jira/browse/HIVE-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922156#comment-13922156 ] Navis commented on HIVE-887: We should defaultize hive.fetch.task.conversion as more, someday. Allow SELECT col without a mapreduce job -- Key: HIVE-887 URL: https://issues.apache.org/jira/browse/HIVE-887 Project: Hive Issue Type: New Feature Environment: All Reporter: Eric Sun Assignee: Ning Zhang Fix For: 0.10.0 I often find myself needing to take a quick look at a particular column of a Hive table. I usually do this by doing a SELECT * from table LIMIT 20; from the CLI. Doing this is pretty fast since it doesn't require a mapreduce job. However, it's tough to examine just 1 or 2 columns when the table is very wide. So, I might do SELECT col from table LIMIT 20; but it's much slower since it requires a map-reduce. It'd be really convenient if a map-reduce wasn't necessary. Currently a good work around is to do hive -e select * from table | cut --key=n but it'd be more convenient if it were built in since it alleviates the need for column counting. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6566) Incorrect union-all plan with map-joins on Tez
[ https://issues.apache.org/jira/browse/HIVE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6566: - Attachment: HIVE-6566.1.patch Incorrect union-all plan with map-joins on Tez -- Key: HIVE-6566 URL: https://issues.apache.org/jira/browse/HIVE-6566 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6566.1.patch The tez dag is hooked up incorrectly for some union all queries involving map joins. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6566) Incorrect union-all plan with map-joins on Tez
[ https://issues.apache.org/jira/browse/HIVE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6566: - Status: Patch Available (was: Open) Incorrect union-all plan with map-joins on Tez -- Key: HIVE-6566 URL: https://issues.apache.org/jira/browse/HIVE-6566 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6566.1.patch The tez dag is hooked up incorrectly for some union all queries involving map joins. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6332) HCatConstants Documentation needed
[ https://issues.apache.org/jira/browse/HIVE-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922159#comment-13922159 ] Lefty Leverenz commented on HIVE-6332: -- Looks good overall. Of course I have some editorial nits, but they shouldn't clutter up this jira. One typo you could fix now: hcat.dynamic.partitioning.custom.patttern (triple t). An introduction would be helpful, mentioning the HCatConstants.java file and explaining basic usage. Why are cache parameters hcatalog.hive.xxx while all other parameters are hcat.xxx? (I'm asking about hcat vs. hcatalog, not the hive part.) This sentence in the first section confuses me: An override to specify where HCatStorer will write to, defined from pig jobs, either directly by user, or by using org.apache.hive.hcatalog.pig.HCatStorerWrapper. Does it mean that Pig jobs specify hcat.pig.storer.external.location? Could you give examples of specifying by user and by HCatStorerWrapper? In the Data Promotion section, this sentence seems a bit off: On the write side, it is expected that the user pass in valid HCatRecords with data correctly. Does that mean with data correctly typed for Hive? That's it for my first pass. I'll take another look later. HCatConstants Documentation needed -- Key: HIVE-6332 URL: https://issues.apache.org/jira/browse/HIVE-6332 Project: Hive Issue Type: Task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan HCatConstants documentation is near non-existent, being defined only as comments in code for the various parameters. Given that a lot of api winds up being implemented as knobs that can be tweaked here, we should have a public facing doc for this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6551) group by after join with skew join optimization references invalid task sometimes
[ https://issues.apache.org/jira/browse/HIVE-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922295#comment-13922295 ] Hive QA commented on HIVE-6551: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12632789/HIVE-6551.1.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5358 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5 org.apache.hive.beeline.TestSchemaTool.testSchemaInit org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1637/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1637/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12632789 group by after join with skew join optimization references invalid task sometimes - Key: HIVE-6551 URL: https://issues.apache.org/jira/browse/HIVE-6551 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-6551.1.patch.txt For example, {noformat} hive set hive.auto.convert.join = true; hive set hive.optimize.skewjoin = true; hive set hive.skewjoin.key = 3; hive EXPLAIN FROM (SELECT src.* FROM src) x JOIN (SELECT src.* FROM src) Y ON (x.key = Y.key) SELECT sum(hash(Y.key)), sum(hash(Y.value)); OK STAGE DEPENDENCIES: Stage-8 is a root stage Stage-6 depends on stages: Stage-8 Stage-5 depends on stages: Stage-6 , consists of Stage-4, Stage-2 Stage-4 Stage-2 depends on stages: Stage-4, Stage-1 Stage-0 is a root stage ... {noformat} Stage-2 references not-existing Stage-1 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6508) Mismatched results between vector and non-vector mode with decimal field
[ https://issues.apache.org/jira/browse/HIVE-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922304#comment-13922304 ] Remus Rusanu commented on HIVE-6508: The value 0 comes in the input vector unscaled (scale 0). As aggregates (SUM, STDxx) are being updated, they us the scale of the input value, not the scale of the input column. So any 0 in the input will round the intermediate fractional part of the intermediate. Final result is off. AVG uses a special scale so is not affected. MIN/MAX use the input value scale, but has no side effects. Fix is to pass in the column scale explictly, rather than assume the input value scale has the column scale. Ultimately the behavior of passing in unscaled 0s is wrong, but this comes from the row-mode join modus-operandi and I don't want to change that. Hardening the aggregates against this case is more robust. Mismatched results between vector and non-vector mode with decimal field Key: HIVE-6508 URL: https://issues.apache.org/jira/browse/HIVE-6508 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Following query has a little mismatch in result as compared to the non-vector mode. {code} select d_year, i_brand_id, i_brand, sum(ss_ext_sales_price) as sum_agg from date_dim join store_sales on date_dim.d_date_sk = store_sales.ss_sold_date_sk join item on store_sales.ss_item_sk = item.i_item_sk where i_manufact_id = 128 and d_moy = 11 group by d_year, i_brand, i_brand_id order by d_year, sum_agg desc, i_brand_id limit 100; {code} This query is on tpcds data. The field ss_ext_sales_price is of type decimal(7,2) and everything else is an integer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6508) Mismatched results between vector and non-vector mode with decimal field
[ https://issues.apache.org/jira/browse/HIVE-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated HIVE-6508: --- Status: Patch Available (was: Open) Mismatched results between vector and non-vector mode with decimal field Key: HIVE-6508 URL: https://issues.apache.org/jira/browse/HIVE-6508 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Attachments: HIVE-6508.1.patch Following query has a little mismatch in result as compared to the non-vector mode. {code} select d_year, i_brand_id, i_brand, sum(ss_ext_sales_price) as sum_agg from date_dim join store_sales on date_dim.d_date_sk = store_sales.ss_sold_date_sk join item on store_sales.ss_item_sk = item.i_item_sk where i_manufact_id = 128 and d_moy = 11 group by d_year, i_brand, i_brand_id order by d_year, sum_agg desc, i_brand_id limit 100; {code} This query is on tpcds data. The field ss_ext_sales_price is of type decimal(7,2) and everything else is an integer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6508) Mismatched results between vector and non-vector mode with decimal field
[ https://issues.apache.org/jira/browse/HIVE-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated HIVE-6508: --- Attachment: HIVE-6508.1.patch Mismatched results between vector and non-vector mode with decimal field Key: HIVE-6508 URL: https://issues.apache.org/jira/browse/HIVE-6508 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Attachments: HIVE-6508.1.patch Following query has a little mismatch in result as compared to the non-vector mode. {code} select d_year, i_brand_id, i_brand, sum(ss_ext_sales_price) as sum_agg from date_dim join store_sales on date_dim.d_date_sk = store_sales.ss_sold_date_sk join item on store_sales.ss_item_sk = item.i_item_sk where i_manufact_id = 128 and d_moy = 11 group by d_year, i_brand, i_brand_id order by d_year, sum_agg desc, i_brand_id limit 100; {code} This query is on tpcds data. The field ss_ext_sales_price is of type decimal(7,2) and everything else is an integer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5998) Add vectorized reader for Parquet files
[ https://issues.apache.org/jira/browse/HIVE-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated HIVE-5998: --- Status: Open (was: Patch Available) Add vectorized reader for Parquet files --- Key: HIVE-5998 URL: https://issues.apache.org/jira/browse/HIVE-5998 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers, Vectorization Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor Labels: Parquet, vectorization Attachments: HIVE-5998.1.patch, HIVE-5998.2.patch, HIVE-5998.3.patch, HIVE-5998.4.patch, HIVE-5998.5.patch, HIVE-5998.6.patch, HIVE-5998.7.patch, HIVE-5998.8.patch HIVE-5783 is adding native Parquet support in Hive. As Parquet is a columnar format, it makes sense to provide a vectorized reader, similar to how RC and ORC formats have, to benefit from vectorized execution engine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5998) Add vectorized reader for Parquet files
[ https://issues.apache.org/jira/browse/HIVE-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated HIVE-5998: --- Status: Patch Available (was: Open) Looks like Jenkins lost its queue, resubmit Add vectorized reader for Parquet files --- Key: HIVE-5998 URL: https://issues.apache.org/jira/browse/HIVE-5998 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers, Vectorization Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor Labels: Parquet, vectorization Attachments: HIVE-5998.1.patch, HIVE-5998.2.patch, HIVE-5998.3.patch, HIVE-5998.4.patch, HIVE-5998.5.patch, HIVE-5998.6.patch, HIVE-5998.7.patch, HIVE-5998.8.patch, HIVE-5998.9.patch HIVE-5783 is adding native Parquet support in Hive. As Parquet is a columnar format, it makes sense to provide a vectorized reader, similar to how RC and ORC formats have, to benefit from vectorized execution engine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5998) Add vectorized reader for Parquet files
[ https://issues.apache.org/jira/browse/HIVE-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated HIVE-5998: --- Attachment: HIVE-5998.9.patch .8 resubmitted Add vectorized reader for Parquet files --- Key: HIVE-5998 URL: https://issues.apache.org/jira/browse/HIVE-5998 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers, Vectorization Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor Labels: Parquet, vectorization Attachments: HIVE-5998.1.patch, HIVE-5998.2.patch, HIVE-5998.3.patch, HIVE-5998.4.patch, HIVE-5998.5.patch, HIVE-5998.6.patch, HIVE-5998.7.patch, HIVE-5998.8.patch, HIVE-5998.9.patch HIVE-5783 is adding native Parquet support in Hive. As Parquet is a columnar format, it makes sense to provide a vectorized reader, similar to how RC and ORC formats have, to benefit from vectorized execution engine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6536) Reduce dependencies of org.apache.hive:hive-jdbc maven module
[ https://issues.apache.org/jira/browse/HIVE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Minder updated HIVE-6536: --- Attachment: hive-hdbc-maven-dependencies-0-13.log Here is the tree for 0.13.0. Note that this appears to be missing org.apache.hadoop:hadoop-mapreduce-client-core and this seems to be required to use the driver. Reduce dependencies of org.apache.hive:hive-jdbc maven module - Key: HIVE-6536 URL: https://issues.apache.org/jira/browse/HIVE-6536 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.12.0 Environment: org.apache.hive:hive-jdbc:jar:0.12.0 Reporter: Kevin Minder Attachments: hive-hdbc-maven-dependencies-0-13.log, hive-jdbc-maven-dependencies.log The Hive JDBC driver maven module requires a significant number of dependencies that are likely unnecessary and will result in bloating of consumers. Most of this is a result of the dependency on org.apache.hive:hive-cli. I have attached a portion of the output from mvn depedency:tree output for a client that depends on the org.apache.hive:hive-jdbc module. Note the extra 2.0.6.1-102 in the output is the result of our local build and publish to a local nexus repo. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6536) Reduce dependencies of org.apache.hive:hive-jdbc maven module
[ https://issues.apache.org/jira/browse/HIVE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Minder updated HIVE-6536: --- Attachment: (was: hive-hdbc-maven-dependencies-0-13.log) Reduce dependencies of org.apache.hive:hive-jdbc maven module - Key: HIVE-6536 URL: https://issues.apache.org/jira/browse/HIVE-6536 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.12.0 Environment: org.apache.hive:hive-jdbc:jar:0.12.0 Reporter: Kevin Minder Attachments: hive-jdbc-maven-dependencies-0-13.log, hive-jdbc-maven-dependencies.log The Hive JDBC driver maven module requires a significant number of dependencies that are likely unnecessary and will result in bloating of consumers. Most of this is a result of the dependency on org.apache.hive:hive-cli. I have attached a portion of the output from mvn depedency:tree output for a client that depends on the org.apache.hive:hive-jdbc module. Note the extra 2.0.6.1-102 in the output is the result of our local build and publish to a local nexus repo. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6536) Reduce dependencies of org.apache.hive:hive-jdbc maven module
[ https://issues.apache.org/jira/browse/HIVE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Minder updated HIVE-6536: --- Attachment: hive-jdbc-maven-dependencies-0-13.log Reduce dependencies of org.apache.hive:hive-jdbc maven module - Key: HIVE-6536 URL: https://issues.apache.org/jira/browse/HIVE-6536 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.12.0 Environment: org.apache.hive:hive-jdbc:jar:0.12.0 Reporter: Kevin Minder Attachments: hive-jdbc-maven-dependencies-0-13.log, hive-jdbc-maven-dependencies.log The Hive JDBC driver maven module requires a significant number of dependencies that are likely unnecessary and will result in bloating of consumers. Most of this is a result of the dependency on org.apache.hive:hive-cli. I have attached a portion of the output from mvn depedency:tree output for a client that depends on the org.apache.hive:hive-jdbc module. Note the extra 2.0.6.1-102 in the output is the result of our local build and publish to a local nexus repo. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 15873: Query cancel should stop running MR tasks
On Feb. 27, 2014, 11:08 p.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java, line 110 https://reviews.apache.org/r/15873/diff/3/?file=478815#file478815line110 When pollFinished is running, this shutdown() function will not be able to make progress. Which means that the query cancellation will happen only after a task (could be an MR task) is complete. It seems synchronizing around shutdown should be sufficient, either by making it volatile or having synchronized methods around it. Since thread safe concurrent collection classes are being used here, I don't see other concurrency issues that would make it necessary to make all these functions synchronized. Navis Ryu wrote: It just only polls status of running tasks and goes into wait state quite quickly, so it would not hinder shutdown process. Furthermore, two threads, polling and shutdown, has a race condition on both collections, runnable and running, so those should be guarded by shared something. Thejas Nair wrote: Yes, it will go into the wait state quickly. But I haven't understood how the wait helps here. There is no notify in this code, so the wait will always wait for 2 seconds. It will be no different from a sleep(2000) . So it looks like the polling outside loop will continue until all the currently running jobs are complete. Navis Ryu wrote: In javadoc, Object.wait() The current thread must own this object's monitor. The thread releases ownership of this monitor and waits until another thread notifies threads waiting on this object's monitor In wait state, any other thread can take the monitor (in sleep, it's not possible). So shutdown thread does not need to wait for 2 seconds. Polling thread might notice 2 seconds after shutdown as you said because it's not notified. But I think it's not a big deal. Isn't it? Thanks for the explanation! - Thejas --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15873/#review35625 --- On March 4, 2014, 8:02 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15873/ --- (Updated March 4, 2014, 8:02 a.m.) Review request for hive. Bugs: HIVE-5901 https://issues.apache.org/jira/browse/HIVE-5901 Repository: hive-git Description --- Currently, query canceling does not stop running MR job immediately. Diffs - ql/src/java/org/apache/hadoop/hive/ql/Driver.java 332cadb ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java c51a9c8 ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java 854cd52 ql/src/java/org/apache/hadoop/hive/ql/exec/TaskRunner.java ead7b59 Diff: https://reviews.apache.org/r/15873/diff/ Testing --- Thanks, Navis Ryu
Re: Review Request 15873: Query cancel should stop running MR tasks
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15873/#review36359 --- ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java https://reviews.apache.org/r/15873/#comment67310 This second addToRunnable(tsk) is redundant. - Thejas Nair On March 4, 2014, 8:02 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15873/ --- (Updated March 4, 2014, 8:02 a.m.) Review request for hive. Bugs: HIVE-5901 https://issues.apache.org/jira/browse/HIVE-5901 Repository: hive-git Description --- Currently, query canceling does not stop running MR job immediately. Diffs - ql/src/java/org/apache/hadoop/hive/ql/Driver.java 332cadb ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java c51a9c8 ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java 854cd52 ql/src/java/org/apache/hadoop/hive/ql/exec/TaskRunner.java ead7b59 Diff: https://reviews.apache.org/r/15873/diff/ Testing --- Thanks, Navis Ryu
[jira] [Commented] (HIVE-5901) Query cancel should stop running MR tasks
[ https://issues.apache.org/jira/browse/HIVE-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922571#comment-13922571 ] Thejas M Nair commented on HIVE-5901: - [~navis] I have added a comment on reviewboard on the latest update there. But there seems to be slight difference in the HIVE-5901.6.patch.txt patch here and latest one in reviewboard. In HIVE-5901.6.patch.txt the 'boolean shutdown' has been made volatile, which is not necessary as you pointed out with the changes to add synchronization. Query cancel should stop running MR tasks - Key: HIVE-5901 URL: https://issues.apache.org/jira/browse/HIVE-5901 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5901.1.patch.txt, HIVE-5901.2.patch.txt, HIVE-5901.3.patch.txt, HIVE-5901.4.patch.txt, HIVE-5901.5.patch.txt, HIVE-5901.6.patch.txt Currently, query canceling does not stop running MR job immediately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HIVE-5901) Query cancel should stop running MR tasks
[ https://issues.apache.org/jira/browse/HIVE-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922571#comment-13922571 ] Thejas M Nair edited comment on HIVE-5901 at 3/6/14 2:42 PM: - [~navis] I have added a comment on reviewboard on the latest update there. But there seems to be slight difference in the HIVE-5901.6.patch.txt patch here and latest one in reviewboard. In HIVE-5901.6.patch.txt the 'boolean shutdown' has been made volatile, which is not necessary as you pointed out with the changes to make the functions synchronized. was (Author: thejas): [~navis] I have added a comment on reviewboard on the latest update there. But there seems to be slight difference in the HIVE-5901.6.patch.txt patch here and latest one in reviewboard. In HIVE-5901.6.patch.txt the 'boolean shutdown' has been made volatile, which is not necessary as you pointed out with the changes to add synchronization. Query cancel should stop running MR tasks - Key: HIVE-5901 URL: https://issues.apache.org/jira/browse/HIVE-5901 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5901.1.patch.txt, HIVE-5901.2.patch.txt, HIVE-5901.3.patch.txt, HIVE-5901.4.patch.txt, HIVE-5901.5.patch.txt, HIVE-5901.6.patch.txt Currently, query canceling does not stop running MR job immediately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5218) datanucleus does not work with MS SQLServer in Hive metastore
[ https://issues.apache.org/jira/browse/HIVE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5218: Resolution: Duplicate Status: Resolved (was: Patch Available) Marking as duplicate. HIVE-5099 has patch that upgrades to a newer datanucleus version. datanucleus does not work with MS SQLServer in Hive metastore - Key: HIVE-5218 URL: https://issues.apache.org/jira/browse/HIVE-5218 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.12.0 Reporter: shanyu zhao Assignee: shanyu zhao Fix For: 0.13.0 Attachments: 0001-HIVE-5218-datanucleus-does-not-work-with-SQLServer-i.patch, HIVE-5218-trunk.patch, HIVE-5218-trunk.patch, HIVE-5218-v2.patch, HIVE-5218.2.patch, HIVE-5218.patch HIVE-3632 upgraded datanucleus version to 3.2.x, however, this version of datanucleus doesn't work with SQLServer as the metastore. The problem is that datanucleus tries to use fully qualified object name to find a table in the database but couldn't find it. If I downgrade the version to HIVE-2084, SQLServer works fine. It could be a bug in datanucleus. This is the detailed exception I'm getting when using datanucleus 3.2.x with SQL Server: {noformat} FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTa sk. MetaException(message:javax.jdo.JDOException: Exception thrown calling table .exists() for a2ee36af45e9f46c19e995bfd2d9b5fd1hivemetastore..SEQUENCE_TABLE at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusExc eption(NucleusJDOHelper.java:596) at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPe rsistenceManager.java:732) … at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawS tore.java:111) at $Proxy0.createTable(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_tabl e_core(HiveMetaStore.java:1071) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_tabl e_with_environment_context(HiveMetaStore.java:1104) … at $Proxy11.create_table_with_environment_context(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$cr eate_table_with_environment_context.getResult(ThriftHiveMetastore.java:6417) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$cr eate_table_with_environment_context.getResult(ThriftHiveMetastore.java:6401) NestedThrowablesStackTrace: com.microsoft.sqlserver.jdbc.SQLServerException: There is already an object name d 'SEQUENCE_TABLE' in the database. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError (SQLServerException.java:197) at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServ erStatement.java:1493) at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQ LServerStatement.java:775) at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute (SQLServerStatement.java:676) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4615) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLSe rverConnection.java:1400) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLSer verStatement.java:179) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLS erverStatement.java:154) at com.microsoft.sqlserver.jdbc.SQLServerStatement.execute(SQLServerStat ement.java:649) at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:300) at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(A bstractTable.java:760) at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementLi st(AbstractTable.java:711) at org.datanucleus.store.rdbms.table.AbstractTable.create(AbstractTable. java:425) at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable. java:488) at org.datanucleus.store.rdbms.valuegenerator.TableGenerator.repositoryE xists(TableGenerator.java:242) at org.datanucleus.store.rdbms.valuegenerator.AbstractRDBMSGenerator.obt ainGenerationBlock(AbstractRDBMSGenerator.java:86) at org.datanucleus.store.valuegenerator.AbstractGenerator.obtainGenerati onBlock(AbstractGenerator.java:197) at org.datanucleus.store.valuegenerator.AbstractGenerator.next(AbstractG enerator.java:105) at org.datanucleus.store.rdbms.RDBMSStoreManager.getStrategyValueForGene rator(RDBMSStoreManager.java:2019) at
[jira] [Updated] (HIVE-5099) Some partition publish operation cause OOM in metastore backed by SQL Server
[ https://issues.apache.org/jira/browse/HIVE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5099: Attachment: HIVE-5099.3.patch Some partition publish operation cause OOM in metastore backed by SQL Server Key: HIVE-5099 URL: https://issues.apache.org/jira/browse/HIVE-5099 Project: Hive Issue Type: Bug Components: Metastore, Windows Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-5099-1.patch, HIVE-5099-2.patch, HIVE-5099.3.patch For certain metastore operation combination, metastore operation hangs and metastore server eventually fail due to OOM. This happens when metastore is backed by SQL Server. Here is a testcase to reproduce: {code} CREATE TABLE tbl_repro_oom1 (a STRING, b INT) PARTITIONED BY (c STRING, d STRING); CREATE TABLE tbl_repro_oom_2 (a STRING ) PARTITIONED BY (e STRING); ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='France', d=4); ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='Russia', d=3); ALTER TABLE tbl_repro_oom_2 ADD PARTITION (e='Russia'); ALTER TABLE tbl_repro_oom1 DROP PARTITION (c = 'India'); --failure {code} The code cause the issue is in ExpressionTree.java: {code} valString = partitionName.substring(partitionName.indexOf(\ + keyEqual + \)+ + keyEqualLength + ).substring(0, partitionName.substring(partitionName.indexOf(\ + keyEqual + \)+ + keyEqualLength + ).indexOf(\/\)); {code} The snapshot of table partition before the drop partition statement is: {code} PART_ID CREATE_TIMELAST_ACCESS_TIME PART_NAMESD_ID TBL_ID 931376526718 0c=France/d=4 127 33 941376526718 0c=Russia/d=3 128 33 951376526718 0e=Russia 129 34 {code} Datanucleus query try to find the value of a particular key by locating $key= as the start, / as the end. For example, value of c in c=France/d=4 by locating c= as the start, / following as the end. However, this query fail if we try to find value e in e=Russia since there is no tailing /. Other database works since the query plan first filter out the partition not belonging to tbl_repro_oom1. Whether this error surface or not depends on the query optimizer. When this exception happens, metastore keep trying and throw exception. The memory image of metastore contains a large number of exception objects: {code} com.microsoft.sqlserver.jdbc.SQLServerException: Invalid length parameter passed to the LEFT or SUBSTRING function. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:197) at com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:4762) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1682) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:955) at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) at org.datanucleus.store.rdbms.query.ForwardQueryResult.init(ForwardQueryResult.java:90) at org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:686) at org.datanucleus.store.query.Query.executeQuery(Query.java:1791) at org.datanucleus.store.query.Query.executeWithMap(Query.java:1694) at org.datanucleus.api.jdo.JDOQuery.executeWithMap(JDOQuery.java:334) at org.apache.hadoop.hive.metastore.ObjectStore.listMPartitionsByFilter(ObjectStore.java:1715) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1590) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy4.getPartitionsByFilter(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:2163) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at $Proxy5.get_partitions_by_filter(Unknown Source) at
[jira] [Updated] (HIVE-5099) Some partition publish operation cause OOM in metastore backed by SQL Server
[ https://issues.apache.org/jira/browse/HIVE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5099: Status: Patch Available (was: Open) Some partition publish operation cause OOM in metastore backed by SQL Server Key: HIVE-5099 URL: https://issues.apache.org/jira/browse/HIVE-5099 Project: Hive Issue Type: Bug Components: Metastore, Windows Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-5099-1.patch, HIVE-5099-2.patch, HIVE-5099.3.patch For certain metastore operation combination, metastore operation hangs and metastore server eventually fail due to OOM. This happens when metastore is backed by SQL Server. Here is a testcase to reproduce: {code} CREATE TABLE tbl_repro_oom1 (a STRING, b INT) PARTITIONED BY (c STRING, d STRING); CREATE TABLE tbl_repro_oom_2 (a STRING ) PARTITIONED BY (e STRING); ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='France', d=4); ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='Russia', d=3); ALTER TABLE tbl_repro_oom_2 ADD PARTITION (e='Russia'); ALTER TABLE tbl_repro_oom1 DROP PARTITION (c = 'India'); --failure {code} The code cause the issue is in ExpressionTree.java: {code} valString = partitionName.substring(partitionName.indexOf(\ + keyEqual + \)+ + keyEqualLength + ).substring(0, partitionName.substring(partitionName.indexOf(\ + keyEqual + \)+ + keyEqualLength + ).indexOf(\/\)); {code} The snapshot of table partition before the drop partition statement is: {code} PART_ID CREATE_TIMELAST_ACCESS_TIME PART_NAMESD_ID TBL_ID 931376526718 0c=France/d=4 127 33 941376526718 0c=Russia/d=3 128 33 951376526718 0e=Russia 129 34 {code} Datanucleus query try to find the value of a particular key by locating $key= as the start, / as the end. For example, value of c in c=France/d=4 by locating c= as the start, / following as the end. However, this query fail if we try to find value e in e=Russia since there is no tailing /. Other database works since the query plan first filter out the partition not belonging to tbl_repro_oom1. Whether this error surface or not depends on the query optimizer. When this exception happens, metastore keep trying and throw exception. The memory image of metastore contains a large number of exception objects: {code} com.microsoft.sqlserver.jdbc.SQLServerException: Invalid length parameter passed to the LEFT or SUBSTRING function. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:197) at com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:4762) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1682) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:955) at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) at org.datanucleus.store.rdbms.query.ForwardQueryResult.init(ForwardQueryResult.java:90) at org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:686) at org.datanucleus.store.query.Query.executeQuery(Query.java:1791) at org.datanucleus.store.query.Query.executeWithMap(Query.java:1694) at org.datanucleus.api.jdo.JDOQuery.executeWithMap(JDOQuery.java:334) at org.apache.hadoop.hive.metastore.ObjectStore.listMPartitionsByFilter(ObjectStore.java:1715) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1590) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy4.getPartitionsByFilter(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:2163) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at $Proxy5.get_partitions_by_filter(Unknown Source) at
[jira] [Commented] (HIVE-5099) Some partition publish operation cause OOM in metastore backed by SQL Server
[ https://issues.apache.org/jira/browse/HIVE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922640#comment-13922640 ] Thejas M Nair commented on HIVE-5099: - +1 . Rebased the patch, changes were only to surrounding line version numbers. Some partition publish operation cause OOM in metastore backed by SQL Server Key: HIVE-5099 URL: https://issues.apache.org/jira/browse/HIVE-5099 Project: Hive Issue Type: Bug Components: Metastore, Windows Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-5099-1.patch, HIVE-5099-2.patch, HIVE-5099.3.patch For certain metastore operation combination, metastore operation hangs and metastore server eventually fail due to OOM. This happens when metastore is backed by SQL Server. Here is a testcase to reproduce: {code} CREATE TABLE tbl_repro_oom1 (a STRING, b INT) PARTITIONED BY (c STRING, d STRING); CREATE TABLE tbl_repro_oom_2 (a STRING ) PARTITIONED BY (e STRING); ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='France', d=4); ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='Russia', d=3); ALTER TABLE tbl_repro_oom_2 ADD PARTITION (e='Russia'); ALTER TABLE tbl_repro_oom1 DROP PARTITION (c = 'India'); --failure {code} The code cause the issue is in ExpressionTree.java: {code} valString = partitionName.substring(partitionName.indexOf(\ + keyEqual + \)+ + keyEqualLength + ).substring(0, partitionName.substring(partitionName.indexOf(\ + keyEqual + \)+ + keyEqualLength + ).indexOf(\/\)); {code} The snapshot of table partition before the drop partition statement is: {code} PART_ID CREATE_TIMELAST_ACCESS_TIME PART_NAMESD_ID TBL_ID 931376526718 0c=France/d=4 127 33 941376526718 0c=Russia/d=3 128 33 951376526718 0e=Russia 129 34 {code} Datanucleus query try to find the value of a particular key by locating $key= as the start, / as the end. For example, value of c in c=France/d=4 by locating c= as the start, / following as the end. However, this query fail if we try to find value e in e=Russia since there is no tailing /. Other database works since the query plan first filter out the partition not belonging to tbl_repro_oom1. Whether this error surface or not depends on the query optimizer. When this exception happens, metastore keep trying and throw exception. The memory image of metastore contains a large number of exception objects: {code} com.microsoft.sqlserver.jdbc.SQLServerException: Invalid length parameter passed to the LEFT or SUBSTRING function. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:197) at com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:4762) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1682) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:955) at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) at org.datanucleus.store.rdbms.query.ForwardQueryResult.init(ForwardQueryResult.java:90) at org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:686) at org.datanucleus.store.query.Query.executeQuery(Query.java:1791) at org.datanucleus.store.query.Query.executeWithMap(Query.java:1694) at org.datanucleus.api.jdo.JDOQuery.executeWithMap(JDOQuery.java:334) at org.apache.hadoop.hive.metastore.ObjectStore.listMPartitionsByFilter(ObjectStore.java:1715) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1590) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy4.getPartitionsByFilter(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:2163) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at
[jira] [Updated] (HIVE-6487) PTest2 do not copy failed source directories
[ https://issues.apache.org/jira/browse/HIVE-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6487: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you Szehon! I have committed this to trunk!! PTest2 do not copy failed source directories Key: HIVE-6487 URL: https://issues.apache.org/jira/browse/HIVE-6487 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Szehon Ho Fix For: 0.14.0 Attachments: HIVE-6487.2.patch, HIVE-6487.patch Right now we copy the entire source directory for failed tests back to the master (up to 5). They are 10GB per so it takes a very long time. We should remove this feature. Remove the cp command from batch-exec.vm: https://github.com/apache/hive/blob/trunk/testutils/ptest2/src/main/resources/batch-exec.vm#L91 also don't publish the number of failed tests as a template variable: NO_PRECOMMIT_TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5099) Some partition publish operation cause OOM in metastore backed by SQL Server
[ https://issues.apache.org/jira/browse/HIVE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922684#comment-13922684 ] Thejas M Nair commented on HIVE-5099: - FYI, the new versions of datanucleus in this patch has been tested to work with following databases MySQL 5.x Oracle11g r2 Postgres 8.x Postgres 9.x SQL Server - Windows Some partition publish operation cause OOM in metastore backed by SQL Server Key: HIVE-5099 URL: https://issues.apache.org/jira/browse/HIVE-5099 Project: Hive Issue Type: Bug Components: Metastore, Windows Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-5099-1.patch, HIVE-5099-2.patch, HIVE-5099.3.patch For certain metastore operation combination, metastore operation hangs and metastore server eventually fail due to OOM. This happens when metastore is backed by SQL Server. Here is a testcase to reproduce: {code} CREATE TABLE tbl_repro_oom1 (a STRING, b INT) PARTITIONED BY (c STRING, d STRING); CREATE TABLE tbl_repro_oom_2 (a STRING ) PARTITIONED BY (e STRING); ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='France', d=4); ALTER TABLE tbl_repro_oom1 ADD PARTITION (c='Russia', d=3); ALTER TABLE tbl_repro_oom_2 ADD PARTITION (e='Russia'); ALTER TABLE tbl_repro_oom1 DROP PARTITION (c = 'India'); --failure {code} The code cause the issue is in ExpressionTree.java: {code} valString = partitionName.substring(partitionName.indexOf(\ + keyEqual + \)+ + keyEqualLength + ).substring(0, partitionName.substring(partitionName.indexOf(\ + keyEqual + \)+ + keyEqualLength + ).indexOf(\/\)); {code} The snapshot of table partition before the drop partition statement is: {code} PART_ID CREATE_TIMELAST_ACCESS_TIME PART_NAMESD_ID TBL_ID 931376526718 0c=France/d=4 127 33 941376526718 0c=Russia/d=3 128 33 951376526718 0e=Russia 129 34 {code} Datanucleus query try to find the value of a particular key by locating $key= as the start, / as the end. For example, value of c in c=France/d=4 by locating c= as the start, / following as the end. However, this query fail if we try to find value e in e=Russia since there is no tailing /. Other database works since the query plan first filter out the partition not belonging to tbl_repro_oom1. Whether this error surface or not depends on the query optimizer. When this exception happens, metastore keep trying and throw exception. The memory image of metastore contains a large number of exception objects: {code} com.microsoft.sqlserver.jdbc.SQLServerException: Invalid length parameter passed to the LEFT or SUBSTRING function. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:197) at com.microsoft.sqlserver.jdbc.SQLServerResultSet$FetchBuffer.nextRow(SQLServerResultSet.java:4762) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.fetchBufferNext(SQLServerResultSet.java:1682) at com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:955) at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) at org.datanucleus.store.rdbms.query.ForwardQueryResult.init(ForwardQueryResult.java:90) at org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:686) at org.datanucleus.store.query.Query.executeQuery(Query.java:1791) at org.datanucleus.store.query.Query.executeWithMap(Query.java:1694) at org.datanucleus.api.jdo.JDOQuery.executeWithMap(JDOQuery.java:334) at org.apache.hadoop.hive.metastore.ObjectStore.listMPartitionsByFilter(ObjectStore.java:1715) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1590) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy4.getPartitionsByFilter(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:2163) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at
[jira] [Commented] (HIVE-6495) TableDesc.getDeserializer() should use correct classloader when calling Class.forName()
[ https://issues.apache.org/jira/browse/HIVE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922685#comment-13922685 ] Ashutosh Chauhan commented on HIVE-6495: [~jdere] Looks like this need to be reuploaded for jenkins to pick it up. TableDesc.getDeserializer() should use correct classloader when calling Class.forName() --- Key: HIVE-6495 URL: https://issues.apache.org/jira/browse/HIVE-6495 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6495.1.patch User is getting an error with the following stack trace below. It looks like when Class.forName() is called, it may not be using the correct class loader (JavaUtils.getClassLoader() is used in other contexts when the loaded jar may be required). {noformat} FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception java.lang.ClassNotFoundException: my.serde.ColonSerdejava.lang.RuntimeException: java.lang.ClassNotFoundException: my.serde.ColonSerde at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:68) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRowInspectorFromTable(FetchOperator.java:231) at org.apache.hadoop.hive.ql.exec.FetchOperator.getOutputObjectInspector(FetchOperator.java:608) at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:80) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:497) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:352) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:995) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1038) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.lang.ClassNotFoundException: my.serde.ColonSerde at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:66) ... 20 more {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3635) allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for the boolean hive type
[ https://issues.apache.org/jira/browse/HIVE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922710#comment-13922710 ] Brock Noland commented on HIVE-3635: +1 LGTM allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for the boolean hive type --- Key: HIVE-3635 URL: https://issues.apache.org/jira/browse/HIVE-3635 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.9.0 Reporter: Alexander Alten-Lorenz Assignee: Xuefu Zhang Attachments: HIVE-3635.1.patch, HIVE-3635.2.patch, HIVE-3635.patch interpret t as true and f as false for boolean types. PostgreSQL exports represent it that way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6446) Ability to specify hadoop.bin.path from command line -D
[ https://issues.apache.org/jira/browse/HIVE-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6446: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Remus! Ability to specify hadoop.bin.path from command line -D --- Key: HIVE-6446 URL: https://issues.apache.org/jira/browse/HIVE-6446 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor Fix For: 0.14.0 Attachments: HIVE-6446.1.patch, HIVE-6446.2.patch the surefire plugin configures hadoop.bin.path as a system property: {code} hadoop.bin.path${basedir}/${hive.path.to.root}/testutils/hadoop/hadoop.bin.path {code} On Windows testing, this should be: {code} hadoop.bin.path${basedir}/${hive.path.to.root}/testutils/hadoop.cmd/hadoop.bin.path {code} Additionally, it would be useful to be able to specify the Hadoop CLI location from -D mvn command line. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6567) show grant ... on all fails with NPE
Thejas M Nair created HIVE-6567: --- Summary: show grant ... on all fails with NPE Key: HIVE-6567 URL: https://issues.apache.org/jira/browse/HIVE-6567 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair With sql std auth - {code} hive show grant user user1 on all; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. null 2014-03-06 08:52:39,238 ERROR exec.DDLTask (DDLTask.java:execute(423)) - java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.Utilities.getDbTableName(Utilities.java:2033) at org.apache.hadoop.hive.ql.exec.DDLTask.getHivePrivilegeObject(DDLTask.java:819) at org.apache.hadoop.hive.ql.exec.DDLTask.showGrantsV2(DDLTask.java:612) at org.apache.hadoop.hive.ql.exec.DDLTask.showGrants(DDLTask.java:515) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:388) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1456) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1229) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:874) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:864) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:687) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5933) SQL std auth - add support to metastore api to list all privileges for a user
[ https://issues.apache.org/jira/browse/HIVE-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922748#comment-13922748 ] Thejas M Nair commented on HIVE-5933: - show grant user hive_test_user on all; added in HIVE-6122 provides the equivalent functionality of SHOW GRANTS FOR user; SHOW GRANTS FOR role; But the command fails with sql std auth with an NPE . Created HIVE-6567 to track it. SQL std auth - add support to metastore api to list all privileges for a user - Key: HIVE-5933 URL: https://issues.apache.org/jira/browse/HIVE-5933 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Original Estimate: 24h Remaining Estimate: 24h This is for supporting SHOW GRANTS statements - SHOW GRANTS; SHOW GRANTS FOR user; SHOW GRANTS FOR role; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5933) SQL std auth - add support to metastore api to list all privileges for a user
[ https://issues.apache.org/jira/browse/HIVE-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922752#comment-13922752 ] Thejas M Nair commented on HIVE-5933: - Duplicates HIVE-6122 except for 'show grants;' command which is not supported via HIVE-6122 . SQL std auth - add support to metastore api to list all privileges for a user - Key: HIVE-5933 URL: https://issues.apache.org/jira/browse/HIVE-5933 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Original Estimate: 24h Remaining Estimate: 24h This is for supporting SHOW GRANTS statements - SHOW GRANTS; SHOW GRANTS FOR user; SHOW GRANTS FOR role; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6508) Mismatched results between vector and non-vector mode with decimal field
[ https://issues.apache.org/jira/browse/HIVE-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922775#comment-13922775 ] Jitendra Nath Pandey commented on HIVE-6508: +1. The patch looks good to me. Mismatched results between vector and non-vector mode with decimal field Key: HIVE-6508 URL: https://issues.apache.org/jira/browse/HIVE-6508 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Attachments: HIVE-6508.1.patch Following query has a little mismatch in result as compared to the non-vector mode. {code} select d_year, i_brand_id, i_brand, sum(ss_ext_sales_price) as sum_agg from date_dim join store_sales on date_dim.d_date_sk = store_sales.ss_sold_date_sk join item on store_sales.ss_item_sk = item.i_item_sk where i_manufact_id = 128 and d_moy = 11 group by d_year, i_brand, i_brand_id order by d_year, sum_agg desc, i_brand_id limit 100; {code} This query is on tpcds data. The field ss_ext_sales_price is of type decimal(7,2) and everything else is an integer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6568) Vectorized cast of decimal to string produces strings with trailing zeros.
Jitendra Nath Pandey created HIVE-6568: -- Summary: Vectorized cast of decimal to string produces strings with trailing zeros. Key: HIVE-6568 URL: https://issues.apache.org/jira/browse/HIVE-6568 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey A decimal value 1.23 with scale 5 is represented in string as 1.23000. This behavior is different from HiveDecimal behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6551) group by after join with skew join optimization references invalid task sometimes
[ https://issues.apache.org/jira/browse/HIVE-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922810#comment-13922810 ] Ashutosh Chauhan commented on HIVE-6551: +1 group by after join with skew join optimization references invalid task sometimes - Key: HIVE-6551 URL: https://issues.apache.org/jira/browse/HIVE-6551 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-6551.1.patch.txt For example, {noformat} hive set hive.auto.convert.join = true; hive set hive.optimize.skewjoin = true; hive set hive.skewjoin.key = 3; hive EXPLAIN FROM (SELECT src.* FROM src) x JOIN (SELECT src.* FROM src) Y ON (x.key = Y.key) SELECT sum(hash(Y.key)), sum(hash(Y.value)); OK STAGE DEPENDENCIES: Stage-8 is a root stage Stage-6 depends on stages: Stage-8 Stage-5 depends on stages: Stage-6 , consists of Stage-4, Stage-2 Stage-4 Stage-2 depends on stages: Stage-4, Stage-1 Stage-0 is a root stage ... {noformat} Stage-2 references not-existing Stage-1 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6558) HiveServer2 Plain SASL authentication broken after hadoop 2.3 upgrade
[ https://issues.apache.org/jira/browse/HIVE-6558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922823#comment-13922823 ] Prasad Mujumdar commented on HIVE-6558: --- [~thejas], [~ashutoshc] would you mind taking a look. This should be a blocker for 0.13 release. Thanks! HiveServer2 Plain SASL authentication broken after hadoop 2.3 upgrade - Key: HIVE-6558 URL: https://issues.apache.org/jira/browse/HIVE-6558 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2 Affects Versions: 0.13.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Priority: Blocker Attachments: HIVE-6558.2.patch Java only includes Plain SASL client and not server. Hence HiveServer2 includes a Plain SASL server implementation. Now Hadoop has its own Plain SASL server [HADOOP-9020|https://issues.apache.org/jira/browse/HADOOP-9020] which is part of Hadoop 2.3 [release|http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/releasenotes.html]. The two servers use different Sasl callbacks and the servers are registered in java.security.Provider via static code. As a result the HiveServer2 instance could be using Hadoop's Plain SASL server which breaks the authentication. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Thoughts on new metastore APIs
Thanks for discussing this, Brock. I agree that is important to consider while writing a new metastore api call. But I think this (single input/output struct) should be a guideline, I am not sure if this should be used in every case. What you are saying shows that there is a tradeoff between ending up with more functions vs ending up with more input/output structs/classes. I am not sure if having more input/output structs is any better. Take the case of create_table/create_table_with_environment_context that you mentioned. Even though create_table had a single input argument Table, instead of adding EnvironmentContext contents to Table struct, the authors decided to create a new function with additional EnvironmentContext argument. This makes sense because the Table struct is used by other functions as well such as get_table, and EnvironmentContext fields don't make sense for those cases as it is not persisted as part of table. Which means that the only way to prevent creation of create_table_with_environment_context method would have been to have a CreateTableArg struct as input argument instead of Table as the argument. ie, creating a different struct for the single input/output every function is only way you can be sure that you don't need more functions. This approach of reducing the number of functions also means that you would start encoding different types of actions within the single input argument. Consider the case of get_partition vs get_partition_by_name. It would need a single struct with an enum that tells if it to lookup based on the partition key-values or the name, and based on the enum it would use different fields in the struct. I feel having different functions is more readable for this case. For example, the api in HIVE-5931 would need to change from listRolePrincipalGrant get_principals_in_role(1:string role_name) to struct GetPrincipalInRoleOutput{ 1:listRolePrincipalGrant rolePrincList; } struct GetPrincipalInRoleInput{ 1:string role_name; } GetPrincipalInRoleOutput get_principals_in_role(1: GetPrincipalInRoleInput input); I am not sure if the insurance costs in terms of readability is low here. I think we should consider the risk in each case of function proliferation and pay the costs accordingly. Let me know if I have misunderstood what you are proposing here. Thanks, Thejas On Wed, Mar 5, 2014 at 11:39 AM, Brock Noland br...@cloudera.com wrote: Hi, There is a ton of great work going into the 0.13 release. Specifically, we are adding a ton of APIs to the metastore: https://github.com/apache/hive/blame/trunk/metastore/if/hive_metastore.thrift Few of these new API's follow the best practice of a single request and response struct. Some follow this partially by having a single response object but take no arguments while others return void and take a single request object. Still others, mostly related to authorization, do not even partially follow this pattern. The single request/response struct model is extremely important as changing the number of arguments is a backwards incompatible change. Therefore the only way to change an api is to add *new* methods calls. This is why we have so many crazy APIs in the hive metastore such as create_table/create_table_with_environment_context and 12 (yes, twelve) ways to get partitions. I would like to suggest that we require all new APIs to follow the single request/response struct model. That is any new API that would be committed *after* today. I have heard the following arguments against this approach which I believe to be invalid: *This API will never change (or never return a value or never take another value)* We all have been writing code enough that we don't know, there are unknown unknowns. By following the single request/response struct model for *all* APIs we can future proof ourselves. Why wouldn't we want to buy insurance now when it's cheap? *The performance impact of wrapping an object is too much* These calls are being made over the network which is orders of magnitude slower than creating a small, simple, and lightweight object to wrap method arguments and response values. Cheers, Brock -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Hung Precommit Jenkins Jobs
Ashutosh informed me that the precommit build was hung. Long story short, PTest2 had completed. For example, see this job which took 8 hours: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1636/ That correpsonds to: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-1637/execution.txt which if you scroll all the way to the bottom, you will note it finished long before that: 2014-03-06 05:48:22,194 INFO PTest.run:207 Executed 5358 tests 2014-03-06 05:48:22,194 INFO PTest.run:209 PERF: Phase ExecutionPhase took 101 minutes 2014-03-06 05:48:22,194 INFO PTest.run:209 PERF: Phase PrepPhase took 5 minutes 2014-03-06 05:48:22,194 INFO PTest.run:209 PERF: Phase ReportingPhase took 0 minutes 2014-03-06 05:48:22,194 INFO JIRAService.postComment:136 Comment: {color:red}Overall{color}: -1 at least one tests failed Long story short, it looks like Bigtop Jenkins is not as reliable as we would like.
[jira] [Updated] (HIVE-6555) TestSchemaTool is failing on trunk after branching
[ https://issues.apache.org/jira/browse/HIVE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6555: --- Status: Patch Available (was: Open) TestSchemaTool is failing on trunk after branching -- Key: HIVE-6555 URL: https://issues.apache.org/jira/browse/HIVE-6555 Project: Hive Issue Type: Bug Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6555-branch13.patch, HIVE-6555.patch This is because version was bumped to 0.14 in pom file and there are no metastore scripts for 0.14 yet. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6338) Improve exception handling in createDefaultDb() in Metastore
[ https://issues.apache.org/jira/browse/HIVE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6338: --- Status: Open (was: Patch Available) Improve exception handling in createDefaultDb() in Metastore Key: HIVE-6338 URL: https://issues.apache.org/jira/browse/HIVE-6338 Project: Hive Issue Type: Task Components: Metastore Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.9.0, 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-6338.1.patch, HIVE-6338.patch There is a suggestion on HIVE-5959 comment list on possible improvements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6417) sql std auth - new users in admin role config should get added
[ https://issues.apache.org/jira/browse/HIVE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6417: --- Attachment: HIVE-6417.1.patch Re-attach for Hive QA to pick up. sql std auth - new users in admin role config should get added -- Key: HIVE-6417 URL: https://issues.apache.org/jira/browse/HIVE-6417 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Ashutosh Chauhan Attachments: HIVE-6417.1.patch, HIVE-6417.patch if metastore is started with hive.users.in.admin.role=user1, then user1 is added admin role to metastore. If the value is changed to hive.users.in.admin.role=user2, then user2 should get added to the role in metastore. Right now, if the admin role exists, new users don't get added. A work-around is - user1 adding user2 to the admin role using grant role statement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6338) Improve exception handling in createDefaultDb() in Metastore
[ https://issues.apache.org/jira/browse/HIVE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6338: --- Attachment: HIVE-6338.1.patch Reattaching for Hive QA to pick up. Improve exception handling in createDefaultDb() in Metastore Key: HIVE-6338 URL: https://issues.apache.org/jira/browse/HIVE-6338 Project: Hive Issue Type: Task Components: Metastore Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-6338.1.patch, HIVE-6338.patch There is a suggestion on HIVE-5959 comment list on possible improvements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6529) Tez output files are out of date
[ https://issues.apache.org/jira/browse/HIVE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6529: --- Resolution: Implemented Status: Resolved (was: Patch Available) looks like it was already done as part of 60ff41c Tue Feb 25 07:58:52 2014 + Merge latest trunk into branch. (Gunther Hagleitner) Tez output files are out of date Key: HIVE-6529 URL: https://issues.apache.org/jira/browse/HIVE-6529 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Attachments: HIVE-6529.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (HIVE-6538) yet another annoying exception in test logs
[ https://issues.apache.org/jira/browse/HIVE-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reopened HIVE-6538: sorry wrong jira yet another annoying exception in test logs --- Key: HIVE-6538 URL: https://issues.apache.org/jira/browse/HIVE-6538 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Szehon Ho Priority: Trivial Attachments: HIVE-6538.2.patch, HIVE-6538.patch Whenever you look at failed q tests you have to go thru this useless exception. {noformat} 2014-03-03 11:22:54,872 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(143)) - MetaException(message:NoSuchObjectException(message:Function default.qtest_get_java_boolean does not exist)) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:4575) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_function(HiveMetaStore.java:4702) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at $Proxy8.get_function(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getFunction(HiveMetaStoreClient.java:1526) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) at $Proxy9.getFunction(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:2603) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfoFromMetastore(FunctionRegistry.java:546) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getQualifiedFunctionInfo(FunctionRegistry.java:578) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:599) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:606) at org.apache.hadoop.hive.ql.parse.FunctionSemanticAnalyzer.analyzeDropFunction(FunctionSemanticAnalyzer.java:94) at org.apache.hadoop.hive.ql.parse.FunctionSemanticAnalyzer.analyzeInternal(FunctionSemanticAnalyzer.java:60) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:345) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1078) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1121) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1014) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004) at org.apache.hadoop.hive.ql.QTestUtil.runCmd(QTestUtil.java:655) at org.apache.hadoop.hive.ql.QTestUtil.createSources(QTestUtil.java:772) at org.apache.hadoop.hive.cli.TestCliDriver.clinit(TestCliDriver.java:46) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.internal.runners.SuiteMethod.testFromSuiteMethod(SuiteMethod.java:34) at org.junit.internal.runners.SuiteMethod.init(SuiteMethod.java:23) at org.junit.internal.builders.SuiteMethodBuilder.runnerForClass(SuiteMethodBuilder.java:14) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57) at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:29) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57) at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:262) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at
[jira] [Updated] (HIVE-6538) yet another annoying exception in test logs
[ https://issues.apache.org/jira/browse/HIVE-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6538: --- Resolution: Implemented Status: Resolved (was: Patch Available) Looks like this was done as part of 60ff41c Tue Feb 25 07:58:52 2014 + Merge latest trunk into branch. (Gunther Hagleitner) yet another annoying exception in test logs --- Key: HIVE-6538 URL: https://issues.apache.org/jira/browse/HIVE-6538 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Szehon Ho Priority: Trivial Attachments: HIVE-6538.2.patch, HIVE-6538.patch Whenever you look at failed q tests you have to go thru this useless exception. {noformat} 2014-03-03 11:22:54,872 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(143)) - MetaException(message:NoSuchObjectException(message:Function default.qtest_get_java_boolean does not exist)) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:4575) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_function(HiveMetaStore.java:4702) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at $Proxy8.get_function(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getFunction(HiveMetaStoreClient.java:1526) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) at $Proxy9.getFunction(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:2603) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfoFromMetastore(FunctionRegistry.java:546) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getQualifiedFunctionInfo(FunctionRegistry.java:578) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:599) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:606) at org.apache.hadoop.hive.ql.parse.FunctionSemanticAnalyzer.analyzeDropFunction(FunctionSemanticAnalyzer.java:94) at org.apache.hadoop.hive.ql.parse.FunctionSemanticAnalyzer.analyzeInternal(FunctionSemanticAnalyzer.java:60) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:345) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1078) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1121) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1014) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004) at org.apache.hadoop.hive.ql.QTestUtil.runCmd(QTestUtil.java:655) at org.apache.hadoop.hive.ql.QTestUtil.createSources(QTestUtil.java:772) at org.apache.hadoop.hive.cli.TestCliDriver.clinit(TestCliDriver.java:46) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.internal.runners.SuiteMethod.testFromSuiteMethod(SuiteMethod.java:34) at org.junit.internal.runners.SuiteMethod.init(SuiteMethod.java:23) at org.junit.internal.builders.SuiteMethodBuilder.runnerForClass(SuiteMethodBuilder.java:14) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57) at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:29) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57) at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:262) at
Re: Thoughts on new metastore APIs
On Thu, Mar 6, 2014 at 12:13 PM, Thejas Nair the...@hortonworks.com wrote: Thanks for discussing this, Brock. I agree that is important to consider while writing a new metastore api call. But I think this (single input/output struct) should be a guideline, I am not sure if this should be used in every case. As with any rule, there are always exceptions. However, looking at the new API's I don't see an instance where it would would have been harmful to use this model. Exceptions should be extremely rare since thousands of RPC implementations successfully use the request/response model. However, as always, it would be up to the developers working on the change to make this call. They'd do so knowing they are going against the guideline and could be asked to justify why they are doing so. What you are saying shows that there is a tradeoff between ending up with more functions vs ending up with more input/output structs/classes. I am not sure if having more input/output structs is any better. Take the case of create_table/create_table_with_environment_context that you mentioned. Even though create_table had a single input argument Table, instead of adding EnvironmentContext contents to Table struct, the authors decided to create a new function with additional EnvironmentContext argument. This makes sense because the Table struct is used by other functions as well such as get_table, and EnvironmentContext fields don't make sense for those cases as it is not persisted as part of table. Which means that the only way to prevent creation of create_table_with_environment_context method would have been to have a CreateTableArg struct as input argument instead of Table as the argument. ie, creating a different struct for the single input/output every function is only way you can be sure that you don't need more functions. RPC methods are special. They are published to the world and therefore cannot be easily modified or refactored. Once we create a new RPC method, we are stuck with it for a very long time. In this way, Thrift is rather strange in that it allows you to exposed the api signatures. The request/response model is far more common. Therefore, if we were creating create_table from scratch, I would suggest we use: CreateTableRequest/CreateTableResponse That way, we can add optional arguments such as environment context very easily. Although, I'd love to take credit for this idea, I didn't just come up with this myself. This is a standard way of handling RPC. This approach of reducing the number of functions also means that you would start encoding different types of actions within the single input argument. Consider the case of get_partition vs get_partition_by_name. It would need a single struct with an enum that tells if it to lookup based on the partition key-values or the name, and based on the enum it would use different fields in the struct. I feel having different functions is more readable for this case. There will be cases where we'll need similar methods. This point where is with the request/response model, adding a single parameter doesn't require an entirely new method. The developers working on the change would have to make the call as to whether a new functionality requires a new API or can be handled within the current API. For example, the api in HIVE-5931 would need to change from listRolePrincipalGrant get_principals_in_role(1:string role_name) to struct GetPrincipalInRoleOutput{ 1:listRolePrincipalGrant rolePrincList; } struct GetPrincipalInRoleInput{ 1:string role_name; } GetPrincipalInRoleOutput get_principals_in_role(1: GetPrincipalInRoleInput input); I am not sure if the insurance costs in terms of readability is low here. I think we should consider the risk in each case of function proliferation and pay the costs accordingly. Let me know if I have misunderstood what you are proposing here. The Output and Input names do feel odd. As I said above. Request/Response are standard names for these kinds of objects. Today, for a method like the one above, it may feel like overheard or extra work. However, in the future when you want to add another parameter such as isFilter or encoding etc, then the insurance pays off big time. Brock
[jira] [Commented] (HIVE-6468) HS2 out of memory error when curl sends a get request
[ https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922869#comment-13922869 ] Xuefu Zhang commented on HIVE-6468: --- I agree that guard against this is good. Just curious, however, why a http get request would put HS2 in OOM? It's understandable that HS2 doesn't understand the request, but how it runs out of memory seems interesting. HS2 out of memory error when curl sends a get request - Key: HIVE-6468 URL: https://issues.apache.org/jira/browse/HIVE-6468 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Centos 6.3, hive 12, hadoop-2.2 Reporter: Abin Shahab Assignee: Navis Attachments: HIVE-6468.1.patch.txt We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) curl localhost:1 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6434) Restrict function create/drop to admin roles
[ https://issues.apache.org/jira/browse/HIVE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922877#comment-13922877 ] Thejas M Nair commented on HIVE-6434: - +1 Restrict function create/drop to admin roles Key: HIVE-6434 URL: https://issues.apache.org/jira/browse/HIVE-6434 Project: Hive Issue Type: Sub-task Components: Authorization, UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6434.1.patch, HIVE-6434.2.patch, HIVE-6434.3.patch, HIVE-6434.4.patch, HIVE-6434.5.patch Restrict function create/drop to admin roles, if sql std auth is enabled. This would include temp/permanent functions, as well as macros. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6417) sql std auth - new users in admin role config should get added
[ https://issues.apache.org/jira/browse/HIVE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6417: --- Status: Open (was: Patch Available) sql std auth - new users in admin role config should get added -- Key: HIVE-6417 URL: https://issues.apache.org/jira/browse/HIVE-6417 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Ashutosh Chauhan Attachments: HIVE-6417.patch if metastore is started with hive.users.in.admin.role=user1, then user1 is added admin role to metastore. If the value is changed to hive.users.in.admin.role=user2, then user2 should get added to the role in metastore. Right now, if the admin role exists, new users don't get added. A work-around is - user1 adding user2 to the admin role using grant role statement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6338) Improve exception handling in createDefaultDb() in Metastore
[ https://issues.apache.org/jira/browse/HIVE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6338: --- Status: Patch Available (was: Open) Improve exception handling in createDefaultDb() in Metastore Key: HIVE-6338 URL: https://issues.apache.org/jira/browse/HIVE-6338 Project: Hive Issue Type: Task Components: Metastore Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.9.0, 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-6338.1.patch, HIVE-6338.patch There is a suggestion on HIVE-5959 comment list on possible improvements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6417) sql std auth - new users in admin role config should get added
[ https://issues.apache.org/jira/browse/HIVE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6417: --- Status: Patch Available (was: Open) sql std auth - new users in admin role config should get added -- Key: HIVE-6417 URL: https://issues.apache.org/jira/browse/HIVE-6417 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Ashutosh Chauhan Attachments: HIVE-6417.1.patch, HIVE-6417.patch if metastore is started with hive.users.in.admin.role=user1, then user1 is added admin role to metastore. If the value is changed to hive.users.in.admin.role=user2, then user2 should get added to the role in metastore. Right now, if the admin role exists, new users don't get added. A work-around is - user1 adding user2 to the admin role using grant role statement. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 18162: HIVE-6434: Restrict function create/drop to admin roles
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18162/#review36395 --- Ship it! Ship It! - Thejas Nair On Feb. 26, 2014, 2:10 a.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18162/ --- (Updated Feb. 26, 2014, 2:10 a.m.) Review request for hive and Thejas Nair. Bugs: HIVE-6434 https://issues.apache.org/jira/browse/HIVE-6434 Repository: hive-git Description --- Add output entity of DB object to make sure only admin roles can add/drop functions/macros. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java 68a25e0 ql/src/java/org/apache/hadoop/hive/ql/parse/MacroSemanticAnalyzer.java 0ae07e3 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java c43bcea ql/src/test/queries/clientnegative/authorization_create_func1.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_create_func2.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_create_macro1.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_create_func1.q PRE-CREATION ql/src/test/queries/clientpositive/authorization_create_macro1.q PRE-CREATION ql/src/test/results/clientnegative/authorization_create_func1.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_create_func2.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_create_macro1.q.out PRE-CREATION ql/src/test/results/clientnegative/cluster_tasklog_retrieval.q.out 747aa6a ql/src/test/results/clientnegative/create_function_nonexistent_class.q.out 393a3e8 ql/src/test/results/clientnegative/create_function_nonexistent_db.q.out ebb069e ql/src/test/results/clientnegative/create_function_nonudf_class.q.out dd66afc ql/src/test/results/clientnegative/create_udaf_failure.q.out 3fc3d36 ql/src/test/results/clientnegative/create_unknown_genericudf.q.out af3d50b ql/src/test/results/clientnegative/create_unknown_udf_udaf.q.out e138fd0 ql/src/test/results/clientnegative/drop_native_udf.q.out 1913df9 ql/src/test/results/clientnegative/udf_function_does_not_implement_udf.q.out 9ea8668 ql/src/test/results/clientnegative/udf_local_resource.q.out b6ea77d ql/src/test/results/clientnegative/udf_nonexistent_resource.q.out ad70d54 ql/src/test/results/clientnegative/udf_test_error.q.out a788a10 ql/src/test/results/clientnegative/udf_test_error_reduce.q.out 98b42e0 ql/src/test/results/clientpositive/authorization_create_func1.q.out PRE-CREATION ql/src/test/results/clientpositive/authorization_create_macro1.q.out PRE-CREATION ql/src/test/results/clientpositive/autogen_colalias.q.out a074b96 ql/src/test/results/clientpositive/compile_processor.q.out 7e9bb29 ql/src/test/results/clientpositive/create_func1.q.out 5a249c3 ql/src/test/results/clientpositive/create_genericudaf.q.out 96fe2fa ql/src/test/results/clientpositive/create_genericudf.q.out bf1f4ac ql/src/test/results/clientpositive/create_udaf.q.out 2e86a36 ql/src/test/results/clientpositive/create_view.q.out ecc7618 ql/src/test/results/clientpositive/drop_udf.q.out 422933a ql/src/test/results/clientpositive/macro.q.out c483029 ql/src/test/results/clientpositive/ptf_register_tblfn.q.out 11c9724 ql/src/test/results/clientpositive/udaf_sum_list.q.out b1922d9 ql/src/test/results/clientpositive/udf_compare_java_string.q.out 8e6e365 ql/src/test/results/clientpositive/udf_context_aware.q.out 10414fa ql/src/test/results/clientpositive/udf_logic_java_boolean.q.out 88c1984 ql/src/test/results/clientpositive/udf_testlength.q.out 4d75482 ql/src/test/results/clientpositive/udf_testlength2.q.out 8a1e03e ql/src/test/results/clientpositive/udf_using.q.out 69e5f3b ql/src/test/results/clientpositive/windowing_udaf2.q.out 5043a45 Diff: https://reviews.apache.org/r/18162/diff/ Testing --- positive/negative q files added Thanks, Jason Dere
[jira] [Updated] (HIVE-6538) yet another annoying exception in test logs
[ https://issues.apache.org/jira/browse/HIVE-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6538: Attachment: HIVE-6538.2.patch Looks like pre-commit queue died last night, I'm resubmitting yet another annoying exception in test logs --- Key: HIVE-6538 URL: https://issues.apache.org/jira/browse/HIVE-6538 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Szehon Ho Priority: Trivial Attachments: HIVE-6538.2.patch, HIVE-6538.2.patch, HIVE-6538.patch Whenever you look at failed q tests you have to go thru this useless exception. {noformat} 2014-03-03 11:22:54,872 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(143)) - MetaException(message:NoSuchObjectException(message:Function default.qtest_get_java_boolean does not exist)) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:4575) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_function(HiveMetaStore.java:4702) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at $Proxy8.get_function(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getFunction(HiveMetaStoreClient.java:1526) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) at $Proxy9.getFunction(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:2603) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfoFromMetastore(FunctionRegistry.java:546) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getQualifiedFunctionInfo(FunctionRegistry.java:578) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:599) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:606) at org.apache.hadoop.hive.ql.parse.FunctionSemanticAnalyzer.analyzeDropFunction(FunctionSemanticAnalyzer.java:94) at org.apache.hadoop.hive.ql.parse.FunctionSemanticAnalyzer.analyzeInternal(FunctionSemanticAnalyzer.java:60) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:345) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1078) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1121) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1014) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004) at org.apache.hadoop.hive.ql.QTestUtil.runCmd(QTestUtil.java:655) at org.apache.hadoop.hive.ql.QTestUtil.createSources(QTestUtil.java:772) at org.apache.hadoop.hive.cli.TestCliDriver.clinit(TestCliDriver.java:46) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.internal.runners.SuiteMethod.testFromSuiteMethod(SuiteMethod.java:34) at org.junit.internal.runners.SuiteMethod.init(SuiteMethod.java:23) at org.junit.internal.builders.SuiteMethodBuilder.runnerForClass(SuiteMethodBuilder.java:14) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57) at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:29) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57) at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:262) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at
[jira] [Commented] (HIVE-6565) OrcSerde should be added as NativeSerDe in SerDeUtils
[ https://issues.apache.org/jira/browse/HIVE-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922895#comment-13922895 ] Xuefu Zhang commented on HIVE-6565: --- [~branky] Thanks for pointing this out. I'm curious if this solves any particular problem. If so, putting the problem in the description would be very helpful. I asked the question because I thought this might solve HIVE-4703. I tried but it seemed not helping that. Thanks. OrcSerde should be added as NativeSerDe in SerDeUtils - Key: HIVE-6565 URL: https://issues.apache.org/jira/browse/HIVE-6565 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Branky Shao If the table defined as ORC format, the columns info can be fetched from StorageDescriptor, no need to get from SerDe. And obviously, ORC is a Hive's native file format, therefore, OrcSerde should be added as NativeSerDe in SerDeUtils. The fix is fairly simple, just add single line in SerDeUtils : nativeSerDeNames.add(org.apache.hadoop.hive.ql.io.orc.OrcSerde.class.getName()); -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Timeline for the Hive 0.13 release?
ok sure. Tracking these with the JQL below. I don’t have permission to setup a Shared Filter; can someone help with this. Of the 35 issues: 11 are still open, 22 are patch available, 2 are resolved. regards, Harish. JQL: id in (HIVE-5317, HIVE-5843, HIVE-6060, HIVE-6319, HIVE-6460, HIVE-5687, HIVE-5943, HIVE-5942, HIVE-6547, HIVE-5155, HIVE-6486, HIVE-6455, HIVE-4177, HIVE-4764, HIVE-6306, HIVE-6350, HIVE-6485, HIVE-6507, HIVE-6499, HIVE-6325, HIVE-6558, HIVE-6403, HIVE-4790, HIVE-4293, HIVE-6551, HIVE-6359, HIVE-6314, HIVE-6241, HIVE-5768, HIVE-2752, HIVE-6312, HIVE-6129, HIVE-6012, HIVE-6434, HIVE-6562) ORDER BY status ASC, assignee On Mar 5, 2014, at 6:50 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: Can you consider HIVE-6562 as well? HIVE-6562 - Protection from exceptions in ORC predicate evaluation Thanks Prasanth Jayachandran On Mar 5, 2014, at 5:56 PM, Jason Dere jd...@hortonworks.com wrote: Would like to get these in, if possible: HIVE-6012 restore backward compatibility of arithmetic operations HIVE-6434 Restrict function create/drop to admin roles On Mar 5, 2014, at 5:41 PM, Navis류승우 navis@nexr.com wrote: I have really big wish list(65 pending) but it would be time to focus on finalization. - Small bugs HIVE-6403 uncorrelated subquery is failing with auto.convert.join=true HIVE-4790 MapredLocalTask task does not make virtual columns HIVE-4293 Predicates following UDTF operator are removed by PPD - Trivials HIVE-6551 group by after join with skew join optimization references invalid task sometimes HIVE-6359 beeline -f fails on scripts with tabs in them. HIVE-6314 The logging (progress reporting) is too verbose HIVE-6241 Remove direct reference of Hadoop23Shims inQTestUtil HIVE-5768 Beeline connection cannot be closed with !close command HIVE-2752 Index names are case sensitive - Memory leakage HIVE-6312 doAs with plain sasl auth should be session aware - Implementation is not accord with document HIVE-6129 alter exchange is implemented in inverted manner I'll update the wiki, too. 2014-03-05 12:18 GMT+09:00 Harish Butani hbut...@hortonworks.com: Tracking jiras to be applied to branch 0.13 here: https://cwiki.apache.org/confluence/display/Hive/Hive+0.13+release+status On Mar 4, 2014, at 5:45 PM, Harish Butani hbut...@hortonworks.com wrote: the branch is created. have changed the poms in both branches. Planning to setup a wikipage to track jiras that will get ported to 0.13 regards, Harish. On Mar 4, 2014, at 5:05 PM, Harish Butani hbut...@hortonworks.com wrote: branching now. Will be changing the pom files on trunk. Will send another email when the branch and trunk changes are in. On Mar 4, 2014, at 4:03 PM, Sushanth Sowmyan khorg...@gmail.com wrote: I have two patches still as patch-available, that have had +1s as well, but are waiting on pre-commit tests picking them up go in to 0.13: https://issues.apache.org/jira/browse/HIVE-6507 (refactor of table property names from string constants to an enum in OrcFile) https://issues.apache.org/jira/browse/HIVE-6499 (fixes bug where calls like create table and drop table can fail if metastore-side authorization is used in conjunction with custom inputformat/outputformat/serdes that are not loadable from the metastore-side) -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of
[jira] [Updated] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6414: Attachment: HIVE-6414.3.patch Re-submitting the patch, on behalf of Justin, to retrigger the pre-commit test. ParquetInputFormat provides data values that do not match the object inspectors --- Key: HIVE-6414 URL: https://issues.apache.org/jira/browse/HIVE-6414 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Remus Rusanu Assignee: Justin Coffey Labels: Parquet Fix For: 0.13.0 Attachments: HIVE-6414.2.patch, HIVE-6414.3.patch, HIVE-6414.3.patch, HIVE-6414.patch While working on HIVE-5998 I noticed that the ParquetRecordReader returns IntWritable for all 'int like' types, in disaccord with the row object inspectors. I though fine, and I worked my way around it. But I see now that the issue trigger failuers in other places, eg. in aggregates: {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cint:528534767,ctinyint:31,csmallint:4963,cfloat:31.0,cdouble:4963.0,cstring1:cvLH6Eat2yFsyy7p} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) ... 9 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) ... 15 more {noformat} My test is (I'm writing a test .q from HIVE-5998, but the repro does not involve vectorization): {noformat} create table if not exists alltypes_parquet ( cint int, ctinyint tinyint, csmallint smallint, cfloat float, cdouble double, cstring1 string) stored as parquet; insert overwrite table alltypes_parquet select cint, ctinyint, csmallint, cfloat, cdouble, cstring1 from alltypesorc; explain select * from alltypes_parquet limit 10; select * from alltypes_parquet limit 10; explain select ctinyint, max(cint), min(csmallint), count(cstring1), avg(cfloat), stddev_pop(cdouble) from alltypes_parquet group by ctinyint; select ctinyint, max(cint), min(csmallint), count(cstring1), avg(cfloat), stddev_pop(cdouble) from alltypes_parquet group by ctinyint; {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6549) removed templeton.jar from webhcat-default.xml
[ https://issues.apache.org/jira/browse/HIVE-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922914#comment-13922914 ] Lefty Leverenz commented on HIVE-6549: -- More advantages of wikidocs: * Access -- you can look up the information even when the code isn't available. * Elaboration -- additional notes and guidance. * Search -- well, I'd like to say you can always find a config variable by googling it, but a random check of Hive config properties had more misses than hits. And one search found an svn copy of hive-default.xml. * Review -- after initial release, descriptions are more likely to get reviewed in the wiki and corrections are easier. Of course, that leads to a major disadvantage: divergence of the wikidoc from the source file. removed templeton.jar from webhcat-default.xml -- Key: HIVE-6549 URL: https://issues.apache.org/jira/browse/HIVE-6549 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Priority: Minor this property is no longer used also removed corresponding AppConfig.TEMPLETON_JAR_NAME -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6537) NullPointerException when loading hashtable for MapJoin directly
[ https://issues.apache.org/jira/browse/HIVE-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922933#comment-13922933 ] Gunther Hagleitner commented on HIVE-6537: -- +1 NullPointerException when loading hashtable for MapJoin directly Key: HIVE-6537 URL: https://issues.apache.org/jira/browse/HIVE-6537 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6537.01.patch, HIVE-6537.2.patch.txt, HIVE-6537.patch We see the following error: {noformat} 2014-02-20 23:33:15,743 FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.load(HashTableLoader.java:103) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:149) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:164) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1026) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1030) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1030) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.NullPointerException at java.util.Arrays.fill(Arrays.java:2685) at org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.loadDirectly(HashTableLoader.java:155) at org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.load(HashTableLoader.java:81) ... 15 more {noformat} It appears that the tables in Arrays.fill call is nulls. I don't really have full understanding of this path, but what I gleaned so far is this... From what I see, tables would be set unconditionally in initializeOp of the sink, and in no other place, so I assume for this code to ever work that startForward calls it at least some time. Here, it doesn't call it, so it's null. Previous loop also uses tables, and should have NPE-d before fill was ever called; it didn't, so I'd assume it never executed. There's a little bit of inconsistency in the above code where directWorks are added to parents unconditionally but sink is only added as child conditionally. I think it may be that some of the direct works are not table scans; in fact given that loop never executes they may be null (which is rather strange). Regardless, it seems that the logic should be fixed, it may be the root cause -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6411) Support more generic way of using composite key for HBaseHandler
[ https://issues.apache.org/jira/browse/HIVE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922962#comment-13922962 ] Xuefu Zhang commented on HIVE-6411: --- [~navis], would you mind update the review board with the latest patch? Thanks. Support more generic way of using composite key for HBaseHandler Key: HIVE-6411 URL: https://issues.apache.org/jira/browse/HIVE-6411 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6411.1.patch.txt, HIVE-6411.2.patch.txt, HIVE-6411.3.patch.txt, HIVE-6411.4.patch.txt, HIVE-6411.5.patch.txt HIVE-2599 introduced using custom object for the row key. But it forces key objects to extend HBaseCompositeKey, which is again extension of LazyStruct. If user provides proper Object and OI, we can replace internal key and keyOI with those. Initial implementation is based on factory interface. {code} public interface HBaseKeyFactory { void init(SerDeParameters parameters, Properties properties) throws SerDeException; ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException; LazyObjectBase createObject(ObjectInspector inspector) throws SerDeException; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6566) Incorrect union-all plan with map-joins on Tez
[ https://issues.apache.org/jira/browse/HIVE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6566: - Attachment: HIVE-6566.2.patch Incorrect union-all plan with map-joins on Tez -- Key: HIVE-6566 URL: https://issues.apache.org/jira/browse/HIVE-6566 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6566.1.patch, HIVE-6566.2.patch The tez dag is hooked up incorrectly for some union all queries involving map joins. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6566) Incorrect union-all plan with map-joins on Tez
[ https://issues.apache.org/jira/browse/HIVE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922968#comment-13922968 ] Gunther Hagleitner commented on HIVE-6566: -- .2 adds comments (per review request). Incorrect union-all plan with map-joins on Tez -- Key: HIVE-6566 URL: https://issues.apache.org/jira/browse/HIVE-6566 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6566.1.patch, HIVE-6566.2.patch The tez dag is hooked up incorrectly for some union all queries involving map joins. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6566) Incorrect union-all plan with map-joins on Tez
[ https://issues.apache.org/jira/browse/HIVE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6566: - Status: Open (was: Patch Available) Incorrect union-all plan with map-joins on Tez -- Key: HIVE-6566 URL: https://issues.apache.org/jira/browse/HIVE-6566 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6566.1.patch, HIVE-6566.2.patch The tez dag is hooked up incorrectly for some union all queries involving map joins. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6566) Incorrect union-all plan with map-joins on Tez
[ https://issues.apache.org/jira/browse/HIVE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6566: - Status: Patch Available (was: Open) Incorrect union-all plan with map-joins on Tez -- Key: HIVE-6566 URL: https://issues.apache.org/jira/browse/HIVE-6566 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6566.1.patch, HIVE-6566.2.patch The tez dag is hooked up incorrectly for some union all queries involving map joins. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6338) Improve exception handling in createDefaultDb() in Metastore
[ https://issues.apache.org/jira/browse/HIVE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923008#comment-13923008 ] Hive QA commented on HIVE-6338: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12633195/HIVE-6338.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5358 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20 org.apache.hive.beeline.TestSchemaTool.testSchemaInit org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1638/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1638/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12633195 Improve exception handling in createDefaultDb() in Metastore Key: HIVE-6338 URL: https://issues.apache.org/jira/browse/HIVE-6338 Project: Hive Issue Type: Task Components: Metastore Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-6338.1.patch, HIVE-6338.patch There is a suggestion on HIVE-5959 comment list on possible improvements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5728) Make ORC InputFormat/OutputFormat usable outside Hive
[ https://issues.apache.org/jira/browse/HIVE-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-5728: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Patch is committed. Mark it as resolved. Make ORC InputFormat/OutputFormat usable outside Hive - Key: HIVE-5728 URL: https://issues.apache.org/jira/browse/HIVE-5728 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.13.0 Attachments: HIVE-5728-1.patch, HIVE-5728-10.patch, HIVE-5728-2.patch, HIVE-5728-3.patch, HIVE-5728-4.patch, HIVE-5728-5.patch, HIVE-5728-6.patch, HIVE-5728-7.patch, HIVE-5728-8.patch, HIVE-5728-9.patch, HIVE-5728.10.patch, HIVE-5728.11.patch, HIVE-5728.12.patch, HIVE-5728.13.patch ORC InputFormat/OutputFormat is currently not usable outside Hive. There are several issues need to solve: 1. Several class is not public, eg: OrcStruct 2. There is no InputFormat/OutputFormat for new api (Some tools such as Pig need new api) 3. Has no way to push WriteOption to OutputFormat outside Hive -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6507) OrcFile table property names are specified as strings
[ https://issues.apache.org/jira/browse/HIVE-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-6507: --- Status: Open (was: Patch Available) OrcFile table property names are specified as strings - Key: HIVE-6507 URL: https://issues.apache.org/jira/browse/HIVE-6507 Project: Hive Issue Type: Bug Components: HCatalog, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-6507.2.patch, HIVE-6507.patch In HIVE-5504, we had to do some special casing in HCatalog to add a particular set of orc table properties from table properties to job properties. In doing so, it's obvious that that is a bit cumbersome, and ideally, the list of all orc file table properties should really be an enum, rather than individual loosely tied constant strings. If we were to clean this up, we can clean up other code that references this to reference the entire enum, and avoid future errors when new table properties are introduced, but other referencing code is not updated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6507) OrcFile table property names are specified as strings
[ https://issues.apache.org/jira/browse/HIVE-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-6507: --- Status: Patch Available (was: Open) OrcFile table property names are specified as strings - Key: HIVE-6507 URL: https://issues.apache.org/jira/browse/HIVE-6507 Project: Hive Issue Type: Bug Components: HCatalog, Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-6507.2.patch, HIVE-6507.patch In HIVE-5504, we had to do some special casing in HCatalog to add a particular set of orc table properties from table properties to job properties. In doing so, it's obvious that that is a bit cumbersome, and ideally, the list of all orc file table properties should really be an enum, rather than individual loosely tied constant strings. If we were to clean this up, we can clean up other code that references this to reference the entire enum, and avoid future errors when new table properties are introduced, but other referencing code is not updated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6566) Incorrect union-all plan with map-joins on Tez
[ https://issues.apache.org/jira/browse/HIVE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923033#comment-13923033 ] Sergey Shelukhin commented on HIVE-6566: +1 Incorrect union-all plan with map-joins on Tez -- Key: HIVE-6566 URL: https://issues.apache.org/jira/browse/HIVE-6566 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6566.1.patch, HIVE-6566.2.patch The tez dag is hooked up incorrectly for some union all queries involving map joins. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6137) Hive should report that the file/path doesn’t exist when it doesn’t
[ https://issues.apache.org/jira/browse/HIVE-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-6137: Attachment: HIVE-6137.6.patch cc-ing [~ashutoshc] , Slight difference from the previous patch which caused e.getCause() to return null. Hive should report that the file/path doesn’t exist when it doesn’t --- Key: HIVE-6137 URL: https://issues.apache.org/jira/browse/HIVE-6137 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-6137.1.patch, HIVE-6137.2.patch, HIVE-6137.3.patch, HIVE-6137.4.patch, HIVE-6137.5.patch, HIVE-6137.6.patch Hive should report that the file/path doesn’t exist when it doesn’t (it now reports SocketTimeoutException): Execute a Hive DDL query with a reference to a non-existent blob (such as CREATE EXTERNAL TABLE...) and check Hive logs (stderr): FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: java.io.IOException) This error message is not detailed enough. If a file doesn't exist, Hive should report that it received an error while trying to locate the file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6060) Define API for RecordUpdater and UpdateReader
[ https://issues.apache.org/jira/browse/HIVE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923044#comment-13923044 ] Sergey Shelukhin commented on HIVE-6060: is it possible to post rb? Define API for RecordUpdater and UpdateReader - Key: HIVE-6060 URL: https://issues.apache.org/jira/browse/HIVE-6060 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-6060.patch, acid-io.patch, h-5317.patch, h-5317.patch, h-5317.patch, h-6060.patch, h-6060.patch We need to define some new APIs for how Hive interacts with the file formats since it needs to be much richer than the current RecordReader and RecordWriter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6338) Improve exception handling in createDefaultDb() in Metastore
[ https://issues.apache.org/jira/browse/HIVE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6338: --- Resolution: Fixed Fix Version/s: (was: 0.13.0) 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Improve exception handling in createDefaultDb() in Metastore Key: HIVE-6338 URL: https://issues.apache.org/jira/browse/HIVE-6338 Project: Hive Issue Type: Task Components: Metastore Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-6338.1.patch, HIVE-6338.patch There is a suggestion on HIVE-5959 comment list on possible improvements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6569) HCatalog still has references to deprecated property hive.metastore.local
Sushanth Sowmyan created HIVE-6569: -- Summary: HCatalog still has references to deprecated property hive.metastore.local Key: HIVE-6569 URL: https://issues.apache.org/jira/browse/HIVE-6569 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Attachments: HIVE-6569.patch HIVE-2585 removed the conf parameter hive.metastore.local, but HCatalog still has references to it. Most of it is in tests, but one is in PigHCatUtil, which leads to HCatLoader/HCatStorer jobs giving warnings. We need to remove them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6569) HCatalog still has references to deprecated property hive.metastore.local
[ https://issues.apache.org/jira/browse/HIVE-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-6569: --- Attachment: HIVE-6569.patch Patch attached. HCatalog still has references to deprecated property hive.metastore.local - Key: HIVE-6569 URL: https://issues.apache.org/jira/browse/HIVE-6569 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Labels: cleanup, hcatalog Attachments: HIVE-6569.patch HIVE-2585 removed the conf parameter hive.metastore.local, but HCatalog still has references to it. Most of it is in tests, but one is in PigHCatUtil, which leads to HCatLoader/HCatStorer jobs giving warnings. We need to remove them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6569) HCatalog still has references to deprecated property hive.metastore.local
[ https://issues.apache.org/jira/browse/HIVE-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-6569: --- Status: Patch Available (was: Open) HCatalog still has references to deprecated property hive.metastore.local - Key: HIVE-6569 URL: https://issues.apache.org/jira/browse/HIVE-6569 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Labels: cleanup, hcatalog Attachments: HIVE-6569.patch HIVE-2585 removed the conf parameter hive.metastore.local, but HCatalog still has references to it. Most of it is in tests, but one is in PigHCatUtil, which leads to HCatLoader/HCatStorer jobs giving warnings. We need to remove them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6569) HCatalog still has references to deprecated property hive.metastore.local
[ https://issues.apache.org/jira/browse/HIVE-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923077#comment-13923077 ] Eugene Koifman commented on HIVE-6569: -- webhcat-default.xml has a ref to it as well. Should probably be removed as well HCatalog still has references to deprecated property hive.metastore.local - Key: HIVE-6569 URL: https://issues.apache.org/jira/browse/HIVE-6569 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Labels: cleanup, hcatalog Attachments: HIVE-6569.patch HIVE-2585 removed the conf parameter hive.metastore.local, but HCatalog still has references to it. Most of it is in tests, but one is in PigHCatUtil, which leads to HCatLoader/HCatStorer jobs giving warnings. We need to remove them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6570) Hive variable substitution does not work with the source command
Anthony Hsu created HIVE-6570: - Summary: Hive variable substitution does not work with the source command Key: HIVE-6570 URL: https://issues.apache.org/jira/browse/HIVE-6570 Project: Hive Issue Type: Bug Reporter: Anthony Hsu The following does not work: {code} source ${hivevar:test-dir}/test.q; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6570) Hive variable substitution does not work with the source command
[ https://issues.apache.org/jira/browse/HIVE-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923079#comment-13923079 ] Anthony Hsu commented on HIVE-6570: --- I have a fix for this issue and will upload a patch shortly. Hive variable substitution does not work with the source command -- Key: HIVE-6570 URL: https://issues.apache.org/jira/browse/HIVE-6570 Project: Hive Issue Type: Bug Reporter: Anthony Hsu The following does not work: {code} source ${hivevar:test-dir}/test.q; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6538) yet another annoying exception in test logs
[ https://issues.apache.org/jira/browse/HIVE-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923087#comment-13923087 ] Sergey Shelukhin commented on HIVE-6538: has long line, +1 otherwise yet another annoying exception in test logs --- Key: HIVE-6538 URL: https://issues.apache.org/jira/browse/HIVE-6538 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Szehon Ho Priority: Trivial Attachments: HIVE-6538.2.patch, HIVE-6538.2.patch, HIVE-6538.patch Whenever you look at failed q tests you have to go thru this useless exception. {noformat} 2014-03-03 11:22:54,872 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(143)) - MetaException(message:NoSuchObjectException(message:Function default.qtest_get_java_boolean does not exist)) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:4575) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_function(HiveMetaStore.java:4702) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at $Proxy8.get_function(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getFunction(HiveMetaStoreClient.java:1526) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) at $Proxy9.getFunction(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getFunction(Hive.java:2603) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfoFromMetastore(FunctionRegistry.java:546) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getQualifiedFunctionInfo(FunctionRegistry.java:578) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:599) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:606) at org.apache.hadoop.hive.ql.parse.FunctionSemanticAnalyzer.analyzeDropFunction(FunctionSemanticAnalyzer.java:94) at org.apache.hadoop.hive.ql.parse.FunctionSemanticAnalyzer.analyzeInternal(FunctionSemanticAnalyzer.java:60) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:345) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1078) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1121) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1014) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004) at org.apache.hadoop.hive.ql.QTestUtil.runCmd(QTestUtil.java:655) at org.apache.hadoop.hive.ql.QTestUtil.createSources(QTestUtil.java:772) at org.apache.hadoop.hive.cli.TestCliDriver.clinit(TestCliDriver.java:46) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.internal.runners.SuiteMethod.testFromSuiteMethod(SuiteMethod.java:34) at org.junit.internal.runners.SuiteMethod.init(SuiteMethod.java:23) at org.junit.internal.builders.SuiteMethodBuilder.runnerForClass(SuiteMethodBuilder.java:14) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57) at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:29) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57) at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:262) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at
[jira] [Updated] (HIVE-6569) HCatalog still has references to deprecated property hive.metastore.local
[ https://issues.apache.org/jira/browse/HIVE-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-6569: --- Attachment: HIVE-6569.2.patch HCatalog still has references to deprecated property hive.metastore.local - Key: HIVE-6569 URL: https://issues.apache.org/jira/browse/HIVE-6569 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Labels: cleanup, hcatalog Attachments: HIVE-6569.2.patch, HIVE-6569.patch HIVE-2585 removed the conf parameter hive.metastore.local, but HCatalog still has references to it. Most of it is in tests, but one is in PigHCatUtil, which leads to HCatLoader/HCatStorer jobs giving warnings. We need to remove them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6569) HCatalog still has references to deprecated property hive.metastore.local
[ https://issues.apache.org/jira/browse/HIVE-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923118#comment-13923118 ] Sushanth Sowmyan commented on HIVE-6569: Good catch - updating patch with a couple more instances I found in .xml files. HCatalog still has references to deprecated property hive.metastore.local - Key: HIVE-6569 URL: https://issues.apache.org/jira/browse/HIVE-6569 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Labels: cleanup, hcatalog Attachments: HIVE-6569.2.patch, HIVE-6569.patch HIVE-2585 removed the conf parameter hive.metastore.local, but HCatalog still has references to it. Most of it is in tests, but one is in PigHCatUtil, which leads to HCatLoader/HCatStorer jobs giving warnings. We need to remove them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6147) Support avro data stored in HBase columns
[ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923121#comment-13923121 ] Xuefu Zhang commented on HIVE-6147: --- [~swarnim] I'm not totally convinced that these tests are unrelated, as they consistently appeared in the test result. In addition, I manually ran TestHCatLoader, and got errors as the following: {code} testProjectionsBasic(org.apache.hive.hcatalog.pig.TestHCatLoader) Time elapsed: 0.184 sec ERROR! java.io.IOException: Failed to execute create table junit_unparted_complex(name string, studentid int, contact structphno:string,email:string, currently_registered_courses arraystring, current_grades mapstring,string, phnos arraystructphno:string,type:string) stored as RCFILE tblproperties('hcat.isd'='org.apache.hive.hcatalog.rcfile.RCFileInputDriver','hcat.osd'='org.apache.hive.hcatalog.rcfile.RCFileOutputDriver'). Driver returned 1 Error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.NullPointerException at org.apache.hive.hcatalog.pig.TestHCatLoader.executeStatementOnDriver(TestHCatLoader.java:125) at org.apache.hive.hcatalog.pig.TestHCatLoader.createTable(TestHCatLoader.java:111) at org.apache.hive.hcatalog.pig.TestHCatLoader.createTable(TestHCatLoader.java:101) at org.apache.hive.hcatalog.pig.TestHCatLoader.createTable(TestHCatLoader.java:115) at org.apache.hive.hcatalog.pig.TestHCatLoader.setup(TestHCatLoader.java:154) {code} Please further investigate. Support avro data stored in HBase columns - Key: HIVE-6147 URL: https://issues.apache.org/jira/browse/HIVE-6147 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.12.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt Presently, the HBase Hive integration supports querying only primitive data types in columns. It would be nice to be able to store and query Avro objects in HBase columns by making them visible as structs to Hive. This will allow Hive to perform ad hoc analysis of HBase data which can be deeply structured. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6571) query id should be available for logging during query compilation
Gunther Hagleitner created HIVE-6571: Summary: query id should be available for logging during query compilation Key: HIVE-6571 URL: https://issues.apache.org/jira/browse/HIVE-6571 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Would be nice to have the query id set during compilation to tie logs together etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6417) sql std auth - new users in admin role config should get added
[ https://issues.apache.org/jira/browse/HIVE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923144#comment-13923144 ] Hive QA commented on HIVE-6417: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12633196/HIVE-6417.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5359 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2 org.apache.hive.beeline.TestSchemaTool.testSchemaInit org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1640/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1640/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12633196 sql std auth - new users in admin role config should get added -- Key: HIVE-6417 URL: https://issues.apache.org/jira/browse/HIVE-6417 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Ashutosh Chauhan Attachments: HIVE-6417.1.patch, HIVE-6417.patch if metastore is started with hive.users.in.admin.role=user1, then user1 is added admin role to metastore. If the value is changed to hive.users.in.admin.role=user2, then user2 should get added to the role in metastore. Right now, if the admin role exists, new users don't get added. A work-around is - user1 adding user2 to the admin role using grant role statement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6571) query id should be available for logging during query compilation
[ https://issues.apache.org/jira/browse/HIVE-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923150#comment-13923150 ] Sergey Shelukhin commented on HIVE-6571: +1 query id should be available for logging during query compilation - Key: HIVE-6571 URL: https://issues.apache.org/jira/browse/HIVE-6571 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-6571.1.patch Would be nice to have the query id set during compilation to tie logs together etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6571) query id should be available for logging during query compilation
[ https://issues.apache.org/jira/browse/HIVE-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6571: - Attachment: HIVE-6571.1.patch query id should be available for logging during query compilation - Key: HIVE-6571 URL: https://issues.apache.org/jira/browse/HIVE-6571 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-6571.1.patch Would be nice to have the query id set during compilation to tie logs together etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6571) query id should be available for logging during query compilation
[ https://issues.apache.org/jira/browse/HIVE-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6571: - Status: Patch Available (was: Open) query id should be available for logging during query compilation - Key: HIVE-6571 URL: https://issues.apache.org/jira/browse/HIVE-6571 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-6571.1.patch Would be nice to have the query id set during compilation to tie logs together etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6571) query id should be available for logging during query compilation
[ https://issues.apache.org/jira/browse/HIVE-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923152#comment-13923152 ] Sergey Shelukhin commented on HIVE-6571: this queryId really ties the logs together... query id should be available for logging during query compilation - Key: HIVE-6571 URL: https://issues.apache.org/jira/browse/HIVE-6571 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-6571.1.patch Would be nice to have the query id set during compilation to tie logs together etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6555) TestSchemaTool is failing on trunk after branching
[ https://issues.apache.org/jira/browse/HIVE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6555: --- Status: Open (was: Patch Available) TestSchemaTool is failing on trunk after branching -- Key: HIVE-6555 URL: https://issues.apache.org/jira/browse/HIVE-6555 Project: Hive Issue Type: Bug Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6555-branch13.patch, HIVE-6555.1.patch, HIVE-6555.patch This is because version was bumped to 0.14 in pom file and there are no metastore scripts for 0.14 yet. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6555) TestSchemaTool is failing on trunk after branching
[ https://issues.apache.org/jira/browse/HIVE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6555: --- Status: Patch Available (was: Open) TestSchemaTool is failing on trunk after branching -- Key: HIVE-6555 URL: https://issues.apache.org/jira/browse/HIVE-6555 Project: Hive Issue Type: Bug Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6555-branch13.patch, HIVE-6555.1.patch, HIVE-6555.patch This is because version was bumped to 0.14 in pom file and there are no metastore scripts for 0.14 yet. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6555) TestSchemaTool is failing on trunk after branching
[ https://issues.apache.org/jira/browse/HIVE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6555: --- Attachment: HIVE-6555.1.patch Same patch. Reupload for Hive QA to pick up. TestSchemaTool is failing on trunk after branching -- Key: HIVE-6555 URL: https://issues.apache.org/jira/browse/HIVE-6555 Project: Hive Issue Type: Bug Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6555-branch13.patch, HIVE-6555.1.patch, HIVE-6555.patch This is because version was bumped to 0.14 in pom file and there are no metastore scripts for 0.14 yet. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6430: --- Attachment: HIVE-6430.patch Reattaching the patch, with some fixes in new code (not working yet). Looks like QA didn't pick it up MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6570) Hive variable substitution does not work with the source command
[ https://issues.apache.org/jira/browse/HIVE-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-6570: - Assignee: Anthony Hsu Hive variable substitution does not work with the source command -- Key: HIVE-6570 URL: https://issues.apache.org/jira/browse/HIVE-6570 Project: Hive Issue Type: Bug Reporter: Anthony Hsu Assignee: Anthony Hsu The following does not work: {code} source ${hivevar:test-dir}/test.q; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6572) Use shimmed version of hadoop conf names for mapred.{min,max}.split.size{.*}
Sushanth Sowmyan created HIVE-6572: -- Summary: Use shimmed version of hadoop conf names for mapred.{min,max}.split.size{.*} Key: HIVE-6572 URL: https://issues.apache.org/jira/browse/HIVE-6572 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan HadoopShims has a method to fetch config parameters by name so that they return the appropriate config param name for the appropriate hadoop version. We need to be consistent about using these versions. For eg:. mapred.min.split.size is deprecated with hadoop 2.x, and is instead called mapreduce.input.fileinputformat.split.minsize . Also, there is a bug in Hadoop20SShims and Hadoop20Shims that defines MAPREDMINSPLITSIZEPERNODE as mapred.min.split.size.per.rack and MAPREDMINSPLITSIZEPERRACK as mapred.min.split.size.per.node. This is wrong and confusing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6570) Hive variable substitution does not work with the source command
[ https://issues.apache.org/jira/browse/HIVE-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Hsu updated HIVE-6570: -- Attachment: HIVE-6570.1.patch.txt Added support for Hive variable substitution with the source command, and added a test for this in source.q. Hive variable substitution does not work with the source command -- Key: HIVE-6570 URL: https://issues.apache.org/jira/browse/HIVE-6570 Project: Hive Issue Type: Bug Reporter: Anthony Hsu Assignee: Anthony Hsu Attachments: HIVE-6570.1.patch.txt The following does not work: {code} source ${hivevar:test-dir}/test.q; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6060) Define API for RecordUpdater and UpdateReader
[ https://issues.apache.org/jira/browse/HIVE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923235#comment-13923235 ] Owen O'Malley commented on HIVE-6060: - I'm not sure why it didn't link, but here: https://reviews.apache.org/r/18810/diff/ Define API for RecordUpdater and UpdateReader - Key: HIVE-6060 URL: https://issues.apache.org/jira/browse/HIVE-6060 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-6060.patch, acid-io.patch, h-5317.patch, h-5317.patch, h-5317.patch, h-6060.patch, h-6060.patch We need to define some new APIs for how Hive interacts with the file formats since it needs to be much richer than the current RecordReader and RecordWriter. -- This message was sent by Atlassian JIRA (v6.2#6252)