[jira] [Commented] (HIVE-5315) bin/hive should retrieve HADOOP_VERSION by better way.
[ https://issues.apache.org/jira/browse/HIVE-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771625#comment-13771625 ] Hive QA commented on HIVE-5315: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12603915/HIVE-5315.patch {color:red}ERROR:{color} -1 due to 91 failed/errored test(s), 129 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables org.apache.hive.beeline.src.test.TestBeeLineWithArgs.testPositiveScriptFile org.apache.hive.jdbc.TestJdbcDriver2.testBadURL org.apache.hive.jdbc.TestJdbcDriver2.testBuiltInUDFCol org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes2 org.apache.hive.jdbc.TestJdbcDriver2.testDatabaseMetaData org.apache.hive.jdbc.TestJdbcDriver2.testDescribeTable org.apache.hive.jdbc.TestJdbcDriver2.testDriverProperties org.apache.hive.jdbc.TestJdbcDriver2.testDuplicateColumnNameOrder org.apache.hive.jdbc.TestJdbcDriver2.testErrorDiag org.apache.hive.jdbc.TestJdbcDriver2.testErrorMessages org.apache.hive.jdbc.TestJdbcDriver2.testExecutePreparedStatement org.apache.hive.jdbc.TestJdbcDriver2.testExecuteQueryException org.apache.hive.jdbc.TestJdbcDriver2.testExplainStmt org.apache.hive.jdbc.TestJdbcDriver2.testExprCol org.apache.hive.jdbc.TestJdbcDriver2.testImportedKeys
[jira] [Updated] (HIVE-5315) bin/hive should retrieve HADOOP_VERSION by better way.
[ https://issues.apache.org/jira/browse/HIVE-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated HIVE-5315: - Status: Open (was: Patch Available) bin/hive should retrieve HADOOP_VERSION by better way. -- Key: HIVE-5315 URL: https://issues.apache.org/jira/browse/HIVE-5315 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Kousuke Saruta Fix For: 0.11.1 Attachments: HIVE-5315.patch In current implementation, bin/hive retrieves HADOOP_VERSION like as follows {code} HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 1) {print $2;}}'); {code} But, sometimes, hadoop version doesn't show version information at the first line. If HADOOP_VERSION is not retrieve collectly, Hive or related processes will not be up. I faced this situation when I try to debug Hiveserver2 with debug option like as follows {code} -Xdebug -Xrunjdwp:trunsport=dt_socket,suspend=n,server=y,address=9876 {code} Then, hadoop version shows -Xdebug... at the first line. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5310) commit futuama_episodes
[ https://issues.apache.org/jira/browse/HIVE-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771628#comment-13771628 ] Hudson commented on HIVE-5310: -- SUCCESS: Integrated in Hive-trunk-h0.21 #2340 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2340/]) HIVE-5310 futurama-episodes (ecapriolo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524448) * /hive/trunk/data/files/futurama_episodes.avro commit futuama_episodes --- Key: HIVE-5310 URL: https://issues.apache.org/jira/browse/HIVE-5310 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Edward Capriolo Assignee: Edward Capriolo This is a small binary file that will be used for trevni. We can run the pre-commit build if this is committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5166) TestWebHCatE2e is failing intermittently on trunk
[ https://issues.apache.org/jira/browse/HIVE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771627#comment-13771627 ] Hudson commented on HIVE-5166: -- SUCCESS: Integrated in Hive-trunk-h0.21 #2340 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2340/]) HIVE-5166 : TestWebHCatE2e is failing intermittently on trunk (Eugene Koifman via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524441) * /hive/trunk/hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/TestWebHCatE2e.java TestWebHCatE2e is failing intermittently on trunk - Key: HIVE-5166 URL: https://issues.apache.org/jira/browse/HIVE-5166 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 0.12.0 Reporter: Ashutosh Chauhan Assignee: Eugene Koifman Fix For: 0.13.0 Attachments: HIVE-5166.patch I observed these while running full test suite last couple of times. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5318) Import Throws Error when Importing from a table export Hive 0.9 to Hive 0.10
Brad Ruderman created HIVE-5318: --- Summary: Import Throws Error when Importing from a table export Hive 0.9 to Hive 0.10 Key: HIVE-5318 URL: https://issues.apache.org/jira/browse/HIVE-5318 Project: Hive Issue Type: Bug Components: Import/Export Affects Versions: 0.10.0, 0.9.0 Reporter: Brad Ruderman Priority: Critical When Exporting hive tables using the hive command in Hive 0.9 EXPORT table TO 'hdfs_path' then importing to another hive 0.10 instance using IMPORT FROM 'hdfs_path', hive throws this error: 13/09/18 13:14:02 ERROR ql.Driver: FAILED: SemanticException Exception while processing org.apache.hadoop.hive.ql.parse.SemanticException: Exception while processing at org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:277) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:349) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:938) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: java.lang.NullPointerException at java.util.ArrayList.init(ArrayList.java:131) at org.apache.hadoop.hive.ql.plan.CreateTableDesc.init(CreateTableDesc.java:128) at org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:99) ... 16 more 13/09/18 13:14:02 INFO ql.Driver: /PERFLOG method=compile start=1379535241411 end=1379535242332 duration=921 13/09/18 13:14:02 INFO ql.Driver: PERFLOG method=releaseLocks 13/09/18 13:14:02 INFO ql.Driver: /PERFLOG method=releaseLocks start=1379535242332 end=1379535242332 duration=0 13/09/18 13:14:02 INFO ql.Driver: PERFLOG method=releaseLocks 13/09/18 13:14:02 INFO ql.Driver: /PERFLOG method=releaseLocks start=1379535242333 end=1379535242333 duration=0 This is probably a critical blocker for people who are trying to test Hive 0.10 in their staging environments prior to the upgrade from 0.9 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: How long will we support Hadoop 0.20.2?
How is 0.20.2 more easy for setup/development than the stable 1.x line ? On Wed, Sep 18, 2013 at 7:18 PM, Xuefu Zhang xzh...@cloudera.com wrote: Even not for production, I think 0.20.2 is very useful for development as well. it's simple and easy to set up, avoiding a lot of hassles that we don't have to deal with during development. Thus, I think it makes sense to keep supporting it, especially when there isn't much cost involved. --Xuefu On Wed, Sep 18, 2013 at 7:06 PM, Ashish Thusoo athu...@qubole.com wrote: +1 on what Ed said. I think 0.20.2 is still very real. Would be a bummer if we do not support it as a lot of companies are still on that version. Ashish Ashish Thusoo http://www.linkedin.com/pub/ashish-thusoo/0/5a8/50 CEO and Co-founder, Qubole http://www.qubole.com - a cloud based service that makes big data easy for analysts and data engineers On Wed, Sep 18, 2013 at 5:57 PM, Edward Capriolo edlinuxg...@gmail.com wrote: BTW: I am very likely to install hive 0.12 on hadoop 0.20.2 clusters. I have been running hive since version 0.2. I have been running hadoop since version 0.17.2. After 0.17.2 I moved to 0.20.2. Since then the hadoop has seemingly has 10's of releases. 0.21, 0.21.append (Dead on arrival) . cloudera this, cloudera that, yahoo hadoop distribution (dead on arrival), 0.20.2.203 0.20.2.205, 1? 2.0? 2.1. None of them really have much shelf life or a very clear upgrade path. The only thing that has remained constant for our environment is hive and hadoop 0.20.2. I have been happily just upgrading hive on these clusters for years now. So in a nutshell, I'm a long time committer, and I actively support and develop hive on hadoop 0.20.2 clusters, I do not see supporting the shims as complicated or difficult. On Wed, Sep 18, 2013 at 7:02 PM, Owen O'Malley omal...@apache.org wrote: On Wed, Sep 18, 2013 at 1:54 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I am not fine with dropping it. I still run it in several places. The question is not whether you run Hadoop 0.20.2, but whether you are likely to install Hive 0.12 on those very old clusters. Believe it or now many people still run 0.20.2. I believe (correct me If I am wrong) facebook is still running a heavily patch 0.20.2. It is more accurate to say that Facebook is running a fork of Hadoop where the last common point was Hadoop 0.20.1. I haven't heard anyone (other than you in this thread) say they are running 0.20.2 in years. I could see dropping 0.20.2 if it was a huge burden but I do not see it that way, it work's it is reliable, and it is a known quantity. It is a large burden in that we have relatively complicated shims and a lack of testing. Unless you are signing up to test every release on 0.20.2 we don't have anyone doing the relevant testing. -- Owen -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc
[ https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771652#comment-13771652 ] Ashutosh Chauhan commented on HIVE-4113: Thanks, [~yhuai] for taking this one up. Its a known existing problem that predicate pushdown doesn't happen for HCatalog today. I will say that if it is getting burdensome, we can tackle that in a separate jira. I am fine with removing flag for column pruning. Its been around for a long time ( HIVE-279 ) and I haven't come across a case where user has run into problem with it. I didn't get your comment about READ_ALL_COLUMNS_DEFAULT. If we set it to true, will that imply that this optimization will be off by default, that seems like a bad choice. In HCatInputFormat, we can probably set the config such that it always select all columns for now. That way Hive will still get the benefit of optimization and hcatalog will continue with what it is doing today. Optimize select count(1) with RCFile and Orc Key: HIVE-4113 URL: https://issues.apache.org/jira/browse/HIVE-4113 Project: Hive Issue Type: Bug Components: File Formats Reporter: Gopal V Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, HIVE-4113.patch, HIVE-4113.patch select count(1) loads up every column every row when used with RCFile. select count(1) from store_sales_10_rc gives {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 31.73 sec HDFS Read: 234914410 HDFS Write: 8 SUCCESS {code} Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far less {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 29.75 sec HDFS Read: 28145994 HDFS Write: 8 SUCCESS {code} Which is 11% of the data size read by the COUNT(1). This was tracked down to the following code in RCFile.java {code} } else { // TODO: if no column name is specified e.g, in select count(1) from tt; // skip all columns, this should be distinguished from the case: // select * from tt; for (int i = 0; i skippedColIDs.length; i++) { skippedColIDs[i] = false; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5209) JDBC support for varchar
[ https://issues.apache.org/jira/browse/HIVE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771659#comment-13771659 ] Phabricator commented on HIVE-5209: --- thejas has commented on the revision HIVE-5209 [jira] JDBC support for varchar. INLINE COMMENTS jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java:32 TableSchema was used prior to this patch, but since the classes it uses have changed, it is possible that there is a new dependency. But if this patch isn't changing the situation, we can fix that separately. I think the best next step would be to test what are the hive client dependencies with and without the patch. In a separate jira, I think we should start looking at creating a service-core module that jdbc classes use, instead of having the whole service module which includes the server pieces as a dependency. REVISION DETAIL https://reviews.facebook.net/D12999 To: JIRA, jdere Cc: cwsteinbach, thejas JDBC support for varchar Key: HIVE-5209 URL: https://issues.apache.org/jira/browse/HIVE-5209 Project: Hive Issue Type: Improvement Components: HiveServer2, JDBC, Types Reporter: Jason Dere Assignee: Jason Dere Attachments: D12999.1.patch, HIVE-5209.1.patch, HIVE-5209.2.patch, HIVE-5209.4.patch, HIVE-5209.D12705.1.patch Support returning varchar length in result set metadata -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3764) Support metastore version consistency check
[ https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-3764: -- Attachment: HIVE-3764-12.2.patch Rebased patch for 0.12 Support metastore version consistency check --- Key: HIVE-3764 URL: https://issues.apache.org/jira/browse/HIVE-3764 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-3764-12.2.patch, HIVE-3764.1.patch, HIVE-3764.2.patch Today there's no version/compatibility information stored in hive metastore. Also the datanucleus configuration property to automatically create missing tables is enabled by default. If you happen to start an older or newer hive or don't run the correct upgrade scripts during migration, the metastore would end up corrupted. The autoCreate schema is not always sufficient to upgrade metastore when migrating to newer release. It's not supported with all databases. Besides the migration often involves altering existing table, changing or moving data etc. Hence it's very useful to have some consistency check to make sure that hive is using correct metastore and for production systems the schema is not automatically by running hive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3764) Support metastore version consistency check
[ https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-3764: -- Attachment: HIVE-3764-trunk.2.patch Rebased patch for trunk Support metastore version consistency check --- Key: HIVE-3764 URL: https://issues.apache.org/jira/browse/HIVE-3764 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-3764-12.2.patch, HIVE-3764.1.patch, HIVE-3764.2.patch, HIVE-3764-trunk.2.patch Today there's no version/compatibility information stored in hive metastore. Also the datanucleus configuration property to automatically create missing tables is enabled by default. If you happen to start an older or newer hive or don't run the correct upgrade scripts during migration, the metastore would end up corrupted. The autoCreate schema is not always sufficient to upgrade metastore when migrating to newer release. It's not supported with all databases. Besides the migration often involves altering existing table, changing or moving data etc. Hence it's very useful to have some consistency check to make sure that hive is using correct metastore and for production systems the schema is not automatically by running hive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5301) Add a schema tool for offline metastore schema upgrade
[ https://issues.apache.org/jira/browse/HIVE-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771664#comment-13771664 ] Prasad Mujumdar commented on HIVE-5301: --- [~ashutoshc] Agreed. I have tested extensively with Derby and MySQL with 0.10. Did test the upgrade of empty schema (generated using this tool) as well. But the 0.7 to 0.8 upgrade is more complex due to data move which is not tested by that. I will setup 0.7 with test data and verify the upgrade options. will attach the output of the tests. Add a schema tool for offline metastore schema upgrade -- Key: HIVE-5301 URL: https://issues.apache.org/jira/browse/HIVE-5301 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-5301.1.patch, HIVE-5301-with-HIVE-3764.0.patch HIVE-3764 is addressing metastore version consistency. Besides it would be helpful to add a tool that can leverage this version information to figure out the required set of upgrade scripts, and execute those against the configured metastore. Now that Hive includes Beeline client, it can be used to execute the scripts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5198) WebHCat returns exitcode 143 (w/o an explanation)
[ https://issues.apache.org/jira/browse/HIVE-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771672#comment-13771672 ] Hudson commented on HIVE-5198: -- FAILURE: Integrated in Hive-trunk-hadoop2 #440 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/440/]) HIVE-5198: WebHCat returns exitcode 143 (w/o an explanation) (Eugene Koifman via Thejas Nair) (thejas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524617) * /hive/trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/ExecServiceImpl.java WebHCat returns exitcode 143 (w/o an explanation) - Key: HIVE-5198 URL: https://issues.apache.org/jira/browse/HIVE-5198 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.11.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-5198.patch The message might look like this: {statement:use default; show table extended like xyz;,error:unable to show table: xyz,exec:{stdout:,stderr:,exitcode:143}} WebHCat has a templeton.exec.timeout property which kills an HCat request (i.e. something like a DDL statement that gets routed to HCat CLI) if it takes longer than this timeout. Since WebHCat does a fork/exec to 'hcat' script, the timeout is implemented as SIGTERM sent to the subprocess. SIGTERM value is 15. So it's reported as 128 + 15 = 143. Error logging/reporting should be improved in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771678#comment-13771678 ] Thejas M Nair commented on HIVE-4487: - I am seeing several precommit intermittent test failures in last few builds, which seem to be caused by permission errors. I am wondering if it might be related to this change. I also saw this on my linux machine, but not in another run on my mac. The tests have errors like this - Copying data from file:/home/hiveptest/ip-10-74-50-170-hiveptest-2/apache-svn-trunk-source/data/files/kv1.txt Failed with exception Failed to set permissions of path: /home/hiveptest/ip-10-74-50-170-hiveptest-2/apache-svn-trunk-source/build/ql/scratchdir/hive_2013-09-18_19-22-30_852_73877859563099-1/-ext-1 to 0777 For example in - https://builds.apache.org/job/PreCommit-HIVE-Build/813/testReport/org.apache.hadoop.hive.ql.parse/TestParseNegative/testParseNegative_ambiguous_join_col/ Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3764) Support metastore version consistency check
[ https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-3764: -- Attachment: (was: HIVE-3764-trunk.2.patch) Support metastore version consistency check --- Key: HIVE-3764 URL: https://issues.apache.org/jira/browse/HIVE-3764 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-3764.1.patch, HIVE-3764.2.patch Today there's no version/compatibility information stored in hive metastore. Also the datanucleus configuration property to automatically create missing tables is enabled by default. If you happen to start an older or newer hive or don't run the correct upgrade scripts during migration, the metastore would end up corrupted. The autoCreate schema is not always sufficient to upgrade metastore when migrating to newer release. It's not supported with all databases. Besides the migration often involves altering existing table, changing or moving data etc. Hence it's very useful to have some consistency check to make sure that hive is using correct metastore and for production systems the schema is not automatically by running hive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3764) Support metastore version consistency check
[ https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-3764: -- Attachment: (was: HIVE-3764-12.2.patch) Support metastore version consistency check --- Key: HIVE-3764 URL: https://issues.apache.org/jira/browse/HIVE-3764 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-3764.1.patch, HIVE-3764.2.patch, HIVE-3764-trunk.2.patch Today there's no version/compatibility information stored in hive metastore. Also the datanucleus configuration property to automatically create missing tables is enabled by default. If you happen to start an older or newer hive or don't run the correct upgrade scripts during migration, the metastore would end up corrupted. The autoCreate schema is not always sufficient to upgrade metastore when migrating to newer release. It's not supported with all databases. Besides the migration often involves altering existing table, changing or moving data etc. Hence it's very useful to have some consistency check to make sure that hive is using correct metastore and for production systems the schema is not automatically by running hive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3764) Support metastore version consistency check
[ https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-3764: -- Attachment: HIVE-3764-12.3.patch Support metastore version consistency check --- Key: HIVE-3764 URL: https://issues.apache.org/jira/browse/HIVE-3764 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-3764-12.3.patch, HIVE-3764.1.patch, HIVE-3764.2.patch, HIVE-3764-trunk.3.patch Today there's no version/compatibility information stored in hive metastore. Also the datanucleus configuration property to automatically create missing tables is enabled by default. If you happen to start an older or newer hive or don't run the correct upgrade scripts during migration, the metastore would end up corrupted. The autoCreate schema is not always sufficient to upgrade metastore when migrating to newer release. It's not supported with all databases. Besides the migration often involves altering existing table, changing or moving data etc. Hence it's very useful to have some consistency check to make sure that hive is using correct metastore and for production systems the schema is not automatically by running hive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3764) Support metastore version consistency check
[ https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-3764: -- Attachment: HIVE-3764-trunk.3.patch Support metastore version consistency check --- Key: HIVE-3764 URL: https://issues.apache.org/jira/browse/HIVE-3764 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-3764-12.3.patch, HIVE-3764.1.patch, HIVE-3764.2.patch, HIVE-3764-trunk.3.patch Today there's no version/compatibility information stored in hive metastore. Also the datanucleus configuration property to automatically create missing tables is enabled by default. If you happen to start an older or newer hive or don't run the correct upgrade scripts during migration, the metastore would end up corrupted. The autoCreate schema is not always sufficient to upgrade metastore when migrating to newer release. It's not supported with all databases. Besides the migration often involves altering existing table, changing or moving data etc. Hence it's very useful to have some consistency check to make sure that hive is using correct metastore and for production systems the schema is not automatically by running hive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5311) TestHCatPartitionPublish can fail randomly
[ https://issues.apache.org/jira/browse/HIVE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771712#comment-13771712 ] Hudson commented on HIVE-5311: -- FAILURE: Integrated in Hive-trunk-h0.21 #2341 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2341/]) HIVE-5311 : TestHCatPartitionPublish can fail randomly (Brock Noland via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524515) * /hive/trunk/hcatalog/core/src/test/java/org/apache/hcatalog/mapreduce/TestHCatPartitionPublish.java * /hive/trunk/hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitionPublish.java TestHCatPartitionPublish can fail randomly -- Key: HIVE-5311 URL: https://issues.apache.org/jira/browse/HIVE-5311 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5311.patch {noformat} org.apache.thrift.TApplicationException: create_table_with_environment_context failed: out of sequence response at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:793) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:779) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:482) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:471) at org.apache.hcatalog.mapreduce.TestHCatPartitionPublish.createTable(TestHCatPartitionPublish.java:241) at org.apache.hcatalog.mapreduce.TestHCatPartitionPublish.testPartitionPublish(TestHCatPartitionPublish.java:133) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771713#comment-13771713 ] Hudson commented on HIVE-4487: -- FAILURE: Integrated in Hive-trunk-h0.21 #2341 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2341/]) HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524578) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java HIVE-4487 - Hive does not set explicit permissions on hive.exec.scratchdir (Chaoyu Tang via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524509) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
[ https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771711#comment-13771711 ] Hudson commented on HIVE-5313: -- FAILURE: Integrated in Hive-trunk-h0.21 #2341 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2341/]) HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524578) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string) - Key: HIVE-5313 URL: https://issues.apache.org/jira/browse/HIVE-5313 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.12.0 Attachments: HIVE-5313.patch As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have to shim it out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5198) WebHCat returns exitcode 143 (w/o an explanation)
[ https://issues.apache.org/jira/browse/HIVE-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771710#comment-13771710 ] Hudson commented on HIVE-5198: -- FAILURE: Integrated in Hive-trunk-h0.21 #2341 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2341/]) HIVE-5198: WebHCat returns exitcode 143 (w/o an explanation) (Eugene Koifman via Thejas Nair) (thejas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524617) * /hive/trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/ExecServiceImpl.java WebHCat returns exitcode 143 (w/o an explanation) - Key: HIVE-5198 URL: https://issues.apache.org/jira/browse/HIVE-5198 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.11.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-5198.patch The message might look like this: {statement:use default; show table extended like xyz;,error:unable to show table: xyz,exec:{stdout:,stderr:,exitcode:143}} WebHCat has a templeton.exec.timeout property which kills an HCat request (i.e. something like a DDL statement that gets routed to HCat CLI) if it takes longer than this timeout. Since WebHCat does a fork/exec to 'hcat' script, the timeout is implemented as SIGTERM sent to the subprocess. SIGTERM value is 15. So it's reported as 128 + 15 = 143. Error logging/reporting should be improved in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5032) Enable hive creating external table at the root directory of DFS
[ https://issues.apache.org/jira/browse/HIVE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771763#comment-13771763 ] Hive QA commented on HIVE-5032: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12603946/HIVE-5032.2.patch {color:red}ERROR:{color} -1 due to 174 failed/errored test(s), 1241 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_fs2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5 org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan2 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2
[jira] [Updated] (HIVE-5319) Executing SELECT on an AVRO table fails after executing ALTER to modify type of an existing column
[ https://issues.apache.org/jira/browse/HIVE-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Tomar updated HIVE-5319: - Labels: avro (was: ) Executing SELECT on an AVRO table fails after executing ALTER to modify type of an existing column -- Key: HIVE-5319 URL: https://issues.apache.org/jira/browse/HIVE-5319 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Environment: Linux Ubuntu Reporter: Neha Tomar Labels: avro 1 Created a table in Hive with AVRO data. CREATE EXTERNAL TABLE tweets (username string, tweet string, timestamp bigint) COMMENT 'A table backed by Avro data with the Avro schema stored in HDFS' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/home/neha/test_data/avro_create_data' TBLPROPERTIES ('avro.schema.literal'='{namespace:com.miguno.avro,name:Tweet,type:record,fields:[ {name : username,type : string,doc : Name of the user account on Twitter.com},{name : tweet,type:string,doc : The content of the Twitter message}, {name : timestamp, type : long, doc : Unix epoch time in seconds}]}'); 2 Altered type of a column (to a compatible type) using ALTER TABLE. In this example, altered type for column timestamp from long to int. ALTER TABLE tweets SET TBLPROPERTIES ('avro.schema.literal'='{namespace:com.miguno.avro,name:Tweet,type:record,fields:[ {name : username,type : string,doc : Name of the user account on Twitter.com},{name : tweet,type:string,doc : The content of the Twitter message}, {name : timestamp, type : int, doc : Unix epoch time in seconds}]}'); 3 Now, a select query on this table fails with following error. hive select * from tweets; OK Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting int Time taken: 4.514 seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5319) Executing SELECT on an AVRO table fails after executing ALTER to modify type of an existing column
Neha Tomar created HIVE-5319: Summary: Executing SELECT on an AVRO table fails after executing ALTER to modify type of an existing column Key: HIVE-5319 URL: https://issues.apache.org/jira/browse/HIVE-5319 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Environment: Linux Ubuntu Reporter: Neha Tomar 1 Created a table in Hive with AVRO data. CREATE EXTERNAL TABLE tweets (username string, tweet string, timestamp bigint) COMMENT 'A table backed by Avro data with the Avro schema stored in HDFS' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/home/neha/test_data/avro_create_data' TBLPROPERTIES ('avro.schema.literal'='{namespace:com.miguno.avro,name:Tweet,type:record,fields:[ {name : username,type : string,doc : Name of the user account on Twitter.com},{name : tweet,type:string,doc : The content of the Twitter message}, {name : timestamp, type : long, doc : Unix epoch time in seconds}]}'); 2 Altered type of a column (to a compatible type) using ALTER TABLE. In this example, altered type for column timestamp from long to int. ALTER TABLE tweets SET TBLPROPERTIES ('avro.schema.literal'='{namespace:com.miguno.avro,name:Tweet,type:record,fields:[ {name : username,type : string,doc : Name of the user account on Twitter.com},{name : tweet,type:string,doc : The content of the Twitter message}, {name : timestamp, type : int, doc : Unix epoch time in seconds}]}'); 3 Now, a select query on this table fails with following error. hive select * from tweets; OK Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting int Time taken: 4.514 seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5319) Executing SELECT on an AVRO table fails after executing ALTER to modify type of an existing column
[ https://issues.apache.org/jira/browse/HIVE-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771780#comment-13771780 ] Neha Tomar commented on HIVE-5319: -- Pasting the exception trace below. Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting int 13/09/19 16:03:46 ERROR CliDriver: Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting int java.io.IOException: org.apache.avro.AvroTypeException: Found long, expecting int at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.avro.AvroTypeException: Found long, expecting int at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:231) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:82) at org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:341) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:146) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:140) at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:49) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:514) ... 13 more Executing SELECT on an AVRO table fails after executing ALTER to modify type of an existing column -- Key: HIVE-5319 URL: https://issues.apache.org/jira/browse/HIVE-5319 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Environment: Linux Ubuntu Reporter: Neha Tomar Labels: avro 1 Created a table in Hive with AVRO data. CREATE EXTERNAL TABLE tweets (username string, tweet string, timestamp bigint) COMMENT 'A table backed by Avro data with the Avro schema stored in HDFS' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/home/neha/test_data/avro_create_data' TBLPROPERTIES ('avro.schema.literal'='{namespace:com.miguno.avro,name:Tweet,type:record,fields:[ {name : username,type : string,doc : Name of the user account on Twitter.com},{name : tweet,type:string,doc : The content of the Twitter message}, {name : timestamp, type : long, doc : Unix epoch time in seconds}]}'); 2 Altered type of a column (to a compatible type) using ALTER TABLE. In this example, altered type for column timestamp from long to int. ALTER TABLE tweets SET TBLPROPERTIES ('avro.schema.literal'='{namespace:com.miguno.avro,name:Tweet,type:record,fields:[ {name : username,type : string,doc : Name of the user account on Twitter.com},{name : tweet,type:string,doc : The content of the Twitter message}, {name : timestamp, type : int, doc : Unix epoch time in seconds}]}'); 3 Now, a select query on this table fails with following error. hive select * from tweets; OK Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting int
[jira] [Commented] (HIVE-5309) Update hive-default.xml.template for vectorization flag; remove unused imports from MetaStoreUtils.java
[ https://issues.apache.org/jira/browse/HIVE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771809#comment-13771809 ] Hive QA commented on HIVE-5309: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12603954/HIVE-5309.1-vectorization.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 3955 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_plan_json org.apache.hcatalog.listener.TestMsgBusConnection.testConnection org.apache.hive.hcatalog.listener.TestMsgBusConnection.testConnection org.apache.hive.hcatalog.mapreduce.TestHCatExternalPartitioned.testHCatPartitionedTable {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/820/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/820/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. Update hive-default.xml.template for vectorization flag; remove unused imports from MetaStoreUtils.java --- Key: HIVE-5309 URL: https://issues.apache.org/jira/browse/HIVE-5309 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5309.1-vectorization.patch, HIVE-5309.1.vectorization.patch This jira provides fixes for some of the review comments on HIVE-5283. 1) Update hive-default.xml.template for vectorization flag. 2) remove unused imports from MetaStoreUtils. 3) Add a test to run vectorization with non-orc format. The test must still pass because vectorization optimization should fall back to non-vector mode. 4) Hardcode the table name in QTestUtil.java. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Operators and || do not work
Hello, Though the documentation https://cwiki.apache.org/Hive/languagemanual-udf.html says they are same as AND and OR, they do not even get parsed. User gets parsing when they are used. Was that intentional or is it a regression? hive select key from src where key=a || key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '|' 'key' '=' in expression specification hive select key from src where key=a key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '' 'key' '=' in expression specification Thanks Amareshwari
[jira] [Commented] (HIVE-5168) Extend Hive for spatial query support
[ https://issues.apache.org/jira/browse/HIVE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771817#comment-13771817 ] Fusheng Wang commented on HIVE-5168: A draft design document has been uploaded: https://cwiki.apache.org/confluence/display/Hive/Spatial+queries Extend Hive for spatial query support - Key: HIVE-5168 URL: https://issues.apache.org/jira/browse/HIVE-5168 Project: Hive Issue Type: New Feature Reporter: Fusheng Wang Labels: Hadoop-GIS, Spatial, I would like to propose to incorporate a newly developed spatial querying component into Hive. We have recently developed a high performance MapReduce based spatial querying system Hadoop-GIS, to support large scale spatial queries and analytics. Hadoop-GIS is a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through space partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects on MapReduce. Hadoop-GIS takes advantage of global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. We have an alpha release. We look forward to contributors in Hive community to contribute to the system. github: https://github.com/hadoop-gis Hadoop-GIS wiki: https://web.cci.emory.edu/confluence/display/HadoopGIS References: 1. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, Joel Saltz: Hadoop-GIS: A High Performance Spatial Data Warehousing System Over MapReduce. In Proceedings of the 39th International Conference on Very Large Databases (VLDB'2013), Trento, Italy, August 26-30, 2013. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p726-aji.pdf 2. Ablimit Aji, Fusheng Wang and Joel Saltz: Towards Building a High Performance Spatial Query System for Large Scale Medical Imaging Data. In Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), Redondo Beach, California, USA, November 6-9, 2012. http://confluence.cci.emory.edu:8090/download/attachments/6193390/SIGSpatial2012TechReport.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4113) Optimize select count(1) with RCFile and Orc
[ https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4113: --- Status: Open (was: Patch Available) Optimize select count(1) with RCFile and Orc Key: HIVE-4113 URL: https://issues.apache.org/jira/browse/HIVE-4113 Project: Hive Issue Type: Bug Components: File Formats Reporter: Gopal V Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, HIVE-4113.patch, HIVE-4113.patch select count(1) loads up every column every row when used with RCFile. select count(1) from store_sales_10_rc gives {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 31.73 sec HDFS Read: 234914410 HDFS Write: 8 SUCCESS {code} Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far less {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 29.75 sec HDFS Read: 28145994 HDFS Write: 8 SUCCESS {code} Which is 11% of the data size read by the COUNT(1). This was tracked down to the following code in RCFile.java {code} } else { // TODO: if no column name is specified e.g, in select count(1) from tt; // skip all columns, this should be distinguished from the case: // select * from tt; for (int i = 0; i skippedColIDs.length; i++) { skippedColIDs[i] = false; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc
[ https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771847#comment-13771847 ] Yin Huai commented on HIVE-4113: READ_ALL_COLUMNS and READ_ALL_COLUMNS_DEFAULT are mainly created for HCat, because I think it is a kind of burden to users if they have to be aware ColumnProjectionUtils and use it every time. So, through HCat, if users do not use ColumnProjectionUtils to set needed columns, we will read all columns. If we set READ_ALL_COLUMNS_DEFAULT=false, no column will be read if a user does not use ColumnProjectionUtils. In Hive, if we get rid off the flag of column pruning, the list of neededColumnIDs in TS will not be null. Thus, in Hive, we will always set READ_ALL_COLUMNS to false (the .2 patch has an issue on it... I will fix it later). In summary, in Hive, we use neededColumnIDs in TS as the only way to tell a underlying recordreader what to read. If neededColumnIDs is an empty list, we will know no needed column. Otherwise, we will read columns specified in neededColumnIDs (if we have select * in a sub-query, neededColumnIDs should be populated to include all columns). In HCat, if a user wants to use the MapReduce interface, he or she has two ways to tell what columns are needed. 1) This user does nothing. In this case, we will read all columns. 2) This user uses utility functions in ColumnProjectionUtils (e.g. setReadColumnIDs) to specify needed columns. In this case, READ_ALL_COLUMNS will be set to false and we only read columns specified in READ_COLUMN_IDS_CONF_STR. I hope what I am proposing makes sense. I am welcome to any suggestion :) Optimize select count(1) with RCFile and Orc Key: HIVE-4113 URL: https://issues.apache.org/jira/browse/HIVE-4113 Project: Hive Issue Type: Bug Components: File Formats Reporter: Gopal V Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, HIVE-4113.patch, HIVE-4113.patch select count(1) loads up every column every row when used with RCFile. select count(1) from store_sales_10_rc gives {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 31.73 sec HDFS Read: 234914410 HDFS Write: 8 SUCCESS {code} Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far less {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 29.75 sec HDFS Read: 28145994 HDFS Write: 8 SUCCESS {code} Which is 11% of the data size read by the COUNT(1). This was tracked down to the following code in RCFile.java {code} } else { // TODO: if no column name is specified e.g, in select count(1) from tt; // skip all columns, this should be distinguished from the case: // select * from tt; for (int i = 0; i skippedColIDs.length; i++) { skippedColIDs[i] = false; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5306) Use new GenericUDF instead of basic UDF for UDFAbs class
[ https://issues.apache.org/jira/browse/HIVE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771887#comment-13771887 ] Hive QA commented on HIVE-5306: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12603955/HIVE-5306.4.patch {color:red}ERROR:{color} -1 due to 174 failed/errored test(s), 1245 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_fs2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5 org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan2 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2
[jira] [Created] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors
Chaoyu Tang created HIVE-5320: - Summary: Querying a table with nested struct type over JSON data results in errors Key: HIVE-5320 URL: https://issues.apache.org/jira/browse/HIVE-5320 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Reporter: Chaoyu Tang Querying a table with nested_struct datatype like == create table nest_struct_tbl (col1 string, col2 arraystructa1:string, a2:arraystructb1:int, b2:string, b3:string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; == over JSON data cause errors including java.lang.IndexOutOfBoundsException or corrupted data. The JsonSerDe used is json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar. The cause is that the method: public ListObject getStructFieldsDataAsList(Object o) in JsonStructObjectInspector.java returns a list referencing to a static arraylist values So the local variable 'list' in method serialize of Hive LazySimpleSerDe class is returned with same reference in its recursive calls and its element values are kept on being overwritten in the case STRUCT. Solutions: 1. Fix in JsonSerDe, and change the field 'values' in java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java to instance scope. Filed a ticket to JSonSerDe (https://github.com/rcongiu/Hive-JSON-Serde/issues/31) 2. Ideally, in the method serialize of class LazySimpleSerDe, we should defensively save a copy of a list resulted from list = soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of JsonStructObjectInspector, so that the recursive calls of serialize can work properly regardless of the extended SerDe implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771894#comment-13771894 ] Alan Gates commented on HIVE-5317: -- The only requirement is that the file format must be able to support a rowid. With things like text and sequence file this can be done via a byte offset. I'm not seeing why this falls apart in the file based authorization. Are you worried that different users will own the base and delta files? It's no different than the current case where different users may own different partitions. We will need to make sure the compactions can still happen in this case, that is that the compaction can be run as the user who owns the table, not as Hive. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4223) LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table
[ https://issues.apache.org/jira/browse/HIVE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771904#comment-13771904 ] Chaoyu Tang commented on HIVE-4223: --- I was able to reproduce the similar issue but with JsonSerDe 1.1.4 (json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar). See Hive-5320 for details LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table Key: HIVE-4223 URL: https://issues.apache.org/jira/browse/HIVE-4223 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Environment: Hive 0.9.0 Reporter: Yong Zhang Attachments: nest_struct.data The LazySimpleSerDe will throw IndexOutOfBoundsException if the column structure is struct containing array of struct. I have a table with one column defined like this: columnA array struct col1:primiType, col2:primiType, col3:primiType, col4:primiType, col5:primiType, col6:primiType, col7:primiType, col8:array struct col1:primiType, col2::primiType, col3::primiType, col4:primiType, col5:primiType, col6:primiType, col7:primiType, col8:primiType, col9:primiType In this example, the outside struct has 8 columns (including the array), and the inner struct has 9 columns. As long as the outside struct has LESS column count than the inner struct column count, I think we will get the following exception as stracktrace in LazeSimpleSerDe when it tries to serialize a row: Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 8 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:443) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:381) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531) ... 9 more I am not very sure about exactly the reason of this problem. I believe that the public static void serialize(ByteStream.Output out, Object obj,ObjectInspector objInspector, byte[] separators, int level, Text nullSequence, boolean escaped, byte escapeChar, boolean[] needsEscape) is recursively invoking itself when facing nest structure. But for the nested struct structure, the list reference will mass up, and the size() will return wrong data. In the above example case I faced, for these 2 lines: List? extends StructField fields = soi.getAllStructFieldRefs(); list = soi.getStructFieldsDataAsList(obj); my StructObjectInspector(soi) will return the CORRECT data for getAllStructFieldRefs() and getStructFieldsDataAsList() methods. For example, for one row, for the outsider 8 columns struct, I have 2 elements in the inner array of struct, and each element will have 9 columns (as there are 9 columns in the inner struct). During runtime, after I added more logging in the LazySimpleSerDe, I will see the following behavior in the logging: for 8 outside column, loop for 9 inside columns, loop for serialize for 9 inside columns, loop for serialize code broken here, for the outside loop, it will try to access the 9th element,which not exist
[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc
[ https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771905#comment-13771905 ] Ashutosh Chauhan commented on HIVE-4113: Sounds good to me. Go ahead and make the changes. Optimize select count(1) with RCFile and Orc Key: HIVE-4113 URL: https://issues.apache.org/jira/browse/HIVE-4113 Project: Hive Issue Type: Bug Components: File Formats Reporter: Gopal V Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, HIVE-4113.patch, HIVE-4113.patch select count(1) loads up every column every row when used with RCFile. select count(1) from store_sales_10_rc gives {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 31.73 sec HDFS Read: 234914410 HDFS Write: 8 SUCCESS {code} Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far less {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 29.75 sec HDFS Read: 28145994 HDFS Write: 8 SUCCESS {code} Which is 11% of the data size read by the COUNT(1). This was tracked down to the following code in RCFile.java {code} } else { // TODO: if no column name is specified e.g, in select count(1) from tt; // skip all columns, this should be distinguished from the case: // select * from tt; for (int i = 0; i skippedColIDs.length; i++) { skippedColIDs[i] = false; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5271) Convert join op to a map join op in the planning phase
[ https://issues.apache.org/jira/browse/HIVE-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771914#comment-13771914 ] Ashutosh Chauhan commented on HIVE-5271: Couple of suggestions: * Seems like changes in MapRedTask are unintentional. * Instead of modifying existing test files, I will recommend to create new test cases with names like tez_* Convert join op to a map join op in the planning phase -- Key: HIVE-5271 URL: https://issues.apache.org/jira/browse/HIVE-5271 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5271.WIP.patch This captures the planning changes required in hive to support hash joins. We need to convert the join operator to a map join operator. This is hooked into the infrastructure provided by HIVE-5095. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors
[ https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-5320: -- Attachment: HIVE-5320.patch Querying a table with nested struct type over JSON data results in errors - Key: HIVE-5320 URL: https://issues.apache.org/jira/browse/HIVE-5320 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Reporter: Chaoyu Tang Attachments: HIVE-5320.patch Querying a table with nested_struct datatype like == create table nest_struct_tbl (col1 string, col2 arraystructa1:string, a2:arraystructb1:int, b2:string, b3:string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; == over JSON data cause errors including java.lang.IndexOutOfBoundsException or corrupted data. The JsonSerDe used is json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar. The cause is that the method: public ListObject getStructFieldsDataAsList(Object o) in JsonStructObjectInspector.java returns a list referencing to a static arraylist values So the local variable 'list' in method serialize of Hive LazySimpleSerDe class is returned with same reference in its recursive calls and its element values are kept on being overwritten in the case STRUCT. Solutions: 1. Fix in JsonSerDe, and change the field 'values' in java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java to instance scope. Filed a ticket to JSonSerDe (https://github.com/rcongiu/Hive-JSON-Serde/issues/31) 2. Ideally, in the method serialize of class LazySimpleSerDe, we should defensively save a copy of a list resulted from list = soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of JsonStructObjectInspector, so that the recursive calls of serialize can work properly regardless of the extended SerDe implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Operators and || do not work
I have not tested it on historical versions, so don't know on which versions it used to work (if ever), but possibly antlr upgrade [1] may have impacted this. [1] : https://issues.apache.org/jira/browse/HIVE-2439 Ashutosh On Thu, Sep 19, 2013 at 4:52 AM, amareshwari sriramdasu amareshw...@gmail.com wrote: Hello, Though the documentation https://cwiki.apache.org/Hive/languagemanual-udf.html says they are same as AND and OR, they do not even get parsed. User gets parsing when they are used. Was that intentional or is it a regression? hive select key from src where key=a || key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '|' 'key' '=' in expression specification hive select key from src where key=a key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '' 'key' '=' in expression specification Thanks Amareshwari
[jira] [Assigned] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors
[ https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang reassigned HIVE-5320: - Assignee: Chaoyu Tang Querying a table with nested struct type over JSON data results in errors - Key: HIVE-5320 URL: https://issues.apache.org/jira/browse/HIVE-5320 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Attachments: HIVE-5320.patch Querying a table with nested_struct datatype like == create table nest_struct_tbl (col1 string, col2 arraystructa1:string, a2:arraystructb1:int, b2:string, b3:string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; == over JSON data cause errors including java.lang.IndexOutOfBoundsException or corrupted data. The JsonSerDe used is json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar. The cause is that the method: public ListObject getStructFieldsDataAsList(Object o) in JsonStructObjectInspector.java returns a list referencing to a static arraylist values So the local variable 'list' in method serialize of Hive LazySimpleSerDe class is returned with same reference in its recursive calls and its element values are kept on being overwritten in the case STRUCT. Solutions: 1. Fix in JsonSerDe, and change the field 'values' in java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java to instance scope. Filed a ticket to JSonSerDe (https://github.com/rcongiu/Hive-JSON-Serde/issues/31) 2. Ideally, in the method serialize of class LazySimpleSerDe, we should defensively save a copy of a list resulted from list = soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of JsonStructObjectInspector, so that the recursive calls of serialize can work properly regardless of the extended SerDe implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors
[ https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771916#comment-13771916 ] Chaoyu Tang commented on HIVE-5320: --- Please review the attached patch for the fix. Querying a table with nested struct type over JSON data results in errors - Key: HIVE-5320 URL: https://issues.apache.org/jira/browse/HIVE-5320 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Attachments: HIVE-5320.patch Querying a table with nested_struct datatype like == create table nest_struct_tbl (col1 string, col2 arraystructa1:string, a2:arraystructb1:int, b2:string, b3:string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; == over JSON data cause errors including java.lang.IndexOutOfBoundsException or corrupted data. The JsonSerDe used is json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar. The cause is that the method: public ListObject getStructFieldsDataAsList(Object o) in JsonStructObjectInspector.java returns a list referencing to a static arraylist values So the local variable 'list' in method serialize of Hive LazySimpleSerDe class is returned with same reference in its recursive calls and its element values are kept on being overwritten in the case STRUCT. Solutions: 1. Fix in JsonSerDe, and change the field 'values' in java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java to instance scope. Filed a ticket to JSonSerDe (https://github.com/rcongiu/Hive-JSON-Serde/issues/31) 2. Ideally, in the method serialize of class LazySimpleSerDe, we should defensively save a copy of a list resulted from list = soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of JsonStructObjectInspector, so that the recursive calls of serialize can work properly regardless of the extended SerDe implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2615) CTAS with literal NULL creates VOID type
[ https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771940#comment-13771940 ] Johndee Burks commented on HIVE-2615: - An example of what the cast would look like: create table new_table as select column, cast(null as type) column_name from table_name; create table null_test as select user, cast(null as bigint) test from a; CTAS with literal NULL creates VOID type Key: HIVE-2615 URL: https://issues.apache.org/jira/browse/HIVE-2615 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0 Reporter: David Phillips Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-2615.1.patch Create the table with a column that always contains NULL: {quote} hive create table bad as select 1 x, null z from dual; {quote} Because there's no type, Hive gives it the VOID type: {quote} hive describe bad; OK x int z void {quote} This seems weird, because AFAIK, there is no normal way to create a column of type VOID. The problem is that the table can't be queried: {quote} hive select * from bad; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Internal error: no LazyObject for VOID {quote} Worse, even if you don't select that field, the query fails at runtime: {quote} hive select x from bad; ... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5302) PartitionPruner fails on Avro non-partitioned data
[ https://issues.apache.org/jira/browse/HIVE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771970#comment-13771970 ] Edward Capriolo commented on HIVE-5302: --- [~mwagner] [~busbey] Can the two of you come to a consensus as to whether the bug still exists? [~ashutoshc] I understand your debate about bloating the plan, however the plan is fairly ephemeral and changes quite often. If we can confirm the issue, this is surely a 0.12 blocker. You have mentioned that you would like to see this issue resolved a different way. Without a concrete suggestion as to what the better way might be we are at a stand still. I do not thing we want to hold up 0.12 longer then we need to, and I do not thing we want avro broken. Does anyone want to add anything? If not I am +1 on this patch. PartitionPruner fails on Avro non-partitioned data -- Key: HIVE-5302 URL: https://issues.apache.org/jira/browse/HIVE-5302 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.11.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Blocker Labels: avro Attachments: HIVE-5302.1-branch-0.12.patch.txt, HIVE-5302.1.patch.txt, HIVE-5302.1.patch.txt While updating HIVE-3585 I found a test case that causes the failure in the MetaStoreUtils partition retrieval from back in HIVE-4789. in this case, the failure is triggered when the partition pruner is handed a non-partitioned table and has to construct a pseudo-partition. e.g. {code} INSERT OVERWRITE TABLE partitioned_table PARTITION(col) SELECT id, foo, col FROM non_partitioned_table WHERE col = 9; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771954#comment-13771954 ] Brock Noland commented on HIVE-4487: Full error message from: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-813/failed/TestParseNegative/hive.log {noformat} 20_510_7475863120290716577-1/-ext-1 to 0777 java.io.IOException: Failed to set permissions of path: /home/hiveptest/ip-10-74-50-170-hiveptest-2/apache-svn-trunk-source/build/ql/scratchdir/hive_2013-09-18_19-22-20_510_7475863120290716577-1/-ext-1 to 0777 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.fs.ProxyFileSystem.mkdirs(ProxyFileSystem.java:217) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126) at org.apache.hadoop.hive.ql.exec.CopyTask.execute(CopyTask.java:74) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1415) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1193) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1021) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889) at org.apache.hadoop.hive.ql.QTestUtil.runLoadCmd(QTestUtil.java:539) at org.apache.hadoop.hive.ql.QTestUtil.createSources(QTestUtil.java:586) at org.apache.hadoop.hive.ql.QTestUtil.init(QTestUtil.java:678) at org.apache.hadoop.hive.ql.parse.TestParseNegative.runTest(TestParseNegative.java:248) at org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_ambiguous_join_col(TestParseNegative.java:117) {noformat} Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5202) Support for SettableUnionObjectInspector and implement isSettable/hasAllFieldsSettable APIs for all data types.
[ https://issues.apache.org/jira/browse/HIVE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771971#comment-13771971 ] Hive QA commented on HIVE-5202: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12603964/HIVE-5202.2.patch.txt {color:red}ERROR:{color} -1 due to 174 failed/errored test(s), 1241 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_fs2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5 org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan2 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2
[jira] [Updated] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-5317: Attachment: InsertUpdatesinHive.pdf Here are my thoughts about how it can be approached. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: How long will we support Hadoop 0.20.2?
+1 to dropping Hadoop 0.20.2 support in Hive 0.13, which given that Hive 0.12 has just branched means it isn't likely that Hive 0.13 will come out in the next 6 months. -- Owen On Thu, Sep 19, 2013 at 8:35 AM, Brock Noland br...@cloudera.com wrote: First off, I have to apologize, I didn't know there would be such passions on both sides of the 0.20.2 argument! On Thu, Sep 19, 2013 at 10:11 AM, Edward Capriolo edlinuxg...@gmail.com wrote: That rant being done, No worries man, Hadoop versions are something worth ranting about. IMHO Hadoop has a history of changing API's and breaking end users. However, I feel this is improving. we can not and should not support hadoop 0.20.2 forever. Discontinuing hadoop 0.20.2 in say 6 months might be reasonable, but I think dropping it on the floor due to a one line change for a missing convenience constructor is a bit knee-jerk. Very sorry if I came across with the opinion that we should drop 0.20.2 now because of the constructor issue. The issue brought up 0.20.2's age in my mind and the logical next step is to ask how long we plan on supporting it! :) I like the time bounding idea and I feel 6 months is reasonable. FWIW, the 1.X series is stable for my needs. Brock
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771952#comment-13771952 ] Brock Noland commented on HIVE-4487: Very strange. I don't see why this would be occurring since the hiveptest owns everything in /home/hiveptest/. It's not a privileged user so cannot it cannot change ownership. The only way I can see that is if hive_2013-09-18_19-22-30_852_73877859563099-1 somehow got created with 000 (or anything but 700). Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5311) TestHCatPartitionPublish can fail randomly
[ https://issues.apache.org/jira/browse/HIVE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772065#comment-13772065 ] Hudson commented on HIVE-5311: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #106 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/106/]) HIVE-5311 : TestHCatPartitionPublish can fail randomly (Brock Noland via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524515) * /hive/trunk/hcatalog/core/src/test/java/org/apache/hcatalog/mapreduce/TestHCatPartitionPublish.java * /hive/trunk/hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitionPublish.java TestHCatPartitionPublish can fail randomly -- Key: HIVE-5311 URL: https://issues.apache.org/jira/browse/HIVE-5311 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5311.patch {noformat} org.apache.thrift.TApplicationException: create_table_with_environment_context failed: out of sequence response at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:793) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:779) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:482) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:471) at org.apache.hcatalog.mapreduce.TestHCatPartitionPublish.createTable(TestHCatPartitionPublish.java:241) at org.apache.hcatalog.mapreduce.TestHCatPartitionPublish.testPartitionPublish(TestHCatPartitionPublish.java:133) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
[ https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772064#comment-13772064 ] Hudson commented on HIVE-5313: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #106 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/106/]) HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524578) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string) - Key: HIVE-5313 URL: https://issues.apache.org/jira/browse/HIVE-5313 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.12.0 Attachments: HIVE-5313.patch As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have to shim it out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772086#comment-13772086 ] Bikas Saha commented on HIVE-5317: -- Some questions which I am sure have been considered but are not clear in the document. Should metastore heartbeat be in the job itself and not the client since the job is the source of truth and the client can disappear. What happens if the client disappears but the job completes with success and manages to promote the output files? Is transaction id per file or per metastore? Where does the metastore recover the last transaction id(s) from after restart? Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772066#comment-13772066 ] Hudson commented on HIVE-4487: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #106 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/106/]) HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524578) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java HIVE-4487 - Hive does not set explicit permissions on hive.exec.scratchdir (Chaoyu Tang via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524509) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772046#comment-13772046 ] stack commented on HIVE-5317: - [~alangates] Looks like a bunch of hbase primitives done as mapreduce jobs. At first blush, on 1., percolator would be a bunch of work but looks less than what is proposed here (would you need percolator given you write the transaction id into the row?). On 2., if hbase were made write ORC, couldn't you MR the files hbase writes after asking hbase to snapshot. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5198) WebHCat returns exitcode 143 (w/o an explanation)
[ https://issues.apache.org/jira/browse/HIVE-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772063#comment-13772063 ] Hudson commented on HIVE-5198: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #106 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/106/]) HIVE-5198: WebHCat returns exitcode 143 (w/o an explanation) (Eugene Koifman via Thejas Nair) (thejas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524617) * /hive/trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/ExecServiceImpl.java WebHCat returns exitcode 143 (w/o an explanation) - Key: HIVE-5198 URL: https://issues.apache.org/jira/browse/HIVE-5198 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.11.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-5198.patch The message might look like this: {statement:use default; show table extended like xyz;,error:unable to show table: xyz,exec:{stdout:,stderr:,exitcode:143}} WebHCat has a templeton.exec.timeout property which kills an HCat request (i.e. something like a DDL statement that gets routed to HCat CLI) if it takes longer than this timeout. Since WebHCat does a fork/exec to 'hcat' script, the timeout is implemented as SIGTERM sent to the subprocess. SIGTERM value is 15. So it's reported as 128 + 15 = 143. Error logging/reporting should be improved in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5070) Need to implement listLocatedStatus() in ProxyFileSystem for 0.23 shim
[ https://issues.apache.org/jira/browse/HIVE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shanyu zhao updated HIVE-5070: -- Summary: Need to implement listLocatedStatus() in ProxyFileSystem for 0.23 shim (was: Need to implement listLocatedStatus() in ProxyFileSystem) Need to implement listLocatedStatus() in ProxyFileSystem for 0.23 shim -- Key: HIVE-5070 URL: https://issues.apache.org/jira/browse/HIVE-5070 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.12.0 Reporter: shanyu zhao Fix For: 0.13.0 Attachments: HIVE-5070.patch.txt, HIVE-5070-v2.patch, HIVE-5070-v3.patch MAPREDUCE-1981 introduced a new API for FileSystem - listLocatedStatus. It is used in Hadoop's FileInputFormat.getSplits(). Hive's ProxyFileSystem class needs to implement this API in order to make Hive unit test work. Otherwise, you'll see these exceptions when running TestCliDriver test case, e.g. results of running allcolref_in_udf.q: [junit] Running org.apache.hadoop.hive.cli.TestCliDriver [junit] Begin query: allcolref_in_udf.q [junit] java.lang.IllegalArgumentException: Wrong FS: pfile:/GitHub/Monarch/project/hive-monarch/build/ql/test/data/warehouse/src, expected: file:/// [junit] at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) [junit] at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69) [junit] at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:375) [junit] at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1482) [junit] at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1522) [junit] at org.apache.hadoop.fs.FileSystem$4.init(FileSystem.java:1798) [junit] at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1797) [junit] at org.apache.hadoop.fs.ChecksumFileSystem.listLocatedStatus(ChecksumFileSystem.java:579) [junit] at org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:235) [junit] at org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:235) [junit] at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264) [junit] at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217) [junit] at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:69) [junit] at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:385) [junit] at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:351) [junit] at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:389) [junit] at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503) [junit] at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495) [junit] at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390) [junit] at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) [junit] at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1481) [junit] at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) [junit] at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) [junit] at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:552) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1481) [junit] at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:552) [junit] at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:543) [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448) [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:688) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772077#comment-13772077 ] stack commented on HIVE-5317: - bq. The HBase scan rate is much lower than HDFS, especially with short-circuit reads. What kinda of numbers are you talking Owen? Would be interested in knowing what they are. Implication would be also that it cannot be improved? Or scanning the files written by hbase offline from a snapshot wouldn't work from you (snapshots are cheap in hbase. Going by your use cases, you'd be doing these runs infrequently enough). bq. HBase is tuned for a write-heavy workloads. Funny. Often we're accused of the other extreme. bq. HBase doesn't have a columnar format and can't support column projection. It doesn't. Too much work to add a storage engine that wrote columnar? bq. HBase doesn't have the equivalent of partitions or buckets. In hbase we call them 'Regions'. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772115#comment-13772115 ] Owen O'Malley commented on HIVE-5317: - Bikas, In Hive if the client disappears, the query fails, because the final work (output promotion, display to the user) is done by the client. Also don't forget that a single query may be composed on many MR jobs, although obviously that changes on Tez. The transaction id is global for all of the tasks working on the same query. The metastore's data in stored in an underlying SQL database, so the transaction information will need to be there also. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-4732: Attachment: HIVE-4732.6.patch Incorporating [~appodictic] comments. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, HIVE-4732.6.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow
[ https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772142#comment-13772142 ] Doug Sedlak commented on HIVE-4051: --- I've noticed that the more partitions in a Hive table, the slower the following operations come back. With thousands of partitions they approach painfully slow: SELECT * FROM TABNAME SHOW TABLE EXTENDED LIKE `TABNAME` Do you know if this fix takes case of these issues? If not is it something you could test? If not, I'll enter a new case. Thanks, Doug doug.sed...@sas.com Hive's metastore suffers from 1+N queries when querying partitions is slow Key: HIVE-4051 URL: https://issues.apache.org/jira/browse/HIVE-4051 Project: Hive Issue Type: Bug Components: Clients, Metastore Environment: RHEL 6.3 / EC2 C1.XL Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: 0.12.0 Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, HIVE-4051.D11805.3.patch, HIVE-4051.D11805.4.patch, HIVE-4051.D11805.5.patch, HIVE-4051.D11805.6.patch, HIVE-4051.D11805.7.patch, HIVE-4051.D11805.8.patch, HIVE-4051.D11805.9.patch Hive's query client takes a long time to initialize start planning queries because of delays in creating all the MTable/MPartition objects. For a hive db with 1800 partitions, the metastore took 6-7 seconds to initialize - firing approximately 5900 queries to the mysql database. Several of those queries fetch exactly one row to create a single object on the client. The following 12 queries were repeated for each partition, generating a storm of SQL queries {code} 4 Query SELECT `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID` FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 AND THIS.`INTEGER_IDX`=0 4 Query SELECT `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND THIS.`INTEGER_IDX`=0 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` =4871 AND `STRING_LIST_ID_KID` IS NOT NULL 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT (`A0`.`STRING_LIST_ID_KID` IS NULL) {code} This data is not detached or cached, so this operation is performed during every query plan for the partitions, even in the same hive client. The queries are automatically generated by JDO/DataNucleus which makes it nearly impossible to rewrite it into a single denormalized join operation process it locally. Attempts to optimize this with JDO fetch-groups did not bear fruit in improving the query count. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5306) Use new GenericUDF instead of basic UDF for UDFAbs class
[ https://issues.apache.org/jira/browse/HIVE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772149#comment-13772149 ] Mohammad Kamrul Islam commented on HIVE-5306: - [~jdere]: 2. How about float/string? I think those can be converted to double. Was string and float supported in the original case? Will apply your other comments. Use new GenericUDF instead of basic UDF for UDFAbs class Key: HIVE-5306 URL: https://issues.apache.org/jira/browse/HIVE-5306 Project: Hive Issue Type: Improvement Components: UDF Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-5306.1.patch, HIVE-5306.2.patch, HIVE-5306.3.patch, HIVE-5306.4.patch GenericUDF class is the latest and recommended base class for any UDFs. This JIRA is to change the current UDFAbs class extended from GenericUDF. The general benefit of GenericUDF is described in comments as * The GenericUDF are superior to normal UDFs in the following ways: 1. It can * accept arguments of complex types, and return complex types. 2. It can accept * variable length of arguments. 3. It can accept an infinite number of function * signature - for example, it's easy to write a GenericUDF that accepts * arrayint, arrayarrayint and so on (arbitrary levels of nesting). 4. It * can do short-circuit evaluations using DeferedObject. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request 14232: HIVE-5070: Need to implement listLocatedStatus() in ProxyFileSystem for 0.23 shim.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14232/ --- Review request for hive, Jason Dere and Thejas Nair. Repository: hive-git Description --- Please see HIVE-5070 for a detailed description of the problem: https://issues.apache.org/jira/browse/HIVE-5070 This patch creates a new shim method: createProxyFileSystem(). In shim 0.20 and 0.20S, it is simply create a ProxyFileSystem object. In shim 0.23, it creates a ProxyFileSystem23 that derives from ProxyFileSystem and implement the listLocatedStatus() method to handle proxy correctly. Diffs - shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java cf5c175 shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 9351411 shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java 28843e0 shims/src/common/java/org/apache/hadoop/fs/ProxyFileSystem.java 28a18f6 shims/src/common/java/org/apache/hadoop/fs/ProxyLocalFileSystem.java 9f35769 shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java 5b91267 Diff: https://reviews.apache.org/r/14232/diff/ Testing --- Run hive unit tests against hadoop 2.1.1-beta, it is successful now. Thanks, shanyu zhao
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772021#comment-13772021 ] Alan Gates commented on HIVE-5317: -- Brock, we did look at that. We didn't go that route for a couple of reasons: # Adding transactions to HBase is a fair amount of work. See Google's Percolator paper on one approach to that. # HBase can't offer the same scan speed as HDFS. Since we're choosing to focus this on updates done in the OLAP style work loads HBase isn't going to be a great storage mechanism for the data. I agree it might make sense to have transactions on HBase for a more OLTP style workload. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 14221: HIVE-4113: Optimize select count(1) with RCFile and Orc
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14221/ --- (Updated Sept. 19, 2013, 5:48 p.m.) Review request for hive. Bugs: HIVE-4113 https://issues.apache.org/jira/browse/HIVE-4113 Repository: hive-git Description --- Modifies ColumnProjectionUtils such there are two flags. One for the column ids and one indicating whether all columns should be read. Additionally the patch updates all locations which uses the old method of empty string indicating all columns should be read. The automatic formatter generated by ant eclipse-files is fairly aggressive so there are some unrelated import/whitespace cleanup. This one is based on https://reviews.apache.org/r/11770/ and has been rebased to the latest trunk. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f37d0c conf/hive-default.xml.template 545026d hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 766056b hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java 553446a hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatRecordReader.java 3ee6157 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java 1980ef5 hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitioned.java 577e06d hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoader.java d38bb8d ql/src/java/org/apache/hadoop/hive/ql/Driver.java 31a52ba ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java ab0494e ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java a5a8943 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 0f29a0e ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java 49145b7 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cccdc1b ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java a83f223 ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java 9521060 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 50c5093 ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java cbdc2db ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java ed14e82 ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java b97d869 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MetadataOnlyOptimizer.java 0550bf6 ql/src/test/org/apache/hadoop/hive/ql/io/PerformTestRCFileAndSeqFile.java fb9fca1 ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java dd1276d ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 83c5c38 serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0b3ef7b serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java 11f5f07 serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 1335446 serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java e1270cc serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java b717278 serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java 0317024 serde/src/test/org/apache/hadoop/hive/serde2/TestColumnProjectionUtils.java PRE-CREATION Diff: https://reviews.apache.org/r/14221/diff/ Testing --- Thanks, Yin Huai
[jira] [Commented] (HIVE-5306) Use new GenericUDF instead of basic UDF for UDFAbs class
[ https://issues.apache.org/jira/browse/HIVE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772158#comment-13772158 ] Jason Dere commented on HIVE-5306: -- As mentioned in the first comment, for non-generic UDFs it does attempt to see if the input argument can be mapped to one of the supported argument types. So it should work for float/string: hive create view view1 as select abs('1'), abs(cast(1.0 as float)) from src limit 1; OK Time taken: 0.099 seconds hive describe view1; OK _c0 double None _c1 double None Time taken: 0.055 seconds, Fetched: 2 row(s) Use new GenericUDF instead of basic UDF for UDFAbs class Key: HIVE-5306 URL: https://issues.apache.org/jira/browse/HIVE-5306 Project: Hive Issue Type: Improvement Components: UDF Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-5306.1.patch, HIVE-5306.2.patch, HIVE-5306.3.patch, HIVE-5306.4.patch GenericUDF class is the latest and recommended base class for any UDFs. This JIRA is to change the current UDFAbs class extended from GenericUDF. The general benefit of GenericUDF is described in comments as * The GenericUDF are superior to normal UDFs in the following ways: 1. It can * accept arguments of complex types, and return complex types. 2. It can accept * variable length of arguments. 3. It can accept an infinite number of function * signature - for example, it's easy to write a GenericUDF that accepts * arrayint, arrayarrayint and so on (arbitrary levels of nesting). 4. It * can do short-circuit evaluations using DeferedObject. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5321) Join filters do not work correctly with outer joins again
Alexander Pivovarov created HIVE-5321: - Summary: Join filters do not work correctly with outer joins again Key: HIVE-5321 URL: https://issues.apache.org/jira/browse/HIVE-5321 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.9.0 Reporter: Alexander Pivovarov SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 10) and SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 10) do not give correct results. to reproduce: hive create table tt1 (c1 int); hive create table tt2 (c1 int); $ vi tt1 1 2 3 4 $ vi tt2 1 2 8 9 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/ $ hadoop fs -put tt2 /user/hive/warehouse/tt2/ wrong result: hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); 1 1 2 2 3 NULL 4 NULL correct result: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2; 1 1 2 2 alexp@t1:~/hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt Release Notes - Hive - Version 0.11.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772192#comment-13772192 ] Eric Hanson commented on HIVE-5317: --- Overall this looks like a workable approach give the use cases described (mostly coarse grained updates with a low transaction rate), and it has the benefit that it doesn't take a dependency on another large piece of software like an update-aware DBMS or NoSQL store. Regarding use cases, it appears that this design won't be able to have fast performance for fine-grained inserts. E.g. there might be scenarios where you want to insert one row into a fact table every 10 milliseconds in a separate transaction and have the rows immediately visible to readers. Are you willing to forgo that use case? It sounds like yes. This may be reasonable. If you want to handle it then a different design for the delta insert file information is probably needed, i.e. a store that's optimized for short write transactions. I didn't see any obvious problem, due to the versioned scans, but is this design safe from the Halloween problem? That's the problem where an update scan sees its own updates again, causing an infinite loop or incorrect update. An argument that the design is safe from this would be good. You mention that you will have one type of delta file that encodes updates directly, for sorted files. Is this really necessary, or can you make updates illegal for sorted files? If updates can always be modelled as insert plus deleted, that simplifies things. How do you ensure that the delta files are fully written (committed) to the storage system before the metastore treats the transaction that created the delta file as committed? It's not completely clear why you need exactly the transaction ID information specified in the delta file names. E.g. would just the transaction ID (start timestamp) be enough? A precise specification of how they are used would be useful. Explicitly explaining what happens when a transaction aborts and how its delta files get ignored and then cleaned up would be useful. Is there any issue with correctness of task retry in the presence of updates if a task fails? It appears that it is safe due to the snapshot isolation. Explicitly addressing this in the specification would be good. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5321) Join filters do not work correctly with outer joins again
[ https://issues.apache.org/jira/browse/HIVE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-5321: -- Description: SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 10) and SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 10) do not give correct results. to reproduce: hive create table tt1 (c1 int); hive create table tt2 (c1 int); $ vi tt1 1 2 3 4 $ vi tt2 1 2 8 9 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/ $ hadoop fs -put tt2 /user/hive/warehouse/tt2/ wrong result: hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); 1 1 2 2 3 NULL 4 NULL correct result: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2; 1 1 2 2 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt Release Notes - Hive - Version 0.11.0 was: SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 10) and SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 10) do not give correct results. to reproduce: hive create table tt1 (c1 int); hive create table tt2 (c1 int); $ vi tt1 1 2 3 4 $ vi tt2 1 2 8 9 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/ $ hadoop fs -put tt2 /user/hive/warehouse/tt2/ wrong result: hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); 1 1 2 2 3 NULL 4 NULL correct result: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2; 1 1 2 2 alexp@t1:~/hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt Release Notes - Hive - Version 0.11.0 Join filters do not work correctly with outer joins again - Key: HIVE-5321 URL: https://issues.apache.org/jira/browse/HIVE-5321 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.11.0 Reporter: Alexander Pivovarov SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 10) and SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 10) do not give correct results. to reproduce: hive create table tt1 (c1 int); hive create table tt2 (c1 int); $ vi tt1 1 2 3 4 $ vi tt2 1 2 8 9 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/ $ hadoop fs -put tt2 /user/hive/warehouse/tt2/ wrong result: hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); 1 1 2 2 3 NULL 4 NULL correct result: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2; 1 1 2 2 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt Release Notes - Hive - Version 0.11.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1534) Join filters do not work correctly with outer joins
[ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772169#comment-13772169 ] Alexander Pivovarov commented on HIVE-1534: --- I still see this issue in hive-0.11.0 Join filters do not work correctly with outer joins --- Key: HIVE-1534 URL: https://issues.apache.org/jira/browse/HIVE-1534 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.7.0 Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, patch-1534-4.txt, patch-1534.txt SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 10) and SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 10) do not give correct results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5321) Join filters do not work correctly with outer joins again
[ https://issues.apache.org/jira/browse/HIVE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-5321: -- Description: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); does not give correct results. to reproduce: hive create table tt1 (c1 int); hive create table tt2 (c1 int); $ vi tt1 1 2 3 4 $ vi tt2 1 2 8 9 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/ $ hadoop fs -put tt2 /user/hive/warehouse/tt2/ wrong result: hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); 1 1 2 2 3 NULL 4 NULL correct result: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2; 1 1 2 2 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt Release Notes - Hive - Version 0.11.0 was: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); do not give correct results. to reproduce: hive create table tt1 (c1 int); hive create table tt2 (c1 int); $ vi tt1 1 2 3 4 $ vi tt2 1 2 8 9 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/ $ hadoop fs -put tt2 /user/hive/warehouse/tt2/ wrong result: hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); 1 1 2 2 3 NULL 4 NULL correct result: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2; 1 1 2 2 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt Release Notes - Hive - Version 0.11.0 Join filters do not work correctly with outer joins again - Key: HIVE-5321 URL: https://issues.apache.org/jira/browse/HIVE-5321 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.11.0 Reporter: Alexander Pivovarov select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); does not give correct results. to reproduce: hive create table tt1 (c1 int); hive create table tt2 (c1 int); $ vi tt1 1 2 3 4 $ vi tt2 1 2 8 9 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/ $ hadoop fs -put tt2 /user/hive/warehouse/tt2/ wrong result: hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); 1 1 2 2 3 NULL 4 NULL correct result: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2; 1 1 2 2 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt Release Notes - Hive - Version 0.11.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772106#comment-13772106 ] Yin Huai commented on HIVE-4487: {code} drwxrwxrwt 22 root root 56K Sep 19 13:36 tmp {code} {code} drwxrwxrwx 2 yhuai yhuai 4.0K Sep 19 13:50 yhuai {code} Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772186#comment-13772186 ] Chaoyu Tang commented on HIVE-4487: --- [~yhuai] It works in my eclipse. The log tells that it failed in line outStream = fs.create(resFile) of DDLTask. Could you debug and check before this line is executed, what permission and owner of the dir (e.g. /tmp/yhuai/hive_2013-09-19_/, one level up -local-1) are? What Hadoop version you are using? Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1534) Join filters do not work correctly with outer joins
[ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772179#comment-13772179 ] Alexander Pivovarov commented on HIVE-1534: --- to reproduce: hive create table tt1 (c1 int); hive create table tt2 (c1 int); $ vi tt1 1 2 3 4 $ vi tt2 1 2 8 9 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/ $ hadoop fs -put tt2 /user/hive/warehouse/tt2/ wrong result: hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); 1 1 2 2 3 NULL 4 NULL correct result: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2; 1 1 2 2 alexp@t1:~/hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt Release Notes - Hive - Version 0.11.0 Join filters do not work correctly with outer joins --- Key: HIVE-1534 URL: https://issues.apache.org/jira/browse/HIVE-1534 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.7.0 Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, patch-1534-4.txt, patch-1534.txt SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 10) and SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 10) do not give correct results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
[ https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772183#comment-13772183 ] Hudson commented on HIVE-5313: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #173 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/173/]) HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524578) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string) - Key: HIVE-5313 URL: https://issues.apache.org/jira/browse/HIVE-5313 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.12.0 Attachments: HIVE-5313.patch As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have to shim it out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772184#comment-13772184 ] Hudson commented on HIVE-4487: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #173 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/173/]) HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524578) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4113) Optimize select count(1) with RCFile and Orc
[ https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4113: --- Attachment: HIVE-4113.3.patch Optimize select count(1) with RCFile and Orc Key: HIVE-4113 URL: https://issues.apache.org/jira/browse/HIVE-4113 Project: Hive Issue Type: Bug Components: File Formats Reporter: Gopal V Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, HIVE-4113.3.patch, HIVE-4113.patch, HIVE-4113.patch select count(1) loads up every column every row when used with RCFile. select count(1) from store_sales_10_rc gives {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 31.73 sec HDFS Read: 234914410 HDFS Write: 8 SUCCESS {code} Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far less {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 29.75 sec HDFS Read: 28145994 HDFS Write: 8 SUCCESS {code} Which is 11% of the data size read by the COUNT(1). This was tracked down to the following code in RCFile.java {code} } else { // TODO: if no column name is specified e.g, in select count(1) from tt; // skip all columns, this should be distinguished from the case: // select * from tt; for (int i = 0; i skippedColIDs.length; i++) { skippedColIDs[i] = false; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772103#comment-13772103 ] Yin Huai commented on HIVE-4487: i meant when I did show tables in hive cli launched in eclipse. Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772101#comment-13772101 ] Brock Noland commented on HIVE-4487: Can you share the file permissions on each directory in the tree? Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4113) Optimize select count(1) with RCFile and Orc
[ https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4113: --- Status: Patch Available (was: Open) Optimize select count(1) with RCFile and Orc Key: HIVE-4113 URL: https://issues.apache.org/jira/browse/HIVE-4113 Project: Hive Issue Type: Bug Components: File Formats Reporter: Gopal V Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, HIVE-4113.3.patch, HIVE-4113.patch, HIVE-4113.patch select count(1) loads up every column every row when used with RCFile. select count(1) from store_sales_10_rc gives {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 31.73 sec HDFS Read: 234914410 HDFS Write: 8 SUCCESS {code} Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far less {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 29.75 sec HDFS Read: 28145994 HDFS Write: 8 SUCCESS {code} Which is 11% of the data size read by the COUNT(1). This was tracked down to the following code in RCFile.java {code} } else { // TODO: if no column name is specified e.g, in select count(1) from tt; // skip all columns, this should be distinguished from the case: // select * from tt; for (int i = 0; i skippedColIDs.length; i++) { skippedColIDs[i] = false; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5321) Join filters do not work correctly with outer joins again
[ https://issues.apache.org/jira/browse/HIVE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-5321: -- Description: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); do not give correct results. to reproduce: hive create table tt1 (c1 int); hive create table tt2 (c1 int); $ vi tt1 1 2 3 4 $ vi tt2 1 2 8 9 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/ $ hadoop fs -put tt2 /user/hive/warehouse/tt2/ wrong result: hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); 1 1 2 2 3 NULL 4 NULL correct result: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2; 1 1 2 2 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt Release Notes - Hive - Version 0.11.0 was: SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 10) and SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 10) do not give correct results. to reproduce: hive create table tt1 (c1 int); hive create table tt2 (c1 int); $ vi tt1 1 2 3 4 $ vi tt2 1 2 8 9 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/ $ hadoop fs -put tt2 /user/hive/warehouse/tt2/ wrong result: hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); 1 1 2 2 3 NULL 4 NULL correct result: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2; 1 1 2 2 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt Release Notes - Hive - Version 0.11.0 Join filters do not work correctly with outer joins again - Key: HIVE-5321 URL: https://issues.apache.org/jira/browse/HIVE-5321 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.11.0 Reporter: Alexander Pivovarov select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); do not give correct results. to reproduce: hive create table tt1 (c1 int); hive create table tt2 (c1 int); $ vi tt1 1 2 3 4 $ vi tt2 1 2 8 9 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/ $ hadoop fs -put tt2 /user/hive/warehouse/tt2/ wrong result: hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); 1 1 2 2 3 NULL 4 NULL correct result: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2; 1 1 2 2 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt Release Notes - Hive - Version 0.11.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5198) WebHCat returns exitcode 143 (w/o an explanation)
[ https://issues.apache.org/jira/browse/HIVE-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772182#comment-13772182 ] Hudson commented on HIVE-5198: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #173 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/173/]) HIVE-5198: WebHCat returns exitcode 143 (w/o an explanation) (Eugene Koifman via Thejas Nair) (thejas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1524617) * /hive/trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/ExecServiceImpl.java WebHCat returns exitcode 143 (w/o an explanation) - Key: HIVE-5198 URL: https://issues.apache.org/jira/browse/HIVE-5198 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.11.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-5198.patch The message might look like this: {statement:use default; show table extended like xyz;,error:unable to show table: xyz,exec:{stdout:,stderr:,exitcode:143}} WebHCat has a templeton.exec.timeout property which kills an HCat request (i.e. something like a DDL statement that gets routed to HCat CLI) if it takes longer than this timeout. Since WebHCat does a fork/exec to 'hcat' script, the timeout is implemented as SIGTERM sent to the subprocess. SIGTERM value is 15. So it's reported as 128 + 15 = 143. Error logging/reporting should be improved in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772093#comment-13772093 ] Yin Huai commented on HIVE-4487: Here is my error log when I am launching hive cli through eclipse. {code} Caused by: java.io.FileNotFoundException: /tmp/yhuai/hive_2013-09-19_13-43-12_206_2528583202954923226-1/-local-1 (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:209) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:180) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:176) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:234) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.init(ChecksumFileSystem.java:335) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:364) at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2252) ... 13 more {\code} Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5321) Join filters do not work correctly with outer joins again
[ https://issues.apache.org/jira/browse/HIVE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772259#comment-13772259 ] Alexander Pivovarov commented on HIVE-5321: --- Most probably the fix should validate a query and prevent executing query having filter predicate in join on Join filters do not work correctly with outer joins again - Key: HIVE-5321 URL: https://issues.apache.org/jira/browse/HIVE-5321 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.11.0 Reporter: Alexander Pivovarov select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); does not give correct results. to reproduce: hive create table tt1 (c1 int); hive create table tt2 (c1 int); $ vi tt1 1 2 3 4 $ vi tt2 1 2 8 9 $ hadoop fs -put tt1 /user/hive/warehouse/tt1/ $ hadoop fs -put tt2 /user/hive/warehouse/tt2/ wrong result: hive select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1 and tt1.c1 = 2); 1 1 2 2 3 NULL 4 NULL correct result: select * from tt1 left outer join tt2 on (tt1.c1 = tt2.c1) where tt1.c1 = 2; 1 1 2 2 hive-0.11.0-bin$ head -1 RELEASE_NOTES.txt Release Notes - Hive - Version 0.11.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772258#comment-13772258 ] Thejas M Nair commented on HIVE-4487: - I think the problem might have to do with HIVE-5313 change. It converts the octal string into short using Short.parseShort(scratchDirPermission) but that function expects decimal. So 700 gets converted to 700 instead of 448. Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5322) FsPermission is initialized incorrectly in HIVE 5513
[ https://issues.apache.org/jira/browse/HIVE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772274#comment-13772274 ] Thejas M Nair commented on HIVE-5322: - I won't be able to get to fixing this today. So it would be great if anyone can take a stab at it. Supporting 20.2 is not trivial work ! (cc [~appodictic]) FsPermission is initialized incorrectly in HIVE 5513 Key: HIVE-5322 URL: https://issues.apache.org/jira/browse/HIVE-5322 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Thejas M Nair Priority: Blocker Fix For: 0.12.0 The change in HIVE-5313 converts the octal string into short using Short.parseShort(scratchDirPermission) but Short.parseShort function expects decimal. So 700 gets converted to 700 instead of 448. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772278#comment-13772278 ] Thejas M Nair commented on HIVE-4487: - I have created HIVE-5322 to track the permission issue. Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Fix For: 0.12.0 Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 14221: HIVE-4113: Optimize select count(1) with RCFile and Orc
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14221/#review26274 --- Few comments, mostly around unnecessary null checks, which I think are no longer required, now that column pruning will always be happening. Secondly, I think we should be representing column list as LinkedHashSet instead of List. ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java https://reviews.apache.org/r/14221/#comment51336 Seems like this null check is now redundant. List can never be null. ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java https://reviews.apache.org/r/14221/#comment51341 If above is true, this else can also be removed. ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java https://reviews.apache.org/r/14221/#comment51339 Seems like this null check is now redundant. Can we remove this? ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java https://reviews.apache.org/r/14221/#comment51340 If above is true, than this can also be removed. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51337 Seems like this method is called only from tests, no one actually uses it. I will suggest just to remove this method altogether to minimize. appendReadColumnIds() can readily be used in place of this and that is what Hive uses everywhere. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51338 This method is also only called either from previous method and from test. Once we remove previous one, I don't think we need to introduce this new method. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51355 ids should be of type LinkedHashSetInteger instead of list. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51343 Seems like this null check is no longer needed. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51344 I don't think old can be null at this point either. We should remove this null check. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51356 Simlarly here cols should be of type LinkedHashSetString serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51345 This null check is not required anymore. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51342 Seems like there is no way that ids could be null now. Lets remove this null check. If someone is indeed passing null, than we are just masking that bug, which should be fixed at caller site. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51358 This should return LinkedHashSetInteger instead. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51346 Caller should never pass null conf, Returning empty list is dangerous. Better to let it throw NPE on the caller than this. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51347 It doesn't seem like that this method is needed. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51348 It should be caller's responsibility to not pass in null here. We should not do this null check. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51349 Remove call to this new public method and just inline that logic in this method. We should keep our public methods to minimum. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51350 Again, no null check please : ) - Ashutosh Chauhan On Sept. 19, 2013, 5:48 p.m., Yin Huai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14221/ --- (Updated Sept. 19, 2013, 5:48 p.m.) Review request for hive. Bugs: HIVE-4113 https://issues.apache.org/jira/browse/HIVE-4113 Repository: hive-git Description --- Modifies ColumnProjectionUtils such there are two flags. One for the column ids and one indicating whether all columns should be read.
[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc
[ https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772163#comment-13772163 ] Ashutosh Chauhan commented on HIVE-4113: [~yhuai] I left some comments on RB. But, it seems like you updated the patch in meanwhile, so some of those you may have already addressed. Optimize select count(1) with RCFile and Orc Key: HIVE-4113 URL: https://issues.apache.org/jira/browse/HIVE-4113 Project: Hive Issue Type: Bug Components: File Formats Reporter: Gopal V Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, HIVE-4113.3.patch, HIVE-4113.patch, HIVE-4113.patch select count(1) loads up every column every row when used with RCFile. select count(1) from store_sales_10_rc gives {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 31.73 sec HDFS Read: 234914410 HDFS Write: 8 SUCCESS {code} Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far less {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 29.75 sec HDFS Read: 28145994 HDFS Write: 8 SUCCESS {code} Which is 11% of the data size read by the COUNT(1). This was tracked down to the following code in RCFile.java {code} } else { // TODO: if no column name is specified e.g, in select count(1) from tt; // skip all columns, this should be distinguished from the case: // select * from tt; for (int i = 0; i skippedColIDs.length; i++) { skippedColIDs[i] = false; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5322) FsPermission is initialized incorrectly in HIVE 5513
[ https://issues.apache.org/jira/browse/HIVE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772286#comment-13772286 ] Mark Wagner commented on HIVE-5322: --- I'll take at a stab at it. FsPermission is initialized incorrectly in HIVE 5513 Key: HIVE-5322 URL: https://issues.apache.org/jira/browse/HIVE-5322 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Thejas M Nair Priority: Blocker Fix For: 0.12.0 The change in HIVE-5313 converts the octal string into short using Short.parseShort(scratchDirPermission) but Short.parseShort function expects decimal. So 700 gets converted to 700 instead of 448. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-5322) FsPermission is initialized incorrectly in HIVE 5513
[ https://issues.apache.org/jira/browse/HIVE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner reassigned HIVE-5322: - Assignee: Mark Wagner FsPermission is initialized incorrectly in HIVE 5513 Key: HIVE-5322 URL: https://issues.apache.org/jira/browse/HIVE-5322 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Mark Wagner Priority: Blocker Fix For: 0.12.0 The change in HIVE-5313 converts the octal string into short using Short.parseShort(scratchDirPermission) but Short.parseShort function expects decimal. So 700 gets converted to 700 instead of 448. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5322) FsPermission is initialized incorrectly in HIVE 5513
Thejas M Nair created HIVE-5322: --- Summary: FsPermission is initialized incorrectly in HIVE 5513 Key: HIVE-5322 URL: https://issues.apache.org/jira/browse/HIVE-5322 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Thejas M Nair Priority: Blocker Fix For: 0.12.0 The change in HIVE-5313 converts the octal string into short using Short.parseShort(scratchDirPermission) but Short.parseShort function expects decimal. So 700 gets converted to 700 instead of 448. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5086) Fix scriptfile1.q on Windows
[ https://issues.apache.org/jira/browse/HIVE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5086: --- Resolution: Fixed Fix Version/s: (was: 0.12.0) 0.13.0 Status: Resolved (was: Patch Available) Nevermind, I found the missing file in original patch. Committed to trunk. Thanks, Daniel! Fix scriptfile1.q on Windows Key: HIVE-5086 URL: https://issues.apache.org/jira/browse/HIVE-5086 Project: Hive Issue Type: Bug Components: Tests, Windows Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.13.0 Attachments: HIVE-5086-1.patch, HIVE-5086-2.patch Test failed with error message: [junit] Task with the most failures(4): [junit] - [junit] Task ID: [junit] task_20130814023904691_0001_m_00 [junit] [junit] URL: [junit] http://localhost:50030/taskdetails.jsp?jobid=job_20130814023904691_0001tipid=task_20130814023904691_0001_m_00 [junit] - [junit] Diagnostic Messages for this Task: [junit] java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {key:238,value:val_238} [junit] at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175) [junit] at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) [junit] at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) [junit] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365) [junit] at org.apache.hadoop.mapred.Child$4.run(Child.java:271) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) [junit] at org.apache.hadoop.mapred.Child.main(Child.java:265) [junit] Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {key:238,value:val_238} [junit] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:538) [junit] at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157) [junit] ... 8 more [junit] Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 2]: Unable to initialize custom script. [junit] at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:357) [junit] at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) [junit] at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:848) [junit] at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88) [junit] at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) [junit] at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:848) [junit] at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90) [junit] at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) [junit] at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:848) [junit] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:528) [junit] ... 9 more [junit] Caused by: java.io.IOException: Cannot run program D:\tmp\hadoop-Administrator\mapred\local\3_0\taskTracker\Administrator\jobcache\job_20130814023904691_0001\attempt_20130814023904691_0001_m_00_3\work\.\testgrep: CreateProcess error=193, %1 is not a valid Win32 application [junit] at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) [junit] at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:316) [junit] ... 18 more [junit] Caused by: java.io.IOException: CreateProcess error=193, %1 is not a valid Win32 application [junit] at java.lang.ProcessImpl.create(Native Method) [junit] at java.lang.ProcessImpl.init(ProcessImpl.java:81) [junit] at java.lang.ProcessImpl.start(ProcessImpl.java:30) [junit] at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) [junit] ... 19 more [junit] [junit] [junit] Exception: Client Execution failed with error code = 2 [junit] See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get more logs. [junit] junit.framework.AssertionFailedError: Client Execution failed with error code = 2 [junit] See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get more logs. [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772005#comment-13772005 ] Brock Noland commented on HIVE-5317: Just curious, I was surprised I didn't see adding transactions to HBase + support in the hbase storage handler as a potential alternative implementation. Could you speak to why your approach is superior to that approach? Also, it'd be great if you posted design document on the design document section of the wiki: https://cwiki.apache.org/confluence/display/Hive/DesignDocs Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772028#comment-13772028 ] Owen O'Malley commented on HIVE-5317: - Expanding on Alan's comments: * The HBase scan rate is much lower than HDFS, especially with short-circuit reads. * HBase is tuned for a write-heavy workloads. * HBase doesn't have a columnar format and can't support column projection. * HBase doesn't have predicate pushdown into the file format. * HBase doesn't have the equivalent of partitions or buckets. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5209) JDBC support for varchar
[ https://issues.apache.org/jira/browse/HIVE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772335#comment-13772335 ] Phabricator commented on HIVE-5209: --- jdere has commented on the revision HIVE-5209 [jira] JDBC support for varchar. INLINE COMMENTS jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java:32 Tried the sample JDBC client from https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients, the example is still able to work fine with just the hive-jdbc and hive-service JARs in my classpath. So it actually looks like we are still ok here, even without having to pull out those methods above to a separate utility class. REVISION DETAIL https://reviews.facebook.net/D12999 To: JIRA, jdere Cc: cwsteinbach, thejas JDBC support for varchar Key: HIVE-5209 URL: https://issues.apache.org/jira/browse/HIVE-5209 Project: Hive Issue Type: Improvement Components: HiveServer2, JDBC, Types Reporter: Jason Dere Assignee: Jason Dere Attachments: D12999.1.patch, HIVE-5209.1.patch, HIVE-5209.2.patch, HIVE-5209.4.patch, HIVE-5209.5.patch, HIVE-5209.D12705.1.patch Support returning varchar length in result set metadata -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772337#comment-13772337 ] Hive QA commented on HIVE-4732: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12604084/HIVE-4732.6.patch {color:red}ERROR:{color} -1 due to 174 failed/errored test(s), 1242 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_fs2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5 org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan2 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan1 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772336#comment-13772336 ] Phabricator commented on HIVE-2206: --- yhuai has closed the revision HIVE-2206 [jira] add a new optimizer for query correlation discovery and optimization. Closed by commit rHIVE1504395 (authored by hashutosh). CHANGED PRIOR TO COMMIT https://reviews.facebook.net/D11097?vs=39099id=40161#toc REVISION DETAIL https://reviews.facebook.net/D11097 COMMIT https://reviews.facebook.net/rHIVE1504395 To: JIRA, ashutoshc, yhuai Cc: brock add a new optimizer for query correlation discovery and optimization Key: HIVE-2206 URL: https://issues.apache.org/jira/browse/HIVE-2206 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.12.0 Reporter: He Yongqiang Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-2206.10-r1384442.patch.txt, HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, HIVE-2206.D11097.10.patch, HIVE-2206.D11097.11.patch, HIVE-2206.D11097.12.patch, HIVE-2206.D11097.13.patch, HIVE-2206.D11097.14.patch, HIVE-2206.D11097.15.patch, HIVE-2206.D11097.16.patch, HIVE-2206.D11097.17.patch, HIVE-2206.D11097.18.patch, HIVE-2206.D11097.19.patch, HIVE-2206.D11097.1.patch, HIVE-2206.D11097.20.patch, HIVE-2206.D11097.21.patch, HIVE-2206.D11097.22.patch, HIVE-2206.D11097.2.patch, HIVE-2206.D11097.3.patch, HIVE-2206.D11097.4.patch, HIVE-2206.D11097.5.patch, HIVE-2206.D11097.6.patch, HIVE-2206.D11097.7.patch, HIVE-2206.D11097.8.patch, HIVE-2206.D11097.9.patch, HIVE-2206.patch, testQueries.2.q, YSmartPatchForHive.patch This issue proposes a new logical optimizer called Correlation Optimizer, which is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/). The paper and slides of YSmart are linked at the bottom. Since Hive translates queries in a sentence by sentence fashion, for every operation which may need to shuffle the data (e.g. join and aggregation operations), Hive will generate a MapReduce job for that operation. However, for those operations which may need to shuffle the data, they may involve correlations explained below and thus can be executed in a single MR job. # Input Correlation: Multiple MR jobs have input correlation (IC) if their input relation sets are not disjoint; # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they have not only input correlation, but also the same partition key; # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its child nodes if it has the same partition key as that child node. The current implementation of correlation optimizer only detect correlations among MR jobs for reduce-side join operators and reduce-side aggregation operators (not map only aggregation). A query will be optimized if it satisfies following conditions. # There exists a MR job for reduce-side join operator or reduce side aggregation operator which have JFC with all of its parents MR jobs (TCs will be also exploited if JFC exists); # All input tables of those correlated MR job are original input tables (not intermediate tables generated by sub-queries); and # No self join is involved in those correlated MR jobs. Correlation optimizer is implemented as a logical optimizer. The main reasons are that it only needs to manipulate the query plan tree and it can leverage the existing component on generating MR jobs. Current implementation can serve as a framework for correlation related optimizations. I think that it is better than adding individual optimizers. There are several work that can be done in future to improve this optimizer. Here are three examples. # Support queries only involve TC; # Support queries in which input tables of correlated MR jobs involves intermediate tables; and # Optimize queries involving self join. References: Paper and presentation of YSmart. Paper: http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf Slides: http://sdrv.ms/UpwJJc -- This message is
[jira] [Updated] (HIVE-4113) Optimize select count(1) with RCFile and Orc
[ https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4113: --- Status: Open (was: Patch Available) Optimize select count(1) with RCFile and Orc Key: HIVE-4113 URL: https://issues.apache.org/jira/browse/HIVE-4113 Project: Hive Issue Type: Bug Components: File Formats Reporter: Gopal V Assignee: Yin Huai Fix For: 0.12.0 Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, HIVE-4113.3.patch, HIVE-4113.patch, HIVE-4113.patch select count(1) loads up every column every row when used with RCFile. select count(1) from store_sales_10_rc gives {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 31.73 sec HDFS Read: 234914410 HDFS Write: 8 SUCCESS {code} Where as, select count(ss_sold_date_sk) from store_sales_10_rc; reads far less {code} Job 0: Map: 5 Reduce: 1 Cumulative CPU: 29.75 sec HDFS Read: 28145994 HDFS Write: 8 SUCCESS {code} Which is 11% of the data size read by the COUNT(1). This was tracked down to the following code in RCFile.java {code} } else { // TODO: if no column name is specified e.g, in select count(1) from tt; // skip all columns, this should be distinguished from the case: // select * from tt; for (int i = 0; i skippedColIDs.length; i++) { skippedColIDs[i] = false; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4996) unbalanced calls to openTransaction/commitTransaction
[ https://issues.apache.org/jira/browse/HIVE-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772313#comment-13772313 ] Jonathan Sharley commented on HIVE-4996: We have seen similar issues in an environment with a lot of concurrent hive access from multiple machines all running version 0.11. Our first thought was that we needed to turn on hive.support.concurrency. However, after doing this on all of our hosts including the ones running hiveserver we still see an intermittent issue. I've not been able to reliably reproduce it, but it does happen at least a few times a day as part of our regularly scheduled jobs. unbalanced calls to openTransaction/commitTransaction - Key: HIVE-4996 URL: https://issues.apache.org/jira/browse/HIVE-4996 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Environment: hiveserver1 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Reporter: wangfeng Priority: Critical Labels: hive, metastore Original Estimate: 504h Remaining Estimate: 504h when we used hiveserver1 based on hive-0.10.0, we found the Exception thrown.It was: FAILED: Error in metadata: MetaException(message:java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that the re are unbalanced calls to openTransaction/commitTransaction) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask help -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5209) JDBC support for varchar
[ https://issues.apache.org/jira/browse/HIVE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-5209: - Attachment: HIVE-5209.5.patch Uploading patch v5, with modifications based on Thejas' feedback. I tried a JDBC client example and it did not require any additional JARs other than hive-jdbc and hive-service, so it looks like we do not need rework the TypeDescriptor/TypeQualifier changes to avoid dependency on serde classes. JDBC support for varchar Key: HIVE-5209 URL: https://issues.apache.org/jira/browse/HIVE-5209 Project: Hive Issue Type: Improvement Components: HiveServer2, JDBC, Types Reporter: Jason Dere Assignee: Jason Dere Attachments: D12999.1.patch, HIVE-5209.1.patch, HIVE-5209.2.patch, HIVE-5209.4.patch, HIVE-5209.5.patch, HIVE-5209.D12705.1.patch Support returning varchar length in result set metadata -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772311#comment-13772311 ] Sushanth Sowmyan commented on HIVE-4388: As an update, I've been working on updating this patch, and finally have all my tests succeeding, but I'm now told that the KV/Cell changes are being rolled back now in 0.96. HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Gunther Hagleitner Assignee: Brock Noland Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 14221: HIVE-4113: Optimize select count(1) with RCFile and Orc
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14221/#review26284 --- ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java https://reviews.apache.org/r/14221/#comment51360 removed in diff3 ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java https://reviews.apache.org/r/14221/#comment51361 From the context, I think op should be a TS. So I will remove this else. ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java https://reviews.apache.org/r/14221/#comment51362 removed in diff3 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java https://reviews.apache.org/r/14221/#comment51363 will remove it. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51364 it is only used in tests. I will remove it. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51365 will remove it. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51366 will remove it serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51367 If we use LinkedHashSet, there will be no duplicate in ids. But, we also need to check if there is any duplicate in the read column string we are appending to (it will happen in the node running the compiler). The current version will leave the deduplication work in getReadColumnIDs (it will happen in every task). I think your suggestion is better. Will change it serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51370 right now, we do the deduplication work in getReadColumnIDs (at every task). I think your suggestion is better. Will change it. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51369 I will check it usage and make sure ids will not be null. Since ids is an input parameter, is it better to add an annotation or have an assertion? serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51371 ok will remove it. serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java https://reviews.apache.org/r/14221/#comment51372 ok, will also remove other null check - Yin Huai On Sept. 19, 2013, 5:48 p.m., Yin Huai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14221/ --- (Updated Sept. 19, 2013, 5:48 p.m.) Review request for hive. Bugs: HIVE-4113 https://issues.apache.org/jira/browse/HIVE-4113 Repository: hive-git Description --- Modifies ColumnProjectionUtils such there are two flags. One for the column ids and one indicating whether all columns should be read. Additionally the patch updates all locations which uses the old method of empty string indicating all columns should be read. The automatic formatter generated by ant eclipse-files is fairly aggressive so there are some unrelated import/whitespace cleanup. This one is based on https://reviews.apache.org/r/11770/ and has been rebased to the latest trunk. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f37d0c conf/hive-default.xml.template 545026d hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 766056b hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java 553446a hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatRecordReader.java 3ee6157 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java 1980ef5 hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitioned.java 577e06d hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoader.java d38bb8d ql/src/java/org/apache/hadoop/hive/ql/Driver.java 31a52ba ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java ab0494e ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java a5a8943 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 0f29a0e ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java 49145b7 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cccdc1b ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java a83f223 ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java 9521060