[jira] [Commented] (HIVE-8583) HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist
[ https://issues.apache.org/jira/browse/HIVE-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182481#comment-14182481 ] Lars Francke commented on HIVE-8583: As far as I understand the work is done on the non-replaced original configuration properties: {code} void addJobConfToEnvironment(Configuration conf, MapString, String env) { IteratorMap.EntryString, String it = conf.iterator(); while (it.hasNext()) { Map.EntryString, String en = it.next(); String name = en.getKey(); if (!blackListed(name)) { String value = conf.get(name); // does variable expansion name = safeEnvVarName(name); {code} So the replacing happens later. BTW. the replaceAll is wrong too. It takes a regex so . means every character. So it'd replace everything with underscores. HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist --- Key: HIVE-8583 URL: https://issues.apache.org/jira/browse/HIVE-8583 Project: Hive Issue Type: Improvement Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Attachments: HIVE-8583.1.patch [~alangates] added the following in HIVE-8341: {code} String bl = hconf.get(HiveConf.ConfVars.HIVESCRIPT_ENV_BLACKLIST.toString()); if (bl != null bl.length() 0) { String[] bls = bl.split(,); for (String b : bls) { b.replaceAll(., _); blackListedConfEntries.add(b); } } {code} The {{replaceAll}} call is confusing as its result is not used at all. This patch contains the following: * Minor style modification (missorted modifiers) * Adds reading of default value for HIVESCRIPT_ENV_BLACKLIST * Removes replaceAll * Lets blackListed take a Configuration job as parameter which allowed me to add a test for this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8532) return code of source xxx clause is missing
[ https://issues.apache.org/jira/browse/HIVE-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8532: Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks vitthal (Suhas) Gogate, for the contribution. return code of source xxx clause is missing - Key: HIVE-8532 URL: https://issues.apache.org/jira/browse/HIVE-8532 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.12.0, 0.13.1 Reporter: Gordon Wang Fix For: 0.15.0 Attachments: HIVE-8532.patch When executing source hql-file clause, hive client driver does not catch the return code of this command. This behaviour causes an issue when running hive query in Oozie workflow. When the source clause is put into a Oozie workflow, Oozie can not get the return code of this command. Thus, Oozie consider the source clause as successful all the time. So, when the source clause fails, the hive query does not abort and the oozie workflow does not abort either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8586) Record counters aren't updated correctly for vectorized queries
[ https://issues.apache.org/jira/browse/HIVE-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-8586: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk + branch. Record counters aren't updated correctly for vectorized queries --- Key: HIVE-8586 URL: https://issues.apache.org/jira/browse/HIVE-8586 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.14.0 Attachments: HIVE-8586.1.patch Counts batches not rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8590) With different parameters or column number dense_rank function gets different count distinct results
ericni created HIVE-8590: Summary: With different parameters or column number dense_rank function gets different count distinct results Key: HIVE-8590 URL: https://issues.apache.org/jira/browse/HIVE-8590 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.13.1 Environment: cdh 4.6.0/hive0.13 Reporter: ericni We create a table with sql which contains the dense_rank function,and then run count distinct on this table, we found that with diffrent dense_rank parameters or even defferent columns,we will get the defferent count distinct results: 1.Less data will be ok(in our test case,200 million rows will get the same results,but 300 million rows will get the different results ) 2.Different dense_rank parameters may be get the different results ,e.g dense_rank() over(distribute by a,b sort by c desc) and dense_rank() over(distribute by a sort by c desc) 3.All window functions(rank,row_number,dense_rank) have this problem 4.Less column number may be ok 5.Count(1) is ok,but Count distinct gets different results 6.It seems that some rows have been lost and some rows repeated test data(File is too large to upload.): http://pan.baidu.com/s/1hqnCzze test sql: http://pan.baidu.com/s/1eQna8q2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression
[ https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-8517: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to branch and trunk. Thanks [~mmokhtar] When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression - Key: HIVE-8517 URL: https://issues.apache.org/jira/browse/HIVE-8517 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8517.1.patch, HIVE-8517.2.patch, HIVE-8517.3.patch When joining on partition column number of partitions is used as NDV which gets overridden by StatsUtils.getColStatisticsFromExpression and the number of partitions used as NDV is replaced by number of rows which results in the same behavior as explained in https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance This is the call stack. {code} StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) line: 1001 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, NodeProcessorCtx, Object...) line: 1479 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 94 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78 PreOrderWalker.walk(Node) line: 54 PreOrderWalker.walk(Node) line: 59 PreOrderWalker.walk(Node) line: 59 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, HashMapNode,Object) line: 109 AnnotateWithStatistics.transform(ParseContext) line: 78 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, SetWriteEntity) line: 120 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, HashSetReadEntity, HashSetWriteEntity) line: 99 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 Driver.compile(String, boolean) line: 415 {code} Query {code} select ss_item_sk item_sk, d_date, sum(ss_sales_price), sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1193 and 1193+11 and ss_item_sk is not NULL group by ss_item_sk, d_date {code} Plan Notice in the Map join operator the number of rows drop from 82,510,879,939 to 36524 after the join. {code} OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 4 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 652315818272 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk} 1 {d_date_sk} {d_date} {d_month_seq} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col1, _col12, _col22, _col26, _col28, _col29 input vertices: 1 Map 4 Statistics: Num rows: 36524 Data size: 4163736 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator
[jira] [Updated] (HIVE-8567) Vectorized queries output extra stuff for Binary columns
[ https://issues.apache.org/jira/browse/HIVE-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-8567: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk and branch. Thanks [~mmccline]! Vectorized queries output extra stuff for Binary columns Key: HIVE-8567 URL: https://issues.apache.org/jira/browse/HIVE-8567 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8567.01.patch See vector_data_types.q query output. Non-vectorized output is shorter than vectorized binary column output which seems to include characters from earlier rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8582) Outer Join Simplification is broken
[ https://issues.apache.org/jira/browse/HIVE-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-8582: - Priority: Critical (was: Major) Outer Join Simplification is broken --- Key: HIVE-8582 URL: https://issues.apache.org/jira/browse/HIVE-8582 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: 0.14.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8582.patch, HIVE-8582.patch CLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8409) SMB joins fail intermittently on tez
[ https://issues.apache.org/jira/browse/HIVE-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-8409: - Labels: TODOC14 (was: ) SMB joins fail intermittently on tez Key: HIVE-8409 URL: https://issues.apache.org/jira/browse/HIVE-8409 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Critical Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-8409.1.patch, HIVE-8409.10.patch, HIVE-8409.11.patch, HIVE-8409.2.patch, HIVE-8409.3.patch, HIVE-8409.7.patch, HIVE-8409.8.patch, HIVE-8409.9.patch Flakiness with regard to SMB joins in tez. TEZ-1647 is required to complete the fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8409) SMB joins fail intermittently on tez
[ https://issues.apache.org/jira/browse/HIVE-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182517#comment-14182517 ] Lefty Leverenz commented on HIVE-8409: -- Doc note: This adds configuration parameter *hive.tez.smb.number.waves* to HiveConf.java, so it needs to be documented in the wiki. * [Configuration Properties -- Tez | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez] SMB joins fail intermittently on tez Key: HIVE-8409 URL: https://issues.apache.org/jira/browse/HIVE-8409 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Critical Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-8409.1.patch, HIVE-8409.10.patch, HIVE-8409.11.patch, HIVE-8409.2.patch, HIVE-8409.3.patch, HIVE-8409.7.patch, HIVE-8409.8.patch, HIVE-8409.9.patch Flakiness with regard to SMB joins in tez. TEZ-1647 is required to complete the fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8582) Outer Join Simplification is broken
[ https://issues.apache.org/jira/browse/HIVE-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182538#comment-14182538 ] Hive QA commented on HIVE-8582: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12676820/HIVE-8582.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6578 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1438/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1438/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1438/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12676820 - PreCommit-HIVE-TRUNK-Build Outer Join Simplification is broken --- Key: HIVE-8582 URL: https://issues.apache.org/jira/browse/HIVE-8582 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: 0.14.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8582.patch, HIVE-8582.patch CLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8534) sql std auth : update configuration whitelist for 0.14
[ https://issues.apache.org/jira/browse/HIVE-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-8534: - Labels: TODOC14 (was: ) sql std auth : update configuration whitelist for 0.14 -- Key: HIVE-8534 URL: https://issues.apache.org/jira/browse/HIVE-8534 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Blocker Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-8534.1.patch, HIVE-8534.2.patch, HIVE-8534.3.patch, HIVE-8534.4.patch, HIVE-8534.5.patch New config parameters have been introduced in hive 0.14. SQL standard authorization needs to be updated to allow some new parameters to be set, when the authorization mode is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8534) sql std auth : update configuration whitelist for 0.14
[ https://issues.apache.org/jira/browse/HIVE-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182571#comment-14182571 ] Lefty Leverenz commented on HIVE-8534: -- Doc note: This adds *hive.security.authorization.sqlstd.confwhitelist.append* and changes the description of *hive.security.authorization.sqlstd.confwhitelist* in HiveConf.java, so they need to be documented in the wiki. * [Configuration Properties -- SQL Standard Based Authorization | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-SQLStandardBasedAuthorization] * [SQL Standard Based Hive Authorization -- Restrictions on Hive Commands and Statements | https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-RestrictionsonHiveCommandsandStatements] * and optionally [SQL Standard Based Hive Authorization -- Configuration | https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-Configuration] sql std auth : update configuration whitelist for 0.14 -- Key: HIVE-8534 URL: https://issues.apache.org/jira/browse/HIVE-8534 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Blocker Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-8534.1.patch, HIVE-8534.2.patch, HIVE-8534.3.patch, HIVE-8534.4.patch, HIVE-8534.5.patch New config parameters have been introduced in hive 0.14. SQL standard authorization needs to be updated to allow some new parameters to be set, when the authorization mode is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8582) Outer Join Simplification is broken
[ https://issues.apache.org/jira/browse/HIVE-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-8582: - Resolution: Fixed Status: Resolved (was: Patch Available) Failures unrelated. Committed to trunk and branch. Outer Join Simplification is broken --- Key: HIVE-8582 URL: https://issues.apache.org/jira/browse/HIVE-8582 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: 0.14.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8582.patch, HIVE-8582.patch CLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6806) CREATE TABLE should support STORED AS AVRO
[ https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182579#comment-14182579 ] Navis commented on HIVE-6806: - [~leftylev] Right. I'll book that into new issue. CREATE TABLE should support STORED AS AVRO -- Key: HIVE-6806 URL: https://issues.apache.org/jira/browse/HIVE-6806 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Jeremy Beard Assignee: Ashish Kumar Singh Priority: Minor Labels: Avro Fix For: 0.14.0 Attachments: HIVE-6806.1.patch, HIVE-6806.2.patch, HIVE-6806.3.patch, HIVE-6806.patch Avro is well established and widely used within Hive, however creating Avro-backed tables requires the messy listing of the SerDe, InputFormat and OutputFormat classes. Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had native Avro support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8591) hive.default.fileformat should accept all formats described by StorageFormatDescriptor
Navis created HIVE-8591: --- Summary: hive.default.fileformat should accept all formats described by StorageFormatDescriptor Key: HIVE-8591 URL: https://issues.apache.org/jira/browse/HIVE-8591 Project: Hive Issue Type: Task Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor FileFormats are described by StorageFormatDescriptor, which is added in HIVE-5976. Validator for FileFormats should reflect that also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8591) hive.default.fileformat should accept all formats described by StorageFormatDescriptor
[ https://issues.apache.org/jira/browse/HIVE-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8591: Description: NO PRECOMMIT TESTS FileFormats are described by StorageFormatDescriptor, which is added in HIVE-5976. Validator for FileFormats should reflect that also. was:FileFormats are described by StorageFormatDescriptor, which is added in HIVE-5976. Validator for FileFormats should reflect that also. hive.default.fileformat should accept all formats described by StorageFormatDescriptor -- Key: HIVE-8591 URL: https://issues.apache.org/jira/browse/HIVE-8591 Project: Hive Issue Type: Task Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor NO PRECOMMIT TESTS FileFormats are described by StorageFormatDescriptor, which is added in HIVE-5976. Validator for FileFormats should reflect that also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8543) Compactions fail on metastore using postgres
[ https://issues.apache.org/jira/browse/HIVE-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182584#comment-14182584 ] Damien Carol commented on HIVE-8543: [~alangates] You're welcome. I'm sorry I was very busy these last few weeks I have not been able to take care of these postgres tickets. You're making a good job with these ones. Compactions fail on metastore using postgres Key: HIVE-8543 URL: https://issues.apache.org/jira/browse/HIVE-8543 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8543.patch The worker fails to update the stats when the metastore is using Postgres as the RDBMS. {code} org.postgresql.util.PSQLException: ERROR: relation tab_col_stats does not exist {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8591) hive.default.fileformat should accept all formats described by StorageFormatDescriptor
[ https://issues.apache.org/jira/browse/HIVE-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8591: Status: Patch Available (was: Open) hive.default.fileformat should accept all formats described by StorageFormatDescriptor -- Key: HIVE-8591 URL: https://issues.apache.org/jira/browse/HIVE-8591 Project: Hive Issue Type: Task Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-8591.1.patch.txt NO PRECOMMIT TESTS FileFormats are described by StorageFormatDescriptor, which is added in HIVE-5976. Validator for FileFormats should reflect that also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8591) hive.default.fileformat should accept all formats described by StorageFormatDescriptor
[ https://issues.apache.org/jira/browse/HIVE-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8591: Attachment: HIVE-8591.1.patch.txt hive.default.fileformat should accept all formats described by StorageFormatDescriptor -- Key: HIVE-8591 URL: https://issues.apache.org/jira/browse/HIVE-8591 Project: Hive Issue Type: Task Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-8591.1.patch.txt NO PRECOMMIT TESTS FileFormats are described by StorageFormatDescriptor, which is added in HIVE-5976. Validator for FileFormats should reflect that also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8564) DROP TABLE IF EXISTS throws exception if the table does not exist.
[ https://issues.apache.org/jira/browse/HIVE-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8564: Description: NO PRECOMMIT TESTS DROP TABLE IF EXISTS throws exception if the table does not exist. I tried set hive.exec.drop.ignorenonexistent=true, and it made no difference. hive DROP TABLE IF EXISTS testdb.mytable; 14/10/22 15:48:29 ERROR metadata.Hive: NoSuchObjectException(message:testdb.mytable table not found) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29338) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29306) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:29237) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1036) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1022) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) at com.sun.proxy.$Proxy7.getTable(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:975) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:917) at org.apache.hadoop.hive.ql.exec.DDLTask.dropTableOrPartitions(DDLTask.java:3846) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) OK was: DROP TABLE IF EXISTS throws exception if the table does not exist. I tried set hive.exec.drop.ignorenonexistent=true, and it made no difference. hive DROP TABLE IF EXISTS testdb.mytable; 14/10/22 15:48:29 ERROR metadata.Hive: NoSuchObjectException(message:testdb.mytable table not found) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29338) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29306) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:29237) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1036) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1022) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at
[jira] [Updated] (HIVE-8564) DROP TABLE IF EXISTS throws exception if the table does not exist.
[ https://issues.apache.org/jira/browse/HIVE-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8564: Assignee: Navis Status: Patch Available (was: Open) DROP TABLE IF EXISTS throws exception if the table does not exist. Key: HIVE-8564 URL: https://issues.apache.org/jira/browse/HIVE-8564 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.1 Reporter: Ben Assignee: Navis Priority: Minor Attachments: HIVE-8564.1.patch.txt NO PRECOMMIT TESTS DROP TABLE IF EXISTS throws exception if the table does not exist. I tried set hive.exec.drop.ignorenonexistent=true, and it made no difference. hive DROP TABLE IF EXISTS testdb.mytable; 14/10/22 15:48:29 ERROR metadata.Hive: NoSuchObjectException(message:testdb.mytable table not found) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29338) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29306) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:29237) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1036) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1022) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) at com.sun.proxy.$Proxy7.getTable(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:975) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:917) at org.apache.hadoop.hive.ql.exec.DDLTask.dropTableOrPartitions(DDLTask.java:3846) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) OK -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8564) DROP TABLE IF EXISTS throws exception if the table does not exist.
[ https://issues.apache.org/jira/browse/HIVE-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8564: Attachment: HIVE-8564.1.patch.txt It's just a log message (DDLTask returns 0). But seemed annoying. DROP TABLE IF EXISTS throws exception if the table does not exist. Key: HIVE-8564 URL: https://issues.apache.org/jira/browse/HIVE-8564 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.1 Reporter: Ben Priority: Minor Attachments: HIVE-8564.1.patch.txt NO PRECOMMIT TESTS DROP TABLE IF EXISTS throws exception if the table does not exist. I tried set hive.exec.drop.ignorenonexistent=true, and it made no difference. hive DROP TABLE IF EXISTS testdb.mytable; 14/10/22 15:48:29 ERROR metadata.Hive: NoSuchObjectException(message:testdb.mytable table not found) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29338) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29306) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:29237) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1036) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1022) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) at com.sun.proxy.$Proxy7.getTable(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:975) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:917) at org.apache.hadoop.hive.ql.exec.DDLTask.dropTableOrPartitions(DDLTask.java:3846) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) OK -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6165) Unify HivePreparedStatement from jdbc:hive and jdbc:hive2
[ https://issues.apache.org/jira/browse/HIVE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182648#comment-14182648 ] Hive QA commented on HIVE-6165: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12676822/HIVE-6165.2.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6563 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_correctness org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_correctness org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1439/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1439/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1439/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12676822 - PreCommit-HIVE-TRUNK-Build Unify HivePreparedStatement from jdbc:hive and jdbc:hive2 - Key: HIVE-6165 URL: https://issues.apache.org/jira/browse/HIVE-6165 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Reporter: Helmut Zechmann Priority: Minor Attachments: HIVE-6165.1.patch.txt, HIVE-6165.1.patch.txt, HIVE-6165.2.patch, HIVE-6165.2.patch.txt org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 jdbc drivers contain lots of duplicate code. Especially hive-HivePreparedStatement supports setObject, while the hive2 version does not. Share more code between the two to avoid duplicate work and to make sure that both support the broadest possible feature set. CLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8585) Constant folding should happen before ppd
[ https://issues.apache.org/jira/browse/HIVE-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182669#comment-14182669 ] Hive QA commented on HIVE-8585: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12676825/HIVE-8585.patch {color:red}ERROR:{color} -1 due to 71 failed/errored test(s), 6578 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cluster org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppd org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join38 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_vc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_clusterby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_random org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_udf_case org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_udf_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_basic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_semijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sort_merge_join_desc_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sort_merge_join_desc_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_5
[jira] [Created] (HIVE-8592) 0 values convert to null if casting to or inserting to Hive DECIMAL where precision and scale are the same
Aidan Semple created HIVE-8592: -- Summary: 0 values convert to null if casting to or inserting to Hive DECIMAL where precision and scale are the same Key: HIVE-8592 URL: https://issues.apache.org/jira/browse/HIVE-8592 Project: Hive Issue Type: Bug Components: Database/Schema, SQL Affects Versions: 0.13.0 Environment: Running Apache Hive version 0.13.0 using HortonWorks 2.1.2.1 with hadoop version 2.4.0.2.1.2.1-471, on Linux operating system centos5 (also occurs on centos6) Reporter: Aidan Semple Fix For: 0.13.0 I am trying to load zero values into Hive Decimal fields into a Hive table where the precision and scale are defined as the same e.g. DECIMAL(1,1) or DECIMAL(3,3) etc... However every time I run a hive ql insert statement to do this containing zero values or run a LOAD DATA command to load a text file of data containing zero values to these columns / fields, on performing a query on the table, these zero values are displayed and treated as NULL values. On further investigation, I was able to narrow the problem down to doing simple selects with casts. See example and output from Hive below. So attempting to do a cast for 0 or 0.0 or '.0' to DECIMAL(1,1) NULL is returned instead of 0. This is the same for precisions 1-38 where the scale is also the same If there is a work around for this then please let me know. Thanks! hive select cast('.0' as DECIMAL(1,1)), cast('0.0' as DECIMAL(1,1)), cast('0' as DECIMAL(1,1)), cast(0 as DECIMAL(1,1)), cast(0.0 as DECIMAL(1,1)); Query ID = xxx_2014102414_e4dfdcc1-e4ad-4f84-bd48-198e29fd3757 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1413470329106_0052, Tracking URL = http://hdp8:8088/proxy/application_1413470329106_0052/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1413470329106_0052 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-10-24 14:01:10,256 Stage-1 map = 0%, reduce = 0% 2014-10-24 14:01:27,644 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.51 sec MapReduce Total cumulative CPU time: 6 seconds 510 msec Ended Job = job_1413470329106_0052 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 6.51 sec HDFS Read: 269 HDFS Write: 15 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 510 msec OK NULLNULLNULLNULLNULL Time taken: 36.281 seconds, Fetched: 1 row(s) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8535) Enable compile time skew join optimization for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182755#comment-14182755 ] Rui Li commented on HIVE-8535: -- The failed test is because I added SORT_QUERY_RESULTS label to the qfile which has non-deterministic results in order. Maybe we have to merge that change to the trunk. Enable compile time skew join optimization for spark [Spark Branch] --- Key: HIVE-8535 URL: https://issues.apache.org/jira/browse/HIVE-8535 Project: Hive Issue Type: Improvement Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8535.1-spark.patch, HIVE-8535.2-spark.patch Sub-task of HIVE-8406 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8406) Research on skewed join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8406: - Attachment: Skew join background.pdf Upload the doc so it may help people get a better understand how skew join is done. Comments and suggestions are welcome. The doc may change as I dig deeper into the details and begin implementation. Research on skewed join [Spark Branch] -- Key: HIVE-8406 URL: https://issues.apache.org/jira/browse/HIVE-8406 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: Skew join background.pdf Research on how to handle skewed join for hive on spark. Here is original hive's design doc for skewed join, https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8577) Cannot deserialize Avro schema with a mapstring,string with null values
[ https://issues.apache.org/jira/browse/HIVE-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182764#comment-14182764 ] Hive QA commented on HIVE-8577: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12676823/HIVE-8577.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6562 tests executed *Failed tests:* {noformat} org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1441/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1441/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1441/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12676823 - PreCommit-HIVE-TRUNK-Build Cannot deserialize Avro schema with a mapstring,string with null values - Key: HIVE-8577 URL: https://issues.apache.org/jira/browse/HIVE-8577 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Labels: regression Attachments: HIVE-8577.1.patch, HIVE-8577.1.patch, map_null_schema.avro, map_null_val.avro An avro table with a mapstring,string column that contains null values cannot be deserialized when running the select statement. Create the following table: {noformat} CREATE TABLE avro_table (avreau_col_1 mapstring,string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ('avro.schema.url'='file:///tmp/map_null_schema.avro'); {noformat} Then load the avro data: {noformat} LOAD DATA LOCAL INPATH '/tmp/map_null_val.avro' OVERWRITE INTO TABLE avro_table; {noformat} And do the select (it fails): {noformat} SELECT * FROM avro_table; Error: java.io.IOException: org.apache.avro.AvroRuntimeException: Not a map: null (state=,code=0) {noformat} This is a regression bug (it works correctly on hive 0.13.1 version). This is the output that hive 0.13.1 displays: {noformat} {key3:val3,key4:null} {key3:val3,key4:null} {key1:null,key2:val2} {key3:val3,key4:null} {key3:val3,key4:null} {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8535) Enable compile time skew join optimization for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182798#comment-14182798 ] Xuefu Zhang commented on HIVE-8535: --- Hi [~lirui], for those tests that you added SORT_QUERY_RESULTS, please create a JIRA on trunk. We will merge it to Spark branch once it's committed. Thanks. Enable compile time skew join optimization for spark [Spark Branch] --- Key: HIVE-8535 URL: https://issues.apache.org/jira/browse/HIVE-8535 Project: Hive Issue Type: Improvement Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8535.1-spark.patch, HIVE-8535.2-spark.patch Sub-task of HIVE-8406 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8592) 0 values convert to null if casting to or inserting to Hive DECIMAL where precision and scale are the same
[ https://issues.apache.org/jira/browse/HIVE-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-8592. --- Resolution: Duplicate Dupe of HIVE-8559. 0 values convert to null if casting to or inserting to Hive DECIMAL where precision and scale are the same -- Key: HIVE-8592 URL: https://issues.apache.org/jira/browse/HIVE-8592 Project: Hive Issue Type: Bug Components: Database/Schema, SQL Affects Versions: 0.13.0 Environment: Running Apache Hive version 0.13.0 using HortonWorks 2.1.2.1 with hadoop version 2.4.0.2.1.2.1-471, on Linux operating system centos5 (also occurs on centos6) Reporter: Aidan Semple Fix For: 0.13.0 I am trying to load zero values into Hive Decimal fields into a Hive table where the precision and scale are defined as the same e.g. DECIMAL(1,1) or DECIMAL(3,3) etc... However every time I run a hive ql insert statement to do this containing zero values or run a LOAD DATA command to load a text file of data containing zero values to these columns / fields, on performing a query on the table, these zero values are displayed and treated as NULL values. On further investigation, I was able to narrow the problem down to doing simple selects with casts. See example and output from Hive below. So attempting to do a cast for 0 or 0.0 or '.0' to DECIMAL(1,1) NULL is returned instead of 0. This is the same for precisions 1-38 where the scale is also the same If there is a work around for this then please let me know. Thanks! hive select cast('.0' as DECIMAL(1,1)), cast('0.0' as DECIMAL(1,1)), cast('0' as DECIMAL(1,1)), cast(0 as DECIMAL(1,1)), cast(0.0 as DECIMAL(1,1)); Query ID = xxx_2014102414_e4dfdcc1-e4ad-4f84-bd48-198e29fd3757 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1413470329106_0052, Tracking URL = http://hdp8:8088/proxy/application_1413470329106_0052/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1413470329106_0052 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-10-24 14:01:10,256 Stage-1 map = 0%, reduce = 0% 2014-10-24 14:01:27,644 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.51 sec MapReduce Total cumulative CPU time: 6 seconds 510 msec Ended Job = job_1413470329106_0052 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 6.51 sec HDFS Read: 269 HDFS Write: 15 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 510 msec OK NULLNULLNULLNULLNULL Time taken: 36.281 seconds, Fetched: 1 row(s) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
[ https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-8588: - Priority: Critical (was: Major) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster Key: HIVE-8588 URL: https://issues.apache.org/jira/browse/HIVE-8588 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Priority: Critical Attachments: HIVE-8588.1.patch This is originally discovered by [~deepesh] When running a Sqoop integration test from WebHCat {noformat} curl --show-error -d command=export -libjars hdfs:///tmp/mysql-connector-java.jar --connect jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person -d statusdir=sqoop.output -X POST http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa; {noformat} the job is failing with the following error: {noformat} $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) {noformat} Note that the Sqoop tar bundle does not contain the JDBC connector jar. I think the problem here maybe that the mysql connector jar added to libjars isn't available to the Sqoop tool which first connects to the database through JDBC driver to collect some table information before running the MR job. libjars will only add the connector jar for the MR job and not the local one. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
[ https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-8588: - Status: Patch Available (was: Open) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster Key: HIVE-8588 URL: https://issues.apache.org/jira/browse/HIVE-8588 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-8588.1.patch This is originally discovered by [~deepesh] When running a Sqoop integration test from WebHCat {noformat} curl --show-error -d command=export -libjars hdfs:///tmp/mysql-connector-java.jar --connect jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person -d statusdir=sqoop.output -X POST http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa; {noformat} the job is failing with the following error: {noformat} $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) {noformat} Note that the Sqoop tar bundle does not contain the JDBC connector jar. I think the problem here maybe that the mysql connector jar added to libjars isn't available to the Sqoop tool which first connects to the database through JDBC driver to collect some table information before running the MR job. libjars will only add the connector jar for the MR job and not the local one. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
[ https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-8588: - Attachment: HIVE-8588.1.patch sqoop REST endpoint fails to send appropriate JDBC driver to the cluster Key: HIVE-8588 URL: https://issues.apache.org/jira/browse/HIVE-8588 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-8588.1.patch This is originally discovered by [~deepesh] When running a Sqoop integration test from WebHCat {noformat} curl --show-error -d command=export -libjars hdfs:///tmp/mysql-connector-java.jar --connect jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person -d statusdir=sqoop.output -X POST http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa; {noformat} the job is failing with the following error: {noformat} $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) {noformat} Note that the Sqoop tar bundle does not contain the JDBC connector jar. I think the problem here maybe that the mysql connector jar added to libjars isn't available to the Sqoop tool which first connects to the database through JDBC driver to collect some table information before running the MR job. libjars will only add the connector jar for the MR job and not the local one. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8593) Unintended regex is used in ScriptOperator#blackListed()
Ted Yu created HIVE-8593: Summary: Unintended regex is used in ScriptOperator#blackListed() Key: HIVE-8593 URL: https://issues.apache.org/jira/browse/HIVE-8593 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} for (String b : bls) { b.replaceAll(., _); {code} The dot can match any character. See http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replaceAll(java.lang.String,%20java.lang.String) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
[ https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182970#comment-14182970 ] Eugene Koifman commented on HIVE-8588: -- I meant to say in my previous comment that it would be good to get this into 0.14. sqoop REST endpoint fails to send appropriate JDBC driver to the cluster Key: HIVE-8588 URL: https://issues.apache.org/jira/browse/HIVE-8588 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Priority: Critical Attachments: HIVE-8588.1.patch This is originally discovered by [~deepesh] When running a Sqoop integration test from WebHCat {noformat} curl --show-error -d command=export -libjars hdfs:///tmp/mysql-connector-java.jar --connect jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person -d statusdir=sqoop.output -X POST http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa; {noformat} the job is failing with the following error: {noformat} $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) {noformat} Note that the Sqoop tar bundle does not contain the JDBC connector jar. I think the problem here maybe that the mysql connector jar added to libjars isn't available to the Sqoop tool which first connects to the database through JDBC driver to collect some table information before running the MR job. libjars will only add the connector jar for the MR job and not the local one. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
[ https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182969#comment-14182969 ] Eugene Koifman commented on HIVE-8588: -- [~vikram.dixit] w/o this change, for users to submit Sqoop jobs via WebHCat requires them to modify the Sqoop tar file to include the additional JDBC jars in it which is a major usability issue especially when working with multiple DBs and upgrades. sqoop REST endpoint fails to send appropriate JDBC driver to the cluster Key: HIVE-8588 URL: https://issues.apache.org/jira/browse/HIVE-8588 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Priority: Critical Attachments: HIVE-8588.1.patch This is originally discovered by [~deepesh] When running a Sqoop integration test from WebHCat {noformat} curl --show-error -d command=export -libjars hdfs:///tmp/mysql-connector-java.jar --connect jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person -d statusdir=sqoop.output -X POST http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa; {noformat} the job is failing with the following error: {noformat} $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) {noformat} Note that the Sqoop tar bundle does not contain the JDBC connector jar. I think the problem here maybe that the mysql connector jar added to libjars isn't available to the Sqoop tool which first connects to the database through JDBC driver to collect some table information before running the MR job. libjars will only add the connector jar for the MR job and not the local one. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8579) Guaranteed NPE in DDLSemanticAnalyzer
[ https://issues.apache.org/jira/browse/HIVE-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182971#comment-14182971 ] Hive QA commented on HIVE-8579: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12676824/HIVE-8579.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6563 tests executed *Failed tests:* {noformat} org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1442/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1442/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1442/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12676824 - PreCommit-HIVE-TRUNK-Build Guaranteed NPE in DDLSemanticAnalyzer - Key: HIVE-8579 URL: https://issues.apache.org/jira/browse/HIVE-8579 Project: Hive Issue Type: Bug Reporter: Lars Francke Assignee: Jason Dere Attachments: HIVE-8579.1.patch, HIVE-8579.1.patch This was added by [~jdere] in HIVE-8411. I don't fully understand the code (i.e. what it means when desc is null) but I'm sure, Jason, you can fix it without much trouble? {code} if (desc == null || !AlterTableDesc.doesAlterTableTypeSupportPartialPartitionSpec(desc.getOp())) { throw new SemanticException( ErrorMsg.ALTER_TABLE_TYPE_PARTIAL_PARTITION_SPEC_NO_SUPPORTED, desc.getOp().name()); } else if (!conf.getBoolVar(HiveConf.ConfVars.DYNAMICPARTITIONING)) { throw new SemanticException(ErrorMsg.DYNAMIC_PARTITION_DISABLED); } {code} You check for whether {{desc}} is null but then use it to do {{desc.getOp()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8594) Wrong condition in SettableConfigUpdater#setHiveConfWhiteList()
Ted Yu created HIVE-8594: Summary: Wrong condition in SettableConfigUpdater#setHiveConfWhiteList() Key: HIVE-8594 URL: https://issues.apache.org/jira/browse/HIVE-8594 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} if(whiteListParamsStr == null whiteListParamsStr.trim().isEmpty()) { {code} If whiteListParamsStr is null, the call to trim() would result in NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
[ https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182976#comment-14182976 ] Vikram Dixit K commented on HIVE-8588: -- +1 for 0.14 sqoop REST endpoint fails to send appropriate JDBC driver to the cluster Key: HIVE-8588 URL: https://issues.apache.org/jira/browse/HIVE-8588 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Priority: Critical Attachments: HIVE-8588.1.patch This is originally discovered by [~deepesh] When running a Sqoop integration test from WebHCat {noformat} curl --show-error -d command=export -libjars hdfs:///tmp/mysql-connector-java.jar --connect jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person -d statusdir=sqoop.output -X POST http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa; {noformat} the job is failing with the following error: {noformat} $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) {noformat} Note that the Sqoop tar bundle does not contain the JDBC connector jar. I think the problem here maybe that the mysql connector jar added to libjars isn't available to the Sqoop tool which first connects to the database through JDBC driver to collect some table information before running the MR job. libjars will only add the connector jar for the MR job and not the local one. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8595) Slow IO when stdout directed to NAS with large blocksize
Kevin English created HIVE-8595: --- Summary: Slow IO when stdout directed to NAS with large blocksize Key: HIVE-8595 URL: https://issues.apache.org/jira/browse/HIVE-8595 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.13.1 Environment: nfs4 rsize=1048576,wsize=1048576 Reporter: Kevin English Priority: Minor Very slow IO when executing a SQL command file using the following command line when the target file system is an nfs4 mounted NAS with a large blocksize: hive -f sqlscript.sql 2results.log results.tab Work around (thousands of times faster): hive -f sqlscript.sql 2results.log | cat results.tab For instance I had a command finish 10 hours ago and I forgot to use cat and it is still writing out the output which after 10 hours is in the 180 GB range. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8585) Constant folding should happen before ppd
[ https://issues.apache.org/jira/browse/HIVE-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8585: --- Status: Open (was: Patch Available) Constant folding should happen before ppd - Key: HIVE-8585 URL: https://issues.apache.org/jira/browse/HIVE-8585 Project: Hive Issue Type: Improvement Components: Logical Optimizer Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8585.1.patch, HIVE-8585.patch, HIVE-8585.patch will help {{NullScanOptimizer}} to kick in more places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8585) Constant folding should happen before ppd
[ https://issues.apache.org/jira/browse/HIVE-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8585: --- Status: Patch Available (was: Open) Constant folding should happen before ppd - Key: HIVE-8585 URL: https://issues.apache.org/jira/browse/HIVE-8585 Project: Hive Issue Type: Improvement Components: Logical Optimizer Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8585.1.patch, HIVE-8585.patch, HIVE-8585.patch will help {{NullScanOptimizer}} to kick in more places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8585) Constant folding should happen before ppd
[ https://issues.apache.org/jira/browse/HIVE-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8585: --- Attachment: HIVE-8585.1.patch Constant folding should happen before ppd - Key: HIVE-8585 URL: https://issues.apache.org/jira/browse/HIVE-8585 Project: Hive Issue Type: Improvement Components: Logical Optimizer Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8585.1.patch, HIVE-8585.patch, HIVE-8585.patch will help {{NullScanOptimizer}} to kick in more places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8583) HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist
[ https://issues.apache.org/jira/browse/HIVE-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183002#comment-14183002 ] Alan Gates commented on HIVE-8583: -- Yes, Lars is correct. That is just a piece of earlier code that I neglected to take out. HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist --- Key: HIVE-8583 URL: https://issues.apache.org/jira/browse/HIVE-8583 Project: Hive Issue Type: Improvement Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Attachments: HIVE-8583.1.patch [~alangates] added the following in HIVE-8341: {code} String bl = hconf.get(HiveConf.ConfVars.HIVESCRIPT_ENV_BLACKLIST.toString()); if (bl != null bl.length() 0) { String[] bls = bl.split(,); for (String b : bls) { b.replaceAll(., _); blackListedConfEntries.add(b); } } {code} The {{replaceAll}} call is confusing as its result is not used at all. This patch contains the following: * Minor style modification (missorted modifiers) * Adds reading of default value for HIVESCRIPT_ENV_BLACKLIST * Removes replaceAll * Lets blackListed take a Configuration job as parameter which allowed me to add a test for this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 27117: HIVE-8457 - MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27117/ --- (Updated Oct. 24, 2014, 4:51 p.m.) Review request for hive and Xuefu Zhang. Changes --- Thanks Xuefu for the comments. I've updated my patch. Bugs: HIVE-8457 https://issues.apache.org/jira/browse/HIVE-8457 Repository: hive-git Description --- Currently, on the Spark branch, each thread it is bound with a thread-local IOContext, which gets initialized when we generates an input HadoopRDD, and later used in MapOperator, FilterOperator, etc. And, given the introduction of HIVE-8118, we may have multiple downstream RDDs that share the same input HadoopRDD, and we would like to have the HadoopRDD to be cached, to avoid scanning the same table multiple times. A typical case would be like the following: inputRDD inputRDD || MT_11MT_12 || RT_1 RT_2 Here, MT_11 and MT_12 are MapTran from a splitted MapWork, and RT_1 and RT_2 are two ReduceTran. Note that, this example is simplified, as we may also have ShuffleTran between MapTran and ReduceTran. When multiple Spark threads are running, MT_11 may be executed first, and it will ask for an iterator from the HadoopRDD will trigger the creation of the iterator, which in turn triggers the initialization of the IOContext associated with that particular thread. Now, the problem is: before MT_12 starts executing, it will also ask for an iterator from the HadoopRDD, and since the RDD is already cached, instead of creating a new iterator, it will just fetch it from the cached result. However, this will skip the initialization of the IOContext associated with this particular thread. And, when MT_12 starts executing, it will try to initialize the MapOperator, but since the IOContext is not initialized, this will fail miserably. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 20ea977 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 00a6f3d ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 4de3ad4 ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 58e1ceb ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 5fb3b13 Diff: https://reviews.apache.org/r/27117/diff/ Testing --- All multi-insertion related tests are passing on my local machine. Thanks, Chao Sun
[jira] [Updated] (HIVE-8457) MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8457: --- Attachment: HIVE-8457.2-spark.patch Addressing RB comments. MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch] -- Key: HIVE-8457 URL: https://issues.apache.org/jira/browse/HIVE-8457 Project: Hive Issue Type: Bug Components: Spark Reporter: Chao Assignee: Chao Attachments: HIVE-8457.1-spark.patch, HIVE-8457.2-spark.patch Currently, on the Spark branch, each thread it is bound with a thread-local IOContext, which gets initialized when we generates an input {{HadoopRDD}}, and later used in {{MapOperator}}, {{FilterOperator}}, etc. And, given the introduction of HIVE-8118, we may have multiple downstream RDDs that share the same input {{HadoopRDD}}, and we would like to have the {{HadoopRDD}} to be cached, to avoid scanning the same table multiple times. A typical case would be like the following: {noformat} inputRDD inputRDD || MT_11MT_12 || RT_1 RT_2 {noformat} Here, {{MT_11}} and {{MT_12}} are {{MapTran}} from a splitted {{MapWork}}, and {{RT_1}} and {{RT_2}} are two {{ReduceTran}}. Note that, this example is simplified, as we may also have {{ShuffleTran}} between {{MapTran}} and {{ReduceTran}}. When multiple Spark threads are running, {{MT_11}} may be executed first, and it will ask for an iterator from the {{HadoopRDD}} will trigger the creation of the iterator, which in turn triggers the initialization of the {{IOContext}} associated with that particular thread. *Now, the problem is*: before {{MT_12}} starts executing, it will also ask for an iterator from the {{HadoopRDD}}, and since the RDD is already cached, instead of creating a new iterator, it will just fetch it from the cached result. However, *this will skip the initialization of the IOContext associated with this particular thread*. And, when {{MT_12}} starts executing, it will try to initialize the {{MapOperator}}, but since the {{IOContext}} is not initialized, this will fail miserably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8573) Fix some non-deterministic vectorization tests
[ https://issues.apache.org/jira/browse/HIVE-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-8573: -- Status: Patch Available (was: Open) Fix some non-deterministic vectorization tests -- Key: HIVE-8573 URL: https://issues.apache.org/jira/browse/HIVE-8573 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.15.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: HIVE-8573.1.patch, HIVE-8573.2.patch I found the following vectorization tests are not deterministic: vectorization_16.q vectorization_short_regress.q vector_distinct_2.q vector_groupby_3.q vector_mapjoin_reduce.q -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8573) Fix some non-deterministic vectorization tests
[ https://issues.apache.org/jira/browse/HIVE-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-8573: -- Status: Open (was: Patch Available) Fix some non-deterministic vectorization tests -- Key: HIVE-8573 URL: https://issues.apache.org/jira/browse/HIVE-8573 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.15.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: HIVE-8573.1.patch, HIVE-8573.2.patch I found the following vectorization tests are not deterministic: vectorization_16.q vectorization_short_regress.q vector_distinct_2.q vector_groupby_3.q vector_mapjoin_reduce.q -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25550: HIVE-8021 CBO: support CTAS and insert ... select
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25550/#review58290 --- ql/src/test/queries/clientpositive/ctas_colname.q https://reviews.apache.org/r/25550/#comment99237 Why this change - John Pullokkaran On Oct. 23, 2014, 9:11 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25550/ --- (Updated Oct. 23, 2014, 9:11 p.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- see JIRA Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java dee7d7e ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 37cbf7f ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d8c50e3 ql/src/test/queries/clientpositive/cbo_correctness.q 4d8f156 ql/src/test/queries/clientpositive/ctas_colname.q 5322626 ql/src/test/queries/clientpositive/decimal_serde.q cf3a86c ql/src/test/queries/clientpositive/insert0.q PRE-CREATION ql/src/test/results/clientpositive/ctas_colname.q.out 97dacf6 ql/src/test/results/clientpositive/decimal_serde.q.out e461c2e ql/src/test/results/clientpositive/insert0.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25550/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183060#comment-14183060 ] Marcelo Vanzin commented on HIVE-8528: -- Actually, Left, that's a good point, this might need some end-user documentation since the recommended setup is to have a full Spark installation available on the HS2 node. I don't know if the plan is to somehow package that with HS2 or leave it as a configuration step. Add remote Spark client to Hive [Spark Branch] -- Key: HIVE-8528 URL: https://issues.apache.org/jira/browse/HIVE-8528 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Fix For: spark-branch Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch For the time being, at least, we've decided to build the Spark client (see SPARK-3215) inside Hive. This task tracks merging the ongoing work into the Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8444) update pom to junit 4.11
[ https://issues.apache.org/jira/browse/HIVE-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183067#comment-14183067 ] Jason Dere commented on HIVE-8444: -- [~brocknoland] any issue with bumping up to Junit 4.11? update pom to junit 4.11 Key: HIVE-8444 URL: https://issues.apache.org/jira/browse/HIVE-8444 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-8444.1.patch, HIVE-8444.2.patch allows deterministic ordering of tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8532) return code of source xxx clause is missing
[ https://issues.apache.org/jira/browse/HIVE-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183072#comment-14183072 ] vitthal (Suhas) Gogate commented on HIVE-8532: -- Thanks [~navis]! return code of source xxx clause is missing - Key: HIVE-8532 URL: https://issues.apache.org/jira/browse/HIVE-8532 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.12.0, 0.13.1 Reporter: Gordon Wang Fix For: 0.15.0 Attachments: HIVE-8532.patch When executing source hql-file clause, hive client driver does not catch the return code of this command. This behaviour causes an issue when running hive query in Oozie workflow. When the source clause is put into a Oozie workflow, Oozie can not get the return code of this command. Thus, Oozie consider the source clause as successful all the time. So, when the source clause fails, the hive query does not abort and the oozie workflow does not abort either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183073#comment-14183073 ] Xuefu Zhang commented on HIVE-8528: --- [~vanzin], I thought spark installation on HS2 host was optional. Let me know if this has changed. Add remote Spark client to Hive [Spark Branch] -- Key: HIVE-8528 URL: https://issues.apache.org/jira/browse/HIVE-8528 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Fix For: spark-branch Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch For the time being, at least, we've decided to build the Spark client (see SPARK-3215) inside Hive. This task tracks merging the ongoing work into the Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6165) Unify HivePreparedStatement from jdbc:hive and jdbc:hive2
[ https://issues.apache.org/jira/browse/HIVE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6165: -- Attachment: HIVE-6165.2.patch.txt Reload the same patch to re-run test. Unify HivePreparedStatement from jdbc:hive and jdbc:hive2 - Key: HIVE-6165 URL: https://issues.apache.org/jira/browse/HIVE-6165 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Reporter: Helmut Zechmann Priority: Minor Attachments: HIVE-6165.1.patch.txt, HIVE-6165.1.patch.txt, HIVE-6165.2.patch, HIVE-6165.2.patch.txt, HIVE-6165.2.patch.txt org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 jdbc drivers contain lots of duplicate code. Especially hive-HivePreparedStatement supports setObject, while the hive2 version does not. Share more code between the two to avoid duplicate work and to make sure that both support the broadest possible feature set. CLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183079#comment-14183079 ] Marcelo Vanzin commented on HIVE-8528: -- It is optional, but I don't really think we should encourage that. A full install should be the recommended setup. Add remote Spark client to Hive [Spark Branch] -- Key: HIVE-8528 URL: https://issues.apache.org/jira/browse/HIVE-8528 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Fix For: spark-branch Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch For the time being, at least, we've decided to build the Spark client (see SPARK-3215) inside Hive. This task tracks merging the ongoing work into the Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183081#comment-14183081 ] Xuefu Zhang commented on HIVE-8528: --- Got it. Thanks for the clarification. Add remote Spark client to Hive [Spark Branch] -- Key: HIVE-8528 URL: https://issues.apache.org/jira/browse/HIVE-8528 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Fix For: spark-branch Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch For the time being, at least, we've decided to build the Spark client (see SPARK-3215) inside Hive. This task tracks merging the ongoing work into the Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26854: HIVE-2573 Create per-session function registry
On Oct. 23, 2014, 9:50 p.m., Jason Dere wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java, line 465 https://reviews.apache.org/r/26854/diff/1-3/?file=723909#file723909line465 There is no longer a way to query the metastore for UDFs apart from the static initialization. So if one CLI user creates a permanent UDF, another user on CLI, or HS2, will not be able to use that new UDF if the 2nd CLI or HS2 was initialized before this UDF was created. Navis Ryu wrote: Permanent functions (persistent function seemed better name, imho) are registered to system registry, which is shared to all clients. So if one user creates new permanent function, it's shared to all clients. The time a user accesses the function, the class is loaded with required resources and registered to session registry as a temporary function. So this would work if all clients are using hiveserver2, because all clients in this scenario would share the same system registry. But if one or more clients are using the Hive CLI, any persistent UDFs created/dropped by this CLI client would not be reflected in the other clients (or HS2), since it's a different process/system registry. - Jason --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26854/#review57952 --- On Oct. 23, 2014, 12:20 a.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26854/ --- (Updated Oct. 23, 2014, 12:20 a.m.) Review request for hive, Navis Ryu and Thejas Nair. Bugs: HIVE-2573 https://issues.apache.org/jira/browse/HIVE-2573 Repository: hive-git Description --- Small updates to Navis' changes: - session registry doesn't lookup metastore for UDFs - my feedback from Navis' original patch - metastore udfs should not be considered native. This allows them to be added/removed from registry Diffs - ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 9ac540e ql/src/java/org/apache/hadoop/hive/ql/exec/CommonFunctionInfo.java 93c15c0 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java 074255b ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 08e1136 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java 569c125 ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/WindowFunctionInfo.java efecb05 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b900627 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java 31f906a ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java e43d39f ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java 22e5b47 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java af633cb ql/src/test/org/apache/hadoop/hive/ql/parse/TestMacroSemanticAnalyzer.java 46f8052 ql/src/test/queries/clientnegative/drop_native_udf.q ae047bb ql/src/test/results/clientnegative/create_function_nonexistent_class.q.out c7405ed ql/src/test/results/clientnegative/create_function_nonudf_class.q.out d0dd50a ql/src/test/results/clientnegative/drop_native_udf.q.out 9f0eaa5 service/src/test/org/apache/hadoop/hive/service/TestHiveServerSessions.java fd38907 Diff: https://reviews.apache.org/r/26854/diff/ Testing --- Thanks, Jason Dere
[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8528: -- Labels: TODOC-SPARK (was: ) Add remote Spark client to Hive [Spark Branch] -- Key: HIVE-8528 URL: https://issues.apache.org/jira/browse/HIVE-8528 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Labels: TODOC-SPARK Fix For: spark-branch Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch For the time being, at least, we've decided to build the Spark client (see SPARK-3215) inside Hive. This task tracks merging the ongoing work into the Spark branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26854: HIVE-2573 Create per-session function registry
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26854/ --- (Updated Oct. 24, 2014, 5:34 p.m.) Review request for hive, Navis Ryu and Thejas Nair. Changes --- Updating with HIVE-2573.10.patch.txt from Navis Bugs: HIVE-2573 https://issues.apache.org/jira/browse/HIVE-2573 Repository: hive-git Description --- Small updates to Navis' changes: - session registry doesn't lookup metastore for UDFs - my feedback from Navis' original patch - metastore udfs should not be considered native. This allows them to be added/removed from registry Diffs (updated) - common/src/java/org/apache/hadoop/hive/common/JavaUtils.java 9aa917c metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 88b0791 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 9ac540e ql/src/java/org/apache/hadoop/hive/ql/exec/CommonFunctionInfo.java 93c15c0 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java 074255b ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java e43a328 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java 569c125 ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7443f8a ql/src/java/org/apache/hadoop/hive/ql/exec/WindowFunctionInfo.java efecb05 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b900627 ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 13277a9 ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 211ab6c ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java e2768ff ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java 793f117 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 1796b7b ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java 22e5b47 ql/src/java/org/apache/hadoop/hive/ql/parse/IndexUpdater.java 2b239ab ql/src/java/org/apache/hadoop/hive/ql/session/SessionConf.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java af633cb ql/src/test/org/apache/hadoop/hive/ql/parse/TestMacroSemanticAnalyzer.java 46f8052 ql/src/test/queries/clientnegative/drop_native_udf.q ae047bb ql/src/test/results/clientnegative/create_function_nonexistent_class.q.out c7405ed ql/src/test/results/clientnegative/create_function_nonudf_class.q.out d0dd50a ql/src/test/results/clientnegative/drop_native_udf.q.out 9f0eaa5 service/src/test/org/apache/hadoop/hive/service/TestHiveServerSessions.java fd38907 Diff: https://reviews.apache.org/r/26854/diff/ Testing --- Thanks, Jason Dere
[jira] [Commented] (HIVE-8486) TPC-DS Query 96 parallelism is not set correcly
[ https://issues.apache.org/jira/browse/HIVE-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183107#comment-14183107 ] Xuefu Zhang commented on HIVE-8486: --- Since HIVE-8496 is resolved, parallelism on shuffle is no longer a problem. [~csun], please create a separate JIRA to track the spill issue you described. I closing this ticket shortly. TPC-DS Query 96 parallelism is not set correcly --- Key: HIVE-8486 URL: https://issues.apache.org/jira/browse/HIVE-8486 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chao When we run the query on a 20B we only have a parallelism factor of 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8486) TPC-DS Query 96 parallelism is not set correcly
[ https://issues.apache.org/jira/browse/HIVE-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-8486. --- Resolution: Fixed Fix Version/s: spark-branch Fixed via HIVE-8496. TPC-DS Query 96 parallelism is not set correcly --- Key: HIVE-8486 URL: https://issues.apache.org/jira/browse/HIVE-8486 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chao Fix For: spark-branch When we run the query on a 20B we only have a parallelism factor of 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7731) Incorrect result returned when a map work has multiple downstream reduce works [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-7731. --- Resolution: Fixed Fix Version/s: spark-branch Fixed via HIVE-8118. Incorrect result returned when a map work has multiple downstream reduce works [Spark Branch] - Key: HIVE-7731 URL: https://issues.apache.org/jira/browse/HIVE-7731 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Chao Fix For: spark-branch Encountered when running on spark. Suppose we have three tables: {noformat} table1(x int, y int); table2(x int); table3(x int); {noformat} I run the following query: {noformat} from table1 insert overwrite table table2 select x group by x insert overwrite table table3 select y group by y; {noformat} The query generates 1 map and 2 reduces. The map operator has 2 RS, so I suppose it has output for both reduces. The problem is all (incorrect) results go to table2 and table3 is empty. I tried the same query on MR and it gives correct results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8208) Multi-table insertion optimization #1: don't always break operator tree. [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-8208. --- Resolution: Won't Fix Fix Version/s: spark-branch With HIVE-8118, this is no longer needed. Multi-table insertion optimization #1: don't always break operator tree. [Spark Branch] --- Key: HIVE-8208 URL: https://issues.apache.org/jira/browse/HIVE-8208 Project: Hive Issue Type: Improvement Reporter: Chao Fix For: spark-branch Currently, with the current patch of multi-table insertion, it will break whenever there exists one TableScanOperator that can leads to multiple FileSinkOperators. Then, it identifies the lowest common ancestor (LCA), and breaks the tree there, creating same number of child SparkTasks as the number of FileSinkOperators. However, in the following situation it's better not to break the operator tree: Of all the paths from these FileSinkOperators to the LCA, if ReduceSinkOperator only exist in 0 or 1 path of them. In this case, we can do it in one spark job, and no need to break the operator tree. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8215) Multi-table insertion optimization #3: use 1+1 tasks instead of 1+N tasks [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-8215. --- Resolution: Won't Fix With HIVE-8118, this is no longer needed. Multi-table insertion optimization #3: use 1+1 tasks instead of 1+N tasks [Spark Branch] Key: HIVE-8215 URL: https://issues.apache.org/jira/browse/HIVE-8215 Project: Hive Issue Type: Improvement Components: Spark Reporter: Chao Currently, for multi-table insertion it generates 1+N tasks - 1 is the task that generates input, and N are the insert queries that read from the input and write to separate output tables. In order to make these N tasks run in parallel, we rely on {{hive.exec.parallel}} to be set to {{true}}. In this patch, we propose an alternative approach, which is to combine these N tasks into one single task, which contains N separate operator trees, which in execution leads to N result RDDs. We then may be able to execute these N RDDs in parallel inside Spark, without needing {{hive.exec.parallel}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8209) Multi-table insertion optimization #2: use separate context [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-8209. --- Resolution: Won't Fix With HIVE-8118, this is no longer needed. Multi-table insertion optimization #2: use separate context [Spark Branch] -- Key: HIVE-8209 URL: https://issues.apache.org/jira/browse/HIVE-8209 Project: Hive Issue Type: Improvement Components: Spark Reporter: Chao Priority: Minor Currently, the multi-table insertion patch uses {{GenSparkProcContext}} and added some states of its own. It's better to use a separate context only for the purpose of handling multi-table insertion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7387) Guava version conflict between hadoop and spark [Spark-Branch]
[ https://issues.apache.org/jira/browse/HIVE-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183150#comment-14183150 ] Xuefu Zhang commented on HIVE-7387: --- With SPARK-2848, shading guava in Spark, this is no longer a problem in Hive. Guava version conflict between hadoop and spark [Spark-Branch] -- Key: HIVE-7387 URL: https://issues.apache.org/jira/browse/HIVE-7387 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7387-spark.patch The guava conflict happens in hive driver compile stage, as in the follow exception stacktrace, conflict happens while initiate spark RDD in SparkClient, hive driver take both guava 11 from hadoop classpath and spark assembly jar which contains guava 14 classes in its classpath, spark invoked HashFunction.hasInt which method does not exists in guava 11 version, obvious the guava 11 version HashFunction is loaded into the JVM, which lead to a NoSuchMethodError during initiate spark RDD. {code} java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; at org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75) at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661) at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812) at org.apache.spark.broadcast.HttpBroadcast.init(HttpBroadcast.scala:52) at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:35) at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776) at org.apache.spark.rdd.HadoopRDD.init(HadoopRDD.scala:112) at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:527) at org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:307) at org.apache.hadoop.hive.ql.exec.spark.SparkClient.createRDD(SparkClient.java:204) at org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:167) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:32) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:159) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72) {code} NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7387) Guava version conflict between hadoop and spark [Spark-Branch]
[ https://issues.apache.org/jira/browse/HIVE-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-7387. --- Resolution: Not a Problem Guava version conflict between hadoop and spark [Spark-Branch] -- Key: HIVE-7387 URL: https://issues.apache.org/jira/browse/HIVE-7387 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7387-spark.patch The guava conflict happens in hive driver compile stage, as in the follow exception stacktrace, conflict happens while initiate spark RDD in SparkClient, hive driver take both guava 11 from hadoop classpath and spark assembly jar which contains guava 14 classes in its classpath, spark invoked HashFunction.hasInt which method does not exists in guava 11 version, obvious the guava 11 version HashFunction is loaded into the JVM, which lead to a NoSuchMethodError during initiate spark RDD. {code} java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; at org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75) at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661) at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812) at org.apache.spark.broadcast.HttpBroadcast.init(HttpBroadcast.scala:52) at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:35) at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776) at org.apache.spark.rdd.HadoopRDD.init(HadoopRDD.scala:112) at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:527) at org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:307) at org.apache.hadoop.hive.ql.exec.spark.SparkClient.createRDD(SparkClient.java:204) at org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:167) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:32) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:159) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72) {code} NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8561) Expose Hive optiq operator tree to be able to support other sql on hadoop query engines
[ https://issues.apache.org/jira/browse/HIVE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183151#comment-14183151 ] Laljo John Pullokkaran commented on HIVE-8561: -- Na Yang, If i understand correctly, goal of this patch is to use Hive for query parsing, resolving, cost based optimization and use Drill as the execution engine. If my guess is right this patch makes Hive's Optiq Op tree a public interface. The Hive's Optiq Op tree is not meant to be a public interface and it would go through many changes as we add more to CBO support for more operators. Why can't Drill be plugged in as another execution engine just like MR, TEZ, Spark? Expose Hive optiq operator tree to be able to support other sql on hadoop query engines --- Key: HIVE-8561 URL: https://issues.apache.org/jira/browse/HIVE-8561 Project: Hive Issue Type: Task Components: CBO Affects Versions: 0.14.0 Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-8561.patch Hive-0.14 added cost based optimization and optiq operator tree is created for select queries. However, the optiq operator tree is not visible from outside and hard to be used by other Sql on Hadoop query engine such as apache Drill. To be able to allow drill to access the hive optiq operator tree, we need to add a public api to return the hive optiq operator tree. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8533) Enable all q-tests for multi-insertion [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8533: --- Attachment: HIVE-8533.1-spark.patch Spark tests enabled in this patch: {noformat} auto_smb_mapjoin_14.q groupby10.q groupby11.q groupby3_map_skew.q groupby7.q groupby7_noskew_multi_single_reducer.q groupby8.q groupby8_map.q groupby8_map_skew.q groupby8_noskew.q groupby9.q groupby_complex_types.q groupby_complex_types_multi_single_reducer.q groupby_multi_insert_common_distinct.q pcr.q smb_mapjoin_13.q smb_mapjoin_15.q smb_mapjoin_16.q table_access_keys_stats.q {noformat} The result for {{groupby_complex_types_multi_single_reducer.q}} is different from MR's, but this is because it uses {{limit 10}}. The result for {{groupby3_map_skew.q}} also is slightly different: {noformat} 130091.0 260.182 256.10355987055016 98.00.0 142.92680950752379 143.06995106518903 20428.07288 20469.0109 --- 130091.0 260.182 256.10355987055016 98.00.0 142.9268095075238 143.06995106518906 20428.07288 20469.0109 {noformat} I think this is just something about decimal precision, not a correctness issue. Enable all q-tests for multi-insertion [Spark Branch] - Key: HIVE-8533 URL: https://issues.apache.org/jira/browse/HIVE-8533 Project: Hive Issue Type: Test Components: Spark Reporter: Chao Assignee: Chao Attachments: HIVE-8533.1-spark.patch As HIVE-8436 is done, we should be able to enable all multi-insertion related tests. This JIRA is created to track this and record any potential issue encountered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7916) Snappy-java error when running hive query on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-7916. --- Resolution: Not a Problem With latest Spark-Hive integration, the problem seems disappeared. Snappy-java error when running hive query on spark [Spark Branch] - Key: HIVE-7916 URL: https://issues.apache.org/jira/browse/HIVE-7916 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Labels: Spark-M1 Recently spark branch upgraded its dependency on Spark to 1.1.0-SNAPSHOT. While the new version addressed some lib conflicts (such as guava), I'm afraid that it also introduced new problems. The following might be one, when I set the master URL to be a spark standalone cluster: {code} hive set hive.execution.engine=spark; hive set spark.serializer=org.apache.spark.serializer.KryoSerializer; hive set spark.master=spark://xzdt:7077; hive select name, avg(value) from dec group by name; 14/08/28 16:41:52 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 333.0 KB, free 128.0 MB) java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317) at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219) at org.xerial.snappy.Snappy.clinit(Snappy.java:44) at org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:124) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.rdd.HadoopRDD.init(HadoopRDD.scala:116) at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:541) at org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:318) at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateRDD(SparkPlanGenerator.java:160) at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:88) at org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:156) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:52) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:77) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1537) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1304) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1116) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:940) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:930) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860) at java.lang.Runtime.loadLibrary0(Runtime.java:845) at java.lang.System.loadLibrary(System.java:1084) at
[jira] [Updated] (HIVE-8533) Enable all q-tests for multi-insertion [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8533: --- Status: Patch Available (was: Open) Enable all q-tests for multi-insertion [Spark Branch] - Key: HIVE-8533 URL: https://issues.apache.org/jira/browse/HIVE-8533 Project: Hive Issue Type: Test Components: Spark Reporter: Chao Assignee: Chao Attachments: HIVE-8533.1-spark.patch As HIVE-8436 is done, we should be able to enable all multi-insertion related tests. This JIRA is created to track this and record any potential issue encountered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8426) paralle.q assert failed.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-8426. --- Resolution: Fixed Fix Version/s: spark-branch Fixed via HIVE-8362. paralle.q assert failed.[Spark Branch] -- Key: HIVE-8426 URL: https://issues.apache.org/jira/browse/HIVE-8426 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch parallel.q failed to assert output in qtests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 27148: HIVE-8533 - Enable all q-tests for multi-insertion [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27148/ --- Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-8533 https://issues.apache.org/jira/browse/HIVE-8533 Repository: hive-git Description --- As HIVE-8436 is done, we should be able to enable all multi-insertion related tests. This JIRA is created to track this and record any potential issue encountered. Diffs - itests/src/test/resources/testconfiguration.properties db8866d ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby10.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby11.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby3_map_skew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby8.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby8_map.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby8_noskew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby9.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_complex_types.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_multi_insert_common_distinct.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/pcr.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/smb_mapjoin_13.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/smb_mapjoin_15.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/smb_mapjoin_16.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/table_access_keys_stats.q.out PRE-CREATION Diff: https://reviews.apache.org/r/27148/diff/ Testing --- auto_smb_mapjoin_14.q groupby10.q groupby11.q groupby3_map_skew.q groupby7.q groupby7_noskew_multi_single_reducer.q groupby8.q groupby8_map.q groupby8_map_skew.q groupby8_noskew.q groupby9.q groupby_complex_types.q groupby_complex_types_multi_single_reducer.q groupby_multi_insert_common_distinct.q pcr.q smb_mapjoin_13.q smb_mapjoin_15.q smb_mapjoin_16.q table_access_keys_stats.q Thanks, Chao Sun
Re: Review Request 27148: HIVE-8533 - Enable all q-tests for multi-insertion [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27148/ --- (Updated Oct. 24, 2014, 6:03 p.m.) Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-8533 https://issues.apache.org/jira/browse/HIVE-8533 Repository: hive-git Description --- As HIVE-8436 is done, we should be able to enable all multi-insertion related tests. This JIRA is created to track this and record any potential issue encountered. Diffs - itests/src/test/resources/testconfiguration.properties db8866d ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby10.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby11.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby3_map_skew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby8.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby8_map.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby8_noskew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby9.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_complex_types.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_multi_insert_common_distinct.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/pcr.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/smb_mapjoin_13.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/smb_mapjoin_15.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/smb_mapjoin_16.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/table_access_keys_stats.q.out PRE-CREATION Diff: https://reviews.apache.org/r/27148/diff/ Testing --- auto_smb_mapjoin_14.q groupby10.q groupby11.q groupby3_map_skew.q groupby7.q groupby7_noskew_multi_single_reducer.q groupby8.q groupby8_map.q groupby8_map_skew.q groupby8_noskew.q groupby9.q groupby_complex_types.q groupby_complex_types_multi_single_reducer.q groupby_multi_insert_common_distinct.q pcr.q smb_mapjoin_13.q smb_mapjoin_15.q smb_mapjoin_16.q table_access_keys_stats.q Thanks, Chao Sun
[jira] [Resolved] (HIVE-8220) Refactor multi-insert code such that plan splitting and task generation are modular and reusable [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-8220. --- Resolution: Won't Fix No needed with HIVE-8118. Refactor multi-insert code such that plan splitting and task generation are modular and reusable [Spark Branch] --- Key: HIVE-8220 URL: https://issues.apache.org/jira/browse/HIVE-8220 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Labels: Spark-M1 This is a followup for HIVE-7053. Currently the code to split the operator tree and to generate tasks is mingled and thus hard to understand and maintain. Logically the two seems independent. This can be improved by modulizing both. The following might be helpful: {code} @Override protected void generateTaskTree(ListTask? extends Serializable rootTasks, ParseContext pCtx, ListTaskMoveWork mvTask, SetReadEntity inputs, SetWriteEntity outputs) throws SemanticException { // 1. Identify if the plan is for multi-insert and split the plan if necessary ListSetOperator operatorSets = multiInsertSplit(...); // 2. For each operator set, generate a task. for (SetOperator topOps : operatorSets) { SparkTask task = generateTask(topOps); ... } // 3. wire up the tasks ... } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error
Vaibhav Gumashta created HIVE-8596: -- Summary: HiveServer2 dynamic service discovery: ZK throws too many connections error Key: HIVE-8596 URL: https://issues.apache.org/jira/browse/HIVE-8596 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Priority: Critical {noformat} 2014-10-23 07:55:44,221 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /172.31.47.11 - max is 60 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8457) MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183172#comment-14183172 ] Hive QA commented on HIVE-8457: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12676940/HIVE-8457.2-spark.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6809 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_tez_smb_1 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/260/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/260/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-260/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12676940 - PreCommit-HIVE-SPARK-Build MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch] -- Key: HIVE-8457 URL: https://issues.apache.org/jira/browse/HIVE-8457 Project: Hive Issue Type: Bug Components: Spark Reporter: Chao Assignee: Chao Attachments: HIVE-8457.1-spark.patch, HIVE-8457.2-spark.patch Currently, on the Spark branch, each thread it is bound with a thread-local IOContext, which gets initialized when we generates an input {{HadoopRDD}}, and later used in {{MapOperator}}, {{FilterOperator}}, etc. And, given the introduction of HIVE-8118, we may have multiple downstream RDDs that share the same input {{HadoopRDD}}, and we would like to have the {{HadoopRDD}} to be cached, to avoid scanning the same table multiple times. A typical case would be like the following: {noformat} inputRDD inputRDD || MT_11MT_12 || RT_1 RT_2 {noformat} Here, {{MT_11}} and {{MT_12}} are {{MapTran}} from a splitted {{MapWork}}, and {{RT_1}} and {{RT_2}} are two {{ReduceTran}}. Note that, this example is simplified, as we may also have {{ShuffleTran}} between {{MapTran}} and {{ReduceTran}}. When multiple Spark threads are running, {{MT_11}} may be executed first, and it will ask for an iterator from the {{HadoopRDD}} will trigger the creation of the iterator, which in turn triggers the initialization of the {{IOContext}} associated with that particular thread. *Now, the problem is*: before {{MT_12}} starts executing, it will also ask for an iterator from the {{HadoopRDD}}, and since the RDD is already cached, instead of creating a new iterator, it will just fetch it from the cached result. However, *this will skip the initialization of the IOContext associated with this particular thread*. And, when {{MT_12}} starts executing, it will try to initialize the {{MapOperator}}, but since the {{IOContext}} is not initialized, this will fail miserably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error
[ https://issues.apache.org/jira/browse/HIVE-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183169#comment-14183169 ] Vaibhav Gumashta commented on HIVE-8596: [~vikram.dixit] This will be an issue with concurrent use. I feel this should be resolved in 14. cc [~thejas] HiveServer2 dynamic service discovery: ZK throws too many connections error --- Key: HIVE-8596 URL: https://issues.apache.org/jira/browse/HIVE-8596 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Priority: Critical {noformat} 2014-10-23 07:55:44,221 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /172.31.47.11 - max is 60 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8583) HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist
[ https://issues.apache.org/jira/browse/HIVE-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183171#comment-14183171 ] Alan Gates commented on HIVE-8583: -- +1, patch looks fine. The statement missorted modifiers implies there is a correct order. If the compiler doesn't care about final static private versus private static final why should we? HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist --- Key: HIVE-8583 URL: https://issues.apache.org/jira/browse/HIVE-8583 Project: Hive Issue Type: Improvement Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Attachments: HIVE-8583.1.patch [~alangates] added the following in HIVE-8341: {code} String bl = hconf.get(HiveConf.ConfVars.HIVESCRIPT_ENV_BLACKLIST.toString()); if (bl != null bl.length() 0) { String[] bls = bl.split(,); for (String b : bls) { b.replaceAll(., _); blackListedConfEntries.add(b); } } {code} The {{replaceAll}} call is confusing as its result is not used at all. This patch contains the following: * Minor style modification (missorted modifiers) * Adds reading of default value for HIVESCRIPT_ENV_BLACKLIST * Removes replaceAll * Lets blackListed take a Configuration job as parameter which allowed me to add a test for this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
[ https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-8588: - Attachment: HIVE-8588.2.patch sqoop REST endpoint fails to send appropriate JDBC driver to the cluster Key: HIVE-8588 URL: https://issues.apache.org/jira/browse/HIVE-8588 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Priority: Critical Attachments: HIVE-8588.1.patch, HIVE-8588.2.patch This is originally discovered by [~deepesh] When running a Sqoop integration test from WebHCat {noformat} curl --show-error -d command=export -libjars hdfs:///tmp/mysql-connector-java.jar --connect jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person -d statusdir=sqoop.output -X POST http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa; {noformat} the job is failing with the following error: {noformat} $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) {noformat} Note that the Sqoop tar bundle does not contain the JDBC connector jar. I think the problem here maybe that the mysql connector jar added to libjars isn't available to the Sqoop tool which first connects to the database through JDBC driver to collect some table information before running the MR job. libjars will only add the connector jar for the MR job and not the local one. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error
[ https://issues.apache.org/jira/browse/HIVE-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183197#comment-14183197 ] Vikram Dixit K commented on HIVE-8596: -- Ack for 0.14. HiveServer2 dynamic service discovery: ZK throws too many connections error --- Key: HIVE-8596 URL: https://issues.apache.org/jira/browse/HIVE-8596 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Priority: Critical {noformat} 2014-10-23 07:55:44,221 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /172.31.47.11 - max is 60 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6165) Unify HivePreparedStatement from jdbc:hive and jdbc:hive2
[ https://issues.apache.org/jira/browse/HIVE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6165: - Description: org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 jdbc drivers contain lots of duplicate code. Especially hive-HivePreparedStatement supports setObject, while the hive2 version does not. Share more code between the two to avoid duplicate work and to make sure that both support the broadest possible feature set. was: org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 jdbc drivers contain lots of duplicate code. Especially hive-HivePreparedStatement supports setObject, while the hive2 version does not. Share more code between the two to avoid duplicate work and to make sure that both support the broadest possible feature set. CLEAR LIBRARY CACHE Unify HivePreparedStatement from jdbc:hive and jdbc:hive2 - Key: HIVE-6165 URL: https://issues.apache.org/jira/browse/HIVE-6165 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Reporter: Helmut Zechmann Priority: Minor Attachments: HIVE-6165.1.patch.txt, HIVE-6165.1.patch.txt, HIVE-6165.2.patch, HIVE-6165.2.patch.txt, HIVE-6165.2.patch.txt org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 jdbc drivers contain lots of duplicate code. Especially hive-HivePreparedStatement supports setObject, while the hive2 version does not. Share more code between the two to avoid duplicate work and to make sure that both support the broadest possible feature set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error
[ https://issues.apache.org/jira/browse/HIVE-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8596: --- Fix Version/s: 0.14.0 HiveServer2 dynamic service discovery: ZK throws too many connections error --- Key: HIVE-8596 URL: https://issues.apache.org/jira/browse/HIVE-8596 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Priority: Critical Fix For: 0.14.0 {noformat} 2014-10-23 07:55:44,221 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /172.31.47.11 - max is 60 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8457) MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183203#comment-14183203 ] Xuefu Zhang commented on HIVE-8457: --- +1 MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch] -- Key: HIVE-8457 URL: https://issues.apache.org/jira/browse/HIVE-8457 Project: Hive Issue Type: Bug Components: Spark Reporter: Chao Assignee: Chao Attachments: HIVE-8457.1-spark.patch, HIVE-8457.2-spark.patch Currently, on the Spark branch, each thread it is bound with a thread-local IOContext, which gets initialized when we generates an input {{HadoopRDD}}, and later used in {{MapOperator}}, {{FilterOperator}}, etc. And, given the introduction of HIVE-8118, we may have multiple downstream RDDs that share the same input {{HadoopRDD}}, and we would like to have the {{HadoopRDD}} to be cached, to avoid scanning the same table multiple times. A typical case would be like the following: {noformat} inputRDD inputRDD || MT_11MT_12 || RT_1 RT_2 {noformat} Here, {{MT_11}} and {{MT_12}} are {{MapTran}} from a splitted {{MapWork}}, and {{RT_1}} and {{RT_2}} are two {{ReduceTran}}. Note that, this example is simplified, as we may also have {{ShuffleTran}} between {{MapTran}} and {{ReduceTran}}. When multiple Spark threads are running, {{MT_11}} may be executed first, and it will ask for an iterator from the {{HadoopRDD}} will trigger the creation of the iterator, which in turn triggers the initialization of the {{IOContext}} associated with that particular thread. *Now, the problem is*: before {{MT_12}} starts executing, it will also ask for an iterator from the {{HadoopRDD}}, and since the RDD is already cached, instead of creating a new iterator, it will just fetch it from the cached result. However, *this will skip the initialization of the IOContext associated with this particular thread*. And, when {{MT_12}} starts executing, it will try to initialize the {{MapOperator}}, but since the {{IOContext}} is not initialized, this will fail miserably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6165) Unify HivePreparedStatement from jdbc:hive and jdbc:hive2
[ https://issues.apache.org/jira/browse/HIVE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183213#comment-14183213 ] Gunther Hagleitner commented on HIVE-6165: -- [~xuefuz] - i've already done that. the last Hive QA entry is for .2 patch (i stripped the .txt by mistake). Unify HivePreparedStatement from jdbc:hive and jdbc:hive2 - Key: HIVE-6165 URL: https://issues.apache.org/jira/browse/HIVE-6165 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Reporter: Helmut Zechmann Priority: Minor Attachments: HIVE-6165.1.patch.txt, HIVE-6165.1.patch.txt, HIVE-6165.2.patch, HIVE-6165.2.patch.txt, HIVE-6165.2.patch.txt org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 jdbc drivers contain lots of duplicate code. Especially hive-HivePreparedStatement supports setObject, while the hive2 version does not. Share more code between the two to avoid duplicate work and to make sure that both support the broadest possible feature set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8457) MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8457: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Patch committed to Spark branch. Thanks to Chao for the contribution. MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch] -- Key: HIVE-8457 URL: https://issues.apache.org/jira/browse/HIVE-8457 Project: Hive Issue Type: Bug Components: Spark Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8457.1-spark.patch, HIVE-8457.2-spark.patch Currently, on the Spark branch, each thread it is bound with a thread-local IOContext, which gets initialized when we generates an input {{HadoopRDD}}, and later used in {{MapOperator}}, {{FilterOperator}}, etc. And, given the introduction of HIVE-8118, we may have multiple downstream RDDs that share the same input {{HadoopRDD}}, and we would like to have the {{HadoopRDD}} to be cached, to avoid scanning the same table multiple times. A typical case would be like the following: {noformat} inputRDD inputRDD || MT_11MT_12 || RT_1 RT_2 {noformat} Here, {{MT_11}} and {{MT_12}} are {{MapTran}} from a splitted {{MapWork}}, and {{RT_1}} and {{RT_2}} are two {{ReduceTran}}. Note that, this example is simplified, as we may also have {{ShuffleTran}} between {{MapTran}} and {{ReduceTran}}. When multiple Spark threads are running, {{MT_11}} may be executed first, and it will ask for an iterator from the {{HadoopRDD}} will trigger the creation of the iterator, which in turn triggers the initialization of the {{IOContext}} associated with that particular thread. *Now, the problem is*: before {{MT_12}} starts executing, it will also ask for an iterator from the {{HadoopRDD}}, and since the RDD is already cached, instead of creating a new iterator, it will just fetch it from the cached result. However, *this will skip the initialization of the IOContext associated with this particular thread*. And, when {{MT_12}} starts executing, it will try to initialize the {{MapOperator}}, but since the {{IOContext}} is not initialized, this will fail miserably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8437) Modify SparkPlan generation to set toCache flag to SparkTrans where caching is needed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-8437. --- Resolution: Fixed Fix Version/s: spark-branch Fixed via HIVE-8457. Modify SparkPlan generation to set toCache flag to SparkTrans where caching is needed [Spark Branch] Key: HIVE-8437 URL: https://issues.apache.org/jira/browse/HIVE-8437 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Fix For: spark-branch HIVE-8436 may modify the SparkWork right before SparkPlan generation. When this happens, the output from some SparkTrans needs to be cached to avoid regenerating the RDD. For more information, please refer to the design doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8435) Add identity project remover optimization
[ https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183226#comment-14183226 ] Hive QA commented on HIVE-8435: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12676828/HIVE-8435.03.patch {color:red}ERROR:{color} -1 due to 539 failed/errored test(s), 6549 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver_accumulo_predicate_pushdown org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver_accumulo_queries org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_create_temp_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7
[jira] [Commented] (HIVE-6165) Unify HivePreparedStatement from jdbc:hive and jdbc:hive2
[ https://issues.apache.org/jira/browse/HIVE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183231#comment-14183231 ] Xuefu Zhang commented on HIVE-6165: --- Yeah. I knew. However, the test failures seems unrelated, but I'm not quite sure. Thus, I liked to have another run to confirm. Thanks for pointing it out thought. Unify HivePreparedStatement from jdbc:hive and jdbc:hive2 - Key: HIVE-6165 URL: https://issues.apache.org/jira/browse/HIVE-6165 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Reporter: Helmut Zechmann Priority: Minor Attachments: HIVE-6165.1.patch.txt, HIVE-6165.1.patch.txt, HIVE-6165.2.patch, HIVE-6165.2.patch.txt, HIVE-6165.2.patch.txt org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 jdbc drivers contain lots of duplicate code. Especially hive-HivePreparedStatement supports setObject, while the hive2 version does not. Share more code between the two to avoid duplicate work and to make sure that both support the broadest possible feature set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error
[ https://issues.apache.org/jira/browse/HIVE-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8596: --- Attachment: HIVE-8596.1.patch HiveServer2 dynamic service discovery: ZK throws too many connections error --- Key: HIVE-8596 URL: https://issues.apache.org/jira/browse/HIVE-8596 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8596.1.patch {noformat} 2014-10-23 07:55:44,221 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /172.31.47.11 - max is 60 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8118) Support work that have multiple child works to work around SPARK-3622 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-8118. --- Resolution: Fixed Fix Version/s: spark-branch All sub tasks are completed. Thus, this JIRA is closed as fixed as well. Support work that have multiple child works to work around SPARK-3622 [Spark Branch] - Key: HIVE-8118 URL: https://issues.apache.org/jira/browse/HIVE-8118 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Assignee: Chao Labels: Spark-M1 Fix For: spark-branch Attachments: HIVE-8118.pdf In the current implementation, both SparkMapRecordHandler and SparkReduceRecorderHandler takes only one result collector, which limits that the corresponding map or reduce task can have only one child. It's very comment in multi-insert queries where a map/reduce task has more than one children. A query like the following has two map tasks as parents: {code} select name, sum(value) from dec group by name union all select name, value from dec order by name {code} It's possible in the future an optimation may be implemented so that a map work is followed by two reduce works and then connected to a union work. Thus, we should take this as a general case. Tez is currently providing a collector for each child operator in the map-side or reduce side operator tree. We can take Tez as a reference. Spark currently doesn't have a tranformation that supports mutliple output datasets from a single input dataset (SPARK-3622). This is a workaround for this gap. Likely this is a big change and subtasks are possible. With this, we can have a simpler and clean multi-insert implementation. This is also the problem observed in HIVE-7731 and HIVE-7503. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25550: HIVE-8021 CBO: support CTAS and insert ... select
On Oct. 24, 2014, 5:10 p.m., John Pullokkaran wrote: ql/src/test/queries/clientpositive/ctas_colname.q, line 9 https://reviews.apache.org/r/25550/diff/8/?file=731255#file731255line9 Why this change see HIVE-8512, the original query is not valid and should fail - Sergey --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25550/#review58290 --- On Oct. 23, 2014, 9:11 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25550/ --- (Updated Oct. 23, 2014, 9:11 p.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- see JIRA Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java dee7d7e ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 37cbf7f ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d8c50e3 ql/src/test/queries/clientpositive/cbo_correctness.q 4d8f156 ql/src/test/queries/clientpositive/ctas_colname.q 5322626 ql/src/test/queries/clientpositive/decimal_serde.q cf3a86c ql/src/test/queries/clientpositive/insert0.q PRE-CREATION ql/src/test/results/clientpositive/ctas_colname.q.out 97dacf6 ql/src/test/results/clientpositive/decimal_serde.q.out e461c2e ql/src/test/results/clientpositive/insert0.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25550/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Created] (HIVE-8597) SMB join small table side should use the same set of serialized payloads across tasks
Siddharth Seth created HIVE-8597: Summary: SMB join small table side should use the same set of serialized payloads across tasks Key: HIVE-8597 URL: https://issues.apache.org/jira/browse/HIVE-8597 Project: Hive Issue Type: Improvement Components: Tez Affects Versions: 0.14.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: 0.14.0 Each task sees all splits belonging to the bucket being processed by the task. At the moment, we end up using different instances of the same serialized split which adds unnecessary memory pressure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8597) SMB join small table side should use the same set of serialized payloads across tasks
[ https://issues.apache.org/jira/browse/HIVE-8597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-8597: - Attachment: HIVE-8597.1.patch Patch to create one set of serialized splits for each bucket, and re-use them across tasks processing the same bucket. Also removes some unused variables, and cleans up variables to allow for GC. [~vikram.dixit] - please review. SMB join small table side should use the same set of serialized payloads across tasks - Key: HIVE-8597 URL: https://issues.apache.org/jira/browse/HIVE-8597 Project: Hive Issue Type: Improvement Components: Tez Affects Versions: 0.14.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: 0.14.0 Attachments: HIVE-8597.1.patch Each task sees all splits belonging to the bucket being processed by the task. At the moment, we end up using different instances of the same serialized split which adds unnecessary memory pressure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8597) SMB join small table side should use the same set of serialized payloads across tasks
[ https://issues.apache.org/jira/browse/HIVE-8597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-8597: - Status: Patch Available (was: Open) SMB join small table side should use the same set of serialized payloads across tasks - Key: HIVE-8597 URL: https://issues.apache.org/jira/browse/HIVE-8597 Project: Hive Issue Type: Improvement Components: Tez Affects Versions: 0.14.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: 0.14.0 Attachments: HIVE-8597.1.patch Each task sees all splits belonging to the bucket being processed by the task. At the moment, we end up using different instances of the same serialized split which adds unnecessary memory pressure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8598) Push constant filters through joins
Ashutosh Chauhan created HIVE-8598: -- Summary: Push constant filters through joins Key: HIVE-8598 URL: https://issues.apache.org/jira/browse/HIVE-8598 Project: Hive Issue Type: Improvement Components: Logical Optimizer Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Will make {{NullScanOptimizer}} more effective. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error
[ https://issues.apache.org/jira/browse/HIVE-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183263#comment-14183263 ] Thejas M Nair commented on HIVE-8596: - Changes look good. Should we just catch Exception , so that any unchecked exceptions are also silenced (so that we don't lose the original exception if there is one). HiveServer2 dynamic service discovery: ZK throws too many connections error --- Key: HIVE-8596 URL: https://issues.apache.org/jira/browse/HIVE-8596 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8596.1.patch {noformat} 2014-10-23 07:55:44,221 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /172.31.47.11 - max is 60 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8021) CBO: support CTAS and insert ... select
[ https://issues.apache.org/jira/browse/HIVE-8021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-8021: --- Attachment: HIVE-8021.07.patch Fix a silly NPE CBO: support CTAS and insert ... select --- Key: HIVE-8021 URL: https://issues.apache.org/jira/browse/HIVE-8021 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8021.01.patch, HIVE-8021.01.patch, HIVE-8021.02.patch, HIVE-8021.03.patch, HIVE-8021.04.patch, HIVE-8021.05.patch, HIVE-8021.06.patch, HIVE-8021.06.patch, HIVE-8021.07.patch, HIVE-8021.patch, HIVE-8021.preliminary.patch Need to send only the select part to CBO for now -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26721: HIVE-8433 CBO loses a column during AST conversion
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26721/#review58319 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java https://reviews.apache.org/r/26721/#comment99274 This is unused. - John Pullokkaran On Oct. 22, 2014, 11:18 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26721/ --- (Updated Oct. 22, 2014, 11:18 p.m.) Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- see jira Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java 0428263 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/PlanModifierForASTConv.java 4f96d02 ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java 10ac4b2 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d8c50e3 ql/src/test/queries/clientpositive/cbo_correctness.q 4d8f156 ql/src/test/queries/clientpositive/select_same_col.q PRE-CREATION ql/src/test/results/clientpositive/cbo_correctness.q.out 7c25e1f ql/src/test/results/clientpositive/select_same_col.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/cbo_correctness.q.out e467773 Diff: https://reviews.apache.org/r/26721/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Updated] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error
[ https://issues.apache.org/jira/browse/HIVE-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8596: --- Attachment: HIVE-8596.2.patch [~thejas] Done. HiveServer2 dynamic service discovery: ZK throws too many connections error --- Key: HIVE-8596 URL: https://issues.apache.org/jira/browse/HIVE-8596 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8596.1.patch, HIVE-8596.2.patch {noformat} 2014-10-23 07:55:44,221 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /172.31.47.11 - max is 60 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)