[jira] [Commented] (HIVE-14949) Enforce that target:source is not 1:N
[ https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843757#comment-15843757 ] Eugene Koifman commented on HIVE-14949: --- The "cardinality clause" is the last one in the text order of the generated multi-insert - the parser preserves this order. (I'm pretty sure we rely on this in other places). I agree that it has a non-trivial cost but it seems that "training wheels on" should be the default. If this condition is violated, we may write different events with the same ROW__ID to the same file (same op, same current txnid). It's not clear to me how the base/delta merge logic will react to it. Maybe it will crash, maybe silently produce bad data. So with this on, bugs in user supplied ON clauses will hopefully be detected before it's too late. > Enforce that target:source is not 1:N > - > > Key: HIVE-14949 > URL: https://issues.apache.org/jira/browse/HIVE-14949 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, > HIVE-14949.03.patch, HIVE-14949.03.patch, HIVE-14949.04.patch, > HIVE-14949.05.patch > > > If > 1 row on source side matches the same row on target side that means that > we are forced update (or delete) the same row in target more than once as > part of the same SQL statement. This should raise an error per SQL Spec > ISO/IEC 9075-2:2011(E) > Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B > There is no sure way to do this via static analysis of the query. > Can we add something to ROJ operator to pay attention to ROW__ID of target > side row and compare it with ROW__ID of target side of previous row output? > If they are the same, that means > 1 source row matched. > Or perhaps just mark each row in the hash table that it matched. And if it > matches again, throw an error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14949) Enforce that target:source is not 1:N
[ https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843743#comment-15843743 ] Alan Gates commented on HIVE-14949: --- A couple of comments: UpdateDeleteSemanticAnalyzer.java line 705 (in your patch), how do you know that the cardinality violation clause will be last in the tree? (We really need PRs here, so I can comment directly on them) This looks like it will be quite expensive, since it runs a separate group by query on the source table. Shouldn't it be off by default rather than on by default? > Enforce that target:source is not 1:N > - > > Key: HIVE-14949 > URL: https://issues.apache.org/jira/browse/HIVE-14949 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, > HIVE-14949.03.patch, HIVE-14949.03.patch, HIVE-14949.04.patch, > HIVE-14949.05.patch > > > If > 1 row on source side matches the same row on target side that means that > we are forced update (or delete) the same row in target more than once as > part of the same SQL statement. This should raise an error per SQL Spec > ISO/IEC 9075-2:2011(E) > Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B > There is no sure way to do this via static analysis of the query. > Can we add something to ROJ operator to pay attention to ROW__ID of target > side row and compare it with ROW__ID of target side of previous row output? > If they are the same, that means > 1 source row matched. > Or perhaps just mark each row in the hash table that it matched. And if it > matches again, throw an error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14949) Enforce that target:source is not 1:N
[ https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837993#comment-15837993 ] Eugene Koifman commented on HIVE-14949: --- no related failures [~alangates] could you review please > Enforce that target:source is not 1:N > - > > Key: HIVE-14949 > URL: https://issues.apache.org/jira/browse/HIVE-14949 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, > HIVE-14949.03.patch, HIVE-14949.03.patch, HIVE-14949.04.patch, > HIVE-14949.05.patch > > > If > 1 row on source side matches the same row on target side that means that > we are forced update (or delete) the same row in target more than once as > part of the same SQL statement. This should raise an error per SQL Spec > ISO/IEC 9075-2:2011(E) > Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B > There is no sure way to do this via static analysis of the query. > Can we add something to ROJ operator to pay attention to ROW__ID of target > side row and compare it with ROW__ID of target side of previous row output? > If they are the same, that means > 1 source row matched. > Or perhaps just mark each row in the hash table that it matched. And if it > matches again, throw an error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14949) Enforce that target:source is not 1:N
[ https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837514#comment-15837514 ] Hive QA commented on HIVE-14949: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12849218/HIVE-14949.05.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10999 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple] (batchId=153) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=93) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3168/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3168/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3168/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12849218 - PreCommit-HIVE-Build > Enforce that target:source is not 1:N > - > > Key: HIVE-14949 > URL: https://issues.apache.org/jira/browse/HIVE-14949 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, > HIVE-14949.03.patch, HIVE-14949.03.patch, HIVE-14949.04.patch, > HIVE-14949.05.patch > > > If > 1 row on source side matches the same row on target side that means that > we are forced update (or delete) the same row in target more than once as > part of the same SQL statement. This should raise an error per SQL Spec > ISO/IEC 9075-2:2011(E) > Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B > There is no sure way to do this via static analysis of the query. > Can we add something to ROJ operator to pay attention to ROW__ID of target > side row and compare it with ROW__ID of target side of previous row output? > If they are the same, that means > 1 source row matched. > Or perhaps just mark each row in the hash table that it matched. And if it > matches again, throw an error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14949) Enforce that target:source is not 1:N
[ https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837079#comment-15837079 ] Hive QA commented on HIVE-14949: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12849184/HIVE-14949.04.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10982 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=125) [table_access_keys_stats.q,bucketmapjoin11.q,auto_join4.q,mapjoin_decimal.q,join34.q,nullgroup.q,mergejoins_mixed.q,sort.q,stats8.q,auto_join28.q,join17.q,union17.q,skewjoinopt11.q,groupby1_map.q,load_dyn_part11.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] (batchId=67) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple] (batchId=153) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=93) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3160/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3160/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3160/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12849184 - PreCommit-HIVE-Build > Enforce that target:source is not 1:N > - > > Key: HIVE-14949 > URL: https://issues.apache.org/jira/browse/HIVE-14949 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, > HIVE-14949.03.patch, HIVE-14949.03.patch, HIVE-14949.04.patch > > > If > 1 row on source side matches the same row on target side that means that > we are forced update (or delete) the same row in target more than once as > part of the same SQL statement. This should raise an error per SQL Spec > ISO/IEC 9075-2:2011(E) > Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B > There is no sure way to do this via static analysis of the query. > Can we add something to ROJ operator to pay attention to ROW__ID of target > side row and compare it with ROW__ID of target side of previous row output? > If they are the same, that means > 1 source row matched. > Or perhaps just mark each row in the hash table that it matched. And if it > matches again, throw an error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14949) Enforce that target:source is not 1:N
[ https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836670#comment-15836670 ] Hive QA commented on HIVE-14949: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12849162/HIVE-14949.03.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 10992 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ctas] (batchId=231) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions] (batchId=231) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table] (batchId=231) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_directory] (batchId=231) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions] (batchId=231) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_table] (batchId=231) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[write_final_output_blobstore] (batchId=231) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_subquery] (batchId=36) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] (batchId=67) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple] (batchId=153) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucket5] (batchId=162) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[list_bucket_dml_10] (batchId=160) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[reduce_deduplicate] (batchId=162) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=93) org.apache.hadoop.hive.ql.TestTxnCommands.testMergeOnTezEdges (batchId=275) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3154/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3154/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3154/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 22 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12849162 - PreCommit-HIVE-Build > Enforce that target:source is not 1:N > - > > Key: HIVE-14949 > URL: https://issues.apache.org/jira/browse/HIVE-14949 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14949.01.patch, HIVE-14949.02.patch, > HIVE-14949.03.patch, HIVE-14949.03.patch > > > If > 1 row on source side matches the same row on target side that means that > we are forced update (or delete) the same row in target more than once as > part of the same SQL statement. This should raise an error per SQL Spec > ISO/IEC 9075-2:2011(E) > Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B > There is no sure way to do this via static analysis of the query. > Can we add something to ROJ operator to pay attention to ROW__ID of target > side row and compare it with ROW__ID of target side of previous row output? > If they are the same, that means > 1 source row matched. > Or perhaps just mark each row in the hash table that it matched. And if it > matches again, throw an error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14949) Enforce that target:source is not 1:N
[ https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832435#comment-15832435 ] Hive QA commented on HIVE-14949: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12848378/HIVE-14949.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10966 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_subquery] (batchId=36) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] (batchId=67) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop] (batchId=226) org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path] (batchId=226) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=136) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape1] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape2] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part] (batchId=149) org.apache.hadoop.hive.ql.TestTxnCommands2.testDynamicPartitionsMerge (batchId=263) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testDynamicPartitionsMerge (batchId=273) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testDynamicPartitionsMerge (batchId=270) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3075/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3075/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3075/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12848378 - PreCommit-HIVE-Build > Enforce that target:source is not 1:N > - > > Key: HIVE-14949 > URL: https://issues.apache.org/jira/browse/HIVE-14949 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14949.01.patch > > > If > 1 row on source side matches the same row on target side that means that > we are forced update (or delete) the same row in target more than once as > part of the same SQL statement. This should raise an error per SQL Spec > ISO/IEC 9075-2:2011(E) > Section 14.2 under "General Rules" Item 6/Subitem a/Subitem 2/Subitem B > There is no sure way to do this via static analysis of the query. > Can we add something to ROJ operator to pay attention to ROW__ID of target > side row and compare it with ROW__ID of target side of previous row output? > If they are the same, that means > 1 source row matched. > Or perhaps just mark each row in the hash table that it matched. And if it > matches again, throw an error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14949) Enforce that target:source is not 1:N
[ https://issues.apache.org/jira/browse/HIVE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828820#comment-15828820 ] Eugene Koifman commented on HIVE-14949: --- maybe a simper strategy is to create a RaiseErrorForMerge() UDF insert into tmp_table select RaiseErrorForMerge() where group by target.ROW__ID having count(*) > 1 So we never actually write anything to tmp_table, but if select produces any rows at all RaiseErrorForMerge() will throw and kill the query. This avoids any need for post hook check the table > Enforce that target:source is not 1:N > - > > Key: HIVE-14949 > URL: https://issues.apache.org/jira/browse/HIVE-14949 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > If > 1 row on source side matches the same row on target side that means that > we are forced update (or delete) the same row in target more than once as > part of the same SQL statement. This should raise an error per SQL Spec > There is no sure way to do this via static analysis of the query. > Can we add something to ROJ operator to pay attention to ROW__ID of target > side row and compare it with ROW__ID of target side of previous row output? > If they are the same, that means > 1 source row matched. > Or perhaps just mark each row in the hash table that it matched. And if it > matches again, throw an error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)