[jira] [Commented] (HIVE-19671) Distribute by rand() can lead to data inconsistency

2018-06-26 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-19671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523928#comment-16523928 ] Xuefu Zhang commented on HIVE-19671: Yeah. I think it makes sense. Thank. > Distribute by rand() can

[jira] [Commented] (HIVE-19671) Distribute by rand() can lead to data inconsistency

2018-06-24 Thread Rui Li (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-19671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16521790#comment-16521790 ] Rui Li commented on HIVE-19671: --- We can check all RS and look for non-deterministic UDF in partition keys

[jira] [Commented] (HIVE-19671) Distribute by rand() can lead to data inconsistency

2018-06-21 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-19671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519830#comment-16519830 ] Xuefu Zhang commented on HIVE-19671: Printing a warning is good, but we may not know if a

[jira] [Commented] (HIVE-19671) Distribute by rand() can lead to data inconsistency

2018-06-21 Thread Rui Li (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-19671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519357#comment-16519357 ] Rui Li commented on HIVE-19671: --- [~xuefuz], I agree it's not trivial to solve this on Hive side. Maybe we

[jira] [Commented] (HIVE-19671) Distribute by rand() can lead to data inconsistency

2018-06-20 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-19671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518745#comment-16518745 ] Xuefu Zhang commented on HIVE-19671: Based on your analysis, it seems that random(see) depends on a

[jira] [Commented] (HIVE-19671) Distribute by rand() can lead to data inconsistency

2018-06-20 Thread Rui Li (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-19671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517846#comment-16517846 ] Rui Li commented on HIVE-19671: --- [~xuefuz], thanks for your input. I think rand(seed) may not work if the

[jira] [Commented] (HIVE-19671) Distribute by rand() can lead to data inconsistency

2018-05-29 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-19671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493796#comment-16493796 ] Xuefu Zhang commented on HIVE-19671: [~lirui] I think #1 is better. Nondeterministic partitioning

[jira] [Commented] (HIVE-19671) Distribute by rand() can lead to data inconsistency

2018-05-28 Thread Rui Li (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-19671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493065#comment-16493065 ] Rui Li commented on HIVE-19671: --- Verified the issue only happens when there're task retries. I can think of

[jira] [Commented] (HIVE-19671) Distribute by rand() can lead to data inconsistency

2018-05-23 Thread Rui Li (JIRA)
[ https://issues.apache.org/jira/browse/HIVE-19671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487148#comment-16487148 ] Rui Li commented on HIVE-19671: --- I haven't verified it but my guess is the issue happens with task failover.