[
https://issues.apache.org/jira/browse/PIG-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-4148:
Issue Type: Bug (was: Sub-task)
Parent: (was: PIG-3446)
Tez order-by is often skewed because
[
https://issues.apache.org/jira/browse/PIG-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-4148:
Fix Version/s: (was: 0.14.0)
0.14.1
Push to 0.14.1 since I cannot find a reproducible
[
https://issues.apache.org/jira/browse/PIG-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheolsoo Park updated PIG-4148:
---
Attachment: popackage.log
generate_sample.py
samples_logs.tar.gz
[
https://issues.apache.org/jira/browse/PIG-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheolsoo Park updated PIG-4148:
---
Attachment: metric_retention.explain
I am also attaching the explain output of my job. To summarize, it
[
https://issues.apache.org/jira/browse/PIG-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheolsoo Park updated PIG-4148:
---
Description:
In Tez, FindQuantiles UDF is called with a smaller number of samples than MR
resulting in
[
https://issues.apache.org/jira/browse/PIG-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheolsoo Park updated PIG-4148:
---
Attachment: PIG-4148-1.patch
The patch changes the number of samples to parallelism x per-task sample
[
https://issues.apache.org/jira/browse/PIG-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheolsoo Park updated PIG-4148:
---
Attachment: (was: PIG-4148-1.patch)
Tez order-by is often skewed because FindQuantiles UDF is