[jira] [Updated] (PIG-4409) fs.defaultFS is overwritten in JobConf by replicated join at runtime
[ https://issues.apache.org/jira/browse/PIG-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-4409: --- Status: Patch Available (was: Open) fs.defaultFS is overwritten in JobConf by replicated join at runtime Key: PIG-4409 URL: https://issues.apache.org/jira/browse/PIG-4409 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.14.0 Reporter: Cheolsoo Park Assignee: Cheolsoo Park Priority: Critical Fix For: 0.15.0 Attachments: PIG-4409-1.patch This is a regression of PIG-4257. Pig accidentally overwrites {{fs.defaultFS}} in JobConf during the replicated join at runtime. This can cause various side effects because udfs and store/load funcs might depend on the value of {{fs.defaultFS}} at runtime. Here is an example. I have a store func that does 2-phase commit to S3. Each reducer writes output to local disk first and copies them to the final destination on S3 during the task commit phase. Once it's done copying, reducer writes a commit log to a hdfs location. During the job commit phase, AM reads all the commit logs and update Hive metastore accordingly. This store func stopped working in 0.14 when there is a replicate join in the reduce phase. It is because {{fs.defaultFS}} is overwritten to local FS from HDFS by replicated join at runtime. The root cause is that PIG-4257 changed {{ConfigurationUtil.getLocalFSProperties()}} to return a reference to JobConf instead of a copy object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4409) fs.defaultFS is overwritten in JobConf by replicated join at runtime
[ https://issues.apache.org/jira/browse/PIG-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-4409: --- Attachment: PIG-4409-1.patch Uploading a patch that fixes the issue. fs.defaultFS is overwritten in JobConf by replicated join at runtime Key: PIG-4409 URL: https://issues.apache.org/jira/browse/PIG-4409 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.14.0 Reporter: Cheolsoo Park Assignee: Cheolsoo Park Priority: Critical Fix For: 0.15.0 Attachments: PIG-4409-1.patch This is a regression of PIG-4257. Pig accidentally overwrites {{fs.defaultFS}} in JobConf during the replicated join at runtime. This can cause various side effects because udfs and store/load funcs might depend on the value of {{fs.defaultFS}} at runtime. Here is an example. I have a store func that does 2-phase commit to S3. Each reducer writes output to local disk first and copies them to the final destination on S3 during the task commit phase. Once it's done copying, reducer writes a commit log to a hdfs location. During the job commit phase, AM reads all the commit logs and update Hive metastore accordingly. This store func stopped working in 0.14 when there is a replicate join in the reduce phase. It is because {{fs.defaultFS}} is overwritten to local FS from HDFS by replicated join at runtime. The root cause is that PIG-4257 changed {{ConfigurationUtil.getLocalFSProperties()}} to return a reference to JobConf instead of a copy object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4409) fs.defaultFS is overwritten in JobConf by replicated join at runtime
[ https://issues.apache.org/jira/browse/PIG-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-4409: --- Resolution: Fixed Fix Version/s: 0.14.1 Status: Resolved (was: Patch Available) Thank you Daniel for the quick review. Committed to 0.14 and trunk. fs.defaultFS is overwritten in JobConf by replicated join at runtime Key: PIG-4409 URL: https://issues.apache.org/jira/browse/PIG-4409 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.14.0 Reporter: Cheolsoo Park Assignee: Cheolsoo Park Priority: Critical Fix For: 0.14.1, 0.15.0 Attachments: PIG-4409-1.patch This is a regression of PIG-4257. Pig accidentally overwrites {{fs.defaultFS}} in JobConf during the replicated join at runtime. This can cause various side effects because udfs and store/load funcs might depend on the value of {{fs.defaultFS}} at runtime. Here is an example. I have a store func that does 2-phase commit to S3. Each reducer writes output to local disk first and copies them to the final destination on S3 during the task commit phase. Once it's done copying, reducer writes a commit log to a hdfs location. During the job commit phase, AM reads all the commit logs and update Hive metastore accordingly. This store func stopped working in 0.14 when there is a replicate join in the reduce phase. It is because {{fs.defaultFS}} is overwritten to local FS from HDFS by replicated join at runtime. The root cause is that PIG-4257 changed {{ConfigurationUtil.getLocalFSProperties()}} to return a reference to JobConf instead of a copy object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)