[jira] [Updated] (PIG-4409) fs.defaultFS is overwritten in JobConf by replicated join at runtime

2015-02-04 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4409:
---
Status: Patch Available  (was: Open)

 fs.defaultFS is overwritten in JobConf by replicated join at runtime
 

 Key: PIG-4409
 URL: https://issues.apache.org/jira/browse/PIG-4409
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.14.0
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Priority: Critical
 Fix For: 0.15.0

 Attachments: PIG-4409-1.patch


 This is a regression of PIG-4257.
 Pig accidentally overwrites {{fs.defaultFS}} in JobConf during the replicated 
 join at runtime. This can cause various side effects because udfs and 
 store/load funcs might depend on the value of {{fs.defaultFS}} at runtime.
 Here is an example. I have a store func that does 2-phase commit to S3. Each 
 reducer writes output to local disk first and copies them to the final 
 destination on S3 during the task commit phase. Once it's done copying, 
 reducer writes a commit log to a hdfs location. During the job commit phase, 
 AM reads all the commit logs and update Hive metastore accordingly.
 This store func stopped working in 0.14 when there is a replicate join in the 
 reduce phase. It is because {{fs.defaultFS}} is overwritten to local FS from 
 HDFS by replicated join at runtime.
 The root cause is that PIG-4257 changed 
 {{ConfigurationUtil.getLocalFSProperties()}} to return a reference to JobConf 
 instead of a copy object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4409) fs.defaultFS is overwritten in JobConf by replicated join at runtime

2015-02-04 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4409:
---
Attachment: PIG-4409-1.patch

Uploading a patch that fixes the issue.

 fs.defaultFS is overwritten in JobConf by replicated join at runtime
 

 Key: PIG-4409
 URL: https://issues.apache.org/jira/browse/PIG-4409
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.14.0
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Priority: Critical
 Fix For: 0.15.0

 Attachments: PIG-4409-1.patch


 This is a regression of PIG-4257.
 Pig accidentally overwrites {{fs.defaultFS}} in JobConf during the replicated 
 join at runtime. This can cause various side effects because udfs and 
 store/load funcs might depend on the value of {{fs.defaultFS}} at runtime.
 Here is an example. I have a store func that does 2-phase commit to S3. Each 
 reducer writes output to local disk first and copies them to the final 
 destination on S3 during the task commit phase. Once it's done copying, 
 reducer writes a commit log to a hdfs location. During the job commit phase, 
 AM reads all the commit logs and update Hive metastore accordingly.
 This store func stopped working in 0.14 when there is a replicate join in the 
 reduce phase. It is because {{fs.defaultFS}} is overwritten to local FS from 
 HDFS by replicated join at runtime.
 The root cause is that PIG-4257 changed 
 {{ConfigurationUtil.getLocalFSProperties()}} to return a reference to JobConf 
 instead of a copy object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4409) fs.defaultFS is overwritten in JobConf by replicated join at runtime

2015-02-04 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4409:
---
   Resolution: Fixed
Fix Version/s: 0.14.1
   Status: Resolved  (was: Patch Available)

Thank you Daniel for the quick review. Committed to 0.14 and trunk.

 fs.defaultFS is overwritten in JobConf by replicated join at runtime
 

 Key: PIG-4409
 URL: https://issues.apache.org/jira/browse/PIG-4409
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.14.0
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Priority: Critical
 Fix For: 0.14.1, 0.15.0

 Attachments: PIG-4409-1.patch


 This is a regression of PIG-4257.
 Pig accidentally overwrites {{fs.defaultFS}} in JobConf during the replicated 
 join at runtime. This can cause various side effects because udfs and 
 store/load funcs might depend on the value of {{fs.defaultFS}} at runtime.
 Here is an example. I have a store func that does 2-phase commit to S3. Each 
 reducer writes output to local disk first and copies them to the final 
 destination on S3 during the task commit phase. Once it's done copying, 
 reducer writes a commit log to a hdfs location. During the job commit phase, 
 AM reads all the commit logs and update Hive metastore accordingly.
 This store func stopped working in 0.14 when there is a replicate join in the 
 reduce phase. It is because {{fs.defaultFS}} is overwritten to local FS from 
 HDFS by replicated join at runtime.
 The root cause is that PIG-4257 changed 
 {{ConfigurationUtil.getLocalFSProperties()}} to return a reference to JobConf 
 instead of a copy object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)