[jira] Updated: (PIG-958) Splitting output data on key field

2009-10-26 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-958:
--

Attachment: 958.v4.patch

1. When run in cluster mode, static variable PigMapReduce.sJobConf is null when 
checked in the UDF constructor but NOT null when UDF is actually invoked. This  
causes incorrect initialization of FileSystem object 'fs' to local filesystem, 
causing the test to fail. Moved to 'fs' initialization to 
intijobSpecificParams() method.

2. Deleting the temporary directory manually in finish(), causes the job to 
fail. Removed the manual deletion. As a side effect, user specified PARENT 
output directory in the UDF will have empty part-* files. These should be 
deleted manually by the user.

Verfied that UDF works correctly and that unit test pass

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v3.patch, 958.v4.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-10-09 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-958:
--

Patch Info: [Patch Available]

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v3.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-10-09 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-958:
--

Status: In Progress  (was: Patch Available)

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v3.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-10-09 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-958:
--

Attachment: 958.v3.patch

Pradeep,
 Thanks for your review comments. I have incorporated the 
suggestions provided in the code review. The code is vastly simplified, cleaner 
and more readable :-). 

Unit test now pass in local mode but fail in cluster mode after taking an 
update of Pig code base. The error I see is :-
hdfs://localhost.localdomain:40352/user/gankur/output/_temporary/_attempt_20091009030519686_0001_m_00_0/output,
 expected: file:///

Looks like a config issue with org.apache.pig.test.MiniCluster in the latest 
pig code. I didn't get time to debug this as I am going on a vacation. 
Regardless, I have attached the new patch for your review. Please suggest what 
needs to be done to pass the unit test in cluster mode.

-Ankur

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v3.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-10-09 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-958:
--

Attachment: (was: 958.v2.patch)

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v3.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-09-25 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-958:
---

Status: Open  (was: Patch Available)

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v2.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-09-25 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-958:
---

Status: Patch Available  (was: Open)

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v2.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-09-23 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-958:
---

Status: Open  (was: Patch Available)

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v2.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-09-23 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-958:
---

Status: Patch Available  (was: Open)

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v2.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-09-23 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-958:
---

Status: Open  (was: Patch Available)

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v2.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-09-23 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-958:
---

Status: Patch Available  (was: Open)

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v2.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-09-22 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-958:
--

Status: Patch Available  (was: Open)

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v2.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-09-22 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-958:
--

Attachment: (was: 958.v1.patch)

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v2.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-09-22 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-958:
--

Status: Open  (was: Patch Available)

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v1.patch, 958.v2.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-09-22 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-958:
--

Attachment: 958.v2.patch

Fixed wrong src path of the classes

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v1.patch, 958.v2.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-09-15 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-958:
--

Status: Patch Available  (was: Open)

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v1.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-958) Splitting output data on key field

2009-09-14 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-958:
--

Attachment: 958.v1.patch

Attached is an implementation of a custom store function that splits the data 
dynamically based on the values of user specified key field in the output tuple

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v1.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.