[jira] Commented: (PIG-1350) [Zebra] Zebra column names cannot have leading _

2010-04-02 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852752#action_12852752 ] Hadoop QA commented on PIG-1350: +1 overall. Here are the results of testing the latest

[jira] Updated: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

2010-04-02 Thread Chao Wang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1342: --- Attachment: PIG-1342.patch [Zebra] Avoid making unnecessary name node calls for writes in Zebra

[jira] Updated: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

2010-04-02 Thread Chao Wang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1342: --- Attachment: (was: PIG-1342.patch) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

[jira] Created: (PIG-1351) [Zebra] No type check when we write to the basic table

2010-04-02 Thread Chao Wang (JIRA)
[Zebra] No type check when we write to the basic table -- Key: PIG-1351 URL: https://issues.apache.org/jira/browse/PIG-1351 Project: Pig Issue Type: Improvement Affects Versions:

[jira] Created: (PIG-1352) piggybank UPPER udf throws exception if argument is null

2010-04-02 Thread Thejas M Nair (JIRA)
piggybank UPPER udf throws exception if argument is null Key: PIG-1352 URL: https://issues.apache.org/jira/browse/PIG-1352 Project: Pig Issue Type: Bug Reporter: Thejas M

[jira] Updated: (PIG-1352) piggybank UPPER udf throws exception if argument is null

2010-04-02 Thread Thejas M Nair (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1352: --- Affects Version/s: 0.7.0 Fix Version/s: 0.7.0 piggybank UPPER udf throws exception if

What should FLATTEN do?

2010-04-02 Thread hc busy
Guys, I have a row containing a map 'id','data', {((1,2)), ((2,3)), ((4,5))} What is the expected behavior when I flatten on that bag? I had expected it to result in 'id','data', (1,2) 'id','data', (2,3) 'id','data', (4,5) But it appears to me that the result of applying FLATTEN to that bag

Re: What should FLATTEN do?

2010-04-02 Thread hc busy
doh s/map/bag/g I seem to get maps and bags mixed up or some reason... Guys, I have a row containing a *bag* 'id','data', {((1,2)), ((2,3)), ((4,5))} What is the expected behavior when I flatten on that bag? I had expected it to result in 'id','data', (1,2) 'id','data', (2,3) 'id','data',

[jira] Updated: (PIG-1350) [Zebra] Zebra column names cannot have leading _

2010-04-02 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1350: - Status: Open (was: Patch Available) [Zebra] Zebra column names cannot have leading _

[jira] Updated: (PIG-1350) [Zebra] Zebra column names cannot have leading _

2010-04-02 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1350: - Status: Patch Available (was: Open) [Zebra] Zebra column names cannot have leading _

[jira] Updated: (PIG-1350) [Zebra] Zebra column names cannot have leading _

2010-04-02 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1350: - Attachment: pig-1350.patch Added fix for a testcase that's causing random failures. [Zebra] Zebra column

Re: What should FLATTEN do?

2010-04-02 Thread Dmitriy Ryaboy
CDH2 or CDH3? CDH2 is basically 0.{4,5}. CDH3 is in between 5 and 6. I expect the first result -- a flattened bag of tuples results in multiple rows, each containing the (not-flattened) tuple. Btw, Pig 0.6 is out. -D On Fri, Apr 2, 2010 at 11:32 AM, hc busy hc.b...@gmail.com wrote: doh

[jira] Assigned: (PIG-1352) piggybank UPPER udf throws exception if argument is null

2010-04-02 Thread Thejas M Nair (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair reassigned PIG-1352: -- Assignee: Thejas M Nair piggybank UPPER udf throws exception if argument is null

[jira] Updated: (PIG-1352) piggybank UPPER udf throws exception if argument is null

2010-04-02 Thread Thejas M Nair (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1352: --- Status: Patch Available (was: Open) piggybank UPPER udf throws exception if argument is null

[jira] Updated: (PIG-1352) piggybank UPPER udf throws exception if argument is null

2010-04-02 Thread Thejas M Nair (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1352: --- Attachment: UPPER.patch piggybank UPPER udf throws exception if argument is null

Re: What should FLATTEN do?

2010-04-02 Thread hc busy
Yeah, I'm sure it has nested tuples. Pig doesn't natively support introduction of tuples h = foreach g generate ((x,y,z)), (x), x doesn't work, but i have a udf that does that don't ask why, and I've seen it print double pair of paren's when I took a dump. Our hadoop guys here

Re: What should FLATTEN do?

2010-04-02 Thread Russell Jurney
Not sure if this is exactly the same, but when I've created tuples within tuples in UDFs (to preserve order of pairs), from bag input, Pig has allowed it - but I can't work with that data in subsequent steps. On Fri, Apr 2, 2010 at 12:37 PM, hc busy hc.b...@gmail.com wrote: Yeah, I'm sure it

Re: What should FLATTEN do?

2010-04-02 Thread hc busy
yeah, you have to implement outputSchema() method on the udf in order to make the content of the tuple visible... There's a nice example in the UDF Manual http://hadoop.apache.org/pig/docs/r0.6.0/udf.html http://hadoop.apache.org/pig/docs/r0.6.0/udf.htmlsearch for 'package myudf' until u

[jira] Assigned: (PIG-864) Record graph of execution of Map-Reduce jobs executed by a Pig script

2010-04-02 Thread Richard Ding (JIRA)
[ https://issues.apache.org/jira/browse/PIG-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-864: Assignee: Richard Ding Record graph of execution of Map-Reduce jobs executed by a Pig script

[jira] Assigned: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script

2010-04-02 Thread Richard Ding (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-1280: - Assignee: Richard Ding Add a pig-script-id to the JobConf of all jobs run in a pig-script

[jira] Assigned: (PIG-809) number of input lines it processed, number of output lines it produced for PIG job

2010-04-02 Thread Richard Ding (JIRA)
[ https://issues.apache.org/jira/browse/PIG-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-809: Assignee: Richard Ding number of input lines it processed, number of output lines it produced for

[jira] Updated: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

2010-04-02 Thread Chao Wang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1342: --- Attachment: PIG-1342.patch [Zebra] Avoid making unnecessary name node calls for writes in Zebra

[jira] Updated: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

2010-04-02 Thread Chao Wang (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1342: --- Attachment: (was: PIG-1342.patch) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

Re: What should FLATTEN do?

2010-04-02 Thread hc busy
Okay guys some details after some digging. We've got this version of pig from CDH2 installed: hadoop-pig-0.5.0+11.1-1 the list of patches that they applied on top of 0.5.0 are listed here: http://archive.cloudera.com/cdh/2/pig-0.5.0+11.1.CHANGES.txt

Re: What should FLATTEN do?

2010-04-02 Thread hc busy
The hadoop version: hadoop-0.20-0.20.1+169.68-1 On Fri, Apr 2, 2010 at 2:33 PM, hc busy hc.b...@gmail.com wrote: Okay guys some details after some digging. We've got this version of pig from CDH2 installed: hadoop-pig-0.5.0+11.1-1 the list of patches that they applied on top of 0.5.0

[jira] Commented: (PIG-864) Record graph of execution of Map-Reduce jobs executed by a Pig script

2010-04-02 Thread Dmitriy V. Ryaboy (JIRA)
[ https://issues.apache.org/jira/browse/PIG-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852985#action_12852985 ] Dmitriy V. Ryaboy commented on PIG-864: --- Richard, is the idea to log something that

[jira] Assigned: (PIG-857) Pig should implement Tool interface from Hadoop

2010-04-02 Thread Richard Ding (JIRA)
[ https://issues.apache.org/jira/browse/PIG-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-857: Assignee: Richard Ding Pig should implement Tool interface from Hadoop

Re: Begin a discussion about Pig as a top level project

2010-04-02 Thread Thejas Nair
I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and heavily influenced by its roadmap. I think it makes sense to continue as a sub-project of hadoop. -Thejas On 3/31/10 4:04 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: Over time, Pig is increasing its coupling to Hadoop

[jira] Assigned: (PIG-62) Need to add pig script and input dirs (in clear text format) to jobconf

2010-04-02 Thread Richard Ding (JIRA)
[ https://issues.apache.org/jira/browse/PIG-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-62: --- Assignee: Richard Ding Need to add pig script and input dirs (in clear text format) to jobconf

[jira] Commented: (PIG-1348) InternalCachedBag running out of memory

2010-04-02 Thread Richard Ding (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853016#action_12853016 ] Richard Ding commented on PIG-1348: --- The problem seems not with the InternalCachedBag. The

[jira] Created: (PIG-1353) Map-side joins

2010-04-02 Thread Ashutosh Chauhan (JIRA)
Map-side joins -- Key: PIG-1353 URL: https://issues.apache.org/jira/browse/PIG-1353 Project: Pig Issue Type: Improvement Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan

[jira] Commented: (PIG-864) Record graph of execution of Map-Reduce jobs executed by a Pig script

2010-04-02 Thread Richard Ding (JIRA)
[ https://issues.apache.org/jira/browse/PIG-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853031#action_12853031 ] Richard Ding commented on PIG-864: -- I'm thinking about logging this information as part of

[jira] Commented: (PIG-864) Record graph of execution of Map-Reduce jobs executed by a Pig script

2010-04-02 Thread Dmitriy V. Ryaboy (JIRA)
[ https://issues.apache.org/jira/browse/PIG-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853039#action_12853039 ] Dmitriy V. Ryaboy commented on PIG-864: --- In that case... think you can knock out PIG-908

[jira] Updated: (PIG-1353) Map-side joins

2010-04-02 Thread Ashutosh Chauhan (JIRA)
[ https://issues.apache.org/jira/browse/PIG-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1353: -- Attachment: pig-1353.patch An illustrative patch which achieves this. Map-side joins