[jira] Commented: (PIG-1167) [zebra] Zebra does not support Hadoop Globs

2010-01-04 Thread Chao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796259#action_12796259
 ] 

Chao Wang commented on PIG-1167:


Patch looks good +1.

 [zebra] Zebra does not support Hadoop Globs
 ---

 Key: PIG-1167
 URL: https://issues.apache.org/jira/browse/PIG-1167
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Yan Zhou
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1167.patch


 Pssing the following path to Zebra causing error but works with Hadoop 
 directly: /projects/FETL/sample/ABF1/{2009120204}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1167) [zebra] Zebra does not support Hadoop Globs

2010-01-04 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1167:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to both Apache trunk and 6.0 branch.

 [zebra] Zebra does not support Hadoop Globs
 ---

 Key: PIG-1167
 URL: https://issues.apache.org/jira/browse/PIG-1167
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Yan Zhou
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1167.patch


 Pssing the following path to Zebra causing error but works with Hadoop 
 directly: /projects/FETL/sample/ABF1/{2009120204}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1094) Fix unit tests corresponding to source changes so far

2010-01-04 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796316#action_12796316
 ] 

Pradeep Kamath commented on PIG-1094:
-

+1 to PIG-1094_6.patch , patch committed - thanks Thejas!

Here is the output of test-patch for the same:

  [exec]
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]
 [exec]
 [exec]


 Fix unit tests corresponding to source changes so far
 -

 Key: PIG-1094
 URL: https://issues.apache.org/jira/browse/PIG-1094
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1094.patch, PIG-1094_2.patch, PIG-1094_3.patch, 
 PIG-1094_4.patch, PIG-1094_5.patch, PIG-1094_6.patch


 The check-in's so far on load-store-redesign branch have nor addressed unit 
 test failures due to interface changes. This jira is to track the task of 
 making the common case unit tests work with the new interfaces. Some aspects 
 of the new proposal like using LoadCaster interface for casting, making local 
 mode work have not been completed yet. Tests which are failing due to those 
 reasons will not be fixed in this jira and addressed in the jiras 
 corresponding to those tasks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-04 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1090:
--

Attachment: PIG-1090-9.patch

This patch replaced msStorage with a Configuration object in LOLoad and fixed 
corresponding test cases.

The results of test-patch run:

{code}
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 15 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

{code}

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, 
 PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, 
 PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796386#action_12796386
 ] 

Daniel Dai commented on PIG-1090:
-

+1 for PIG-1090-8.patch

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, 
 PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, 
 PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1172) PushDownForeachFlatten shall not push ForEach below Join if the flattened fields is used in Join

2010-01-04 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796394#action_12796394
 ] 

Alan Gates commented on PIG-1172:
-

Changes look good, +1.

The patch lists a new hadoop20.jar.  Is this intentional?

 PushDownForeachFlatten shall not push ForEach below Join if the flattened 
 fields is used in Join
 

 Key: PIG-1172
 URL: https://issues.apache.org/jira/browse/PIG-1172
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1172-1.patch


 Currently the following script will push B below D. But we will use fattened 
 column in the join, we cannot push that.
 A = load '1.txt' as (bg:bag{t:tuple(a0,a1)});
 B = FOREACH A generate flatten($0);
 C = load '3.txt' AS (c0, c1);
 D = JOIN B by a1, C by c1;
 E = limit D 10;
 explain E;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1172) PushDownForeachFlatten shall not push ForEach below Join if the flattened fields is used in Join

2010-01-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1172:


Attachment: PIG-1172-2.patch

hadoop20.jar should not be in patch. I reattched the patch. Thanks.

 PushDownForeachFlatten shall not push ForEach below Join if the flattened 
 fields is used in Join
 

 Key: PIG-1172
 URL: https://issues.apache.org/jira/browse/PIG-1172
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1172-1.patch, PIG-1172-2.patch


 Currently the following script will push B below D. But we will use fattened 
 column in the join, we cannot push that.
 A = load '1.txt' as (bg:bag{t:tuple(a0,a1)});
 B = FOREACH A generate flatten($0);
 C = load '3.txt' AS (c0, c1);
 D = JOIN B by a1, C by c1;
 E = limit D 10;
 explain E;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1174) Creation of output path should be done by storage function

2010-01-04 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796449#action_12796449
 ] 

Alan Gates commented on PIG-1174:
-

Delegating creation of the output path to the storage function is not trivial.  
The storage function is invoked on every reducer (or every mapper for map only 
jobs).  So delaying creation until the storage function will create a race 
condition that the storage functions will handle.  And if the solution is just 
to let the first one win and all the rest error out and ignore the error, for a 
large job this will still bombard the namenode with hundreds or thousands of 
bogus mkdir requests.  It also has the problem that all the storage functions 
that get an error can't tell if it's really an error (there's old data there 
they are overwriting) versus they just lost the race and another function has 
already created it.

We are reworking the way load and store function interact with InputFormats and 
OutputFormats (see PIG-966 for full details).  This will push the 
responsibility of file creation onto the OutputFormat.  This may partially 
address your concerns.

 Creation of output path should be done by storage function
 --

 Key: PIG-1174
 URL: https://issues.apache.org/jira/browse/PIG-1174
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham

 When executing a STORE command, Pig creates the output location before the 
 storage function gets called. This causes problems with storage functions 
 that have logic to determine the output location. See this thread:
 http://www.mail-archive.com/pig-user%40hadoop.apache.org/msg01538.html
 For example, when making a request like this:
 STORE A INTO '/my/home/output' USING MultiStorage('/my/home/output','0', 
 'none', '\t');
 Pig creates a file '/my/home/output' and then an exception is thrown when 
 MultiStorage tries to make a directory under '/my/home/output'. The 
 workaround is to instead specify a dummy location as the first path like so:
 STORE A INTO '/my/home/output/temp' USING MultiStorage('/my/home/output','0', 
 'none', '\t');
 Two changes should be made:
 1. The path specified in the INTO clause should be available to the storage 
 function so it doesn't need to be duplicated.
 2. The creation of the output paths should be delegated to the storage 
 function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Pig-trunk #658

2010-01-04 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/658/changes




[jira] Updated: (PIG-1175) Pig 0.6 Docs - Store v. Dump

2010-01-04 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel updated PIG-1175:
-

Attachment: PIG-1175.patch

Patch file.

 Pig 0.6 Docs - Store v. Dump
 

 Key: PIG-1175
 URL: https://issues.apache.org/jira/browse/PIG-1175
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.6.0
Reporter: Corinne Chandel
 Fix For: 0.6.0

 Attachments: PIG-1175.patch


 Pig 0.6 Docs
 (1) Pig Latin Ref Manual
  Update STORE
  Update DUMP (and move under Diagnostic Operators)
 (2) Pig Latin User Guide
  Under Multi-Query Execution, add new section: Store v. Dump
 Updates clarify how STORE and DUMP work with multi-query execution 
 (optimization).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1176) Column Pruner issues in union of loader with and without schema

2010-01-04 Thread Daniel Dai (JIRA)
Column Pruner issues in union of loader with and without schema
---

 Key: PIG-1176
 URL: https://issues.apache.org/jira/browse/PIG-1176
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0


Column pruner for union could fail if one source of union have the schema and 
the other does not have schema. For example, the following script fail:

{code}
a = load '1.txt' as (a0, a1, a2);
b = foreach a generate a0;
c = load '2.txt';
d = foreach c generate $0;
e = union b, d;
dump e;
{code}

However, this issue is in trunk only and is not applicable to 0.6 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1176) Column Pruner issues in union of loader with and without schema

2010-01-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1176:


Status: Patch Available  (was: Open)

 Column Pruner issues in union of loader with and without schema
 ---

 Key: PIG-1176
 URL: https://issues.apache.org/jira/browse/PIG-1176
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1176-1.patch


 Column pruner for union could fail if one source of union have the schema and 
 the other does not have schema. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2);
 b = foreach a generate a0;
 c = load '2.txt';
 d = foreach c generate $0;
 e = union b, d;
 dump e;
 {code}
 However, this issue is in trunk only and is not applicable to 0.6 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1176) Column Pruner issues in union of loader with and without schema

2010-01-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1176:


Attachment: PIG-1176-1.patch

 Column Pruner issues in union of loader with and without schema
 ---

 Key: PIG-1176
 URL: https://issues.apache.org/jira/browse/PIG-1176
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1176-1.patch


 Column pruner for union could fail if one source of union have the schema and 
 the other does not have schema. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2);
 b = foreach a generate a0;
 c = load '2.txt';
 d = foreach c generate $0;
 e = union b, d;
 dump e;
 {code}
 However, this issue is in trunk only and is not applicable to 0.6 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1172) PushDownForeachFlatten shall not push ForEach below Join if the flattened fields is used in Join

2010-01-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1172:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and 0.6 branch.

 PushDownForeachFlatten shall not push ForEach below Join if the flattened 
 fields is used in Join
 

 Key: PIG-1172
 URL: https://issues.apache.org/jira/browse/PIG-1172
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1172-1.patch, PIG-1172-2.patch


 Currently the following script will push B below D. But we will use fattened 
 column in the join, we cannot push that.
 A = load '1.txt' as (bg:bag{t:tuple(a0,a1)});
 B = FOREACH A generate flatten($0);
 C = load '3.txt' AS (c0, c1);
 D = JOIN B by a1, C by c1;
 E = limit D 10;
 explain E;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1176) Column Pruner issues in union of loader with and without schema

2010-01-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796562#action_12796562
 ] 

Hadoop QA commented on PIG-1176:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429411/PIG-1176-1.patch
  against trunk revision 895753.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/165/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/165/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/165/console

This message is automatically generated.

 Column Pruner issues in union of loader with and without schema
 ---

 Key: PIG-1176
 URL: https://issues.apache.org/jira/browse/PIG-1176
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1176-1.patch


 Column pruner for union could fail if one source of union have the schema and 
 the other does not have schema. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2);
 b = foreach a generate a0;
 c = load '2.txt';
 d = foreach c generate $0;
 e = union b, d;
 dump e;
 {code}
 However, this issue is in trunk only and is not applicable to 0.6 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1176) Column Pruner issues in union of loader with and without schema

2010-01-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1176:


Status: Patch Available  (was: Open)

 Column Pruner issues in union of loader with and without schema
 ---

 Key: PIG-1176
 URL: https://issues.apache.org/jira/browse/PIG-1176
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1176-1.patch


 Column pruner for union could fail if one source of union have the schema and 
 the other does not have schema. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2);
 b = foreach a generate a0;
 c = load '2.txt';
 d = foreach c generate $0;
 e = union b, d;
 dump e;
 {code}
 However, this issue is in trunk only and is not applicable to 0.6 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1176) Column Pruner issues in union of loader with and without schema

2010-01-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1176:


Status: Open  (was: Patch Available)

 Column Pruner issues in union of loader with and without schema
 ---

 Key: PIG-1176
 URL: https://issues.apache.org/jira/browse/PIG-1176
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1176-1.patch


 Column pruner for union could fail if one source of union have the schema and 
 the other does not have schema. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2);
 b = foreach a generate a0;
 c = load '2.txt';
 d = foreach c generate $0;
 e = union b, d;
 dump e;
 {code}
 However, this issue is in trunk only and is not applicable to 0.6 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1173) pig cannot be built without an internet connection

2010-01-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796564#action_12796564
 ] 

Daniel Dai commented on PIG-1173:
-

+1, will commit patch shortly.

 pig cannot be built without an internet connection
 --

 Key: PIG-1173
 URL: https://issues.apache.org/jira/browse/PIG-1173
 Project: Pig
  Issue Type: Bug
Reporter: Jeff Hodges
Priority: Minor
 Attachments: offlinebuild-v2.patch, offlinebuild.patch


 Pig's build.xml does not allow for offline building even when it's been built 
 before. This is because the ivy-download target has not conditional 
 associated with it to turn it off. The Hadoop seems to be adding an 
 unless=offline to the ivy-download target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.