Multiple successors

2010-07-20 Thread Swati Jain
I noticed a number of places in the code where the successors of a
LogicalRelationalOperator is accessed as op.successors.get(0). Is it
always the case that logical relational operators (in the new logical
optimizer framework) have only 1 successor? Why dont the rules iterate over
the successors instead of assuming there is a single successor?

An example which shows an LOFilter having multiple successor (correct me if
I am wrong):

A1 = Load(..);
A2 = Load(..);
B = LOFilter(...);
C = LOJoin(A1,B);
D = LOJoin(A2,B);

Thanks!
Swati


[jira] Commented: (PIG-1379) Jars registered from command line should override the ones present in the script

2010-07-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890316#action_12890316
 ] 

Alan Gates commented on PIG-1379:
-

Won't this cause a backward compatibility issue?  Have we determined we're 
willing to make this semantic change in 0.8?

 Jars registered from command line should override the ones present in the 
 script 
 -

 Key: PIG-1379
 URL: https://issues.apache.org/jira/browse/PIG-1379
 Project: Pig
  Issue Type: Improvement
Reporter: Ankur
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1379.patch


 Jars that are registered from the command line when executing the pig script 
 should override the ones that are specified via 'register' in the pig script 
 itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1379) Jars registered from command line should override the ones present in the script

2010-07-20 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1379:


Hadoop Flags: [Incompatible change]

 Jars registered from command line should override the ones present in the 
 script 
 -

 Key: PIG-1379
 URL: https://issues.apache.org/jira/browse/PIG-1379
 Project: Pig
  Issue Type: Improvement
Reporter: Ankur
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1379.patch


 Jars that are registered from the command line when executing the pig script 
 should override the ones that are specified via 'register' in the pig script 
 itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Announcing Howl development list

2010-07-20 Thread Alan Gates


On Jul 14, 2010, at 2:11 AM, Jeff Hammerbacher wrote:


Hey,

Thanks for writing up these notes, they're very useful.

Pradeep Kamath gave a short presentation on Howl, the work he is  
leading to
create a shared metadata system between Pig, Hive, and Map Reduce.   
Dmitriy

noted that we need to get this work more in the open so others can
participate and contribute.



Is there a public JIRA where one could follow this work? Any chance  
we can
break it up into incremental milestones rather than have a single  
code drop
as with previous large features in Pig? I understand it may be  
difficult to
coordinate internal development with external user groups, but I  
hope the

feedback from third parties might make such a process worthwhile.



A wiki page outlining Howl is at http://wiki.apache.org/pig/Howl

A howldev mailing list has been set up on Yahoo! groups for  
discussions on Howl.  You can subscribe by sending mail to howldev-subscr...@yahoogroups.com 
.  We plan on putting the code on github in a read only repository.   
It will be a few more days before we get there.  It will be announced  
on the list when it is.


Alan.



[jira] Created: (PIG-1507) Full outer join fails while doing a filter on joined data

2010-07-20 Thread Daniel Dai (JIRA)
Full outer join fails while doing a filter on joined data
-

 Key: PIG-1507
 URL: https://issues.apache.org/jira/browse/PIG-1507
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


The following script produce wrong result:

test1.dat:
1
2
3

test2.dat:
1
2

pig script:
{code}
a = LOAD 'test1.dat' USING PigStorage() AS (d1:int);
b = LOAD 'test2.dat' USING PigStorage() AS (d2:int);
c = JOIN a BY d1 FULL OUTER, b BY d2;
d = FILTER c BY d2 IS NULL;
STORE d INTO 'test.out' USING PigStorage();
{code}

expected:
3

We get:
1
2
3

This is because we erroneously push the filter before full outer join. Similar 
issue is addressed in 
[PIG-1289|https://issues.apache.org/jira/browse/PIG-1289], but we only fix 
left/right outer join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-602) Pass global configurations to UDF

2010-07-20 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-602:
--

Assignee: (was: Alan Gates)

 Pass global configurations to UDF
 -

 Key: PIG-602
 URL: https://issues.apache.org/jira/browse/PIG-602
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Yiping Han
 Fix For: 0.8.0


 We are seeking an easy way to pass a large number of global configurations to 
 UDFs.
 Since our application contains many pig jobs, and has a large number of 
 configurations. Passing configurations through command line is not an ideal 
 way (i.e. modifying single parameter needs to change multiple command lines). 
 And to put everything into the hadoop conf is not an ideal way either.
 We would like to see if Pig can provide such a facility that allows us to 
 pass a configuration file in some format(XML?) and then make it available 
 through out all the UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-07-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: (was: ScalarImpl1.patch)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-07-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: ScalarImpl1.patch

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs

2010-07-20 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-480:
--

Assignee: (was: Ying He)

 PERFORMANCE: Use identity mapper in a chain of M-R jobs
 ---

 Key: PIG-480
 URL: https://issues.apache.org/jira/browse/PIG-480
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Fix For: 0.8.0

 Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch


 For jobs with two or more MR jobs, use identity mapper wherever possible in 
 second and subsequent MR jobs. Identity mapper is about 50% than pig empty 
 map job because it doesn't parse the data. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1295) Binary comparator for secondary sort

2010-07-20 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890384#action_12890384
 ] 

Daniel Dai commented on PIG-1295:
-

Patch looks pretty good. Thanks Gianmarco! Couple of comments:
1. 
PigTupleRawComparatorNew:324,332,343,357,367,377,387,399,416,474,483,501,512,etc,
 if GeneralizedDataType is not equal, we should throw exception to contain the 
error
2. PigTupleRawComparatorNew:455-464, if the comparison of two items is not 
equal, we shall return the result without comparing additional items, that's 
how we get performance gain
3. I am unable to run TestPigTupleRawComparator.main due to OOM, what is the 
speed up after the change?
4. PigTupleRawComparatorNew:132, we shall move the logic of choosing the right 
comparator to Pig code, and move comparator into BinSedesTuple and 
DefaultTuple. This is part of integration work and let's mark it as the first 
thing for phase 2.

 Binary comparator for secondary sort
 

 Key: PIG-1295
 URL: https://issues.apache.org/jira/browse/PIG-1295
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Gianmarco De Francisci Morales
 Fix For: 0.8.0

 Attachments: PIG-1295_0.1.patch, PIG-1295_0.10.patch, 
 PIG-1295_0.2.patch, PIG-1295_0.3.patch, PIG-1295_0.4.patch, 
 PIG-1295_0.5.patch, PIG-1295_0.6.patch, PIG-1295_0.7.patch, 
 PIG-1295_0.8.patch, PIG-1295_0.9.patch


 When hadoop framework doing the sorting, it will try to use binary version of 
 comparator if available. The benefit of binary comparator is we do not need 
 to instantiate the object before we compare. We see a ~30% speedup after we 
 switch to binary comparator. Currently, Pig use binary comparator in 
 following case:
 1. When semantics of order doesn't matter. For example, in distinct, we need 
 to do a sort in order to filter out duplicate values; however, we do not care 
 how comparator sort keys. Groupby also share this character. In this case, we 
 rely on hadoop's default binary comparator
 2. Semantics of order matter, but the key is of simple type. In this case, we 
 have implementation for simple types, such as integer, long, float, 
 chararray, databytearray, string
 However, if the key is a tuple and the sort semantics matters, we do not have 
 a binary comparator implementation. This especially matters when we switch to 
 use secondary sort. In secondary sort, we convert the inner sort of nested 
 foreach into the secondary key and rely on hadoop to sorting on both main key 
 and secondary key. The sorting key will become a two items tuple. Since the 
 secondary key the sorting key of the nested foreach, so the sorting semantics 
 matters. It turns out we do not have binary comparator once we use secondary 
 sort, and we see a significant slow down.
 Binary comparator for tuple should be doable once we understand the binary 
 structure of the serialized tuple. We can focus on most common use cases 
 first, which is group by followed by a nested sort. In this case, we will 
 use secondary sort. Semantics of the first key does not matter but semantics 
 of secondary key matters. We need to identify the boundary of main key and 
 secondary key in the binary tuple buffer without instantiate tuple itself. 
 Then if the first key equals, we use a binary comparator to compare secondary 
 key. Secondary key can also be a complex data type, but for the first step, 
 we focus on simple secondary key, which is the most common use case.
 We mark this issue to be a candidate project for Google summer of code 2010 
 program. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Multiple successors

2010-07-20 Thread Daniel Dai

Hi, Swati,
The only logical operator can have multiple output is LOSplit. So until 
now, it is safe to assume logical operator only have 1 output except for 
LOSplit.


Daniel

Swati Jain wrote:

I noticed a number of places in the code where the successors of a
LogicalRelationalOperator is accessed as op.successors.get(0). Is it
always the case that logical relational operators (in the new logical
optimizer framework) have only 1 successor? Why dont the rules iterate over
the successors instead of assuming there is a single successor?

An example which shows an LOFilter having multiple successor (correct me if
I am wrong):

A1 = Load(..);
A2 = Load(..);
B = LOFilter(...);
C = LOJoin(A1,B);
D = LOJoin(A2,B);

Thanks!
Swati
  




[jira] Updated: (PIG-1507) Full outer join fails while doing a filter on joined data

2010-07-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1507:


Attachment: PIG-1507-1.patch

 Full outer join fails while doing a filter on joined data
 -

 Key: PIG-1507
 URL: https://issues.apache.org/jira/browse/PIG-1507
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1507-1.patch


 The following script produce wrong result:
 test1.dat:
 1
 2
 3
 test2.dat:
 1
 2
 pig script:
 {code}
 a = LOAD 'test1.dat' USING PigStorage() AS (d1:int);
 b = LOAD 'test2.dat' USING PigStorage() AS (d2:int);
 c = JOIN a BY d1 FULL OUTER, b BY d2;
 d = FILTER c BY d2 IS NULL;
 STORE d INTO 'test.out' USING PigStorage();
 {code}
 expected:
 3
 We get:
 1
 2
 3
 This is because we erroneously push the filter before full outer join. 
 Similar issue is addressed in 
 [PIG-1289|https://issues.apache.org/jira/browse/PIG-1289], but we only fix 
 left/right outer join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1507) Full outer join fails while doing a filter on joined data

2010-07-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1507:


Status: Patch Available  (was: Open)

 Full outer join fails while doing a filter on joined data
 -

 Key: PIG-1507
 URL: https://issues.apache.org/jira/browse/PIG-1507
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1507-1.patch


 The following script produce wrong result:
 test1.dat:
 1
 2
 3
 test2.dat:
 1
 2
 pig script:
 {code}
 a = LOAD 'test1.dat' USING PigStorage() AS (d1:int);
 b = LOAD 'test2.dat' USING PigStorage() AS (d2:int);
 c = JOIN a BY d1 FULL OUTER, b BY d2;
 d = FILTER c BY d2 IS NULL;
 STORE d INTO 'test.out' USING PigStorage();
 {code}
 expected:
 3
 We get:
 1
 2
 3
 This is because we erroneously push the filter before full outer join. 
 Similar issue is addressed in 
 [PIG-1289|https://issues.apache.org/jira/browse/PIG-1289], but we only fix 
 left/right outer join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: explicitly close a mr job

2010-07-20 Thread Daniel Dai
You can refer to MrCompiler.startNew. You need to add store to close 
current MapReduceOper, create a new MapReduceOper, add load, then add 
MapReduceOper to MRPlan.


Daniel

Gang Luo wrote:

Hi all,
when compile a physical plan into MR plan, the current rule is to put as many 
operator as possible into the reduce phase of the current mr job. But sometimes 
we want to control over this in physical plan. Say we want to put operator 1 
into reduce phase of current mr job, end it and then put operator 2 into map 
phase of the next mr job (both operator 1  2 are non-blocking). It seems 
inserting store and load operator in physical plan doesn't help. Is there a 
better way to do this than implementing new operators )e.g. starter and ender)


Thanks,
-Gang



  
  




[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-07-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: ScalarImpl1.patch

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-07-20 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: (was: ScalarImpl1.patch)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Announcing Howl development list

2010-07-20 Thread Jeff Hammerbacher

  A wiki page outlining Howl is at http://wiki.apache.org/pig/Howl

 A howldev mailing list has been set up on Yahoo! groups for discussions on
 Howl.  You can subscribe by sending mail to
 howldev-subscr...@yahoogroups.com.  We plan on putting the code on github
 in a read only repository.  It will be a few more days before we get there.
  It will be announced on the list when it is.


Awesome, thanks Alan!


[jira] Created: (PIG-1508) Make 'docs' target (forrest) work with Java 1.6

2010-07-20 Thread Carl Steinbach (JIRA)
Make 'docs' target (forrest) work with Java 1.6
---

 Key: PIG-1508
 URL: https://issues.apache.org/jira/browse/PIG-1508
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.7.0
Reporter: Carl Steinbach


FOR-984 covers the very inconvenient fact that Forrest 0.8 does not work with 
Java 1.6
The same ticket also suggests a workaround: disabling sitemap and stylesheet 
validation
by setting the forrest.validate.sitemap and forrest.validate.stylesheets 
properties to false.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1509) Add .gitignore file

2010-07-20 Thread Carl Steinbach (JIRA)
Add .gitignore file
---

 Key: PIG-1509
 URL: https://issues.apache.org/jira/browse/PIG-1509
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Carl Steinbach


Add a .gitignore file (equivalent to svn:ignore) for those using git-svn.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1508) Make 'docs' target (forrest) work with Java 1.6

2010-07-20 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated PIG-1508:


Status: Patch Available  (was: Open)

 Make 'docs' target (forrest) work with Java 1.6
 ---

 Key: PIG-1508
 URL: https://issues.apache.org/jira/browse/PIG-1508
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.7.0
Reporter: Carl Steinbach
 Attachments: PIG-1508.patch.txt


 FOR-984 covers the very inconvenient fact that Forrest 0.8 does not work with 
 Java 1.6
 The same ticket also suggests a workaround: disabling sitemap and stylesheet 
 validation
 by setting the forrest.validate.sitemap and forrest.validate.stylesheets 
 properties to false.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1508) Make 'docs' target (forrest) work with Java 1.6

2010-07-20 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated PIG-1508:


Attachment: PIG-1508.patch.txt

PIG-1508.patch.txt:
* set forrest.validate.sitemap=false in forrest.properties
* Remove java5 specific settings in build.xml
* Remove java5 specific settings in test-patch.sh


 Make 'docs' target (forrest) work with Java 1.6
 ---

 Key: PIG-1508
 URL: https://issues.apache.org/jira/browse/PIG-1508
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.7.0
Reporter: Carl Steinbach
 Attachments: PIG-1508.patch.txt


 FOR-984 covers the very inconvenient fact that Forrest 0.8 does not work with 
 Java 1.6
 The same ticket also suggests a workaround: disabling sitemap and stylesheet 
 validation
 by setting the forrest.validate.sitemap and forrest.validate.stylesheets 
 properties to false.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1509) Add .gitignore file

2010-07-20 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated PIG-1509:


Status: Patch Available  (was: Open)

 Add .gitignore file
 ---

 Key: PIG-1509
 URL: https://issues.apache.org/jira/browse/PIG-1509
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Carl Steinbach
 Attachments: PIG-1509.patch.txt


 Add a .gitignore file (equivalent to svn:ignore) for those using git-svn.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1509) Add .gitignore file

2010-07-20 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated PIG-1509:


Attachment: PIG-1509.patch.txt

 Add .gitignore file
 ---

 Key: PIG-1509
 URL: https://issues.apache.org/jira/browse/PIG-1509
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Carl Steinbach
 Attachments: PIG-1509.patch.txt


 Add a .gitignore file (equivalent to svn:ignore) for those using git-svn.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1309) Map-side Cogroup

2010-07-20 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1309:


Fix Version/s: 0.7.0

 Map-side Cogroup
 

 Key: PIG-1309
 URL: https://issues.apache.org/jira/browse/PIG-1309
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0, 0.8.0

 Attachments: mapsideCogrp.patch, pig-1309_1.patch, pig-1309_2.patch, 
 PIG_1309_7.patch


 In never ending quest to make Pig go faster, we want to parallelize as many 
 relational operations as possible. Its already possible to do Group-by( 
 PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira 
 is to add map-side implementation of Cogroup in Pig. Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1507) Full outer join fails while doing a filter on joined data

2010-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890521#action_12890521
 ] 

Hadoop QA commented on PIG-1507:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449962/PIG-1507-1.patch
  against trunk revision 965559.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/348/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/348/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/348/console

This message is automatically generated.

 Full outer join fails while doing a filter on joined data
 -

 Key: PIG-1507
 URL: https://issues.apache.org/jira/browse/PIG-1507
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1507-1.patch


 The following script produce wrong result:
 test1.dat:
 1
 2
 3
 test2.dat:
 1
 2
 pig script:
 {code}
 a = LOAD 'test1.dat' USING PigStorage() AS (d1:int);
 b = LOAD 'test2.dat' USING PigStorage() AS (d2:int);
 c = JOIN a BY d1 FULL OUTER, b BY d2;
 d = FILTER c BY d2 IS NULL;
 STORE d INTO 'test.out' USING PigStorage();
 {code}
 expected:
 3
 We get:
 1
 2
 3
 This is because we erroneously push the filter before full outer join. 
 Similar issue is addressed in 
 [PIG-1289|https://issues.apache.org/jira/browse/PIG-1289], but we only fix 
 left/right outer join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.