date:20091223

[
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793971#action_12793971
]

Gerrit Jansen van Vuuren commented on PIG-1117:
---

OK, will upload the 0.7.0 implementation today, It will still not have an
implementation for fieldsToRead just empty method. I'll have a look at it after
xmas.

Pig reading hive columnar rc tables
---

Key: PIG-1117
URL: https://issues.apache.org/jira/browse/PIG-1117
Project: Pig
Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Gerrit Jansen van Vuuren
Fix For: 0.7.0

Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch,
PIG-1117.patch, PIG-117-v.0.6.0.patch

I've coded a LoadFunc implementation that can read from Hive Columnar RC
tables, this is needed for a project that I'm working on because all our data
is stored using the Hive thrift serialized Columnar RC format. I have looked
at the piggy bank but did not find any implementation that could do this.
We've been running it on our cluster for the last week and have worked out
most bugs.

There are still some improvements to be done but I would need like setting
the amount of mappers based on date partitioning. Its been optimized so as to
read only specific columns and can churn through a data set almost 8 times
faster with this improvement because not all column data is read.
I would like to contribute the class to the piggybank can you guide me in
what I need to do?
I've used hive specific classes to implement this, is it possible to add this
to the piggy bank build ivy for automatic download of the dependencies?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1117) Pig reading hive columnar rc tables


 [ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerrit Jansen van Vuuren updated PIG-1117:
--

Affects Version/s: (was: 0.6.0)
   Status: Open  (was: Patch Available)

 Pig reading hive columnar rc tables
 ---

 Key: PIG-1117
 URL: https://issues.apache.org/jira/browse/PIG-1117
 Project: Pig
  Issue Type: New Feature
Reporter: Gerrit Jansen van Vuuren
 Fix For: 0.7.0

 Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
 PIG-1117.patch, PIG-117-v.0.6.0.patch


 I've coded a LoadFunc implementation that can read from Hive Columnar RC 
 tables, this is needed for a project that I'm working on because all our data 
 is stored using the Hive thrift serialized Columnar RC format. I have looked 
 at the piggy bank but did not find any implementation that could do this. 
 We've been running it on our cluster for the last week and have worked out 
 most bugs.
  
 There are still some improvements to be done but I would need  like setting 
 the amount of mappers based on date partitioning. Its been optimized so as to 
 read only specific columns and can churn through a data set almost 8 times 
 faster with this improvement because not all column data is read.
 I would like to contribute the class to the piggybank can you guide me in 
 what I need to do?
 I've used hive specific classes to implement this, is it possible to add this 
 to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1117) Pig reading hive columnar rc tables

[
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gerrit Jansen van Vuuren updated PIG-1117:
--

Attachment: PIG-117-v.0.7.0.patch

Changes:
- Slicing done per block and not per file.
- Automatic download of hive dependencies from the apache website. This is
only done once.
- Added empty implementation for fieldsToRead (will implement this soon).
- Refactored out code duplication.
- Changed Byte value to be cast to Integer
- Changed Boolean values to be 1 if true else 0

Test: ant hive-test
Jar: ant hive-jar

Dependencies:
The hive_exec.jar needs to be either in the classpath for all task nodes or
registered in the pig script
e.g REGISTER hive_exec.jar
REGISTER piggybank.jar

Pig reading hive columnar rc tables
---

Key: PIG-1117
URL: https://issues.apache.org/jira/browse/PIG-1117
Project: Pig
Issue Type: New Feature
Reporter: Gerrit Jansen van Vuuren
Fix For: 0.7.0

Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch,
PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1117) Pig reading hive columnar rc tables


 [ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerrit Jansen van Vuuren updated PIG-1117:
--

 Tags: PIG-117-v.0.7.0.patch  (was: PIG-117-v.0.6.0.patch)
Affects Version/s: 0.7.0
   Status: Patch Available  (was: Open)

 Pig reading hive columnar rc tables
 ---

 Key: PIG-1117
 URL: https://issues.apache.org/jira/browse/PIG-1117
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Gerrit Jansen van Vuuren
 Fix For: 0.7.0

 Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
 PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch


 I've coded a LoadFunc implementation that can read from Hive Columnar RC 
 tables, this is needed for a project that I'm working on because all our data 
 is stored using the Hive thrift serialized Columnar RC format. I have looked 
 at the piggy bank but did not find any implementation that could do this. 
 We've been running it on our cluster for the last week and have worked out 
 most bugs.
  
 There are still some improvements to be done but I would need  like setting 
 the amount of mappers based on date partitioning. Its been optimized so as to 
 read only specific columns and can churn through a data set almost 8 times 
 faster with this improvement because not all column data is read.
 I would like to contribute the class to the piggybank can you guide me in 
 what I need to do?
 I've used hive specific classes to implement this, is it possible to add this 
 to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-761) ERROR 2086 on simple JOIN

2009-12-23 Thread Ankur (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794005#action_12794005
 ] 

Ankur commented on PIG-761:
---

Here is a very simple script to reproduce the issue:-

- Start -
data1 = LOAD 'data1' as (a:int, b:int, c:chararray);
proj1 = LIMIT data1 5;

data2 = LOAD 'data2' as (x:int, y:chararray, z:chararray);
proj2 = FOREACH data2 GENERATE x, y;

cogrouped = COGROUP proj1 BY a, proj2 BY x INNER PARALLEL 2;
joined = FOREACH cogrouped GENERATE FLATTEN(proj1), FLATTEN(proj2);

store joined into 'results';
- End 

The problem seems to be with the LIMIT operator for one of the relations 
participating in the join.  Seems like this causes the mismatch between 
expected and found local re-arrange operators

 ERROR 2086 on simple JOIN
 -

 Key: PIG-761
 URL: https://issues.apache.org/jira/browse/PIG-761
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
 Environment: mapreduce mode
Reporter: Vadim Zaliva

 ERROR 2086: Unexpected problem during optimization. Could not find all 
 LocalRearrange operators.org.apache.pig.impl.logicalLayer.FrontendException: 
 ERROR 1002: Unable to store alias 109
 doing pretty straightforward join in one of my pig scripts. I am able to 
 'dump' both relationship involved in this join. when I try to join them I am 
 getting this error.
 Here is a full log:
 ERROR 2086: Unexpected problem during optimization. Could not find all
 LocalRearrange operators.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
 to store alias 109
at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:319)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
 2043: Unexpected error during execution.
at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274)
at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:700)
at org.apache.pig.PigServer.execute(PigServer.java:691)
at org.apache.pig.PigServer.registerQuery(PigServer.java:292)
... 5 more
 Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException:
 ERROR 2086: Unexpected problem during optimization. Could not find all
 LocalRearrange operators.
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.handlePackage(POPackageAnnotator.java:116)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.visitMROp(POPackageAnnotator.java:88)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:194)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:43)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
 MapReduceLauncher.compile(MapReduceLauncher.java:198)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:80)
at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
... 8 more
 ERROR 1002: Unable to store alias 398
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
 to store alias 398
at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:319)
 Caused by: java.lang.NullPointerException
at

[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

[
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794077#action_12794077
]

Hadoop QA commented on PIG-1117:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12428803/PIG-117-v.0.7.0.patch
against trunk revision 893373.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 5 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

-1 release audit. The applied patch generated 410 release audit warnings
(more than the trunk's current 408 warnings).

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/testReport/
Release audit warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/console

This message is automatically generated.

Pig reading hive columnar rc tables
---

Key: PIG-1117
URL: https://issues.apache.org/jira/browse/PIG-1117
Project: Pig
Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Gerrit Jansen van Vuuren
Fix For: 0.7.0

Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch,
PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)


 [ 
https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang reassigned PIG-1166:
---

Assignee: Jeff Zhang

 A bit change of the interface of Tuple  DataBag ( make the set and append 
 method return this)
 --

 Key: PIG-1166
 URL: https://issues.apache.org/jira/browse/PIG-1166
 Project: Pig
  Issue Type: Improvement
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Minor

 When people write unit test for UDF, they always need to build a tuple or 
 bag. If we change the interface of Tuple and DataBag,  make the set and 
 append method return this, it can decrease the code size.  e.g. Now people 
 have to write the following code to build a Tuple:
 {code}
 Tuple tuple=TupleFactory.getInstance().newTuple(3);
 tuple.set(0,item_0);
 tuple.set(1,item_1);
 tuple.set(2,item_2);
 {code}
 If we change the interface,  make the set and append method return this, we 
 can rewrite the above code like this:
 {code}
 Tuple tuple=TupleFactory.getInstance().newTuple(3);
 tuple.set(0,item_0).set(1,item_1).set(2,item_2);
 {code}
 This interface change won't have back compatibility problem and I think 
 there's no performance problem too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1148) Move splitable logic from pig latin to InputFormat


 [ 
https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-1148:


Status: Patch Available  (was: Open)

 Move splitable logic from pig latin to InputFormat
 --

 Key: PIG-1148
 URL: https://issues.apache.org/jira/browse/PIG-1148
 Project: Pig
  Issue Type: Sub-task
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: PIG-1148.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)


 [ 
https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-1166:


Status: Patch Available  (was: Open)

 A bit change of the interface of Tuple  DataBag ( make the set and append 
 method return this)
 --

 Key: PIG-1166
 URL: https://issues.apache.org/jira/browse/PIG-1166
 Project: Pig
  Issue Type: Improvement
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Minor
 Attachments: Pig_1166.patch


 When people write unit test for UDF, they always need to build a tuple or 
 bag. If we change the interface of Tuple and DataBag,  make the set and 
 append method return this, it can decrease the code size.  e.g. Now people 
 have to write the following code to build a Tuple:
 {code}
 Tuple tuple=TupleFactory.getInstance().newTuple(3);
 tuple.set(0,item_0);
 tuple.set(1,item_1);
 tuple.set(2,item_2);
 {code}
 If we change the interface,  make the set and append method return this, we 
 can rewrite the above code like this:
 {code}
 Tuple tuple=TupleFactory.getInstance().newTuple(3);
 tuple.set(0,item_0).set(1,item_1).set(2,item_2);
 {code}
 This interface change won't have back compatibility problem and I think 
 there's no performance problem too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)


 [ 
https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-1166:


Attachment: Pig_1166.patch

 A bit change of the interface of Tuple  DataBag ( make the set and append 
 method return this)
 --

 Key: PIG-1166
 URL: https://issues.apache.org/jira/browse/PIG-1166
 Project: Pig
  Issue Type: Improvement
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Minor
 Attachments: Pig_1166.patch


 When people write unit test for UDF, they always need to build a tuple or 
 bag. If we change the interface of Tuple and DataBag,  make the set and 
 append method return this, it can decrease the code size.  e.g. Now people 
 have to write the following code to build a Tuple:
 {code}
 Tuple tuple=TupleFactory.getInstance().newTuple(3);
 tuple.set(0,item_0);
 tuple.set(1,item_1);
 tuple.set(2,item_2);
 {code}
 If we change the interface,  make the set and append method return this, we 
 can rewrite the above code like this:
 {code}
 Tuple tuple=TupleFactory.getInstance().newTuple(3);
 tuple.set(0,item_0).set(1,item_1).set(2,item_2);
 {code}
 This interface change won't have back compatibility problem and I think 
 there's no performance problem too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1148) Move splitable logic from pig latin to InputFormat


[ 
https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794085#action_12794085
 ] 

Hadoop QA commented on PIG-1148:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428829/PIG-1148.patch
  against trunk revision 893373.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 27 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/154/console

This message is automatically generated.

 Move splitable logic from pig latin to InputFormat
 --

 Key: PIG-1148
 URL: https://issues.apache.org/jira/browse/PIG-1148
 Project: Pig
  Issue Type: Sub-task
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: PIG-1148.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1094) Fix unit tests corresponding to source changes so far


 [ 
https://issues.apache.org/jira/browse/PIG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1094:
---

Attachment: PIG-1094_5.patch

This patch (PIG-1094_5.patch) fixes the order-by, skew-join and merge-join test 
failures.

TestPoissonSampleLoader.java - testNumSamples() - Unlike earlier version of 
sampler, if there are very few rows (3 in this case) only one sample is 
selected.
WeightedRangePartitioner.java - If the sample file is empty, there was a check 
to ensure that the input is also empty , using FileLocalizer.getSize(). Removed 
that check. Input location need not be a file. 
PoissonSampleLoader.java - additional comments, fixed indentation .
GetMemNumRows.java - handling the case where 2nd last column is null (while 
looking for the specially marked last tuple).


output of test-patch -
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


 Fix unit tests corresponding to source changes so far
 -

 Key: PIG-1094
 URL: https://issues.apache.org/jira/browse/PIG-1094
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1094.patch, PIG-1094_2.patch, PIG-1094_3.patch, 
 PIG-1094_4.patch, PIG-1094_5.patch


 The check-in's so far on load-store-redesign branch have nor addressed unit 
 test failures due to interface changes. This jira is to track the task of 
 making the common case unit tests work with the new interfaces. Some aspects 
 of the new proposal like using LoadCaster interface for casting, making local 
 mode work have not been completed yet. Tests which are failing due to those 
 reasons will not be fixed in this jira and addressed in the jiras 
 corresponding to those tasks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'

2009-12-23 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1136:
-

Status: Patch Available  (was: Open)

 [zebra] Map Split of Storage info do not allow for leading underscore char '_'
 --

 Key: PIG-1136
 URL: https://issues.apache.org/jira/browse/PIG-1136
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Priority: Minor
 Attachments: pig-1136-xuefu.patch


 There is some user need to support that type of map keys. Pig's column does 
 not allow for leading underscore, but apparently no restriction is placed on 
 the map key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'


[ 
https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794143#action_12794143
 ] 

Yan Zhou commented on PIG-1136:
---

Patch reviewed +1

 [zebra] Map Split of Storage info do not allow for leading underscore char '_'
 --

 Key: PIG-1136
 URL: https://issues.apache.org/jira/browse/PIG-1136
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Priority: Minor
 Attachments: pig-1136-xuefu.patch


 There is some user need to support that type of map keys. Pig's column does 
 not allow for leading underscore, but apparently no restriction is placed on 
 the map key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces


[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794158#action_12794158
 ] 

Pradeep Kamath commented on PIG-1090:
-

Dmitriy,
 The method in LoadMetadata that is implemented in PIG-1090-4.patch is to set 
partition filter and not to implement filter pushdown in general. Only 
partition filter conditions are pushed down through LoadMetadata as per the 
redesign proposal. As you rightly pointed pushing down filters in general will 
be done through the LoadPushDown interface which currently only has a 
pushProjection method - at a later point when Pig is able to push down filters, 
a pushFilter method can be introduced. It is not currently present because we 
don't know what the argument would look like eventually when we do push down 
filters. The optimization in the patch attached to this jira is only to extract 
conditions on partition columns which is needed to be able to call 
LoadMetadata.setPartitionFitler() method and hence was added in this patch.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, 
 PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-761) ERROR 2086 on simple JOIN


 [ 
https://issues.apache.org/jira/browse/PIG-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-761:
---

Attachment: PIG-761-1.patch

The problem lies in the complexity between limit and one of the optimization. 
More specific, optimization POPackageAnnotator search for matching 
POLocalRearrange in the map plan, if not, search in the predecessor's reduce 
plan. However, if we have a limit, limit will introduce a map-reduce job 
between the original map-reduce job and its predecessor. POPackageAnnotator 
cannot find the POLocalRearrange then. To fix it, we mark the map reduce job 
introduced by limit, and in POPackageAnnotator, if we saw a limit map reduce 
job, we will search POLocalRearrange in limit job's parent.

 ERROR 2086 on simple JOIN
 -

 Key: PIG-761
 URL: https://issues.apache.org/jira/browse/PIG-761
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
 Environment: mapreduce mode
Reporter: Vadim Zaliva
 Fix For: 0.6.0

 Attachments: PIG-761-1.patch


 ERROR 2086: Unexpected problem during optimization. Could not find all 
 LocalRearrange operators.org.apache.pig.impl.logicalLayer.FrontendException: 
 ERROR 1002: Unable to store alias 109
 doing pretty straightforward join in one of my pig scripts. I am able to 
 'dump' both relationship involved in this join. when I try to join them I am 
 getting this error.
 Here is a full log:
 ERROR 2086: Unexpected problem during optimization. Could not find all
 LocalRearrange operators.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
 to store alias 109
at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:319)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
 2043: Unexpected error during execution.
at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274)
at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:700)
at org.apache.pig.PigServer.execute(PigServer.java:691)
at org.apache.pig.PigServer.registerQuery(PigServer.java:292)
... 5 more
 Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException:
 ERROR 2086: Unexpected problem during optimization. Could not find all
 LocalRearrange operators.
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.handlePackage(POPackageAnnotator.java:116)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.visitMROp(POPackageAnnotator.java:88)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:194)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:43)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
 MapReduceLauncher.compile(MapReduceLauncher.java:198)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:80)
at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
... 8 more
 ERROR 1002: Unable to store alias 398
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
 to store alias 398
at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:319)
 Caused by: java.lang.NullPointerException
at

[jira] Updated: (PIG-761) ERROR 2086 on simple JOIN


 [ 
https://issues.apache.org/jira/browse/PIG-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-761:
---

Fix Version/s: 0.6.0
 Assignee: Daniel Dai
   Status: Patch Available  (was: Open)

 ERROR 2086 on simple JOIN
 -

 Key: PIG-761
 URL: https://issues.apache.org/jira/browse/PIG-761
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
 Environment: mapreduce mode
Reporter: Vadim Zaliva
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-761-1.patch


 ERROR 2086: Unexpected problem during optimization. Could not find all 
 LocalRearrange operators.org.apache.pig.impl.logicalLayer.FrontendException: 
 ERROR 1002: Unable to store alias 109
 doing pretty straightforward join in one of my pig scripts. I am able to 
 'dump' both relationship involved in this join. when I try to join them I am 
 getting this error.
 Here is a full log:
 ERROR 2086: Unexpected problem during optimization. Could not find all
 LocalRearrange operators.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
 to store alias 109
at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:319)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
 2043: Unexpected error during execution.
at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274)
at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:700)
at org.apache.pig.PigServer.execute(PigServer.java:691)
at org.apache.pig.PigServer.registerQuery(PigServer.java:292)
... 5 more
 Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException:
 ERROR 2086: Unexpected problem during optimization. Could not find all
 LocalRearrange operators.
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.handlePackage(POPackageAnnotator.java:116)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.visitMROp(POPackageAnnotator.java:88)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:194)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:43)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at 
 org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
 MapReduceLauncher.compile(MapReduceLauncher.java:198)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:80)
at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
... 8 more
 ERROR 1002: Unable to store alias 398
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
 to store alias 398
at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:319)
 Caused by: java.lang.NullPointerException
at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:669)
at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:330)
at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:41)
at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at

[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces


[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794179#action_12794179
 ] 

Pradeep Kamath commented on PIG-1090:
-

+1 for PIG-1090-5.patch, patch committed to load-store-redesign branch.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, 
 PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'

2009-12-23 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1136:
-

Attachment: (was: pig-1136-xuefu.patch)

 [zebra] Map Split of Storage info do not allow for leading underscore char '_'
 --

 Key: PIG-1136
 URL: https://issues.apache.org/jira/browse/PIG-1136
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Priority: Minor
 Attachments: pig-1136-xuefu-new.patch


 There is some user need to support that type of map keys. Pig's column does 
 not allow for leading underscore, but apparently no restriction is placed on 
 the map key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'

2009-12-23 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1136:
-

Attachment: pig-1136-xuefu-new.patch

 [zebra] Map Split of Storage info do not allow for leading underscore char '_'
 --

 Key: PIG-1136
 URL: https://issues.apache.org/jira/browse/PIG-1136
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Priority: Minor
 Attachments: pig-1136-xuefu-new.patch


 There is some user need to support that type of map keys. Pig's column does 
 not allow for leading underscore, but apparently no restriction is placed on 
 the map key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1170) [zebra] end to end test and stress test

2009-12-23 Thread Jing Huang (JIRA)

[zebra] end to end test and stress test
---

 Key: PIG-1170
 URL: https://issues.apache.org/jira/browse/PIG-1170
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0


Add test cases for zebra end 2 end test , stress test and  stress test 
verification tool. 
No unit test is needed for this jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1146) Inconsistent column pruning in LOUnion


 [ 
https://issues.apache.org/jira/browse/PIG-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1146:


Status: Patch Available  (was: Open)

 Inconsistent column pruning in LOUnion
 --

 Key: PIG-1146
 URL: https://issues.apache.org/jira/browse/PIG-1146
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1146-1.patch, PIG-1146-2.patch


 This happens when we do a union on two relations, if one column comes from a 
 loader, the other matching column comes from a constant, and this column get 
 pruned. We prune for the one from loader and did not prune the constant. Thus 
 leaves union an inconsistent state. Here is a script:
 {code}
 a = load '1.txt' as (a0, a1:chararray, a2);
 b = load '2.txt' as (b0, b2);
 c = foreach b generate b0, 'hello', b2;
 d = union a, c;
 e = foreach d generate $0, $2;
 dump e;
 {code}
 1.txt: 
 {code}
 ulysses thompson64  1.90
 katie carson25  3.65
 {code}
 2.txt:
 {code}
 luke king   0.73
 holly davidson  2.43
 {code}
 expected output:
 (ulysses thompson,1.90)
 (katie carson,3.65)
 (luke king,0.73)
 (holly davidson,2.43)
 real output:
 (ulysses thompson,)
 (katie carson,)
 (luke king,0.73)
 (holly davidson,2.43)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)


[ 
https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794204#action_12794204
 ] 

Hadoop QA commented on PIG-1166:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428831/Pig_1166.patch
  against trunk revision 893373.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 420 release audit warnings 
(more than the trunk's current 413 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/155/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/155/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/155/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/155/console

This message is automatically generated.

 A bit change of the interface of Tuple  DataBag ( make the set and append 
 method return this)
 --

 Key: PIG-1166
 URL: https://issues.apache.org/jira/browse/PIG-1166
 Project: Pig
  Issue Type: Improvement
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Minor
 Attachments: Pig_1166.patch


 When people write unit test for UDF, they always need to build a tuple or 
 bag. If we change the interface of Tuple and DataBag,  make the set and 
 append method return this, it can decrease the code size.  e.g. Now people 
 have to write the following code to build a Tuple:
 {code}
 Tuple tuple=TupleFactory.getInstance().newTuple(3);
 tuple.set(0,item_0);
 tuple.set(1,item_1);
 tuple.set(2,item_2);
 {code}
 If we change the interface,  make the set and append method return this, we 
 can rewrite the above code like this:
 {code}
 Tuple tuple=TupleFactory.getInstance().newTuple(3);
 tuple.set(0,item_0).set(1,item_1).set(2,item_2);
 {code}
 This interface change won't have back compatibility problem and I think 
 there's no performance problem too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1146) Inconsistent column pruning in LOUnion


 [ 
https://issues.apache.org/jira/browse/PIG-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1146:


Status: Open  (was: Patch Available)

 Inconsistent column pruning in LOUnion
 --

 Key: PIG-1146
 URL: https://issues.apache.org/jira/browse/PIG-1146
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1146-1.patch, PIG-1146-2.patch


 This happens when we do a union on two relations, if one column comes from a 
 loader, the other matching column comes from a constant, and this column get 
 pruned. We prune for the one from loader and did not prune the constant. Thus 
 leaves union an inconsistent state. Here is a script:
 {code}
 a = load '1.txt' as (a0, a1:chararray, a2);
 b = load '2.txt' as (b0, b2);
 c = foreach b generate b0, 'hello', b2;
 d = union a, c;
 e = foreach d generate $0, $2;
 dump e;
 {code}
 1.txt: 
 {code}
 ulysses thompson64  1.90
 katie carson25  3.65
 {code}
 2.txt:
 {code}
 luke king   0.73
 holly davidson  2.43
 {code}
 expected output:
 (ulysses thompson,1.90)
 (katie carson,3.65)
 (luke king,0.73)
 (holly davidson,2.43)
 real output:
 (ulysses thompson,)
 (katie carson,)
 (luke king,0.73)
 (holly davidson,2.43)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

[
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794212#action_12794212
]

Thejas M Nair commented on PIG-1090:

I have reviewed the changes related to partition filter extraction.
* The case where load statement has a user defined schema with different column
names for partition column needs to be handled.
* src/org/apache/pig/LoadMetadata.java - I think we should document in the
comments that the load function does not have to implement setParitionFilter
even if it implements other parts of LoadMetadata interface. And that it can
communicate this to pig by returning null in getPartitionKeys.
* src/org/apache/pig/Expression.java - in BinaryExpression.toString() , need to
add parenthesis around the arguments , if they are binary expressions so that
the string represents the correct operator precedence as specified in the
filter condition. eg (a = 1 or b = 1) and c = 1 now gets converted to a = 1 or
b = 1 and c = 1 .
* src/org/apache/pig/Expression.java - in Const.toString() - It will be better
to use single quotes instead of double quotes around string constants, as
string literals in SQL (standard) and pig-latin are single-quoted .

Update sources to reflect recent changes in load-store interfaces
-

Key: PIG-1090
URL: https://issues.apache.org/jira/browse/PIG-1090
Project: Pig
Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
Attachments: PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch,
PIG-1090.patch, PIG-1190-5.patch

There have been some changes (as recorded in the Changes Section, Nov 2 2009
sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the
load/store interfaces - this jira is to track the task of making those
changes under src. Changes under test will be addresses in a different jira.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces


[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794213#action_12794213
 ] 

Thejas M Nair commented on PIG-1090:


My previous comment is regardingPIG-1090-4.patch .

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, 
 PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1164) [zebra]smoke test

2009-12-23 Thread Jing Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Huang updated PIG-1164:


Attachment: smoke.patch

patch for zebra smoke test

 [zebra]smoke test
 -

 Key: PIG-1164
 URL: https://issues.apache.org/jira/browse/PIG-1164
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0

 Attachments: smoke.patch


 Change zebra build.xml file to add smoke target. 
 And env.sh and run script under zebra/src/test/smoke dir

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1102) Collect number of spills per job

2009-12-23 Thread Sriranjan Manjunath (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794217#action_12794217
]

Sriranjan Manjunath commented on PIG-1102:
--

(3) refers to the case where we try to guess the number of records that fit
into memory and start spilling the other records. InternalCachedBag.java
addresses this case:

+if (cacheLimit!= 0 mContents.size() % cacheLimit == 0) {
+/* Increment the spill count*/
+incSpillCount(PigCounters.PROACTIVE_SPILL_COUNT);

+}
}

cacheLimit holds the number of records that can be held in memory whereas
mContents is the tuple that holds all the records. Here, I do not increment the
counter for every record. Instead I count every n'th record, n being the
cacheLimit.

This however, does not increment the counter by the buffer size. Incrementing
it by the buffer size will give us a value which approximately equal to the
number of spilled records.

Collect number of spills per job

Key: PIG-1102
URL: https://issues.apache.org/jira/browse/PIG-1102
Project: Pig
Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
Fix For: 0.7.0

Attachments: PIG_1102.patch, PIG_1102.patch.1

Memory shortage is one of the main performance issues in Pig. Knowing when we
spill do the disk is useful for understanding query performance and also to
see how certain changes in Pig effect that.
Other interesting stats to collect would be average CPU usage and max mem
usage but I am not sure if this information is easily retrievable.
Using Hadoop counters for this would make sense.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1170) [zebra] end to end test and stress test

2009-12-23 Thread Jing Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Huang updated PIG-1170:


Attachment: e2eStress.patch

zebra e2e and stress test patch.
No unit test is need. 

 [zebra] end to end test and stress test
 ---

 Key: PIG-1170
 URL: https://issues.apache.org/jira/browse/PIG-1170
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0

 Attachments: e2eStress.patch


 Add test cases for zebra end 2 end test , stress test and  stress test 
 verification tool. 
 No unit test is needed for this jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1164) [zebra]smoke test


 [ 
https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1164:
--

Status: Patch Available  (was: Open)

 [zebra]smoke test
 -

 Key: PIG-1164
 URL: https://issues.apache.org/jira/browse/PIG-1164
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0

 Attachments: smoke.patch


 Change zebra build.xml file to add smoke target. 
 And env.sh and run script under zebra/src/test/smoke dir

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1170) [zebra] end to end test and stress test


 [ 
https://issues.apache.org/jira/browse/PIG-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1170:
--

Status: Patch Available  (was: Open)

 [zebra] end to end test and stress test
 ---

 Key: PIG-1170
 URL: https://issues.apache.org/jira/browse/PIG-1170
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0

 Attachments: e2eStress.patch


 Add test cases for zebra end 2 end test , stress test and  stress test 
 verification tool. 
 No unit test is needed for this jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1146) Inconsistent column pruning in LOUnion


[ 
https://issues.apache.org/jira/browse/PIG-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794232#action_12794232
 ] 

Pradeep Kamath commented on PIG-1146:
-

+1

 Inconsistent column pruning in LOUnion
 --

 Key: PIG-1146
 URL: https://issues.apache.org/jira/browse/PIG-1146
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1146-1.patch, PIG-1146-2.patch


 This happens when we do a union on two relations, if one column comes from a 
 loader, the other matching column comes from a constant, and this column get 
 pruned. We prune for the one from loader and did not prune the constant. Thus 
 leaves union an inconsistent state. Here is a script:
 {code}
 a = load '1.txt' as (a0, a1:chararray, a2);
 b = load '2.txt' as (b0, b2);
 c = foreach b generate b0, 'hello', b2;
 d = union a, c;
 e = foreach d generate $0, $2;
 dump e;
 {code}
 1.txt: 
 {code}
 ulysses thompson64  1.90
 katie carson25  3.65
 {code}
 2.txt:
 {code}
 luke king   0.73
 holly davidson  2.43
 {code}
 expected output:
 (ulysses thompson,1.90)
 (katie carson,3.65)
 (luke king,0.73)
 (holly davidson,2.43)
 real output:
 (ulysses thompson,)
 (katie carson,)
 (luke king,0.73)
 (holly davidson,2.43)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1148) Move splitable logic from pig latin to InputFormat


 [ 
https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1148:


  Resolution: Fixed
Release Note: split by 'file' is not not allowed as part of the load 
statement to process input files in one map. To achieve this users will have to 
use an InputFormat in the loader which can return one split for the whole file.
Hadoop Flags: [Incompatible change, Reviewed]
  Status: Resolved  (was: Patch Available)

+1, Thanks for the contribution Jeff - I have committed this patch on your 
behalf.

 Move splitable logic from pig latin to InputFormat
 --

 Key: PIG-1148
 URL: https://issues.apache.org/jira/browse/PIG-1148
 Project: Pig
  Issue Type: Sub-task
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: PIG-1148.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces


[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794253#action_12794253
 ] 

Daniel Dai commented on PIG-1090:
-

Regarding to PIG-1090-4.patch, In LOLoad.getSchema, we shall remove the lines 
to setup pig.loader.signature. In the new design, UDF writers should use 
signature inside the LoadFun to keep track of signature rather than the 
Configuration.

Other part relate to signature and push projection looks good to me.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, 
 PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1094) Fix unit tests corresponding to source changes so far


 [ 
https://issues.apache.org/jira/browse/PIG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1094:
---

Attachment: PIG-1094_6.patch

This patch replaces PIG-1094_5.patch

As per Pradeep's suggestion, keeping the check in WeightedRangePartitioner.java 
to ensure that input is empty if sample file is empty.
Also merged with latest changes in LSR branch.


 Fix unit tests corresponding to source changes so far
 -

 Key: PIG-1094
 URL: https://issues.apache.org/jira/browse/PIG-1094
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1094.patch, PIG-1094_2.patch, PIG-1094_3.patch, 
 PIG-1094_4.patch, PIG-1094_5.patch, PIG-1094_6.patch


 The check-in's so far on load-store-redesign branch have nor addressed unit 
 test failures due to interface changes. This jira is to track the task of 
 making the common case unit tests work with the new interfaces. Some aspects 
 of the new proposal like using LoadCaster interface for casting, making local 
 mode work have not been completed yet. Tests which are failing due to those 
 reasons will not be fixed in this jira and addressed in the jiras 
 corresponding to those tasks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-1169) Problems with some top N queries

2009-12-23 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-1169:
-

Assignee: Richard Ding

 Problems with some top N queries
 

 Key: PIG-1169
 URL: https://issues.apache.org/jira/browse/PIG-1169
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding

 Recently, a couple of problems related to the Top N queries were reported by 
 users.
 * From Chuang Liu:
 We tried to get top N results after a groupby and sort, and got different 
 results with or without storing the full sorted results. Here is a skeleton 
 of our pig script.
 {code}
 raw_data = Load 'input_files' AS (f1, f2, ..., fn);
 grouped = group raw_data by (f1, f2);
 data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value;
 ordered = order data by value DESC parallel 10;
 topn = limit ordered 10;
 store ordered into 'outputdir/full';
 store topn into 'outputdir/topn';
 {code}
 With the statement 'store ordered ...', top N results are incorrect, but 
 without the statement, results are correct. Has anyone seen this before? I 
 know a similar bug has been fixed in the multi-query release. We are on pig
 .4 and hadoop .20.1.
 * From Corry Haines:
 I am not sure if this is a bug, or something more subtle, but here is the 
 problem that I am having.
 When I LOAD a dataset, change it with an ORDER, LIMIT it, then CROSS it with 
 itself, the results are not correct. I expect to see the cross of the 
 limited, ordered dataset, but instead I see the cross of the limited dataset. 
 Effectively, its like the LIMIT is being excluded.
 Pig Version: 0.5.0
 Hadoop Version: 0.20.1
 I would greatly appreciate some help, as this is somewhat frustrating.
 Example code (and output) follows:
 {code}
 A = load 'foo' as (f1:int, f2:int, f3:int); B = load 'foo' as (f1:int, 
 f2:int, f3:int);
 a = ORDER A BY f1 DESC;
 b = ORDER B BY f1 DESC;
 aa = LIMIT a 1;
 bb = LIMIT b 1;
 C = CROSS aa, bb;
 DUMP C;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'

[
https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794281#action_12794281
]

Hadoop QA commented on PIG-1136:

+1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12428866/pig-1136-xuefu-new.patch
against trunk revision 893373.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/156/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/156/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/156/console

This message is automatically generated.

[zebra] Map Split of Storage info do not allow for leading underscore char '_'
--

Key: PIG-1136
URL: https://issues.apache.org/jira/browse/PIG-1136
Project: Pig
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Priority: Minor
Attachments: pig-1136-xuefu-new.patch

There is some user need to support that type of map keys. Pig's column does
not allow for leading underscore, but apparently no restriction is placed on
the map key.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'


 [ 
https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1136:
--

   Resolution: Fixed
Fix Version/s: 0.7.0
   Status: Resolved  (was: Patch Available)

Patch committed to Apache trunk.

 [zebra] Map Split of Storage info do not allow for leading underscore char '_'
 --

 Key: PIG-1136
 URL: https://issues.apache.org/jira/browse/PIG-1136
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Priority: Minor
 Fix For: 0.7.0

 Attachments: pig-1136-xuefu-new.patch


 There is some user need to support that type of map keys. Pig's column does 
 not allow for leading underscore, but apparently no restriction is placed on 
 the map key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)


[ 
https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794301#action_12794301
 ] 

Jeff Zhang commented on PIG-1166:
-

I meet this release audit problem several times, could anyone tell me what 
things does release audit include, so I would be more careful the next time.

 A bit change of the interface of Tuple  DataBag ( make the set and append 
 method return this)
 --

 Key: PIG-1166
 URL: https://issues.apache.org/jira/browse/PIG-1166
 Project: Pig
  Issue Type: Improvement
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Minor
 Attachments: Pig_1166.patch


 When people write unit test for UDF, they always need to build a tuple or 
 bag. If we change the interface of Tuple and DataBag,  make the set and 
 append method return this, it can decrease the code size.  e.g. Now people 
 have to write the following code to build a Tuple:
 {code}
 Tuple tuple=TupleFactory.getInstance().newTuple(3);
 tuple.set(0,item_0);
 tuple.set(1,item_1);
 tuple.set(2,item_2);
 {code}
 If we change the interface,  make the set and append method return this, we 
 can rewrite the above code like this:
 {code}
 Tuple tuple=TupleFactory.getInstance().newTuple(3);
 tuple.set(0,item_0).set(1,item_1).set(2,item_2);
 {code}
 This interface change won't have back compatibility problem and I think 
 there's no performance problem too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-761) ERROR 2086 on simple JOIN