[jira] Commented: (PIG-1661) Add alternative search-provider to Pig site

2010-10-02 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917246#action_12917246
 ] 

Santhosh Srinivasan commented on PIG-1661:
--

Sure, worth a try.

> Add alternative search-provider to Pig site
> ---
>
> Key: PIG-1661
> URL: https://issues.apache.org/jira/browse/PIG-1661
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Alex Baranau
>Priority: Minor
> Attachments: PIG-1661.patch
>
>
> Use search-hadoop.com service to make available search in Pig sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) via 
> AVRO-626 so this issue is about enabling it for Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-28 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771287#action_12771287
 ] 

Santhosh Srinivasan commented on PIG-1016:
--

I am summarizing my understanding of the patch that has been submitted by hc 
busy.

Root cause: PIG-880 changed the value type of maps in PigStorage from native 
Java types to DataByteArray. As a result of this change, parsing of complex 
types as map values was disabled.

Proposed fix: Revert the changes made as part of PIG-880 to interpret map 
values as Java types. In addition, change the comparison method to check for 
the object type and call the appropriate compareTo method. The latter is 
required to workaround the fact that the front-end assigns the value type to be 
DataByteArray whereas the backend sees the actual type (Integer, Long, Tuple, 
DataBag, etc.)

Based on this understanding I have the following review comment(s).

Index: 
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBytesRawComparator.java
===

Can you explain the checks in the if and the else? Specifically, 
NullableBytesWritable is a subclass of PigNullableWritable. As a result, in the 
if part, the check for both o1 and o2 not being PigNullableWritable is 
confusing as nbw1 and nbw2 are cast to NullableBytesWritable if o1 and o2 are 
not PigNullableWritable.  

{code}
+// find bug is complaining about nulls. This check sequence will 
prevent nulls from being dereferenced.
+if(o1!=null && o2!=null){
+
+// In case the objects are comparable
+if((o1 instanceof NullableBytesWritable && o2 instanceof 
NullableBytesWritable)||
+   !(o1 instanceof PigNullableWritable && o2 instanceof 
PigNullableWritable)
+){
+
+  NullableBytesWritable nbw1 = (NullableBytesWritable)o1;
+  NullableBytesWritable nbw2 = (NullableBytesWritable)o2;
+  
+  // If either are null, handle differently.
+  if (!nbw1.isNull() && !nbw2.isNull()) {
+  rc = 
((DataByteArray)nbw1.getValueAsPigType()).compareTo((DataByteArray)nbw2.getValueAsPigType());
+  } else {
+  // For sorting purposes two nulls are equal.
+  if (nbw1.isNull() && nbw2.isNull()) rc = 0;
+  else if (nbw1.isNull()) rc = -1;
+  else rc = 1;
+  }
+}else{
+  // enter here only if both o1 and o2 are 
non-NullableByteWritable PigNullableWritable's
+  PigNullableWritable nbw1 = (PigNullableWritable)o1;
+  PigNullableWritable nbw2 = (PigNullableWritable)o2;
+  // If either are null, handle differently.
+  if (!nbw1.isNull() && !nbw2.isNull()) {
+  rc = nbw1.compareTo(nbw2);
+  } else {
+  // For sorting purposes two nulls are equal.
+  if (nbw1.isNull() && nbw2.isNull()) rc = 0;
+  else if (nbw1.isNull()) rc = -1;
+  else rc = 1;
+  }
+}
+}else{
+  if(o1==null && o2==null){rc=0;}
+  else if(o1==null) {rc=-1;}
+  else{ rc=1; }
{code}

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Fix For: 0.5.0
>
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-29 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771442#action_12771442
 ] 

Santhosh Srinivasan commented on PIG-1016:
--

Hc Busy, thanks for taking time to contribute the patch, explaining the details 
and especially for being patient. A few more questions and details have to be 
cleared up before we commit this patch.

IMHO, the right comparison should be along the lines of checking if o1 and o2 
are NullableBytesWritable followed by a check for PigNullableWritable and then 
followed by error handling code.

Alan, can you comment on this approach?

There is a more important semantic issue. If the map value types are strings 
and if the strings are numeric, then the value types for the maps will be of 
different types. In that case, the load function will break. In addition, 
conversion routines might fail when the compareTo method is invoked. An example 
to illustrate this issue.

Suppose, the records is ['key'#1234567890124567]. PIG-880 would treat the value 
as a string and there would be no problem. Now, with the changes reverted, the 
type is inferred as integer and the parsing will fail as the value is too big 
to fit into an integer

Secondly, assuming that the integer was small enough to be converted, the 
comparison method in DataType.java will return the wrong results when an 
integer and a string are compared. For example, if the records are:

[key#*$]
[key#123]

The first value is treated as a string and the second value is treated as an 
integer. The compareTo method will return 1 to indicate that string > integer 
while in reality 123 > *$

Please correct me if the last statement is incorrect or let me know if it needs 
more explanation.

Thoughts/comments from other committers?

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Fix For: 0.5.0
>
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin

2009-11-05 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774147#action_12774147
 ] 

Santhosh Srinivasan commented on PIG-1073:
--

If my memory serves me correctly, the logical plan cloning was implemented (by 
me) for cloning inner plans for foreach. As such, the top level plan cloning 
was never tested and some items are marked as TODO (see visit methods for 
LOLoad, LOStore and LOStream).

If you want to use it as you mention in your test cases, then you need to add 
code for cloning the LOLoad, LOStore, LOStream and LOJoin operators.


> LogicalPlanCloner can't clone plan containing LOJoin
> 
>
> Key: PIG-1073
> URL: https://issues.apache.org/jira/browse/PIG-1073
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Ashutosh Chauhan
>
> Add following testcase in LogicalPlanBuilder.java
> public void testLogicalPlanCloner() throws CloneNotSupportedException{
> LogicalPlan lp = buildPlan("C = join ( load 'A') by $0, (load 'B') by 
> $0;");
> LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
> cloner.getClonedPlan();
> }
> and this fails with the following stacktrace:
> java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
> at 
> org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
> at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
> at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46)
> at 
> org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

2009-11-05 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774153#action_12774153
 ] 

Santhosh Srinivasan commented on PIG-1065:
--

Answer to Question 1: Pig 1.0 had that syntax and it was retained for backward 
compatibility. Paolo suggested that for uniformity, the 'AS' clause for the 
load statements should be extended to all relational operators. Gradually, the 
column aliasing in the foreach should be removed from the documentation and 
eventually removed from the language.

> In-determinate behaviour of Union when there are 2 non-matching schema's
> 
>
> Key: PIG-1065
> URL: https://issues.apache.org/jira/browse/PIG-1065
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
> Fix For: 0.6.0
>
>
> I have a script which first does a union of these schemas and then does a 
> ORDER BY of this result.
> {code}
> f1 = LOAD '1.txt' as (key:chararray, v:chararray);
> f2 = LOAD '2.txt' as (key:chararray);
> u0 = UNION f1, f2;
> describe u0;
> dump u0;
> u1 = ORDER u0 BY $0;
> dump u1;
> {code}
> When I run in Map Reduce mode I get the following result:
> $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
> 
> Schema for u0 unknown.
> 
> (1,2)
> (2,3)
> (1)
> (2)
> 
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias u1
> at org.apache.pig.PigServer.openIterator(PigServer.java:475)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> 
> Caused by: java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 
> When I run the same script in local mode I get a different result, as we know 
> that local mode does not use any Hadoop Classes.
> $java -cp pig.jar org.apache.pig.Main -x local broken.pig
> 
> Schema for u0 unknown
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> Here are some questions
> 1) Why do we allow union if the schemas do not match
> 2) Should we not print an error message/warning so that the user knows that 
> this is not allowed or he can get unexpected results?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

2009-11-10 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775968#action_12775968
 ] 

Santhosh Srinivasan commented on PIG-1065:
--

The schema will then correspond to the prefix as it is implemented today. For 
example if the AS statement is define for the flatten($1) and if $1 flattens to 
10 columns and if the AS clause has 3 columns then the prefix is used and the 
remaining are left undefined.

> In-determinate behaviour of Union when there are 2 non-matching schema's
> 
>
> Key: PIG-1065
> URL: https://issues.apache.org/jira/browse/PIG-1065
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
> Fix For: 0.6.0
>
>
> I have a script which first does a union of these schemas and then does a 
> ORDER BY of this result.
> {code}
> f1 = LOAD '1.txt' as (key:chararray, v:chararray);
> f2 = LOAD '2.txt' as (key:chararray);
> u0 = UNION f1, f2;
> describe u0;
> dump u0;
> u1 = ORDER u0 BY $0;
> dump u1;
> {code}
> When I run in Map Reduce mode I get the following result:
> $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
> 
> Schema for u0 unknown.
> 
> (1,2)
> (2,3)
> (1)
> (2)
> 
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias u1
> at org.apache.pig.PigServer.openIterator(PigServer.java:475)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> 
> Caused by: java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 
> When I run the same script in local mode I get a different result, as we know 
> that local mode does not use any Hadoop Classes.
> $java -cp pig.jar org.apache.pig.Main -x local broken.pig
> 
> Schema for u0 unknown
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> Here are some questions
> 1) Why do we allow union if the schemas do not match
> 2) Should we not print an error message/warning so that the user knows that 
> this is not allowed or he can get unexpected results?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

2009-11-10 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776098#action_12776098
 ] 

Santhosh Srinivasan commented on PIG-1065:
--

bq. Aliasing inside foreach is hugely useful for readability. Are you 
suggesting removing the ability to assign aliases inside a forearch, or just to 
change/assign schemas?

For consistency, all relational operators should support the AS clause. 
Gradually, the aliasing on a per column basis in foreach should be removed from 
the documentation, deprecated and eventually removed. This is a long term 
recommendation.

> In-determinate behaviour of Union when there are 2 non-matching schema's
> 
>
> Key: PIG-1065
> URL: https://issues.apache.org/jira/browse/PIG-1065
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
> Fix For: 0.6.0
>
>
> I have a script which first does a union of these schemas and then does a 
> ORDER BY of this result.
> {code}
> f1 = LOAD '1.txt' as (key:chararray, v:chararray);
> f2 = LOAD '2.txt' as (key:chararray);
> u0 = UNION f1, f2;
> describe u0;
> dump u0;
> u1 = ORDER u0 BY $0;
> dump u1;
> {code}
> When I run in Map Reduce mode I get the following result:
> $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
> 
> Schema for u0 unknown.
> 
> (1,2)
> (2,3)
> (1)
> (2)
> 
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias u1
> at org.apache.pig.PigServer.openIterator(PigServer.java:475)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> 
> Caused by: java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 
> When I run the same script in local mode I get a different result, as we know 
> that local mode does not use any Hadoop Classes.
> $java -cp pig.jar org.apache.pig.Main -x local broken.pig
> 
> Schema for u0 unknown
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> Here are some questions
> 1) Why do we allow union if the schemas do not match
> 2) Should we not print an error message/warning so that the user knows that 
> this is not allowed or he can get unexpected results?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-11 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798917#action_12798917
 ] 

Santhosh Srinivasan commented on PIG-1117:
--

+1 on making it part of main piggybank. We should not be creating a separate 
directory just to handle hive.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-26 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850342#action_12850342
 ] 

Santhosh Srinivasan commented on PIG-1331:
--

Jay, 

In PIG-823 there was a discussion around how Owl is different from Hive's 
metastore. Is that still true today? If not, can you elaborate on the key 
differences between the two systems?

Thanks,
Santhosh

> Owl Hadoop Table Management Service
> ---
>
> Key: PIG-1331
> URL: https://issues.apache.org/jira/browse/PIG-1331
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
>
> This JIRA is a proposal to create a Hadoop table management service: Owl. 
> Today, MapReduce and Pig applications interacts directly with HDFS 
> directories and files and must deal with low level data management issues 
> such as storage format, serialization/compression schemes, data layout, and 
> efficient data accesses, etc, often with different solutions. Owl aims to 
> provide a standard way to addresses this issue and abstracts away the 
> complexities of reading/writing huge amount of data from/to HDFS.
> Owl has a data access API that is modeled after the traditional Hadoop 
> !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
> related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
> store.  Owl integrates with different storage module like Zebra with a 
> pluggable architecture.
>  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
> time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-26 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850355#action_12850355
 ] 

Santhosh Srinivasan commented on PIG-1331:
--

Thanks for the information. Looking at the Hive design at 
http://wiki.apache.org/hadoop/Hive/Design , it looks like there is no 
significant difference between Owl and Hive. As you indicate, I hope we 
converge to a common metastore for Hadoop.



> Owl Hadoop Table Management Service
> ---
>
> Key: PIG-1331
> URL: https://issues.apache.org/jira/browse/PIG-1331
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
>
> This JIRA is a proposal to create a Hadoop table management service: Owl. 
> Today, MapReduce and Pig applications interacts directly with HDFS 
> directories and files and must deal with low level data management issues 
> such as storage format, serialization/compression schemes, data layout, and 
> efficient data accesses, etc, often with different solutions. Owl aims to 
> provide a standard way to addresses this issue and abstracts away the 
> complexities of reading/writing huge amount of data from/to HDFS.
> Owl has a data access API that is modeled after the traditional Hadoop 
> !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
> related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
> store.  Owl integrates with different storage module like Zebra with a 
> pluggable architecture.
>  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
> time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1344) PigStorage should be able to read back complex data containing delimiters created by PigStorage

2010-03-30 Thread Santhosh Srinivasan (JIRA)
PigStorage should be able to read back complex data containing delimiters 
created by PigStorage
---

 Key: PIG-1344
 URL: https://issues.apache.org/jira/browse/PIG-1344
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Santhosh Srinivasan
Assignee: Daniel Dai
 Fix For: 0.8.0


With Pig 0.7, the TextDataParser has been removed and the logic to parse 
complex data types has moved to Utf8StorageConverter. However, this does not 
handle the case where the complex data types could contain delimiters ('{', 
'}', ',', '(', ')', '[', ']', '#'). Fixing this issue will make PigStorage self 
contained and more usable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-520) Physical plan cloning could lead to out of order connections

2008-11-10 Thread Santhosh Srinivasan (JIRA)
Physical plan cloning could lead to out of order connections


 Key: PIG-520
 URL: https://issues.apache.org/jira/browse/PIG-520
 Project: Pig
  Issue Type: Bug
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
 Fix For: types_branch


In the PhysicalPlan clone method, the algorithm used is as follows:

1. Create an empty plan

2. For all the operators in the plan, 
   a. clone the operator 
   b. add it to the plan

3. For all the keys (from_node) in the map mFromEdges
   a. For all the values (to_node) for this key
  i. Connect the from_node to the to_node in the plan

There are no guarantees on the order in which the from_nodes in the mFromEdges 
are processed, we could get out of order connections in the graph.

Example:

If we have UDF with two arguments like myUDF(a, b) in a plan, the order in 
which the nodes are processed will determine the cloned plan. We could end up 
with 

myUDF(a, b)

OR 

myUDF(b,. a)

depending on the order in which a and b appear in the mFromEdges look up table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-522) Problem in using negative (-a)

2008-11-10 Thread Santhosh Srinivasan (JIRA)
Problem in using negative (-a)
--

 Key: PIG-522
 URL: https://issues.apache.org/jira/browse/PIG-522
 Project: Pig
  Issue Type: Bug
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
 Fix For: types_branch


Using negative, i.e., -a leads to exceptions. 

{code}
grunt> a = load 'myfile' as  (name:chararray, age:int, gpa:double);
grunt> b = foreach a generate -gpa;
grunt> dump b;

2008-11-10 16:38:12,517 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2008-11-10 16:38:37,539 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Map reduce job failed
2008-11-10 16:38:37,540 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Job failed!
2008-11-10 16:38:37,542 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
message from task (map) task_200809241441_19426_m_00java.io.IOException: 
Received Error while processing the map plan.
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:197)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-449) Schemas for bags should contain tuples all the time

2008-11-11 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646712#action_12646712
 ] 

Santhosh Srinivasan commented on PIG-449:
-

Currently, bags in Pig are containers of tuples. Accessing elements inside a 
bag should translate to accessing elements inside the tuple contained in the 
bag. In addition, accessing tuples inside a bag should be restricted to the 
FLATTEN keyword in a FOREACH statement. A few examples shown below will 
demonstrate the point.

{code}
a = load '/user/pig/data/student.data' using PigStorage(' ') as (name, age, 
gpa);
b = foreach a generate {(16, 4.0e-2, 'hello')} as b:{t:(i: int, d: double, c: 
chararray)};
c = foreach b generate b.i; -- Here b.i should generate a bag of integers by 
accessing the column called 'i' inside each tuple
d = foeach b generate b.t; -- This should be outlawed as the tuple inside the 
bag does not have a column called 't' although the tuple inside the bag are 
named 't'
{code}

Summary:

1. The frontend should translate access to columns in a bag to columns inside 
the tuple in the bag
2. The frontend should prevent access to tuples inside the bag via projections 
and allow access only via the FLATTEN keyword

Thoughts/suggestions/comments are welcome.

> Schemas for bags should contain tuples all the time
> ---
>
> Key: PIG-449
> URL: https://issues.apache.org/jira/browse/PIG-449
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
>
> The front end treats relations as operators that return bags.  When the 
> schema of a load statement is specified, the bag is associated with the 
> schema specified by the user. Ideally, the schema corresponds to the tuple 
> contained in the bag. 
> With PIG-380, the schema for bag constants are computed by the front end. The 
> schema for the bag contains the tuple which in turn contains the schema of 
> the columns. This results in errors when columns are accessed directly just 
> like the load statements.
> The front end should then treat access to the columns as a double 
> dereference, i.e., access the tuple inside the bag and then the column inside 
> the tuple.
> {code}
> grunt> a = load '/user/sms/data/student.data' using PigStorage(' ') as (name, 
> age, gpa);
> grunt> b = foreach a generate {(16, 4.0e-2, 'hello')} as b:{t:(i: int, d: 
> double, c: chararray)};
> grunt> describe b;
> b: {b: {t: (i: integer,d: double,c: chararray)}}
> grunt> c = foreach b generate b.i;
> 111064 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - 
> java.io.IOException: Invalid alias: i in {t: (i: integer,d: double,c: 
> chararray)}
> at org.apache.pig.PigServer.parseQuery(PigServer.java:293)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:258)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:432)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:242)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:93)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
> at org.apache.pig.Main.main(Main.java:282)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid 
> alias: i in {t: (i: integer,d: double,c: chararray)}
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:5851)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:5709)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BracketedSimpleProj(QueryParser.java:5242)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:4040)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:3909)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:3863)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:3772)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:3698)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:3664)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:3590)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:3500)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:3457)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:2933)
> at 
> 

[jira] Updated: (PIG-512) Expressions in foreach lead to errors

2008-11-13 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-512:


Patch Info: [Patch Available]

> Expressions in foreach lead to errors
> -
>
> Key: PIG-512
> URL: https://issues.apache.org/jira/browse/PIG-512
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG-512.patch
>
>
> Use of expressions that use the same sub-expressions in foreach lead to 
> translation errors. This issue is caused due to sharing operators across 
> nested plans. To remedy this issue, logical operators should be cloned and 
> not shared across plans.
> {code}
> grunt> a = load 'a' as (x, y, z);
> grunt> b = foreach a {
> >> exp1 = x + y;
> >> exp2 = exp1 + x;
> >> generate exp1, exp2;
> >> }
> grunt> explain b;
> 2008-10-30 15:38:40,257 [main] WARN  org.apache.pig.PigServer - bytearray is 
> implicitly casted to double under LOAdd Operator
> 2008-10-30 15:38:40,258 [main] WARN  org.apache.pig.PigServer - bytearray is 
> implicitly casted to double under LOAdd Operator
> 2008-10-30 15:38:40,258 [main] WARN  org.apache.pig.PigServer - bytearray is 
> implicitly casted to double under LOAdd Operator
> Logical Plan:
> Store sms-Thu Oct 30 11:27:27 PDT 2008-2609 Schema: {double,double} Type: 
> Unknown
> |
> |---ForEach sms-Thu Oct 30 11:27:27 PDT 2008-2605 Schema: {double,double} 
> Type: bag
> |   |
> |   Add sms-Thu Oct 30 11:27:27 PDT 2008-2600 FieldSchema: double Type: 
> double
> |   |
> |   |---Cast sms-Thu Oct 30 11:27:27 PDT 2008-2606 FieldSchema: double 
> Type: double
> |   |   |
> |   |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2598 Projections: 
> [0] Overloaded: false FieldSchema: x: bytearray Type: bytearray
> |   |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |
> |   |---Cast sms-Thu Oct 30 11:27:27 PDT 2008-2607 FieldSchema: double 
> Type: double
> |   |
> |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2599 Projections: 
> [1] Overloaded: false FieldSchema: y: bytearray Type: bytearray
> |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |
> |   Add sms-Thu Oct 30 11:27:27 PDT 2008-2603 FieldSchema: double Type: 
> double
> |   |
> |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2601 Projections:  [*]  
> Overloaded: false FieldSchema: double Type: double
> |   |   Input: Add sms-Thu Oct 30 11:27:27 PDT 2008-2600|
> |   |   |---Add sms-Thu Oct 30 11:27:27 PDT 2008-2600 FieldSchema: double 
> Type: double
> |   |   |
> |   |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2598 
> Projections: [0] Overloaded: false FieldSchema: x: bytearray Type: bytearray
> |   |   |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |   |
> |   |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2599 
> Projections: [1] Overloaded: false FieldSchema: y: bytearray Type: bytearray
> |   |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |
> |   |---Cast sms-Thu Oct 30 11:27:27 PDT 2008-2608 FieldSchema: double 
> Type: double
> |   |
> |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2602 Projections: 
> [0] Overloaded: false FieldSchema: x: bytearray Type: bytearray
> |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |
> |---Load sms-Thu Oct 30 11:27:27 PDT 2008-2597 Schema: {x: bytearray,y: 
> bytearray,z: bytearray} Type: bag
> 2008-10-30 15:38:40,272 [main] ERROR org.apache.pig.impl.plan.OperatorPlan - 
> Attempt to give operator of type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject
>  multiple outputs.  This operator does not support multiple outputs.
> 2008-10-30 15:38:40,272 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor
>  - Invalid physical operators in the physical planAttempt to give operator of 
> type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject
>  multiple outputs.  This operator does not support multiple outputs.
> 2008-10-30 15:38:40,273 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
> java.io.IOException: Unable to explain alias b 
> [org.apache.pig.impl.plan.VisitorException]
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:235)
> at org.apache.pig.PigServer.compilePp(PigServer.java:731)
> at org.apache.pig.PigServer.explain(PigServer.java:495)
> at 
> org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:155)
> at 
> org.apache.pig.too

[jira] Updated: (PIG-512) Expressions in foreach lead to errors

2008-11-13 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-512:


Attachment: PIG-512.patch

Atached patch (PIG-512.patch) includes the following:

1. Logical Plan cloning which in turn includes logical operator cloning. 
Caveat: Only logical plan cloning is allowed and LogicalPlanCloner is the 
supported mechanism for cloning logical plans. The following operators do not 
support cloning:
   i. LOLoad
   ii. LOStore
   iii. LOStream

2. A visitor to remove redundant project( * ) operators that occur between two 
relational operators or between two expression operators.

3. Unit tests for 1 and 2

All unit tests pass.

> Expressions in foreach lead to errors
> -
>
> Key: PIG-512
> URL: https://issues.apache.org/jira/browse/PIG-512
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG-512.patch
>
>
> Use of expressions that use the same sub-expressions in foreach lead to 
> translation errors. This issue is caused due to sharing operators across 
> nested plans. To remedy this issue, logical operators should be cloned and 
> not shared across plans.
> {code}
> grunt> a = load 'a' as (x, y, z);
> grunt> b = foreach a {
> >> exp1 = x + y;
> >> exp2 = exp1 + x;
> >> generate exp1, exp2;
> >> }
> grunt> explain b;
> 2008-10-30 15:38:40,257 [main] WARN  org.apache.pig.PigServer - bytearray is 
> implicitly casted to double under LOAdd Operator
> 2008-10-30 15:38:40,258 [main] WARN  org.apache.pig.PigServer - bytearray is 
> implicitly casted to double under LOAdd Operator
> 2008-10-30 15:38:40,258 [main] WARN  org.apache.pig.PigServer - bytearray is 
> implicitly casted to double under LOAdd Operator
> Logical Plan:
> Store sms-Thu Oct 30 11:27:27 PDT 2008-2609 Schema: {double,double} Type: 
> Unknown
> |
> |---ForEach sms-Thu Oct 30 11:27:27 PDT 2008-2605 Schema: {double,double} 
> Type: bag
> |   |
> |   Add sms-Thu Oct 30 11:27:27 PDT 2008-2600 FieldSchema: double Type: 
> double
> |   |
> |   |---Cast sms-Thu Oct 30 11:27:27 PDT 2008-2606 FieldSchema: double 
> Type: double
> |   |   |
> |   |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2598 Projections: 
> [0] Overloaded: false FieldSchema: x: bytearray Type: bytearray
> |   |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |
> |   |---Cast sms-Thu Oct 30 11:27:27 PDT 2008-2607 FieldSchema: double 
> Type: double
> |   |
> |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2599 Projections: 
> [1] Overloaded: false FieldSchema: y: bytearray Type: bytearray
> |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |
> |   Add sms-Thu Oct 30 11:27:27 PDT 2008-2603 FieldSchema: double Type: 
> double
> |   |
> |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2601 Projections:  [*]  
> Overloaded: false FieldSchema: double Type: double
> |   |   Input: Add sms-Thu Oct 30 11:27:27 PDT 2008-2600|
> |   |   |---Add sms-Thu Oct 30 11:27:27 PDT 2008-2600 FieldSchema: double 
> Type: double
> |   |   |
> |   |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2598 
> Projections: [0] Overloaded: false FieldSchema: x: bytearray Type: bytearray
> |   |   |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |   |
> |   |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2599 
> Projections: [1] Overloaded: false FieldSchema: y: bytearray Type: bytearray
> |   |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |
> |   |---Cast sms-Thu Oct 30 11:27:27 PDT 2008-2608 FieldSchema: double 
> Type: double
> |   |
> |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2602 Projections: 
> [0] Overloaded: false FieldSchema: x: bytearray Type: bytearray
> |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |
> |---Load sms-Thu Oct 30 11:27:27 PDT 2008-2597 Schema: {x: bytearray,y: 
> bytearray,z: bytearray} Type: bag
> 2008-10-30 15:38:40,272 [main] ERROR org.apache.pig.impl.plan.OperatorPlan - 
> Attempt to give operator of type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject
>  multiple outputs.  This operator does not support multiple outputs.
> 2008-10-30 15:38:40,272 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor
>  - Invalid physical operators in the physical planAttempt to give operator of 
> type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject
>  multiple outputs.  This operator does not support multiple outputs.
> 2008-10-30 15:38:40

[jira] Updated: (PIG-512) Expressions in foreach lead to errors

2008-11-13 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-512:


Attachment: PIG-512_1.patch

Updated patch with SVN changes since last patch. All unit tests pass.

> Expressions in foreach lead to errors
> -
>
> Key: PIG-512
> URL: https://issues.apache.org/jira/browse/PIG-512
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG-512.patch, PIG-512_1.patch
>
>
> Use of expressions that use the same sub-expressions in foreach lead to 
> translation errors. This issue is caused due to sharing operators across 
> nested plans. To remedy this issue, logical operators should be cloned and 
> not shared across plans.
> {code}
> grunt> a = load 'a' as (x, y, z);
> grunt> b = foreach a {
> >> exp1 = x + y;
> >> exp2 = exp1 + x;
> >> generate exp1, exp2;
> >> }
> grunt> explain b;
> 2008-10-30 15:38:40,257 [main] WARN  org.apache.pig.PigServer - bytearray is 
> implicitly casted to double under LOAdd Operator
> 2008-10-30 15:38:40,258 [main] WARN  org.apache.pig.PigServer - bytearray is 
> implicitly casted to double under LOAdd Operator
> 2008-10-30 15:38:40,258 [main] WARN  org.apache.pig.PigServer - bytearray is 
> implicitly casted to double under LOAdd Operator
> Logical Plan:
> Store sms-Thu Oct 30 11:27:27 PDT 2008-2609 Schema: {double,double} Type: 
> Unknown
> |
> |---ForEach sms-Thu Oct 30 11:27:27 PDT 2008-2605 Schema: {double,double} 
> Type: bag
> |   |
> |   Add sms-Thu Oct 30 11:27:27 PDT 2008-2600 FieldSchema: double Type: 
> double
> |   |
> |   |---Cast sms-Thu Oct 30 11:27:27 PDT 2008-2606 FieldSchema: double 
> Type: double
> |   |   |
> |   |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2598 Projections: 
> [0] Overloaded: false FieldSchema: x: bytearray Type: bytearray
> |   |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |
> |   |---Cast sms-Thu Oct 30 11:27:27 PDT 2008-2607 FieldSchema: double 
> Type: double
> |   |
> |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2599 Projections: 
> [1] Overloaded: false FieldSchema: y: bytearray Type: bytearray
> |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |
> |   Add sms-Thu Oct 30 11:27:27 PDT 2008-2603 FieldSchema: double Type: 
> double
> |   |
> |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2601 Projections:  [*]  
> Overloaded: false FieldSchema: double Type: double
> |   |   Input: Add sms-Thu Oct 30 11:27:27 PDT 2008-2600|
> |   |   |---Add sms-Thu Oct 30 11:27:27 PDT 2008-2600 FieldSchema: double 
> Type: double
> |   |   |
> |   |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2598 
> Projections: [0] Overloaded: false FieldSchema: x: bytearray Type: bytearray
> |   |   |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |   |
> |   |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2599 
> Projections: [1] Overloaded: false FieldSchema: y: bytearray Type: bytearray
> |   |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |
> |   |---Cast sms-Thu Oct 30 11:27:27 PDT 2008-2608 FieldSchema: double 
> Type: double
> |   |
> |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2602 Projections: 
> [0] Overloaded: false FieldSchema: x: bytearray Type: bytearray
> |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |
> |---Load sms-Thu Oct 30 11:27:27 PDT 2008-2597 Schema: {x: bytearray,y: 
> bytearray,z: bytearray} Type: bag
> 2008-10-30 15:38:40,272 [main] ERROR org.apache.pig.impl.plan.OperatorPlan - 
> Attempt to give operator of type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject
>  multiple outputs.  This operator does not support multiple outputs.
> 2008-10-30 15:38:40,272 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor
>  - Invalid physical operators in the physical planAttempt to give operator of 
> type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject
>  multiple outputs.  This operator does not support multiple outputs.
> 2008-10-30 15:38:40,273 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
> java.io.IOException: Unable to explain alias b 
> [org.apache.pig.impl.plan.VisitorException]
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:235)
> at org.apache.pig.PigServer.compilePp(PigServer.java:731)
> at org.apache.pig.PigServer.explain(PigServer.java:495)
> at 
> org.apache.pig.tools.gru

[jira] Created: (PIG-527) Pig does not support storing nested data using default storage

2008-11-13 Thread Santhosh Srinivasan (JIRA)
Pig does not support storing nested data using default storage
--

 Key: PIG-527
 URL: https://issues.apache.org/jira/browse/PIG-527
 Project: Pig
  Issue Type: Bug
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: types_branch


Pig does not allow storing nested data using the default storage function 
(PigStorage)

{code}

grunt> a = load 'student_tab.data' as (name, age, gpa);
grunt> b = group a by age;
grunt> store b into '/user/sms/data/complex.data';

2008-11-13 16:21:17,711 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 50% complete

2008-11-13 16:21:52,747 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Map reduce job failed

2008-11-13 16:21:52,747 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Job failed!

2008-11-13 16:21:52,764 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
message from task (reduce) task_200809241441_21188_r_00java.io.IOException: 
Cannot store a non-flat tuple using PigStorage

at org.apache.pig.builtin.PigStorage.putNext(PigStorage.java:196)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:116)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:90)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:300)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:238)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

2008-11-13 16:21:52,764 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
message from task (reduce) task_200809241441_21188_r_00java.io.IOException: 
Cannot store a non-flat tuple using PigStorage

{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-528) Schema returned in UDF is not used by Pig

2008-11-13 Thread Santhosh Srinivasan (JIRA)
Schema returned in UDF is not used by Pig
-

 Key: PIG-528
 URL: https://issues.apache.org/jira/browse/PIG-528
 Project: Pig
  Issue Type: Bug
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: types_branch


Using an identity UDF that returns the input schema as the output schema leads 
to schema truncation in Pig.

{code}

grunt> a = load '/tudent_tab.data' as (name, age, gpa);
grunt> b = foreach a generate IdentityFunc(name, age);

grunt> describe b;
b: {name: bytearray}
--It should have been b:{(name: bytearray, age: bytearray)}
{code}

The outputSchema method in IdentityFunc is given below:

{code}
@Override
public Schema outputSchema(Schema input) {
return input;  
}
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-512) Expressions in foreach lead to errors

2008-11-14 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647728#action_12647728
 ] 

Santhosh Srinivasan commented on PIG-512:
-

Yes, the visit(LOCross cs)  can be removed from LogicalPlanCloneHelper.java. 
Its a placeholder if we change LOCross 
to have additional member variables. For now, its redundant.

The change in the type checker is not related to the cloning. Its a bug that I 
uncovered while I was testing unary expressions
as part of cloning. The insertCastForUniOp method in the typeChecker had a bug 
where the newly created cast operator was 
not added  to the plan before inserting the cast between the unary expression 
and the unary expression's input. I fixed it by adding
the cast operator to the plan and patching the reference in the unary 
expression to point to the cast.

I would like to thank Pradeep Kamath who had done the ground work in an earlier 
attempt at cloning logical plans.

> Expressions in foreach lead to errors
> -
>
> Key: PIG-512
> URL: https://issues.apache.org/jira/browse/PIG-512
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG-512.patch, PIG-512_1.patch
>
>
> Use of expressions that use the same sub-expressions in foreach lead to 
> translation errors. This issue is caused due to sharing operators across 
> nested plans. To remedy this issue, logical operators should be cloned and 
> not shared across plans.
> {code}
> grunt> a = load 'a' as (x, y, z);
> grunt> b = foreach a {
> >> exp1 = x + y;
> >> exp2 = exp1 + x;
> >> generate exp1, exp2;
> >> }
> grunt> explain b;
> 2008-10-30 15:38:40,257 [main] WARN  org.apache.pig.PigServer - bytearray is 
> implicitly casted to double under LOAdd Operator
> 2008-10-30 15:38:40,258 [main] WARN  org.apache.pig.PigServer - bytearray is 
> implicitly casted to double under LOAdd Operator
> 2008-10-30 15:38:40,258 [main] WARN  org.apache.pig.PigServer - bytearray is 
> implicitly casted to double under LOAdd Operator
> Logical Plan:
> Store sms-Thu Oct 30 11:27:27 PDT 2008-2609 Schema: {double,double} Type: 
> Unknown
> |
> |---ForEach sms-Thu Oct 30 11:27:27 PDT 2008-2605 Schema: {double,double} 
> Type: bag
> |   |
> |   Add sms-Thu Oct 30 11:27:27 PDT 2008-2600 FieldSchema: double Type: 
> double
> |   |
> |   |---Cast sms-Thu Oct 30 11:27:27 PDT 2008-2606 FieldSchema: double 
> Type: double
> |   |   |
> |   |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2598 Projections: 
> [0] Overloaded: false FieldSchema: x: bytearray Type: bytearray
> |   |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |
> |   |---Cast sms-Thu Oct 30 11:27:27 PDT 2008-2607 FieldSchema: double 
> Type: double
> |   |
> |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2599 Projections: 
> [1] Overloaded: false FieldSchema: y: bytearray Type: bytearray
> |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |
> |   Add sms-Thu Oct 30 11:27:27 PDT 2008-2603 FieldSchema: double Type: 
> double
> |   |
> |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2601 Projections:  [*]  
> Overloaded: false FieldSchema: double Type: double
> |   |   Input: Add sms-Thu Oct 30 11:27:27 PDT 2008-2600|
> |   |   |---Add sms-Thu Oct 30 11:27:27 PDT 2008-2600 FieldSchema: double 
> Type: double
> |   |   |
> |   |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2598 
> Projections: [0] Overloaded: false FieldSchema: x: bytearray Type: bytearray
> |   |   |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |   |
> |   |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2599 
> Projections: [1] Overloaded: false FieldSchema: y: bytearray Type: bytearray
> |   |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |   |
> |   |---Cast sms-Thu Oct 30 11:27:27 PDT 2008-2608 FieldSchema: double 
> Type: double
> |   |
> |   |---Project sms-Thu Oct 30 11:27:27 PDT 2008-2602 Projections: 
> [0] Overloaded: false FieldSchema: x: bytearray Type: bytearray
> |   Input: Load sms-Thu Oct 30 11:27:27 PDT 2008-2597
> |
> |---Load sms-Thu Oct 30 11:27:27 PDT 2008-2597 Schema: {x: bytearray,y: 
> bytearray,z: bytearray} Type: bag
> 2008-10-30 15:38:40,272 [main] ERROR org.apache.pig.impl.plan.OperatorPlan - 
> Attempt to give operator of type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject
>  multiple outputs.  This operator does not support multiple outputs.
> 2008-10-30 15:38:40,272 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToP

[jira] Updated: (PIG-528) Schema returned in UDF is not used by Pig

2008-11-14 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-528:


Attachment: PIG-528.patch

Attached patch (PIG-528.patch) contains the following:

1. Fix for handling schemas returned by UDFs
2. Unit test cases for the fix

All unit test cases passed.

> Schema returned in UDF is not used by Pig
> -
>
> Key: PIG-528
> URL: https://issues.apache.org/jira/browse/PIG-528
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG-528.patch
>
>
> Using an identity UDF that returns the input schema as the output schema 
> leads to schema truncation in Pig.
> {code}
> grunt> a = load '/tudent_tab.data' as (name, age, gpa);
> grunt> b = foreach a generate IdentityFunc(name, age);
> grunt> describe b;
> b: {name: bytearray}
> --It should have been b:{(name: bytearray, age: bytearray)}
> {code}
> The outputSchema method in IdentityFunc is given below:
> {code}
> @Override
> public Schema outputSchema(Schema input) {
> return input;  
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-528) Schema returned in UDF is not used by Pig

2008-11-14 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-528:


Patch Info: [Patch Available]

> Schema returned in UDF is not used by Pig
> -
>
> Key: PIG-528
> URL: https://issues.apache.org/jira/browse/PIG-528
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG-528.patch
>
>
> Using an identity UDF that returns the input schema as the output schema 
> leads to schema truncation in Pig.
> {code}
> grunt> a = load '/tudent_tab.data' as (name, age, gpa);
> grunt> b = foreach a generate IdentityFunc(name, age);
> grunt> describe b;
> b: {name: bytearray}
> --It should have been b:{(name: bytearray, age: bytearray)}
> {code}
> The outputSchema method in IdentityFunc is given below:
> {code}
> @Override
> public Schema outputSchema(Schema input) {
> return input;  
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-528) Schema returned in UDF is not used by Pig

2008-11-14 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-528:


Attachment: PIG-528_1.patch

Updated patch removing merge conflicts due to the earlier patch.

> Schema returned in UDF is not used by Pig
> -
>
> Key: PIG-528
> URL: https://issues.apache.org/jira/browse/PIG-528
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG-528.patch, PIG-528_1.patch
>
>
> Using an identity UDF that returns the input schema as the output schema 
> leads to schema truncation in Pig.
> {code}
> grunt> a = load '/tudent_tab.data' as (name, age, gpa);
> grunt> b = foreach a generate IdentityFunc(name, age);
> grunt> describe b;
> b: {name: bytearray}
> --It should have been b:{(name: bytearray, age: bytearray)}
> {code}
> The outputSchema method in IdentityFunc is given below:
> {code}
> @Override
> public Schema outputSchema(Schema input) {
> return input;  
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-527) Pig does not support storing nested data using default storage

2008-11-17 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-527:


Attachment: PIG-527.patch

Attached patch (PIG-527.patch) addresses the following:

1. PigStorage allows storage of nested data
2. Unit tests to test the same

All unit tests pass

> Pig does not support storing nested data using default storage
> --
>
> Key: PIG-527
> URL: https://issues.apache.org/jira/browse/PIG-527
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG-527.patch
>
>
> Pig does not allow storing nested data using the default storage function 
> (PigStorage)
> {code}
> grunt> a = load 'student_tab.data' as (name, age, gpa);
> grunt> b = group a by age;
> grunt> store b into '/user/sms/data/complex.data';
> 2008-11-13 16:21:17,711 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 50% complete
> 2008-11-13 16:21:52,747 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Map reduce job failed
> 2008-11-13 16:21:52,747 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Job failed!
> 2008-11-13 16:21:52,764 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (reduce) 
> task_200809241441_21188_r_00java.io.IOException: Cannot store a non-flat 
> tuple using PigStorage
> at org.apache.pig.builtin.PigStorage.putNext(PigStorage.java:196)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:116)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:90)
> at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:300)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:238)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
> 2008-11-13 16:21:52,764 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (reduce) 
> task_200809241441_21188_r_00java.io.IOException: Cannot store a non-flat 
> tuple using PigStorage
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-527) Pig does not support storing nested data using default storage

2008-11-17 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-527:


Patch Info: [Patch Available]

> Pig does not support storing nested data using default storage
> --
>
> Key: PIG-527
> URL: https://issues.apache.org/jira/browse/PIG-527
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG-527.patch
>
>
> Pig does not allow storing nested data using the default storage function 
> (PigStorage)
> {code}
> grunt> a = load 'student_tab.data' as (name, age, gpa);
> grunt> b = group a by age;
> grunt> store b into '/user/sms/data/complex.data';
> 2008-11-13 16:21:17,711 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 50% complete
> 2008-11-13 16:21:52,747 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Map reduce job failed
> 2008-11-13 16:21:52,747 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Job failed!
> 2008-11-13 16:21:52,764 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (reduce) 
> task_200809241441_21188_r_00java.io.IOException: Cannot store a non-flat 
> tuple using PigStorage
> at org.apache.pig.builtin.PigStorage.putNext(PigStorage.java:196)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:116)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:90)
> at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:300)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:238)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
> 2008-11-13 16:21:52,764 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (reduce) 
> task_200809241441_21188_r_00java.io.IOException: Cannot store a non-flat 
> tuple using PigStorage
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-385) Should support 'null' as a constant

2008-11-18 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648872#action_12648872
 ] 

Santhosh Srinivasan commented on PIG-385:
-

The NULL constant can be used in any context where other constants or 
expressions are used. The difference between constants and NULL constants will 
be the type inference. The interred type for NULL will be based on the context. 
For example, in the statement used in the bug report (shown below for 
reference), the type of null will be the same as the type of $0. By default, 
the type of null will be a bytearray.

{code}

B = foreach A generate $0 > 0 ? $0 : null;

{code}

Casting null
-

If the user chooses to, he/she can cast the null to the appropriate type. For 
example:

{code}

B = foreach A generate $0 > 0 ? $0 : (int)null;

{code}

Use of null with complex types
---

Since complex types are made of simple types, the same rules (as stated above) 
apply. Null constant as map keys will be disallowed.

Examples follow:

{code}

B = foreach A generate $0 > 0 ? $0 : {(null)};
-- here we have a bag with a tuple with a bytearray null constant

C = foreach A generate [2#null];
-- a map constant with key 2 and value bytearray null

D = foreach A generate [null#10];
--- error maps cannot have null keys
{code}

Open questions
--

1. When nulls are stored using PigStorage and then read back using PigStorage, 
a distinction between the various types of null cannot be made.

Thoughts/suggestions/comments welcome.

> Should support 'null' as a constant
> ---
>
> Key: PIG-385
> URL: https://issues.apache.org/jira/browse/PIG-385
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: types_branch
>Reporter: Alan Gates
>Priority: Minor
> Fix For: types_branch
>
>
> It would be nice to be able to do things like:
> B = foreach A generate $0 > 0 ? $0 : null;
> but right now null is not allowed as a constant.  This null constant should 
> be allowed anywhere an expression would be, and should be castable (that is 
> (int)null).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-544) Utf8StorageConverter.java does not always produce NULLs when data is malformed

2008-11-25 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650810#action_12650810
 ] 

Santhosh Srinivasan commented on PIG-544:
-

Another use case where scalars also generate errors:

{code}

grunt> a = load 'student_tab.data';
grunt> store a into 'student_tab.bin' using BinStorage();
grunt> a = load 'student_tab.bin' using BinStorage() as (name: int, age: int, 
gpa: float);
grunt> dump a;

2008-11-25 16:02:40,986 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
message from task (map) 
task_200809241441_24635_m_00java.lang.RuntimeException : Unexpected data 
type 74 found in stream. at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:115)   
  at org.apache.pig.builtin.BinStorage.bytesToInteger(BinStorage.java:169)  
   at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:143)
 

{code}

> Utf8StorageConverter.java does not always produce NULLs when data is malformed
> --
>
> Key: PIG-544
> URL: https://issues.apache.org/jira/browse/PIG-544
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>
> It does so for scalar types but not for complext types and not for the fields 
> inside of the complext types.
> This is because it uses different code to parse scalar types by themselves 
> and scalar types inside of a complex type. It should really use the same (its 
> own code to do so.)
> The code it is currently uses, is inside of TextDataParser.jjt and is also 
> used to parse constants so we need to be careful if we want to make changes 
> to it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-545) PERFORMANCE: Sampler for order bys does not produce a good distribution

2008-11-25 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650824#action_12650824
 ] 

Santhosh Srinivasan commented on PIG-545:
-

The current sampler uses random sampling, assuming uniform distribution of sort 
keys. Using Poisson distribution will enable the sampler to figure out the 
expected value of the distribution without knowing the actual distribution. 
This will ensure (more) even distribution of data for the reducers.

> PERFORMANCE: Sampler for order bys does not produce a good distribution
> ---
>
> Key: PIG-545
> URL: https://issues.apache.org/jira/browse/PIG-545
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: types_branch
>Reporter: Alan Gates
> Fix For: types_branch
>
>
> In running tests on actual data, I've noticed that the final reduce of an 
> order by has skewed partitions.  Some reduces finish in a few seconds while 
> some run for 20 minutes.  Getting a better distribution should lead to much 
> better performance for order by.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-549) type checking with order-by following user-defined function

2008-12-01 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652153#action_12652153
 ] 

Santhosh Srinivasan commented on PIG-549:
-

AFAIK, Pig does not support zero argument UDFs. In your script, UDF2() is the 
reason for the type checking error.

> type checking with order-by following user-defined function
> ---
>
> Key: PIG-549
> URL: https://issues.apache.org/jira/browse/PIG-549
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
> Environment: type checker fails here:
> A = load ...;
> B = foreach A generate UDF1(*), UDF2();
> C = order B by $1;
> where UDF2() is of type EvalFunc.
> I tried all sorts of things, including overriding outputSchema() of the UDF 
> to specify Integer, and also adding "as x : int" to the foreach command -- in 
> all cases I get the same error.
>Reporter: Christopher Olston
> Fix For: types_branch
>
>
> Exception in thread "main" java.lang.AssertionError: Unsupported root type in 
> LOForEach:LOUserFunc
>   at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2267)
>   at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:121)
>   at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:40)
>   at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
>   at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>   at 
> org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
>   at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
>   at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
>   at 
> org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:79)
>   at org.apache.pig.PigServer.compileLp(PigServer.java:684)
>   at org.apache.pig.PigServer.compileLp(PigServer.java:655)
>   at org.apache.pig.PigServer.store(PigServer.java:433)
>   at org.apache.pig.PigServer.store(PigServer.java:421)
>   at org.apache.pig.PigServer.openIterator(PigServer.java:384)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-552) UDF defined with argument causes class instantiation exception

2008-12-03 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652915#action_12652915
 ] 

Santhosh Srinivasan commented on PIG-552:
-

Question related to the test case reported in the bug report. Can you post the 
UDF? If not, can you confirm if the UDF is missing a default constructor?

Review comments:

The patch ignores the problem and tries to proceed. This will lead to runtime 
issues as the class will not be instantiated in the backend. This is not what 
the user wants.

Its probably a bug in the parser where the user defined alias is not getting 
picked up.

> UDF defined with argument causes class instantiation exception
> --
>
> Key: PIG-552
> URL: https://issues.apache.org/jira/browse/PIG-552
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Christopher Olston
> Attachments: pig.patch
>
>
> I'm doing:
> define myFunc myFunc('blah');
> b = foreach a generate myFunc(*);
> Pig parses it, but fails when it tries to run it on hadoop (I'm using "local" 
> mode). It tries to invoke the class loader on "myFunc('blah')" instead of on 
> "myFunc", which causes an exception.
> The bug seems to stem from this part of JobControlCompiler.getJobConf():
> if(mro.UDFs.size()==1){
> String compFuncSpec = mro.UDFs.get(0);
> Class comparator = 
> PigContext.resolveClassName(compFuncSpec);
> if(ComparisonFunc.class.isAssignableFrom(comparator)) {
> 
> jobConf.setMapperClass(PigMapReduce.MapWithComparator.class);
> 
> jobConf.setReducerClass(PigMapReduce.ReduceWithComparator.class);
> jobConf.set("pig.reduce.package", 
> ObjectSerializer.serialize(pack));
> jobConf.set("pig.usercomparator", "true");
> jobConf.setOutputKeyClass(NullableTuple.class);
> jobConf.setOutputKeyComparatorClass(comparator);
> }
> } else {
> jobConf.set("pig.sortOrder",
> ObjectSerializer.serialize(mro.getSortOrder()));
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-549) type checking with order-by following user-defined function

2008-12-03 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652916#action_12652916
 ] 

Santhosh Srinivasan commented on PIG-549:
-

Wrt my previous comment, Pig does support zero argument UDFs in foreach but 
they are allowed in other places like Filter, Order by, etc.

> type checking with order-by following user-defined function
> ---
>
> Key: PIG-549
> URL: https://issues.apache.org/jira/browse/PIG-549
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
> Environment: type checker fails here:
> A = load ...;
> B = foreach A generate UDF1(*), UDF2();
> C = order B by $1;
> where UDF2() is of type EvalFunc.
> I tried all sorts of things, including overriding outputSchema() of the UDF 
> to specify Integer, and also adding "as x : int" to the foreach command -- in 
> all cases I get the same error.
>Reporter: Christopher Olston
> Fix For: types_branch
>
>
> Exception in thread "main" java.lang.AssertionError: Unsupported root type in 
> LOForEach:LOUserFunc
>   at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2267)
>   at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:121)
>   at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:40)
>   at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
>   at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>   at 
> org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
>   at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
>   at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
>   at 
> org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:79)
>   at org.apache.pig.PigServer.compileLp(PigServer.java:684)
>   at org.apache.pig.PigServer.compileLp(PigServer.java:655)
>   at org.apache.pig.PigServer.store(PigServer.java:433)
>   at org.apache.pig.PigServer.store(PigServer.java:421)
>   at org.apache.pig.PigServer.openIterator(PigServer.java:384)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-538) bincond can't work with flatten bags

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan reassigned PIG-538:
---

Assignee: Pradeep Kamath  (was: Santhosh Srinivasan)

> bincond can't work with flatten bags
> 
>
> Key: PIG-538
> URL: https://issues.apache.org/jira/browse/PIG-538
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Olga Natkovich
>Assignee: Pradeep Kamath
> Fix For: types_branch
>
>
> The following script is user with trunk code to simulated outer join not 
> directly supported by pig:
> A = load '/studenttab10k' as (name: chararray, age: int, gpa: float);
> B = load 'votertab10k' as (name: chararray, age: int, registration: 
> chararray, donation: float);
> C = cogroup A by name, B by name;
> D = foreach C generate group, (IsEmpty(A) ? '' : flatten(A)), (IsEmpty(B) ? 
> 'null' : flatten(B));
> On types branch this gives syntax error and even beyond that not supported 
> since bincond requires that both expressions be of the same type. Santhosh 
> suggested to have  special NULL expression that matches any type. This seems 
> to make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-294) Parse errors for boolean conditions

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-294:


Patch Info: [Patch Available]

> Parse errors for boolean conditions
> ---
>
> Key: PIG-294
> URL: https://issues.apache.org/jira/browse/PIG-294
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Attachments: boolean_test.patch
>
>
> The parser throws exceptions for pig statements that contain boolean 
> conditions with operands that use string comparators. A sample statement to 
> reproduce the test is given below:
> split a into b if name lt 'f', c if (name ge 'f' and name le 'h'), d if name 
> gt 'h';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-290) LOCross output schema is not right

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-290:


Patch Info: [Patch Available]

> LOCross output schema is not right
> --
>
> Key: PIG-290
> URL: https://issues.apache.org/jira/browse/PIG-290
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: types_branch
>Reporter: Pi Song
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: insert_between.patch
>
>
> From the schema generation code:-
> {noformat}
> List inputs = mPlan.getPredecessors(this);
> for (LogicalOperator op : inputs) {
> // Create schema here
> }
> {noformat}
> The output schema is generated based on inputs determined in the logical 
> plan. However,  mPlan.getPredecessors() doesn't always preserve the right 
> order  (A x B and B x A result in different schemas). I suggest maintaining 
> mInputs variable in LOCross (as it used to be) to resolve this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-299) Filter operator not included in the main predecessor plan structure

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-299:


Patch Info: [Patch Available]

> Filter operator not included in the main predecessor plan structure
> ---
>
> Key: PIG-299
> URL: https://issues.apache.org/jira/browse/PIG-299
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: types_branch
> Environment: N/A
>Reporter: Tyson Condie
>Assignee: Santhosh Srinivasan
>Priority: Blocker
> Fix For: types_branch
>
> Attachments: nested_project_as_foreach.patch
>
>
> Take the following query, which can be found in TestLogicalPlanBuilder.java 
> method testQuery80();
> a = load 'input1' as (name, age, gpa);
> b = filter a by age < '20';");
> c = group b by (name,age);
> d = foreach c {
> cf = filter b by gpa < '3.0';
> cp = cf.gpa;
> cd = distinct cp;
> co = order cd by gpa;
> generate group, flatten(co);
> };
> The filter statement 'cf = filter b by gpa < '3.0'' is not accessible via the 
> LogicalPlan::getPredecessor method. Here is the explan plan print out of the 
> inner foreach plan:
> |---SORT Test-Plan-Builder-17 Schema: {gpa: bytearray} Type: bag
> |   |
> |   Project Test-Plan-Builder-16 Projections: [0] Overloaded: false 
> FieldSchema: gpa: bytearray cn: 2 Type: bytearray
> |   Input: Distinct Test-Plan-Builder-1
> |
> |---Distinct Test-Plan-Builder-15 Schema: {gpa: bytearray} Type: bag
> |
> |---Project Test-Plan-Builder-14 Projections: [2] Overloaded: false 
> FieldSchema: gpa: bytearray cn: 2 Type: bytearray
> Input: Project Test-Plan-Builder-13 Projections:  [*]  
> Overloaded: false|
> |---Project Test-Plan-Builder-13 Projections:  [*]  Overloaded: 
> false FieldSchema: cf: tuple({name: bytearray,age: bytearray,gpa: bytearray}) 
> Type: tuple
> Input: Filter Test-Plan-Builder-12OPERATOR PROJECT SCHEMA 
> {name: bytearray,age: bytearray,gpa: bytearray}
> As you can see the filter is only accessible via the 
> LOProject::getExpression() method. It is not showing up as an input operator. 
> Focus on the projection immediately following the filter. If I remove this 
> projection then I get a correct plan. For example, let the inner foreach plan 
> be as follows:
> d = foreach c {
> cf = filter b by gpa < '3.0';
> cd = distinct cf;
> co = order cd by gpa;
> generate group, flatten(co);
> };
> Then I get the following (correct) explan plan output.
> |---SORT Test-Plan-Builder-15 Schema: {name: bytearray,age: bytearray,gpa: 
> bytearray} Type: bag
> |   |
> |   Project Test-Plan-Builder-14 Projections: [2] Overloaded: false 
> FieldSchema: gpa: bytearray cn: 2 Type: bytearray
> |   Input: Distinct Test-Plan-Builder-1
> |
> |---Distinct Test-Plan-Builder-13 Schema: {name: bytearray,age: 
> bytearray,gpa: bytearray} Type: bag
> |
> |---Filter Test-Plan-Builder-12 Schema: {name: bytearray,age: 
> bytearray,gpa: bytearray} Type: bag
> |   |
> |   LesserThan Test-Plan-Builder-11 FieldSchema: null Type: 
> Unknown
> |   |
> |   |---Project Test-Plan-Builder-9 Projections: [2] Overloaded: 
> false FieldSchema:  Type: Unknown
> |   |   Input: CoGroup Test-Plan-Builder-7
> |   |
> |   |---Const Test-Plan-Builder-10 FieldSchema: chararray Type: 
> chararray
> |
> |---Project Test-Plan-Builder-8 Projections: [1] Overloaded: 
> false FieldSchema: b: bag({name: bytearray,age: bytearray,gpa: bytearray}) 
> Type: bag
> Input: CoGroup Test-Plan-Builder-7OPERATOR PROJECT SCHEMA 
> {name: bytearray,age: bytearray,gpa: bytearray}
> Alan said that the problem is we don't generate a foreach operator for the 
> 'cp = cf.gpa' statement. Please let me know if this can be resolved.
> Thanks,
> Tyson

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-323) Remove DEFINE from QueryParser

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-323:


Patch Info: [Patch Available]

> Remove DEFINE from QueryParser
> --
>
> Key: PIG-323
> URL: https://issues.apache.org/jira/browse/PIG-323
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
>Priority: Minor
> Fix For: types_branch
>
> Attachments: remove_define_from_query_parser.patch
>
>
> Remove the keyword DEFINE and the associated methods from QueryParser. The 
> syntax and semantics of define as proposed in the functional specification 
> breaks backward compatibility. The UDFs will now provide the list of function 
> arguments that are expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-320) The parser/type checker should use the getSchema method of UDFs to deduce return type/schema

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-320:


Patch Info: [Patch Available]

> The parser/type checker should use the getSchema method of UDFs to deduce 
> return type/schema
> 
>
> Key: PIG-320
> URL: https://issues.apache.org/jira/browse/PIG-320
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: udf_outputSchema.patch
>
>
> Currently, the parser/type checker uses the getReturnType to deduce the 
> return type of the user defined function (UDF). This mechanism is 
> satisfactory only for basic types (int, long, ...); for composite types 
> (tuple, bag), the schema is also required.The abstract class EvalFunc 
> interface exposes the outputSchema to deduce the return type/schema of the 
> UDF. The parser/type checker should use this method to figure out the return 
> type/schema of the UDF and use it appropriately. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-421) error with complex nested plan

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-421:


Patch Info: [Patch Available]

> error with complex nested plan
> --
>
> Key: PIG-421
> URL: https://issues.apache.org/jira/browse/PIG-421
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Olga Natkovich
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG-421.patch, PIG-421_1.patch
>
>
> Even after applying patch for PIG-398, the following query still fails:
> a = load 'studenttab10k' as (name, age, gpa);
> b = filter a by age < 20;
> c = group b by age;
> d = foreach c {
> cf = filter b by gpa < 3.0;
> cp = cf.gpa;
> cd = distinct cp;
> co = order cd by $0;
> generate group, flatten(co);
> }
> store d into 'output';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-400) flatten causes schema naming problems

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-400:


Patch Info: [Patch Available]

> flatten causes schema naming problems
> -
>
> Key: PIG-400
> URL: https://issues.apache.org/jira/browse/PIG-400
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Olga Natkovich
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG_400.patch
>
>
> Script:
> A = load 'data' as (name: chararray, age: chararray, gpa: float);
> B = group A by (name, age);
> C = foreach B generate flatten(group) as res, COUNT(A);
> D = foreach C generate res;
> dump D;
> Error:
> java.io.IOException: Invalid alias: res in {res::name: chararray,res::age: 
> chararray,long}
> at org.apache.pig.PigServer.registerQuery(PigServer.java:255)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:422)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:82)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:302)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid 
> alias: res in {res::name: chararray,res::age: chararray,long}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-158) Rework logical plan

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-158:


Patch Info: [Patch Available]
  Assignee: Santhosh Srinivasan  (was: Alan Gates)

> Rework logical plan
> ---
>
> Key: PIG-158
> URL: https://issues.apache.org/jira/browse/PIG-158
> Project: Pig
>  Issue Type: Sub-task
>  Components: impl
>Reporter: Alan Gates
>Assignee: Santhosh Srinivasan
> Attachments: cast_fix.patch, fully_qualified_typecast_fix.patch, 
> is_null.patch, logical_operators.patch, logical_operators_rev_1.patch, 
> logical_operators_rev_2.patch, logical_operators_rev_3.patch, 
> multiple_column_project.patch, overloaded_project_distinct.patch, 
> parser_changes.patch, parser_changes_v1.patch, parser_changes_v2.patch, 
> parser_changes_v3.patch, parser_changes_v4.patch, ParserErrors.txt, 
> udf_fix.patch, udf_funcSpec.patch, udf_return_type.patch, 
> user_func_and_store.patch, visitorWalker.patch
>
>
> Rework the logical plan in line with 
> http://wiki.apache.org/pig/PigExecutionModel

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-158) Rework logical plan

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan resolved PIG-158.
-

Resolution: Fixed

All patches have been reviewed and checked in as part of the types branch work.

> Rework logical plan
> ---
>
> Key: PIG-158
> URL: https://issues.apache.org/jira/browse/PIG-158
> Project: Pig
>  Issue Type: Sub-task
>  Components: impl
>Reporter: Alan Gates
>Assignee: Santhosh Srinivasan
> Attachments: cast_fix.patch, fully_qualified_typecast_fix.patch, 
> is_null.patch, logical_operators.patch, logical_operators_rev_1.patch, 
> logical_operators_rev_2.patch, logical_operators_rev_3.patch, 
> multiple_column_project.patch, overloaded_project_distinct.patch, 
> parser_changes.patch, parser_changes_v1.patch, parser_changes_v2.patch, 
> parser_changes_v3.patch, parser_changes_v4.patch, ParserErrors.txt, 
> udf_fix.patch, udf_funcSpec.patch, udf_return_type.patch, 
> user_func_and_store.patch, visitorWalker.patch
>
>
> Rework the logical plan in line with 
> http://wiki.apache.org/pig/PigExecutionModel

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-159) Make changes to the parser to support new types functionality

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-159:


Patch Info: [Patch Available]
  Assignee: Santhosh Srinivasan  (was: Alan Gates)

> Make changes to the parser to support new types functionality
> -
>
> Key: PIG-159
> URL: https://issues.apache.org/jira/browse/PIG-159
> Project: Pig
>  Issue Type: Sub-task
>  Components: impl
>Reporter: Alan Gates
>Assignee: Santhosh Srinivasan
> Attachments: parser_chages_v10.patch, parser_chages_v11.patch, 
> parser_chages_v12.patch, parser_chages_v13.patch, parser_chages_v5.patch, 
> parser_chages_v6.patch, parser_chages_v7.patch, parser_chages_v8.patch, 
> parser_chages_v9.patch
>
>
> In order to support the new types functionality described in 
> http://wiki.apache.org/pig/PigTypesFunctionalSpec, the parse needs to change 
> in the following ways:
> 1) AS needs to support types in addition to aliases.  So where previously it 
> was legal to say:
> a = load 'myfile' as a, b, c;
> it will now also be legal to say
> a = load 'myfile' as a integer, b float, c chararray;
> 2) Non string constants need to be supported.  This includes non-string 
> atomic types (integer, long, float, double) and the non-atomic types bags, 
> tuples, and maps.
> 3) A cast operator needs to be added so that fields can be explicitly casted.
> 4) Changes to DEFINE, to allow users to declare arguments and return types 
> for UDFs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-159) Make changes to the parser to support new types functionality

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan resolved PIG-159.
-

Resolution: Fixed

All patches have been reviewed and checked in as part of the types branch 
rework.

> Make changes to the parser to support new types functionality
> -
>
> Key: PIG-159
> URL: https://issues.apache.org/jira/browse/PIG-159
> Project: Pig
>  Issue Type: Sub-task
>  Components: impl
>Reporter: Alan Gates
>Assignee: Santhosh Srinivasan
> Attachments: parser_chages_v10.patch, parser_chages_v11.patch, 
> parser_chages_v12.patch, parser_chages_v13.patch, parser_chages_v5.patch, 
> parser_chages_v6.patch, parser_chages_v7.patch, parser_chages_v8.patch, 
> parser_chages_v9.patch
>
>
> In order to support the new types functionality described in 
> http://wiki.apache.org/pig/PigTypesFunctionalSpec, the parse needs to change 
> in the following ways:
> 1) AS needs to support types in addition to aliases.  So where previously it 
> was legal to say:
> a = load 'myfile' as a, b, c;
> it will now also be legal to say
> a = load 'myfile' as a integer, b float, c chararray;
> 2) Non string constants need to be supported.  This includes non-string 
> atomic types (integer, long, float, double) and the non-atomic types bags, 
> tuples, and maps.
> 3) A cast operator needs to be added so that fields can be explicitly casted.
> 4) Changes to DEFINE, to allow users to declare arguments and return types 
> for UDFs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-552) UDF defined with argument causes class instantiation exception

2008-12-03 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653027#action_12653027
 ] 

Santhosh Srinivasan commented on PIG-552:
-

Sort UDFs have to be CompareFunc and not EvalFunc.

> UDF defined with argument causes class instantiation exception
> --
>
> Key: PIG-552
> URL: https://issues.apache.org/jira/browse/PIG-552
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Christopher Olston
> Attachments: pig.patch
>
>
> I'm doing:
> define myFunc myFunc('blah');
> b = foreach a generate myFunc(*);
> Pig parses it, but fails when it tries to run it on hadoop (I'm using "local" 
> mode). It tries to invoke the class loader on "myFunc('blah')" instead of on 
> "myFunc", which causes an exception.
> The bug seems to stem from this part of JobControlCompiler.getJobConf():
> if(mro.UDFs.size()==1){
> String compFuncSpec = mro.UDFs.get(0);
> Class comparator = 
> PigContext.resolveClassName(compFuncSpec);
> if(ComparisonFunc.class.isAssignableFrom(comparator)) {
> 
> jobConf.setMapperClass(PigMapReduce.MapWithComparator.class);
> 
> jobConf.setReducerClass(PigMapReduce.ReduceWithComparator.class);
> jobConf.set("pig.reduce.package", 
> ObjectSerializer.serialize(pack));
> jobConf.set("pig.usercomparator", "true");
> jobConf.setOutputKeyClass(NullableTuple.class);
> jobConf.setOutputKeyComparatorClass(comparator);
> }
> } else {
> jobConf.set("pig.sortOrder",
> ObjectSerializer.serialize(mro.getSortOrder()));
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-549) type checking with order-by following user-defined function

2008-12-03 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-549:


Description: 

Exception in thread "main" java.lang.AssertionError: Unsupported root type in 
LOForEach:LOUserFunc
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2267)
at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:121)
at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:40)
at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
at 
org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:79)
at org.apache.pig.PigServer.compileLp(PigServer.java:684)
at org.apache.pig.PigServer.compileLp(PigServer.java:655)
at org.apache.pig.PigServer.store(PigServer.java:433)
at org.apache.pig.PigServer.store(PigServer.java:421)
at org.apache.pig.PigServer.openIterator(PigServer.java:384)

  was:


Exception in thread "main" java.lang.AssertionError: Unsupported root type in 
LOForEach:LOUserFunc
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2267)
at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:121)
at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:40)
at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
at 
org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:79)
at org.apache.pig.PigServer.compileLp(PigServer.java:684)
at org.apache.pig.PigServer.compileLp(PigServer.java:655)
at org.apache.pig.PigServer.store(PigServer.java:433)
at org.apache.pig.PigServer.store(PigServer.java:421)
at org.apache.pig.PigServer.openIterator(PigServer.java:384)

 Issue Type: Improvement  (was: Bug)

> type checking with order-by following user-defined function
> ---
>
> Key: PIG-549
> URL: https://issues.apache.org/jira/browse/PIG-549
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: types_branch
> Environment: type checker fails here:
> A = load ...;
> B = foreach A generate UDF1(*), UDF2();
> C = order B by $1;
> where UDF2() is of type EvalFunc.
> I tried all sorts of things, including overriding outputSchema() of the UDF 
> to specify Integer, and also adding "as x : int" to the foreach command -- in 
> all cases I get the same error.
>Reporter: Christopher Olston
> Fix For: types_branch
>
>
> Exception in thread "main" java.lang.AssertionError: Unsupported root type in 
> LOForEach:LOUserFunc
>   at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2267)
>   at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:121)
>   at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:40)
>   at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
>   at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>   at 
> org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
>   at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
>   at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
>   at 
> org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:79)
>   at org.apache.pig.PigServer.compileLp(PigServer.java:684)
>   at org.apache.pig.PigServer.compileLp(PigServer.java:655)
>   at org.apache.pig.PigServer.store(PigServer.java:433)
>   at org.apache.pig.PigServer.store(PigServer.java

[jira] Commented: (PIG-549) type checking with order-by following user-defined function

2008-12-03 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653028#action_12653028
 ] 

Santhosh Srinivasan commented on PIG-549:
-

Sure, we should allow that. I will mark this an enhancement request.

> type checking with order-by following user-defined function
> ---
>
> Key: PIG-549
> URL: https://issues.apache.org/jira/browse/PIG-549
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
> Environment: type checker fails here:
> A = load ...;
> B = foreach A generate UDF1(*), UDF2();
> C = order B by $1;
> where UDF2() is of type EvalFunc.
> I tried all sorts of things, including overriding outputSchema() of the UDF 
> to specify Integer, and also adding "as x : int" to the foreach command -- in 
> all cases I get the same error.
>Reporter: Christopher Olston
> Fix For: types_branch
>
>
> Exception in thread "main" java.lang.AssertionError: Unsupported root type in 
> LOForEach:LOUserFunc
>   at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2267)
>   at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:121)
>   at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:40)
>   at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
>   at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>   at 
> org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
>   at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
>   at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
>   at 
> org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:79)
>   at org.apache.pig.PigServer.compileLp(PigServer.java:684)
>   at org.apache.pig.PigServer.compileLp(PigServer.java:655)
>   at org.apache.pig.PigServer.store(PigServer.java:433)
>   at org.apache.pig.PigServer.store(PigServer.java:421)
>   at org.apache.pig.PigServer.openIterator(PigServer.java:384)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-550) java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.Tuple

2008-12-04 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653424#action_12653424
 ] 

Santhosh Srinivasan commented on PIG-550:
-

The issue is similar to the bug reported in PIG-449.

> java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.pig.data.Tuple
> --
>
> Key: PIG-550
> URL: https://issues.apache.org/jira/browse/PIG-550
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Viraj Bhat
> Fix For: types_branch
>
>
> ==
> Map tasks resulting from the below Pig Script throws the following exception. 
> Note 'one' is a dummy input containing, number 1.
> ==
> {code}
> A = load 'one' using PigStorage() as ( one );
> B = foreach A generate
> {
> (
> ('p1-t1-e1', 'p1-t1-e2'),
> ('p1-t2-e1', 'p1-t2-e2')
> ),
> (
> ('p2-t1-e1', 'p2-t1-e2'),
> ('p2-t2-e1', 'p2-t2-e2')
> )
> };
> describe B;
> C = foreach B generate
> $0 as pairbag { pair: ( t1: (e1, e2), t2: (e1, e2) ) }; describe C;
> D = foreach C generate FLATTEN(pairbag);
> describe D;
> E = foreach D generate
> pair.t1.e2  as t1e2,
> pair.t2.e1  as t2e1;
> describe E;
> dump E;
> {code}
> ==
> 2008-12-01 20:07:53,974 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) 
> task_200810152105_0207_m_00java.lang.ClassCastException: java.lang.String 
> cannot be cast to org.apache.pig.data.Tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:279)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:226)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:133)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:233)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:180)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:170)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ==

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-552) UDF defined with argument causes class instantiation exception

2008-12-04 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653429#action_12653429
 ] 

Santhosh Srinivasan commented on PIG-552:
-

Sorry about the reference to ComparisonFunc, I was looking at your issue - 
PIG-549 and had that in mind.

I was not able to reproduce your scenario. I tried the following script and it 
worked.

{code}

grunt> define mapUdf MapUDF('world');   

grunt> RAW_LOGS = load '/user/sms/data/mydata.txt' as (url:chararray, 
numvisits:int);   
grunt> b = foreach RAW_LOGS generate mapUdf(*); 

{code}

> UDF defined with argument causes class instantiation exception
> --
>
> Key: PIG-552
> URL: https://issues.apache.org/jira/browse/PIG-552
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Christopher Olston
> Attachments: pig.patch
>
>
> I'm doing:
> define myFunc myFunc('blah');
> b = foreach a generate myFunc(*);
> Pig parses it, but fails when it tries to run it on hadoop (I'm using "local" 
> mode). It tries to invoke the class loader on "myFunc('blah')" instead of on 
> "myFunc", which causes an exception.
> The bug seems to stem from this part of JobControlCompiler.getJobConf():
> if(mro.UDFs.size()==1){
> String compFuncSpec = mro.UDFs.get(0);
> Class comparator = 
> PigContext.resolveClassName(compFuncSpec);
> if(ComparisonFunc.class.isAssignableFrom(comparator)) {
> 
> jobConf.setMapperClass(PigMapReduce.MapWithComparator.class);
> 
> jobConf.setReducerClass(PigMapReduce.ReduceWithComparator.class);
> jobConf.set("pig.reduce.package", 
> ObjectSerializer.serialize(pack));
> jobConf.set("pig.usercomparator", "true");
> jobConf.setOutputKeyClass(NullableTuple.class);
> jobConf.setOutputKeyComparatorClass(comparator);
> }
> } else {
> jobConf.set("pig.sortOrder",
> ObjectSerializer.serialize(mro.getSortOrder()));
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-550) java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.Tuple

2008-12-04 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan resolved PIG-550.
-

Resolution: Duplicate

Marking it a duplicate of Pig-449. The resolution in Pig-449 will fix the issue 
reported in this bug.

> java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.pig.data.Tuple
> --
>
> Key: PIG-550
> URL: https://issues.apache.org/jira/browse/PIG-550
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Viraj Bhat
> Fix For: types_branch
>
>
> ==
> Map tasks resulting from the below Pig Script throws the following exception. 
> Note 'one' is a dummy input containing, number 1.
> ==
> {code}
> A = load 'one' using PigStorage() as ( one );
> B = foreach A generate
> {
> (
> ('p1-t1-e1', 'p1-t1-e2'),
> ('p1-t2-e1', 'p1-t2-e2')
> ),
> (
> ('p2-t1-e1', 'p2-t1-e2'),
> ('p2-t2-e1', 'p2-t2-e2')
> )
> };
> describe B;
> C = foreach B generate
> $0 as pairbag { pair: ( t1: (e1, e2), t2: (e1, e2) ) }; describe C;
> D = foreach C generate FLATTEN(pairbag);
> describe D;
> E = foreach D generate
> pair.t1.e2  as t1e2,
> pair.t2.e1  as t2e1;
> describe E;
> dump E;
> {code}
> ==
> 2008-12-01 20:07:53,974 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) 
> task_200810152105_0207_m_00java.lang.ClassCastException: java.lang.String 
> cannot be cast to org.apache.pig.data.Tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:279)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:226)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:133)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:233)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:180)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:170)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ==

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (PIG-550) java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.Tuple

2008-12-04 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653445#action_12653445
 ] 

sms edited comment on PIG-550 at 12/4/08 12:02 PM:
---

Marking it a duplicate of PIG-449. The resolution in PIG-449 will fix the issue 
reported in this bug.

  was (Author: sms):
Marking it a duplicate of Pig-449. The resolution in Pig-449 will fix the 
issue reported in this bug.
  
> java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.pig.data.Tuple
> --
>
> Key: PIG-550
> URL: https://issues.apache.org/jira/browse/PIG-550
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Viraj Bhat
> Fix For: types_branch
>
>
> ==
> Map tasks resulting from the below Pig Script throws the following exception. 
> Note 'one' is a dummy input containing, number 1.
> ==
> {code}
> A = load 'one' using PigStorage() as ( one );
> B = foreach A generate
> {
> (
> ('p1-t1-e1', 'p1-t1-e2'),
> ('p1-t2-e1', 'p1-t2-e2')
> ),
> (
> ('p2-t1-e1', 'p2-t1-e2'),
> ('p2-t2-e1', 'p2-t2-e2')
> )
> };
> describe B;
> C = foreach B generate
> $0 as pairbag { pair: ( t1: (e1, e2), t2: (e1, e2) ) }; describe C;
> D = foreach C generate FLATTEN(pairbag);
> describe D;
> E = foreach D generate
> pair.t1.e2  as t1e2,
> pair.t2.e1  as t2e1;
> describe E;
> dump E;
> {code}
> ==
> 2008-12-01 20:07:53,974 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) 
> task_200810152105_0207_m_00java.lang.ClassCastException: java.lang.String 
> cannot be cast to org.apache.pig.data.Tuple
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:279)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:226)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:133)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:233)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:180)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:170)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ==

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-546) FilterFunc calls empty constructor when it should be calling parameterized constructor

2008-12-08 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-546:


Patch Info: [Patch Available]

> FilterFunc calls empty constructor when it should be calling parameterized 
> constructor
> --
>
> Key: PIG-546
> URL: https://issues.apache.org/jira/browse/PIG-546
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: types_branch
>Reporter: Viraj Bhat
> Fix For: types_branch
>
> Attachments: FILTERFROMFILE.java, insetfilterfile, mydata.txt, 
> PIG-546.patch
>
>
> The following piece of Pig Script uses a custom UDF known as FILTERFROMFILE 
> which extends the FilterFunc. It contains two constructors, an empty 
> constructor which is mandatory and the parameterized constructor. The 
> parameterized constructor  passes the HDFS filename, which the exec function 
> uses to construct a HashMap. The HashMap is later used for filtering records 
> based on the match criteria in the HDFS file.
> {code}
> register util.jar;
> --util.jar contains the FILTERFROMFILE class
> define FILTER_CRITERION util.FILTERFROMFILE('/user/viraj/insetfilterfile');
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> FILTERED_LOGS = filter RAW_LOGS by FILTER_CRITERION(numvisits);
> dump FILTERED_LOGS;
> {code}
> When you execute the above script,  it results in a single Map only job with 
> 1 Map. It seems that the empty constructor is called 5 times, and ultimately 
> results in failure of the job.
> ===
> parameterized constructor: /user/viraj/insetfilterfile
> parameterized constructor: /user/viraj/insetfilterfile
> empty constructor
> empty constructor
> empty constructor
> empty constructor
> empty constructor
> ===
> Error in the Hadoop backend
> ===
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
>   at org.apache.hadoop.fs.Path.(Path.java:90)
>   at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:199)
>   at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:130)
>   at 
> org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:164)
>   at util.FILTERFROMFILE.init(FILTERFROMFILE.java:70)
>   at util.FILTERFROMFILE.exec(FILTERFROMFILE.java:89)
>   at util.FILTERFROMFILE.exec(FILTERFROMFILE.java:52)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:179)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:217)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:170)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ===
> Attaching the sample data and the filter function UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-546) FilterFunc calls empty constructor when it should be calling parameterized constructor

2008-12-08 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-546:


Attachment: PIG-546.patch

The patch (PIG-546.patch)  addresses the following issue(s):

1. Fixes the use of an alias declared via the define statement and the 
subsequent use in
   i. Filter functions
  ii. Load functions
  iii. Store functions
  iv. Order by functions
  v. Streaming specifications (input and output)

2. New unit test cases for the parser, end-to-end test cases for streaming and 
filter udf have been added.

Note: There are no end-to-end test cases for order by using a UDF.

All unit test cases pass.

> FilterFunc calls empty constructor when it should be calling parameterized 
> constructor
> --
>
> Key: PIG-546
> URL: https://issues.apache.org/jira/browse/PIG-546
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: types_branch
>Reporter: Viraj Bhat
> Fix For: types_branch
>
> Attachments: FILTERFROMFILE.java, insetfilterfile, mydata.txt, 
> PIG-546.patch
>
>
> The following piece of Pig Script uses a custom UDF known as FILTERFROMFILE 
> which extends the FilterFunc. It contains two constructors, an empty 
> constructor which is mandatory and the parameterized constructor. The 
> parameterized constructor  passes the HDFS filename, which the exec function 
> uses to construct a HashMap. The HashMap is later used for filtering records 
> based on the match criteria in the HDFS file.
> {code}
> register util.jar;
> --util.jar contains the FILTERFROMFILE class
> define FILTER_CRITERION util.FILTERFROMFILE('/user/viraj/insetfilterfile');
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> FILTERED_LOGS = filter RAW_LOGS by FILTER_CRITERION(numvisits);
> dump FILTERED_LOGS;
> {code}
> When you execute the above script,  it results in a single Map only job with 
> 1 Map. It seems that the empty constructor is called 5 times, and ultimately 
> results in failure of the job.
> ===
> parameterized constructor: /user/viraj/insetfilterfile
> parameterized constructor: /user/viraj/insetfilterfile
> empty constructor
> empty constructor
> empty constructor
> empty constructor
> empty constructor
> ===
> Error in the Hadoop backend
> ===
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
>   at org.apache.hadoop.fs.Path.(Path.java:90)
>   at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:199)
>   at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:130)
>   at 
> org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:164)
>   at util.FILTERFROMFILE.init(FILTERFROMFILE.java:70)
>   at util.FILTERFROMFILE.exec(FILTERFROMFILE.java:89)
>   at util.FILTERFROMFILE.exec(FILTERFROMFILE.java:52)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:179)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:217)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:170)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ===
> Attaching the sample data and the filter function UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-562) Command line parameters should have higher precedence than parameter files during pre-processing of parameters

2008-12-12 Thread Santhosh Srinivasan (JIRA)
Command line parameters should have higher precedence than parameter files 
during pre-processing of parameters
--

 Key: PIG-562
 URL: https://issues.apache.org/jira/browse/PIG-562
 Project: Pig
  Issue Type: Improvement
  Components: tools
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Priority: Minor
 Fix For: types_branch


In parameter substitution,  the order of processing order is stated as follows:

Processing Order
   1.  Configuration files are scanned in the order they are specified on the 
command line. Within each file, the parameters are processed in the order they 
are specified.
   2.  Command line parameters are scanned in the order they are specified on 
the command line.

The order needs to be flipped, allowing the use of command line parameters to 
define values for variables declared in parameter files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-566) Dump and store outputs do not match for PigStorage

2008-12-16 Thread Santhosh Srinivasan (JIRA)
Dump and store outputs do not match for PigStorage
--

 Key: PIG-566
 URL: https://issues.apache.org/jira/browse/PIG-566
 Project: Pig
  Issue Type: Bug
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Priority: Minor
 Fix For: types_branch


The dump and store formats for PigStorage do not match for longs and floats.

{code}
grunt> y = foreach x generate {(2985671202194220139L)};
grunt> describe y;
y: {{(long)}}

grunt> dump y;
({(2985671202194220139L)})

grunt> store y into 'y';
grunt> cat y
{(2985671202194220139)}

{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-567) Handling strings and exceptions in text data parser

2008-12-16 Thread Santhosh Srinivasan (JIRA)
Handling strings and exceptions in text data parser
---

 Key: PIG-567
 URL: https://issues.apache.org/jira/browse/PIG-567
 Project: Pig
  Issue Type: Bug
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Priority: Minor
 Fix For: types_branch


The text data parser treats a sequence of numerals as integer. If the data is 
too long to fit into an integer then a number format exception is thrown and no 
attempts are made to convert the data to a higher type. A couple of questions 
arise:

1. Should strings be annotated with delimiters like quotes to distinguish them 
from numbers?
2. Should conversions to higher types or strings be attempted? The conversions 
have performance implications.

{noformat}

Data file:
{(2985671202194220139L}

Pig script:
a = load 'data' as (list: bag{t: tuple(value: chararray)});
dump a

Output:
2008-12-13 09:08:24,831 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable
to open iterator for alias: a [Unable to store for alias: a [For input string: 
"2985671202194220139"]]
at 
org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:178)
at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:647)
at org.apache.pig.PigServer.store(PigServer.java:452)
at org.apache.pig.PigServer.store(PigServer.java:421)
at org.apache.pig.PigServer.openIterator(PigServer.java:384)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
at 
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:94)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
at org.apache.pig.Main.main(Main.java:282)
Caused by: java.io.IOException: Unable to store for alias: a [For input string: 
"2985671202194220139"]
... 10 more
Caused by: org.apache.pig.backend.executionengine.ExecException: For input 
string: "2985671202194220139"
... 10 more
Caused by: java.lang.NumberFormatException: For input string: 
"2985671202194220139"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:459)
at java.lang.Integer.parseInt(Integer.java:497)
at 
org.apache.pig.data.parser.TextDataParser.AtomDatum(TextDataParser.java:291)
at 
org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:359)
at 
org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:149)
at org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:85)
at 
org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:345)
at 
org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42)
at 
org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:70)
at 
org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:78)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:861)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:243)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:226)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.store(POStore.java:137)
at 
org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:62)
at 
org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:166)
... 9 more

2008-12-13 09:08:24,833 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
Unable to open iterator for alias: a [Unable to store for alias: a [For input 
string: "2985671202194220139"]]
2008-12-13 09:08:24,834 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
java.io.IOException: Unable to open iterator for alias: a [Unable to store for 
alias: a [For input string: "2985671202194220139"]]

{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-575) Please extend FieldSchema class with getSchema() member function for iterating over complex Schemas in Pig UDF outputSchema

2008-12-22 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658625#action_12658625
 ] 

Santhosh Srinivasan commented on PIG-575:
-

The FiledSchema member variable schema is public. It can be accessed directly 
without the use of a getSchema() although having the method could make the code 
cleaner.

> Please extend FieldSchema class with getSchema() member function for 
> iterating over complex Schemas in Pig UDF outputSchema
> ---
>
> Key: PIG-575
> URL: https://issues.apache.org/jira/browse/PIG-575
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: David Ciemiewicz
>Priority: Minor
>
> I have discovered that it is not possible to recurse through parts of the 
> input Schema in the UDF outputSchema function.
> I have a function that operates on an input bag of tuples and then creates 
> sequential pairings of the rows.
> A = foreach One generate { 
> ( 1, a ),
> ( 2, b )
> }   as  bag { tuple ( seq: int, value: chararray ) };
> The output of the PAIRS(A) should be:
> {
> ( ( 1, a ), ( 2, b ) ),
> ( ( 2, b ), ( null, null ) )
> }
> The default output schema for the function should be:
> bag { tuple ( tuple ( order: int, value: chararray ), tuple ( order: int, 
> value: chararray ) ) ) }
> The problem I have is that I'm not able to recurse into the internal Schema 
> of the FieldSchema in my outputSchema function to get at the tuple within the 
> input bag.
> Here's my sample outputSchema for PAIRS:
> public Schema outputSchema(Schema input) {
> try {
> System.out.println("input: " + input.toString());
> Schema databagSchema = new Schema();
> Schema tupleSchema = new Schema();
> Schema inputDataBag = new Schema(input.getFields().get(0));
> System.out.println("inputDataBag: " + 
> input.getFields().get(0).toString());
> //
> //  RIGHT HERE IS WHERE I WANT TO DO inputDataBag.getFields.get(0).getSchema
> //
> Schema.FieldSchema inputTuple = inputDataBag.getFields().get(0);  // 
> Here's where I want to say  
> System.out.println("inputTuple: " + inputTuple.toString());
> databagSchema.add(new Schema.FieldSchema(null, DataType.TUPLE));
> System.out.println("databagSchema: " + databagSchema.toString());
> return new Schema(
> new Schema.FieldSchema(
> getSchemaName( this.getClass().getName().toLowerCase(), 
> input),
> databagSchema,
> DataType.BAG
> )
> );
> } catch (Exception e) {
> return null;
> }
> }
> Here's the execution output from outputSchema:
> input: {A: {seq: int,value: chararray},int,int}
> inputDataBag: A: bag({seq: int,value: chararray})
> inputTuple: A: bag({seq: int,value: chararray})<= what I want to see is ( 
> seq: int, value: chararray )
> rowSchema: A: bag({seq: int,value: chararray})
> rowSchema: A: bag({seq: int,value: chararray})

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-458) Type branch integration with hadoop 18

2008-12-23 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-458:


Affects Version/s: types_branch
Fix Version/s: types_branch

> Type branch integration with hadoop 18
> --
>
> Key: PIG-458
> URL: https://issues.apache.org/jira/browse/PIG-458
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: types_branch
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Fix For: types_branch
>
> Attachments: hadoop18.jar, PIG-458.patch, un18.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-577) outer join query looses name information

2008-12-23 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659016#action_12659016
 ] 

Santhosh Srinivasan commented on PIG-577:
-

The workaround is to invert the condition of the bincond and swap the two 
elements of the bincond. 

{code}
D = FOREACH C GENERATE group, flatten((not IsEmpty(A) ? A: null ), flatten((not 
IsEmpty(B) ? B: null ));
{code}

The root cause for this issue is the schema computation in LOBinCond. The 
assumption is that the schemas of the LHS and RHS of the bincond match all the 
time. The type checker ensures that this assumption is true. However, after 
each statement is parsed we do not run the type checker. The type checker is 
run only when describe, explain, dump or store is encountered.

As a result for the script reported in the bug, the type of the null constant 
is seen as bytearray and not as the schema of the RHS which is a bag.

The type checking logic should be invoked early by invoking the type checker 
after each statement or the type checking logic for bincond should be invoked 
by the getFieldSchema method to ensure the equivalence of the LHS and RHS 
schemas.

> outer join query looses name information
> 
>
> Key: PIG-577
> URL: https://issues.apache.org/jira/browse/PIG-577
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Olga Natkovich
> Fix For: types_branch
>
>
> The following query:
> A = LOAD 'student_data' AS (name: chararray, age: int, gpa: float);
> B = LOAD 'voter_data' AS (name: chararray, age: int, registration: chararray, 
> contributions: float);
> C = COGROUP A BY name, B BY name;
> D = FOREACH C GENERATE group, flatten((IsEmpty(A) ? null : A)), 
> flatten((IsEmpty(B) ? null : B));
> describe D;
> E = FOREACH D GENERATE A::gpa, B::contributions;
> Give the following error: (Even though describe shows correct information: D: 
> {group: chararray,A::name: chararray,A::age: int,A::gpa: float,B::name: 
> chararray,B::age: int,B::registration: chararray,B::contributions: float}
> java.io.IOException: Invalid alias: A::gpa in {group: 
> chararray,bytearray,bytearray}
> at org.apache.pig.PigServer.parseQuery(PigServer.java:298)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:263)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:439)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid 
> alias: A::gpa in {group: chararray,bytearray,bytearray}
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:5930)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:5788)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:3974)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:3871)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:3825)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:3734)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:3660)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:3626)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:3552)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:3462)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:3419)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:2894)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2309)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:966)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:742)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:537)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
> at org.apache.pig.PigServer.parseQuery(PigServer.java:295)
> ... 6 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-577) outer join query looses name information

2008-12-24 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659144#action_12659144
 ] 

Santhosh Srinivasan commented on PIG-577:
-

The correct statement to make mimic the outer join semantics will be:

{code}
D = FOREACH C GENERATE group, flatten((not IsEmpty(A) ? A : 
(bag{tuple(chararray, int, float)}){(null, null, null)})), flatten((not 
IsEmpty(B) ? B : (bag{tuple(chararray, int, chararray, 
float)}){(null,null,null, null)}));
{code}

However, this exposed a bag in the type checker where the schemas of the LHS 
and RHS do not match. The bag with the null constants has a tuple and relation 
A(or B) has a schema without the tuple. This issue was resolved in PIG-449. The 
solution proposed in PIG-449 has to be extended to schema comparisons that 
involve bags.

{code}
2008-12-24 10:31:58,529 [main] ERROR org.apache.pig.tools.grunt.Grunt - Two 
inputs of BinCond must have compatible schemas

2008-12-24 10:31:58,529 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
org.apache.pig.impl.logicalLayer.FrontendException: Unable to describe schema 
for alias D
at org.apache.pig.PigServer.dumpSchema(PigServer.java:367)
at 
org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:153)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:188)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:71)
at org.apache.pig.Main.main(Main.java:302)
Caused by: org.apache.pig.impl.plan.PlanValidationException: An unexpected 
exception caused the validation to stop
at 
org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
at 
org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:79)
at org.apache.pig.PigServer.compileLp(PigServer.java:687)
at org.apache.pig.PigServer.dumpSchema(PigServer.java:360)
... 5 more
Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
Cannot resolve ForEach output schema.
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2731)
at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:122)
at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:41)
at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
... 10 more
Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
Problem during evaluaton of BinCond output type
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:1913)
at org.apache.pig.impl.logicalLayer.LOBinCond.visit(LOBinCond.java:88)
at org.apache.pig.impl.logicalLayer.LOBinCond.visit(LOBinCond.java:27)
at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.checkInnerPlan(TypeCheckingVisitor.java:2812)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2720)
... 15 more
Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
Two inputs of BinCond must have compatible schemas
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:1903)
... 21 more

{code}

> outer join query looses name information
> 
>
> Key: PIG-577
> URL: https://issues.apache.org/jira/browse/PIG-577
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Olga Natkovich
> Fix For: types_branch
>
>
> The following query:
> A = LOAD 'student_data' AS (name: chararray, age: int, gpa: float);
> B = LOAD 'voter_data' AS (name: chararray, age: int, registration: chararray, 
> contributions: float);
> C = COGROUP A BY name, B BY name;
> D = FOREACH C GENERATE group, flatten((IsEmpty(A) ? null : A)), 
> flatten((IsEmpty(B) ? null : B));
> describe D;
> E = FOREACH D GENERATE A::gpa, B::contributions;
> Give the following error: (Even though describe shows correct informat

[jira] Updated: (PIG-578) join ... outer, ... outer semantics are a no-ops, should produce corresponding null values

2008-12-24 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-578:


Issue Type: Improvement  (was: Bug)

Marking this as an improvement as Pig does not support outer joins as a 
language construct. The keyword outer is ignored in the join statement 
currently. This should be fixed to allow outer joins (left, right and full).

> join ... outer, ... outer semantics are a no-ops, should produce 
> corresponding null values
> --
>
> Key: PIG-578
> URL: https://issues.apache.org/jira/browse/PIG-578
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: types_branch
>Reporter: David Ciemiewicz
>
> Currently using the "OUTER" modifier in the JOIN statement is a no-op.  The 
> resuls of JOIN are always an INNER join.  Now that the Pig types branch 
> supports null values proper, the semantics of JOIN ... OUTER, ... OUTER 
> should be corrected to do proper outer joins and populating the corresponding 
> empty values with nulls.
> Here's the example:
> A = load 'a.txt' using PigStorage() as ( comment, value );
> B = load 'b.txt' using PigStorage() as ( comment, value );
> --
> -- OUTER clause is ignored in JOIN statement and does not populat tuple with
> -- null values as it should. Otherwise OUTER is a meaningless no-op modifier.
> --
> ABOuterJoin = join A by ( comment ) outer, B by ( comment ) outer;
> describe ABOuterJoin;
> dump ABOuterJoin;
> The file a contains:
> a-only  1
> ab-both 2
> The file b contains:
> ab-both 2
> b-only  3
> When you execute the script today, the dump results are:
> (ab-both,2,ab-both,2)
> The expected dump results should be:
> (a-only,1,,)
> (ab-both,2,ab-both,2)
> (,,b-only,3)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-577) outer join query looses name information

2008-12-24 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659152#action_12659152
 ] 

Santhosh Srinivasan commented on PIG-577:
-

The use of the null constant in the bincond in the context of a flatten should 
handle the following cases:

Assumption: one of the columns in the bincond is a null constant.

1. If the other column is a simple type or a map then cast the null to the 
other type.
2. If the other column is a complex type other than a map then remove the null 
constant and supplant it with 
a bag or a tuple or a map constant with the appropriate elements, i.e., if the 
other column
is a bag with a tuple that contains three columns (say int, float, chararray) 
then replace the
null constant with a bag that contains a tuple with three null constants. The 
same reasoning
applies to a tuple column.

Upon flattening the complex types will give out the appropriate number of 
columns.

Handling null constants for complex type has implications when the constant is 
materialized either
via dump or store. If the null constant is replaced with an appropriate 
bag/tuple/map then the materialized
constant will look like {(,,)} or (,,) or []. This conflicts with our existing 
view of nulls being empty when
materialized.

> outer join query looses name information
> 
>
> Key: PIG-577
> URL: https://issues.apache.org/jira/browse/PIG-577
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Olga Natkovich
> Fix For: types_branch
>
>
> The following query:
> A = LOAD 'student_data' AS (name: chararray, age: int, gpa: float);
> B = LOAD 'voter_data' AS (name: chararray, age: int, registration: chararray, 
> contributions: float);
> C = COGROUP A BY name, B BY name;
> D = FOREACH C GENERATE group, flatten((IsEmpty(A) ? null : A)), 
> flatten((IsEmpty(B) ? null : B));
> describe D;
> E = FOREACH D GENERATE A::gpa, B::contributions;
> Give the following error: (Even though describe shows correct information: D: 
> {group: chararray,A::name: chararray,A::age: int,A::gpa: float,B::name: 
> chararray,B::age: int,B::registration: chararray,B::contributions: float}
> java.io.IOException: Invalid alias: A::gpa in {group: 
> chararray,bytearray,bytearray}
> at org.apache.pig.PigServer.parseQuery(PigServer.java:298)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:263)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:439)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid 
> alias: A::gpa in {group: chararray,bytearray,bytearray}
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:5930)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:5788)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:3974)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:3871)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:3825)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:3734)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:3660)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:3626)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:3552)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:3462)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:3419)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:2894)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2309)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:966)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:742)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:537)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
> at org.apache.pig.PigServer.parseQuery(PigServer.java:295)
> ... 6 more

-- 
This message i

[jira] Created: (PIG-583) Bag constants used in non foreach statements cause lexical errors

2008-12-29 Thread Santhosh Srinivasan (JIRA)
Bag constants used in non foreach statements cause lexical errors
-

 Key: PIG-583
 URL: https://issues.apache.org/jira/browse/PIG-583
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Priority: Minor
 Fix For: types_branch


Use of bag constants in non-foreach statement cause lexical errors in Pig. The 
root cause is the inability of grunt to distinguish between nested block and 
bag constant in non-foreach statements.

{code}
grunt> a = load 'input'; 
grunt> b = filter a by ($0 eq {(1)});

2008-12-29 14:12:15,306 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
java.io.IOException: Encountered "  "eq "" at line 1, column 21.
Was expecting one of:
"*" ...
")" ...
"." ...
"+" ...
"-" ...
"/" ...
"%" ...
"#" ...
...
org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 2, 
column 29.  Encountered: ")" (41), after : ""
at 
org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:2608)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:658)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:84)
at 
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:94)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
at org.apache.pig.Main.main(Main.java:282)

{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-584) Error handling in Pig

2008-12-29 Thread Santhosh Srinivasan (JIRA)
Error handling in Pig
-

 Key: PIG-584
 URL: https://issues.apache.org/jira/browse/PIG-584
 Project: Pig
  Issue Type: New Feature
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: types_branch


This JIRA tracks the error handling feature in Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-585) Error handling requirements

2008-12-29 Thread Santhosh Srinivasan (JIRA)
Error handling requirements
---

 Key: PIG-585
 URL: https://issues.apache.org/jira/browse/PIG-585
 Project: Pig
  Issue Type: Sub-task
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: types_branch


The error handling feature requirements are documented at: 
http://wiki.apache.org/pig/PigErrorHandling

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-586) Error handling functional specification

2008-12-29 Thread Santhosh Srinivasan (JIRA)
Error handling functional specification
---

 Key: PIG-586
 URL: https://issues.apache.org/jira/browse/PIG-586
 Project: Pig
  Issue Type: Sub-task
  Components: documentation
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: types_branch


The error handling functional specification will be at: 
http://wiki.apache.org/pig/PigErrorHandlingFunctionalSpecification

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-587) Error handling design

2008-12-29 Thread Santhosh Srinivasan (JIRA)
Error handling design
-

 Key: PIG-587
 URL: https://issues.apache.org/jira/browse/PIG-587
 Project: Pig
  Issue Type: Sub-task
  Components: documentation
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: types_branch


The error handling design will be at: 
http://wiki.apache.org/pig/PigErrorHandlingDesign

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-585) Error handling requirements

2008-12-29 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-585:


Component/s: documentation

> Error handling requirements
> ---
>
> Key: PIG-585
> URL: https://issues.apache.org/jira/browse/PIG-585
> Project: Pig
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
>
> The error handling feature requirements are documented at: 
> http://wiki.apache.org/pig/PigErrorHandling

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-588) Error handling phase one

2008-12-29 Thread Santhosh Srinivasan (JIRA)
Error handling phase one


 Key: PIG-588
 URL: https://issues.apache.org/jira/browse/PIG-588
 Project: Pig
  Issue Type: Sub-task
  Components: grunt, impl, tools
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: types_branch


Phase one of error handling implementation will build the infrastructure for 
handling errors and handle errors in the parser and the type checker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-588) Error handling phase one

2008-12-29 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-588:


Attachment: Error_handling_phase_1.patch

The attached patch (Error_handling_phase_1.patch) includes the following:

1. Infrastructure for handling errors, i.e., logging detailed error messages to 
the client side log file, switches to control throwing detailed messages on 
user's screen and base classes for the exception hierarchy

2. Error codes and error messages for the parser and type checker

Unit tests have been modified to accommodate the new structure. No new unit 
test cases have been added.

> Error handling phase one
> 
>
> Key: PIG-588
> URL: https://issues.apache.org/jira/browse/PIG-588
> Project: Pig
>  Issue Type: Sub-task
>  Components: grunt, impl, tools
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: Error_handling_phase_1.patch
>
>
> Phase one of error handling implementation will build the infrastructure for 
> handling errors and handle errors in the parser and the type checker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-589) Error handling phase two

2008-12-29 Thread Santhosh Srinivasan (JIRA)
Error handling phase two


 Key: PIG-589
 URL: https://issues.apache.org/jira/browse/PIG-589
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: types_branch


Phase two of the implementation will cover the remainder of the logical layer 
and the front-end, i.e., the optimizer, the translators, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-590) Error handling phase three

2008-12-29 Thread Santhosh Srinivasan (JIRA)
Error handling phase three
--

 Key: PIG-590
 URL: https://issues.apache.org/jira/browse/PIG-590
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: types_branch


Phase three of the error handling feature will cover the backed including 
Hadoop specific error messages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-591) Error handling phase four

2008-12-29 Thread Santhosh Srinivasan (JIRA)
Error handling phase four
-

 Key: PIG-591
 URL: https://issues.apache.org/jira/browse/PIG-591
 Project: Pig
  Issue Type: Sub-task
  Components: grunt, impl, tools
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: types_branch


Phase four of the error handling feature will address the warning message 
cleanup and warning message aggregation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-588) Error handling phase one

2009-01-08 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-588:


Attachment: Error_handling_phase_1_1.patch

Attached patch adds a few more error codes to exceptions inside PigContext.java.

> Error handling phase one
> 
>
> Key: PIG-588
> URL: https://issues.apache.org/jira/browse/PIG-588
> Project: Pig
>  Issue Type: Sub-task
>  Components: grunt, impl, tools
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: Error_handling_phase_1.patch, 
> Error_handling_phase_1_1.patch
>
>
> Phase one of error handling implementation will build the infrastructure for 
> handling errors and handle errors in the parser and the type checker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-588) Error handling phase one

2009-01-09 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-588:


Attachment: Error_handling_phase_1_2.patch

Attaching new patch as the previous one did not have newly added files 
(PigException.java and TypeCheckerException.java)

> Error handling phase one
> 
>
> Key: PIG-588
> URL: https://issues.apache.org/jira/browse/PIG-588
> Project: Pig
>  Issue Type: Sub-task
>  Components: grunt, impl, tools
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: Error_handling_phase_1.patch, 
> Error_handling_phase_1_1.patch, Error_handling_phase_1_2.patch
>
>
> Phase one of error handling implementation will build the infrastructure for 
> handling errors and handle errors in the parser and the type checker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-588) Error handling phase one

2009-01-09 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-588:


Attachment: Error_handling_phase_1_3.patch

The attached patch ensures that error messages are sent to STDERR instead of 
STDOUT

> Error handling phase one
> 
>
> Key: PIG-588
> URL: https://issues.apache.org/jira/browse/PIG-588
> Project: Pig
>  Issue Type: Sub-task
>  Components: grunt, impl, tools
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: Error_handling_phase_1.patch, 
> Error_handling_phase_1_1.patch, Error_handling_phase_1_2.patch, 
> Error_handling_phase_1_3.patch
>
>
> Phase one of error handling implementation will build the infrastructure for 
> handling errors and handle errors in the parser and the type checker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-588) Error handling phase one

2009-01-09 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-588:


Attachment: Error_handling_phase_1_4.patch

Another patch in synchrony with the types branch after PIG-599

> Error handling phase one
> 
>
> Key: PIG-588
> URL: https://issues.apache.org/jira/browse/PIG-588
> Project: Pig
>  Issue Type: Sub-task
>  Components: grunt, impl, tools
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: Error_handling_phase_1.patch, 
> Error_handling_phase_1_1.patch, Error_handling_phase_1_2.patch, 
> Error_handling_phase_1_3.patch, Error_handling_phase_1_4.patch
>
>
> Phase one of error handling implementation will build the infrastructure for 
> handling errors and handle errors in the parser and the type checker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-616) Casts to complex types do not work as expected

2009-01-12 Thread Santhosh Srinivasan (JIRA)
Casts to complex types do not work as expected
--

 Key: PIG-616
 URL: https://issues.apache.org/jira/browse/PIG-616
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
 Fix For: types_branch


When we specify a (complex) type as a column in Pig, the TypeCastInserter 
inserts the appropriate cast for the (complex) type. However, in the 
implementation of POCast.java, when databyte arrays are converted to the 
(complex) types, we invoke the bytesToXXX method. 

For complex types, especially tuples and bags, we do not enforce the typing 
information specified by the user in the AS clause or with the explicit cast 
statement. The implementation solely relies on bytesToXXX to figure out the 
right type.

An example of a query that fails is given below. Wrt the query, the data is a 
single column that is a bag of integers. The user specifies this bag to be a 
bag of chararray. This conversion is allowed in pig but the implementation does 
not perform the actual cast. Instead the bytesToBag is called on the stream. 
The resulting type is a bag of integers and not a bag of chararray. In the 
subsequent statement the user (correctly) assumes that the conversion has been 
performed but in reality it has not been done. At run time when a chararray 
based operation is performed we see a ClassCastException.

The notion of a schema has is absent in the physical operators. The 
schema/fieldSchema in the logical layer has to be passed on to the physical 
layer. The schema can be used to perform additional operations like casting, 
etc.

{code}

grunt> cat bag.data
{(1)}

grunt> a = load 'bag.data' as (b:{t:(c:chararray)});
grunt> b = foreach a generate flatten(b);
grunt> c = foreach b generate CONCAT('Hello ', $0);
grunt> dump c;

2009-01-12 10:44:44,417 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2009-01-12 10:45:09,439 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Map reduce job failed
2009-01-12 10:45:09,440 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Job failed!
2009-01-12 10:45:09,443 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
message from task (map) 
task_200812151518_9681_m_00java.lang.ClassCastException: java.lang.Integer 
cannot be cast to java.lang.String
at org.apache.pig.builtin.StringConcat.exec(StringConcat.java:37)
at org.apache.pig.builtin.StringConcat.exec(StringConcat.java:31)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:259)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:271)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:187)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:175)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
...

2009-01-12 10:45:09,448 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1066: Unable to open iterator for alias c

2009-01-12 10:45:09,448 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
org.apache.pig.impl.logicalLayer.FrontendException: Unable to open iterator for 
alias c
at org.apache.pig.PigServer.openIterator(PigServer.java:426)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:271)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:72)
at org.apache.pig.Main.main(Main.java:302)

Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:420)
... 5 more

{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-617) Using SUM with basic type fails

2009-01-12 Thread Santhosh Srinivasan (JIRA)
Using SUM with basic type fails
---

 Key: PIG-617
 URL: https://issues.apache.org/jira/browse/PIG-617
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
 Fix For: types_branch


SUM is an aggregate function that expects a bag as an argument. When basic 
types are used as arguments to SUM, Pig fails during run time. The typechecker 
should catch this error and fail earlier.

An example is given below:

{code}
grunt> a = load 'one' as (i: int);
grunt> b = foreach a generate SUM(i);
grunt> dump b;

2009-01-12 14:11:47,595 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2009-01-12 14:12:12,617 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Map reduce job failed
2009-01-12 14:12:12,618 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Job failed!
2009-01-12 14:12:12,623 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
message from task (map) 
task_200812151518_9683_m_00java.lang.ClassCastException: java.lang.Integer 
cannot be cast to org.apache.pig.data.DataBag

2009-01-12 14:12:12,623 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
message from task (map) 
task_200812151518_9683_m_00java.lang.ClassCastException: java.lang.Integer 
cannot be cast to org.apache.pig.data.DataBag
at org.apache.pig.builtin.IntSum.sum(IntSum.java:141)
at org.apache.pig.builtin.IntSum.exec(IntSum.java:41)
at org.apache.pig.builtin.IntSum.exec(IntSum.java:36)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:247)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:265)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:187)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:175)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
...

2009-01-12 14:12:12,629 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1066: Unable to open iterator for alias b
2009-01-12 14:12:12,629 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
org.apache.pig.impl.logicalLayer.FrontendException: Unable to open iterator for 
alias b
at org.apache.pig.PigServer.openIterator(PigServer.java:425)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:271)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:72)
at org.apache.pig.Main.main(Main.java:302)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:419)
... 5 more

{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-589) Error handling phase two

2009-01-14 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-589:


Attachment: Error_handling_phase2.patch

The attached patch fulfills the requirements for phase two.

> Error handling phase two
> 
>
> Key: PIG-589
> URL: https://issues.apache.org/jira/browse/PIG-589
> Project: Pig
>  Issue Type: Sub-task
>  Components: impl
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: Error_handling_phase2.patch
>
>
> Phase two of the implementation will cover the remainder of the logical layer 
> and the front-end, i.e., the optimizer, the translators, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-621) Casts swallow exceptions when there are issues with conversion of bytes to Pig types

2009-01-15 Thread Santhosh Srinivasan (JIRA)
Casts swallow exceptions when there are issues with conversion of bytes to Pig 
types


 Key: PIG-621
 URL: https://issues.apache.org/jira/browse/PIG-621
 Project: Pig
  Issue Type: Bug
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
 Fix For: types_branch


In the current implementation of casts, exceptions thrown while converting 
bytes to Pig types are swallowed. Pig needs to either return NULL or rethrow 
the exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-589) Error handling phase two

2009-01-16 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-589:


Attachment: Error_handling_phase2_4.patch

Updated patch with a minor fix to rethrow an exception. See related bug PIG-621.

> Error handling phase two
> 
>
> Key: PIG-589
> URL: https://issues.apache.org/jira/browse/PIG-589
> Project: Pig
>  Issue Type: Sub-task
>  Components: impl
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: Error_handling_phase2.patch, 
> Error_handling_phase2_4.patch
>
>
> Phase two of the implementation will cover the remainder of the logical layer 
> and the front-end, i.e., the optimizer, the translators, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (PIG-571) pigserver methods do not throw error or return error code when an error occurs

2009-01-16 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664657#action_12664657
 ] 

sms edited comment on PIG-571 at 1/16/09 12:06 PM:
---

In the current implementation, Pig displays the errors including the stack 
trace but does not throw an exception. There are two problems in the existing 
code:

1. Hadoop returns status as String instead of serialized objects
2. Pig does not throw an exception on failures with the appropriate details.

As part of the error handling feature, Pig will handle point 2 in the third 
milestone(PIG-590) and request Hadoop to support status reporting via objects 
and not just Strings.

  was (Author: sms):
In the current implementation, Pig displays the errors including the stack 
trace but do not throw an exception. There are two problems in the existing 
code:

1. Hadoop returns status as String instead of serialized objects
2. Pig does not throw an exception on failures with the appropriate details.

As part of the error handling feature, Pig will handle point 2 in the third 
milestone(PIG-590) and request Hadoop to support status reporting via objects 
and not just Strings.
  
> pigserver methods do not throw error or return error code when an error occurs
> --
>
> Key: PIG-571
> URL: https://issues.apache.org/jira/browse/PIG-571
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Christopher Olston
>Assignee: Santhosh Srinivasan
>
> I do PigServer.registerQuery("store ..."), and the query fails. Pig prints a 
> bunch of stack traces but does not throw an error back to the caller. This is 
> a major problem because my client needs to know whether the Pig command 
> succeeded or failed.
> I saw this problem with registerQuery() ... the same problem may arise with 
> other PigServer methods as well, such as store(), copy(), etc. -- not sure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-571) pigserver methods do not throw error or return error code when an error occurs

2009-01-16 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664657#action_12664657
 ] 

Santhosh Srinivasan commented on PIG-571:
-

In the current implementation, Pig displays the errors including the stack 
trace but do not throw an exception. There are two problems in the existing 
code:

1. Hadoop returns status as String instead of serialized objects
2. Pig does not throw an exception on failures with the appropriate details.

As part of the error handling feature, Pig will handle point 2 in the third 
milestone(PIG-590) and request Hadoop to support status reporting via objects 
and not just Strings.

> pigserver methods do not throw error or return error code when an error occurs
> --
>
> Key: PIG-571
> URL: https://issues.apache.org/jira/browse/PIG-571
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Christopher Olston
>Assignee: Santhosh Srinivasan
>
> I do PigServer.registerQuery("store ..."), and the query fails. Pig prints a 
> bunch of stack traces but does not throw an error back to the caller. This is 
> a major problem because my client needs to know whether the Pig command 
> succeeded or failed.
> I saw this problem with registerQuery() ... the same problem may arise with 
> other PigServer methods as well, such as store(), copy(), etc. -- not sure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-571) pigserver methods do not throw error or return error code when an error occurs

2009-01-16 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664680#action_12664680
 ] 

Santhosh Srinivasan commented on PIG-571:
-

As an intermediate step, Pig will parser the Hadoop status message and create 
an exception with relevant details.

> pigserver methods do not throw error or return error code when an error occurs
> --
>
> Key: PIG-571
> URL: https://issues.apache.org/jira/browse/PIG-571
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Christopher Olston
>Assignee: Santhosh Srinivasan
>
> I do PigServer.registerQuery("store ..."), and the query fails. Pig prints a 
> bunch of stack traces but does not throw an error back to the caller. This is 
> a major problem because my client needs to know whether the Pig command 
> succeeded or failed.
> I saw this problem with registerQuery() ... the same problem may arise with 
> other PigServer methods as well, such as store(), copy(), etc. -- not sure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (PIG-571) pigserver methods do not throw error or return error code when an error occurs

2009-01-16 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664680#action_12664680
 ] 

sms edited comment on PIG-571 at 1/16/09 1:18 PM:
--

As an intermediate step, Pig will parse the Hadoop status message and create an 
exception with relevant details.

  was (Author: sms):
As an intermediate step, Pig will parser the Hadoop status message and 
create an exception with relevant details.
  
> pigserver methods do not throw error or return error code when an error occurs
> --
>
> Key: PIG-571
> URL: https://issues.apache.org/jira/browse/PIG-571
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Christopher Olston
>Assignee: Santhosh Srinivasan
>
> I do PigServer.registerQuery("store ..."), and the query fails. Pig prints a 
> bunch of stack traces but does not throw an error back to the caller. This is 
> a major problem because my client needs to know whether the Pig command 
> succeeded or failed.
> I saw this problem with registerQuery() ... the same problem may arise with 
> other PigServer methods as well, such as store(), copy(), etc. -- not sure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-623) Fix spelling errors in output messages

2009-01-16 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-623:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch has been committed. Thanks for your contribution Tom.

> Fix spelling errors in output messages
> --
>
> Key: PIG-623
> URL: https://issues.apache.org/jira/browse/PIG-623
> Project: Pig
>  Issue Type: Improvement
>Reporter: Tom White
>Priority: Trivial
> Attachments: pig-623.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-622) Include pig executable in distribution

2009-01-16 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-622:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed. Thanks for your contribution Tom.

> Include pig executable in distribution
> --
>
> Key: PIG-622
> URL: https://issues.apache.org/jira/browse/PIG-622
> Project: Pig
>  Issue Type: Bug
>Reporter: Tom White
> Attachments: pig-622.patch
>
>
> Running "ant tar" does not generate the bin directory with the pig executable 
> in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-571) pigserver methods do not throw error or return error code when an error occurs

2009-01-16 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664787#action_12664787
 ] 

Santhosh Srinivasan commented on PIG-571:
-

Thanks for the patch Laukik. A similar fix has been made as part of PIG-588. 
Its pending a review. This bug addresses the fact that Pig does not throw an 
exception when the registerQuery() method reports a failure. This affects Java 
programs that use this API.

> pigserver methods do not throw error or return error code when an error occurs
> --
>
> Key: PIG-571
> URL: https://issues.apache.org/jira/browse/PIG-571
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Christopher Olston
>Assignee: Santhosh Srinivasan
> Attachments: ret_code.diff
>
>
> I do PigServer.registerQuery("store ..."), and the query fails. Pig prints a 
> bunch of stack traces but does not throw an error back to the caller. This is 
> a major problem because my client needs to know whether the Pig command 
> succeeded or failed.
> I saw this problem with registerQuery() ... the same problem may arise with 
> other PigServer methods as well, such as store(), copy(), etc. -- not sure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-545) PERFORMANCE: Sampler for order bys does not produce a good distribution

2009-01-22 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666234#action_12666234
 ] 

Santhosh Srinivasan commented on PIG-545:
-


Two, just getting better sampling won't resolve the issue for order by queries 
that have one or a few keys with a very high number of values, such as in a 
zipf distribution. Unfortunately for us, zipf is a very common data 
distribution. In this case our partitioner may need to be able to detect and 
split large keys by round robining them to a group of reducers.


Better sampling will not resolve the issue for order by. It will help in having 
more evenly sized partitions for the reducers. Since its sampling and not brute 
force approach of checking out the cardinality of each key, there will always 
be a non-zero probability of one reducer getting more data than the other 
reducers. The better sampling approach will minimize such occurrences.

Secondly, post sampling, we can ensure that reducers get the right partitions 
by using Hadoop's ability to pick reducers based on partition functions. I am 
not quite sure how Pig can propose a generic partition function to achieve this.

> PERFORMANCE: Sampler for order bys does not produce a good distribution
> ---
>
> Key: PIG-545
> URL: https://issues.apache.org/jira/browse/PIG-545
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: types_branch
>Reporter: Alan Gates
>Assignee: Amir Youssefi
> Fix For: types_branch
>
>
> In running tests on actual data, I've noticed that the final reduce of an 
> order by has skewed partitions.  Some reduces finish in a few seconds while 
> some run for 20 minutes.  Getting a better distribution should lead to much 
> better performance for order by.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-589) Error handling phase two

2009-01-22 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-589:


Attachment: Error_handling_phase2_5.patch

Attached patch is in synchrony with the latest sources.

> Error handling phase two
> 
>
> Key: PIG-589
> URL: https://issues.apache.org/jira/browse/PIG-589
> Project: Pig
>  Issue Type: Sub-task
>  Components: impl
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: Error_handling_phase2.patch, 
> Error_handling_phase2_4.patch, Error_handling_phase2_5.patch
>
>
> Phase two of the implementation will cover the remainder of the logical layer 
> and the front-end, i.e., the optimizer, the translators, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-589) Error handling phase two

2009-01-22 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666428#action_12666428
 ] 

Santhosh Srinivasan commented on PIG-589:
-

Will fix issue (1). 
For issue (2), the list of matching functions is internal to Pig and is 
probably something that users should not be made aware of. It should probably 
be part of the detailed message that is logged to the file.

> Error handling phase two
> 
>
> Key: PIG-589
> URL: https://issues.apache.org/jira/browse/PIG-589
> Project: Pig
>  Issue Type: Sub-task
>  Components: impl
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: Error_handling_phase2.patch, 
> Error_handling_phase2_4.patch, Error_handling_phase2_5.patch
>
>
> Phase two of the implementation will cover the remainder of the logical layer 
> and the front-end, i.e., the optimizer, the translators, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-615) Wrong number of jobs with limit

2009-01-23 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1254#action_1254
 ] 

Santhosh Srinivasan commented on PIG-615:
-

I will be reviewing this patch.

> Wrong number of jobs with limit
> ---
>
> Key: PIG-615
> URL: https://issues.apache.org/jira/browse/PIG-615
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Shravan Matthur Narayanamurthy
> Attachments: 615.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-553) EvalFunc.finish() not getting called

2009-01-23 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1252#action_1252
 ] 

Santhosh Srinivasan commented on PIG-553:
-

I will be reviewing this patch.

> EvalFunc.finish() not getting called
> 
>
> Key: PIG-553
> URL: https://issues.apache.org/jira/browse/PIG-553
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
> Environment: "local" mode
>Reporter: Christopher Olston
>Assignee: Shravan Matthur Narayanamurthy
> Attachments: 553.patch
>
>
> My EvalFunc's finish() method doesn't seem to get invoked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-553) EvalFunc.finish() not getting called

2009-01-23 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1279#action_1279
 ] 

Santhosh Srinivasan commented on PIG-553:
-

Review comments:

1. The code looks fine.
2. There are no unit test cases. We need unit test cases to ensure that the 
code path is exercised in all cases (preferably at least map reduce case)
3. In algebraic functions, since intermediate is called only in PigCombiner and 
the UDF visitor is never called in the PigCombiner, users should be aware that 
finish() is never called for intermediate methods. The UDF documentation has to 
be updated to reflect this caveat.

> EvalFunc.finish() not getting called
> 
>
> Key: PIG-553
> URL: https://issues.apache.org/jira/browse/PIG-553
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
> Environment: "local" mode
>Reporter: Christopher Olston
>Assignee: Shravan Matthur Narayanamurthy
> Attachments: 553.patch
>
>
> My EvalFunc's finish() method doesn't seem to get invoked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-615) Wrong number of jobs with limit

2009-01-23 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-615:


Attachment: 615_1.patch

Attached patch includes Shravan's fix along with test cases that I added. 
Running unit test cases now.

> Wrong number of jobs with limit
> ---
>
> Key: PIG-615
> URL: https://issues.apache.org/jira/browse/PIG-615
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Shravan Matthur Narayanamurthy
> Attachments: 615.patch, 615_1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-615) Wrong number of jobs with limit

2009-01-23 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-615:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

All unit test cases passed. Patch has been committed. Thanks for the fix 
Shravan.

> Wrong number of jobs with limit
> ---
>
> Key: PIG-615
> URL: https://issues.apache.org/jira/browse/PIG-615
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Shravan Matthur Narayanamurthy
> Attachments: 615.patch, 615_1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-635) POCast.java has incorrect formatting

2009-01-23 Thread Santhosh Srinivasan (JIRA)
POCast.java has incorrect formatting


 Key: PIG-635
 URL: https://issues.apache.org/jira/browse/PIG-635
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
Priority: Trivial
 Fix For: types_branch


POCast.java has incorrect formatting. This crept in as part of PIG-589.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-635) POCast.java has incorrect formatting

2009-01-23 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-635:


Attachment: POCast.patch

Patch to correct formatting in POCast.java

> POCast.java has incorrect formatting
> 
>
> Key: PIG-635
> URL: https://issues.apache.org/jira/browse/PIG-635
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
>Priority: Trivial
> Fix For: types_branch
>
> Attachments: POCast.patch
>
>
> POCast.java has incorrect formatting. This crept in as part of PIG-589.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-635) POCast.java has incorrect formatting

2009-01-24 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan resolved PIG-635.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Resolved. Fix has been committed.

> POCast.java has incorrect formatting
> 
>
> Key: PIG-635
> URL: https://issues.apache.org/jira/browse/PIG-635
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
>Priority: Trivial
> Fix For: types_branch
>
> Attachments: POCast.patch
>
>
> POCast.java has incorrect formatting. This crept in as part of PIG-589.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-635) POCast.java has incorrect formatting

2009-01-24 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-635:


Patch Info: [Patch Available]

> POCast.java has incorrect formatting
> 
>
> Key: PIG-635
> URL: https://issues.apache.org/jira/browse/PIG-635
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: types_branch
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
>Priority: Trivial
> Fix For: types_branch
>
> Attachments: POCast.patch
>
>
> POCast.java has incorrect formatting. This crept in as part of PIG-589.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   >