date:20091105

[jira] Updated: (PIG-1060) MultiQuery optimization throws error for multi-level splits

2009-11-05 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1060:
--

Status: Open  (was: Patch Available)

 MultiQuery optimization throws error for multi-level splits
 ---

 Key: PIG-1060
 URL: https://issues.apache.org/jira/browse/PIG-1060
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Ankur
Assignee: Richard Ding
 Attachments: PIG-1060.patch


 Consider the following scenario :-
 1. Multi-level splits in the map plan.
 2. Each split branch further progressing across a local-global rearrange.
 3. Output of each of these finally merged via a UNION.
 MultiQuery optimizer throws the following error in such a case:
 ERROR 2146: Internal Error. Inconsistency in key index found during 
 optimization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1060) MultiQuery optimization throws error for multi-level splits

2009-11-05 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1060:
--

Status: Patch Available  (was: Open)

 MultiQuery optimization throws error for multi-level splits
 ---

 Key: PIG-1060
 URL: https://issues.apache.org/jira/browse/PIG-1060
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Ankur
Assignee: Richard Ding
 Attachments: PIG-1060.patch


 Consider the following scenario :-
 1. Multi-level splits in the map plan.
 2. Each split branch further progressing across a local-global rearrange.
 3. Output of each of these finally merged via a UNION.
 MultiQuery optimizer throws the following error in such a case:
 ERROR 2146: Internal Error. Inconsistency in key index found during 
 optimization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

2009-11-05 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774054#action_12774054
 ] 

Pradeep Kamath commented on PIG-1065:
-

This is an instance of the problem of representing unknown schema with a null 
schema. If the schema of a relational operator is null, pig assumes the fields 
are of type bytearray which is incorrect. An unknown schema really means we 
don't know the types of the fields. In the above case, once pig determines that 
the two schemas have different sizes, it sets the schema of LOUnion to null (to 
represent unknown schema). Hence the order by expects the fields coming out of 
the union to be byte arrays but in reality the first field (which is the sort 
key above) is a chararray - this results in a runtime exception.

I propose that when either of the two inputs to a union have a schema we should 
error out if the two are incompatible and not continue. If the two inputs don't 
have a schema then we can proceed with null schema - thoughts?

 In-determinate behaviour of Union when there are 2 non-matching schema's
 

 Key: PIG-1065
 URL: https://issues.apache.org/jira/browse/PIG-1065
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.6.0


 I have a script which first does a union of these schemas and then does a 
 ORDER BY of this result.
 {code}
 f1 = LOAD '1.txt' as (key:chararray, v:chararray);
 f2 = LOAD '2.txt' as (key:chararray);
 u0 = UNION f1, f2;
 describe u0;
 dump u0;
 u1 = ORDER u0 BY $0;
 dump u1;
 {code}
 When I run in Map Reduce mode I get the following result:
 $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
 
 Schema for u0 unknown.
 
 (1,2)
 (2,3)
 (1)
 (2)
 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias u1
 at org.apache.pig.PigServer.openIterator(PigServer.java:475)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:397)
 
 Caused by: java.io.IOException: Type mismatch in key from map: expected 
 org.apache.pig.impl.io.NullableBytesWritable, recieved 
 org.apache.pig.impl.io.NullableText
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
 
 When I run the same script in local mode I get a different result, as we know 
 that local mode does not use any Hadoop Classes.
 $java -cp pig.jar org.apache.pig.Main -x local broken.pig
 
 Schema for u0 unknown
 
 (1,2)
 (1)
 (2,3)
 (2)
 
 (1,2)
 (1)
 (2,3)
 (2)
 
 Here are some questions
 1) Why do we allow union if the schemas do not match
 2) Should we not print an error message/warning so that the user knows that 
 this is not allowed or he can get unexpected results?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1071) Support comma separated file/directory names in load statements

2009-11-05 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774061#action_12774061
]

Hadoop QA commented on PIG-1071:

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12424056/PIG-1071.patch
against trunk revision 832804.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 5 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/140/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/140/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/140/console

This message is automatically generated.

Support comma separated file/directory names in load statements
---

Key: PIG-1071
URL: https://issues.apache.org/jira/browse/PIG-1071
Project: Pig
Issue Type: New Feature
Reporter: Richard Ding
Assignee: Richard Ding
Attachments: PIG-1071.patch

Currently Pig Latin support following LOAD syntax:
{code}
LOAD 'data' [USING loader function] [AS schema];
{code}
where data is the name of the file or directory, including files specified
with Hadoop-supported globing syntax. This name is passed to the loader
function.
This feature is to support loaders that can load multiple files from
different directories and allows users to pass in the file names in a comma
separated string.
For example, these will be valid load statements:
{code}
LOAD '/usr/pig/test1/a,/usr/pig/test2/b' USING someloader()';
{code}
and
{code}
LOAD '/usr/pig/test1/{a,c},/usr/pig/test2/b' USING someloader();
{code}
This comma separated string is passed to the loader.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

2009-11-05 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774065#action_12774065
 ] 

Thejas M Nair commented on PIG-1065:


Can this be allowed (in case of incompatible schemas as in description) - u0 = 
UNION f1, f2 as (key:chararray, v:chararray); ?


 In-determinate behaviour of Union when there are 2 non-matching schema's
 

 Key: PIG-1065
 URL: https://issues.apache.org/jira/browse/PIG-1065
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.6.0


 I have a script which first does a union of these schemas and then does a 
 ORDER BY of this result.
 {code}
 f1 = LOAD '1.txt' as (key:chararray, v:chararray);
 f2 = LOAD '2.txt' as (key:chararray);
 u0 = UNION f1, f2;
 describe u0;
 dump u0;
 u1 = ORDER u0 BY $0;
 dump u1;
 {code}
 When I run in Map Reduce mode I get the following result:
 $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
 
 Schema for u0 unknown.
 
 (1,2)
 (2,3)
 (1)
 (2)
 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias u1
 at org.apache.pig.PigServer.openIterator(PigServer.java:475)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:397)
 
 Caused by: java.io.IOException: Type mismatch in key from map: expected 
 org.apache.pig.impl.io.NullableBytesWritable, recieved 
 org.apache.pig.impl.io.NullableText
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
 
 When I run the same script in local mode I get a different result, as we know 
 that local mode does not use any Hadoop Classes.
 $java -cp pig.jar org.apache.pig.Main -x local broken.pig
 
 Schema for u0 unknown
 
 (1,2)
 (1)
 (2,3)
 (2)
 
 (1,2)
 (1)
 (2,3)
 (2)
 
 Here are some questions
 1) Why do we allow union if the schemas do not match
 2) Should we not print an error message/warning so that the user knows that 
 this is not allowed or he can get unexpected results?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-997) [zebra] Sorted Table Support by Zebra

2009-11-05 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-997:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

All the nightly tests now pass.  Patch checked in.

 [zebra] Sorted Table Support by Zebra
 -

 Key: PIG-997
 URL: https://issues.apache.org/jira/browse/PIG-997
 Project: Pig
  Issue Type: New Feature
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.6.0

 Attachments: SortedTable.patch, SortedTable.patch, SortedTable.patch, 
 SortedTable.patch


 This new feature is for Zebra to support sorted data in storage. As a storage 
 library, Zebra will not sort the data by itself. But it will support creation 
 and use of sorted data either through PIG  or through map/reduce tasks that 
 use Zebra as storage format.
 The sorted table keeps the data in a totally sorted manner across all 
 TFiles created by potentially all mappers or reducers.
 For sorted data creation through PIG's STORE operator ,  if the input data is 
 sorted through ORDER BY, the new Zebra table will be marked as sorted on 
 the sorted columns;
 For sorted data creation though Map/Reduce tasks,  three new static methods 
 of the BasicTableOutput class will be provided to allow or help the user to 
 achieve the goal. setSortInfo allows the user to specify the sorted columns 
 of the input tuple to be stored; getSortKeyGenerator and getSortKey help 
 the user to generate the key acceptable by Zebra as a sorted key based upon 
 the schema, sorted columns and the input tuple.
 For sorted data read through PIG's LOAD operator, pass string sorted as an 
 extra argument to the TableLoader constructor to ask for sorted table to be 
 loaded;
 For sorted data read through Map/Reduce tasks, a new static method of 
 TableInputFormat class, requireSortedTable, can be called to ask for a sorted 
 table to be read. Additionally, an overloaded version of the new method can 
 be called to ask for a sorted table on specified sort columns and comparator.
 For this release, sorted table only supported sorting in ascending order, not 
 in descending order. In addition, the sort keys must be of simple types not 
 complex types such as RECORD, COLLECTION and MAP. 
 Multiple-key sorting is supported. But the ordering of the multiple sort keys 
 is significant with the first sort column being the primary sort key, the 
 second being the secondary sort key, etc.
 In this release, the sort keys are stored along with the sort columns where 
 the keys were originally created from, resulting in some data storage 
 redundancy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

2009-11-05 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774086#action_12774086
 ] 

Pradeep Kamath commented on PIG-1065:
-

Currently pig parser does not allow specifying a schema for the union. If we do 
want to allow it there are a few details which arise:
1) From what I know, currently pig allowing specifying row schemas only in load 
statements. Is this restriction by design? ForEach allows only to give 
different schemas at individual field level and even there the type has to 
match if I recollect right.
2)  If the input schemas ae unequal in size, should the specified union schema 
size strictly be MAX of the input schema sizes with nulls being projected for 
missing columns in the input with the smaller schema? So a specified schema of 
size  MAX(size of input schemas) will not be allowed?
3) Can the specified union schema have different types (castable) than the 
result of merging the two input schemas - for example if after merging the two 
input schemas if the first input has int and if specified schema has long? What 
about demotions like the merged schema being long and specified one being int - 
would those be disallowed? - I suppose we just allow whatever is allowed in 
casts

 In-determinate behaviour of Union when there are 2 non-matching schema's
 

 Key: PIG-1065
 URL: https://issues.apache.org/jira/browse/PIG-1065
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.6.0


 I have a script which first does a union of these schemas and then does a 
 ORDER BY of this result.
 {code}
 f1 = LOAD '1.txt' as (key:chararray, v:chararray);
 f2 = LOAD '2.txt' as (key:chararray);
 u0 = UNION f1, f2;
 describe u0;
 dump u0;
 u1 = ORDER u0 BY $0;
 dump u1;
 {code}
 When I run in Map Reduce mode I get the following result:
 $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
 
 Schema for u0 unknown.
 
 (1,2)
 (2,3)
 (1)
 (2)
 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias u1
 at org.apache.pig.PigServer.openIterator(PigServer.java:475)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:397)
 
 Caused by: java.io.IOException: Type mismatch in key from map: expected 
 org.apache.pig.impl.io.NullableBytesWritable, recieved 
 org.apache.pig.impl.io.NullableText
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
 
 When I run the same script in local mode I get a different result, as we know 
 that local mode does not use any Hadoop Classes.
 $java -cp pig.jar org.apache.pig.Main -x local broken.pig
 
 Schema for u0 unknown
 
 (1,2)
 (1)
 (2,3)
 (2)
 
 (1,2)
 (1)
 (2,3)
 (2)
 
 Here are some questions
 1) Why do we allow union if the schemas do not match
 2) Should we not print an error message/warning so that the user knows that 
 this is not allowed or he can get unexpected results?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-958) Splitting output data on key field

2009-11-05 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath resolved PIG-958.


   Resolution: Fixed
Fix Version/s: 0.6.0
 Hadoop Flags: [Reviewed]

Patch committed, thanks for the contribution Ankur!

 Splitting output data on key field
 --

 Key: PIG-958
 URL: https://issues.apache.org/jira/browse/PIG-958
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Ankur
 Fix For: 0.6.0

 Attachments: 958.v3.patch, 958.v4.patch


 Pig users often face the need to split the output records into a bunch of 
 files and directories depending on the type of record. Pig's SPLIT operator 
 is useful when record types are few and known in advance. In cases where type 
 is not directly known but is derived dynamically from values of a key field 
 in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1071) Support comma separated file/directory names in load statements

2009-11-05 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1071:
--

Attachment: PIG-1071.patch

Added two more test cases.

 Support comma separated file/directory names in load statements
 ---

 Key: PIG-1071
 URL: https://issues.apache.org/jira/browse/PIG-1071
 Project: Pig
  Issue Type: New Feature
Reporter: Richard Ding
Assignee: Richard Ding
 Attachments: PIG-1071.patch, PIG-1071.patch


 Currently Pig Latin support following LOAD syntax:
 {code}
 LOAD 'data' [USING loader function] [AS schema];  
 {code}
 where data is the name of the file or directory, including files specified 
 with Hadoop-supported globing syntax. This name is passed to the loader 
 function.
 This feature is to support loaders that can load multiple files from 
 different directories and allows users to pass in the file names in a comma 
 separated string.
 For example, these will be valid load statements:
 {code}
 LOAD '/usr/pig/test1/a,/usr/pig/test2/b' USING someloader()';
 {code}
 and 
 {code}
 LOAD '/usr/pig/test1/{a,c},/usr/pig/test2/b' USING someloader();
 {code}
 This comma separated string is passed to the loader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1060) MultiQuery optimization throws error for multi-level splits

2009-11-05 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774115#action_12774115
]

Hadoop QA commented on PIG-1060:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12424143/PIG-1060.patch
against trunk revision 833126.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

-1 release audit. The applied patch generated 319 release audit warnings
(more than the trunk's current 318 warnings).

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/testReport/
Release audit warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/console

This message is automatically generated.

MultiQuery optimization throws error for multi-level splits
---

Key: PIG-1060
URL: https://issues.apache.org/jira/browse/PIG-1060
Project: Pig
Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Ankur
Assignee: Richard Ding
Attachments: PIG-1060.patch

Consider the following scenario :-
1. Multi-level splits in the map plan.
2. Each split branch further progressing across a local-global rearrange.
3. Output of each of these finally merged via a UNION.
MultiQuery optimizer throws the following error in such a case:
ERROR 2146: Internal Error. Inconsistency in key index found during
optimization.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

How to clone a logical plan ?

2009-11-05 Thread Ashutosh Chauhan

Hi,

For our cost based optimizer for a given query plan we need to generate
alternative query plans and evaluate them based on their estimated cost. As
a result of that, I want to clone a logical plan. I thought
LogicalPlanCloner is meant for that, but it doesnt seem to work. I added
this simple test case in TestLogicalPlanBuilder.java

public void testLogicalPlanCloneHelper() throws
CloneNotSupportedException{
LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B')
by $0;);
LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
cloner.getClonedPlan();
}

and this fails with the following stacktrace:

java.lang.NullPointerException
at
org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
at
org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46)
at
org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110)

I am debugging this, but wanted to ask if I have hit a bug here or if I am
doing something wrong?

Thanks,
Ashutosh

[jira] Created: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner

2009-11-05 Thread Pradeep Kamath (JIRA)

ReversibleLoadStoreFunc interface should be removed to enable different load 
and store implementation classes to be used in a reversible manner
---

 Key: PIG-1072
 URL: https://issues.apache.org/jira/browse/PIG-1072
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner

2009-11-05 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774135#action_12774135
 ] 

Pradeep Kamath commented on PIG-1072:
-

The requirement that to be reversible the load and store func must be the same 
class seems restricting.  While we're reworking the APIs should we add a call 
like:
{code}
Class getReversibleLoader()
{code}
to the StoreFunc interface.  Then the store function can return itself if it is 
also the load function, it can return null if it has no reversible loader, or 
it can return another class (like in the Zebra case).

This will effect Multi Query optimization and streaming optimization currently 
built around ReversibleLoadStoreFunc


 ReversibleLoadStoreFunc interface should be removed to enable different load 
 and store implementation classes to be used in a reversible manner
 ---

 Key: PIG-1072
 URL: https://issues.apache.org/jira/browse/PIG-1072
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: How to clone a logical plan ?

2009-11-05 Thread Santhosh Srinivasan

You have hit a bug. I think LOJoin has to be added to
LogicalPlanCloneHelper.java. Can you file a jira?

Thanks,
Santhosh

-Original Message-
From: Ashutosh Chauhan [mailto:ashutosh.chau...@gmail.com] 
Sent: Thursday, November 05, 2009 3:28 PM
To: pig-dev@hadoop.apache.org
Subject: How to clone a logical plan ?

Hi,

For our cost based optimizer for a given query plan we need to generate
alternative query plans and evaluate them based on their estimated cost.
As a result of that, I want to clone a logical plan. I thought
LogicalPlanCloner is meant for that, but it doesnt seem to work. I added
this simple test case in TestLogicalPlanBuilder.java

public void testLogicalPlanCloneHelper() throws
CloneNotSupportedException{
LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load
'B') by $0;);
LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
cloner.getClonedPlan();
}

and this fails with the following stacktrace:

java.lang.NullPointerException
at
org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
at
org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
at
org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
va:67)
at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
va:69)
at
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at
org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(Lo
gicalPlanCloneHelper.java:73)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(Logical
PlanCloner.java:46)
at
org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(Te
stLogicalPlanBuilder.java:2110)

I am debugging this, but wanted to ask if I have hit a bug here or if I
am doing something wrong?

Thanks,
Ashutosh

[jira] Created: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin

2009-11-05 Thread Ashutosh Chauhan (JIRA)

LogicalPlanCloner can't clone plan containing LOJoin


 Key: PIG-1073
 URL: https://issues.apache.org/jira/browse/PIG-1073
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan


Add following testcase in LogicalPlanBuilder.java

public void testLogicalPlanCloner() throws CloneNotSupportedException{
LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by 
$0;);
LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
cloner.getClonedPlan();
}

and this fails with the following stacktrace:

java.lang.NullPointerException
at org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
at org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at 
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73)
at 
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46)
at 
org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: How to clone a logical plan ?

2009-11-05 Thread Santhosh Srinivasan

If my memory serves me correctly, the logical plan cloning was
implemented (by me) for cloning inner plans for foreach. As such, the
top level plan cloning was never tested and some items are marked as
TODO (see visit methods for LOLoad, LOStore and LOStream).

If you want to use it as you mention in your test cases, then you need
to add code for cloning the LOLoad, LOStore, LOStream and LOJoin.

Santhosh


-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Thursday, November 05, 2009 4:04 PM
To: pig-dev@hadoop.apache.org
Subject: RE: How to clone a logical plan ?

You have hit a bug. I think LOJoin has to be added to
LogicalPlanCloneHelper.java. Can you file a jira?

Thanks,
Santhosh

-Original Message-
From: Ashutosh Chauhan [mailto:ashutosh.chau...@gmail.com]
Sent: Thursday, November 05, 2009 3:28 PM
To: pig-dev@hadoop.apache.org
Subject: How to clone a logical plan ?

Hi,

For our cost based optimizer for a given query plan we need to generate
alternative query plans and evaluate them based on their estimated cost.
As a result of that, I want to clone a logical plan. I thought
LogicalPlanCloner is meant for that, but it doesnt seem to work. I added
this simple test case in TestLogicalPlanBuilder.java

public void testLogicalPlanCloneHelper() throws
CloneNotSupportedException{
LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load
'B') by $0;);
LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
cloner.getClonedPlan();
}

and this fails with the following stacktrace:

java.lang.NullPointerException
at
org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
at
org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
at
org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
va:67)
at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
va:69)
at
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at
org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(Lo
gicalPlanCloneHelper.java:73)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(Logical
PlanCloner.java:46)
at
org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(Te
stLogicalPlanBuilder.java:2110)

I am debugging this, but wanted to ask if I have hit a bug here or if I
am doing something wrong?

Thanks,
Ashutosh

[jira] Commented: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin

2009-11-05 Thread Santhosh Srinivasan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774147#action_12774147
 ] 

Santhosh Srinivasan commented on PIG-1073:
--

If my memory serves me correctly, the logical plan cloning was implemented (by 
me) for cloning inner plans for foreach. As such, the top level plan cloning 
was never tested and some items are marked as TODO (see visit methods for 
LOLoad, LOStore and LOStream).

If you want to use it as you mention in your test cases, then you need to add 
code for cloning the LOLoad, LOStore, LOStream and LOJoin operators.


 LogicalPlanCloner can't clone plan containing LOJoin
 

 Key: PIG-1073
 URL: https://issues.apache.org/jira/browse/PIG-1073
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan

 Add following testcase in LogicalPlanBuilder.java
 public void testLogicalPlanCloner() throws CloneNotSupportedException{
 LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by 
 $0;);
 LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
 cloner.getClonedPlan();
 }
 and this fails with the following stacktrace:
 java.lang.NullPointerException
 at 
 org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
 at 
 org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
 at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
 at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46)
 at 
 org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin

2009-11-05 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774146#action_12774146
 ] 

Ashutosh Chauhan commented on PIG-1073:
---

It seems that fix is to override the visit method in LogicalPlanCloneHelper.java

@Override
protected void visit(LOJoin loJoin) throws VisitorException { .. }

 LogicalPlanCloner can't clone plan containing LOJoin
 

 Key: PIG-1073
 URL: https://issues.apache.org/jira/browse/PIG-1073
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan

 Add following testcase in LogicalPlanBuilder.java
 public void testLogicalPlanCloner() throws CloneNotSupportedException{
 LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by 
 $0;);
 LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
 cloner.getClonedPlan();
 }
 and this fails with the following stacktrace:
 java.lang.NullPointerException
 at 
 org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
 at 
 org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
 at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
 at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46)
 at 
 org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1026) [zebra] map split returns null

2009-11-05 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1026:
--

Status: Patch Available  (was: Open)

 [zebra] map split returns null
 --

 Key: PIG-1026
 URL: https://issues.apache.org/jira/browse/PIG-1026
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Yan Zhou
 Fix For: 0.6.0

 Attachments: PIG_1026.patch


 Here is the test scenario:
  final static String STR_SCHEMA = m1:map(string),m2:map(map(int));
   //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, 
 m2#{z}];[m1];
  final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, 
 m2#{z}];[m1,m2];
 projection: String projection2 = new String(m1#{b}, m2#{x|z});
 User got null pointer exception on reading m1#{b}.
 Yan, please refer to the test class:
 TestNonDefaultWholeMapSplit.java 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

2009-11-05 Thread Santhosh Srinivasan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774153#action_12774153
 ] 

Santhosh Srinivasan commented on PIG-1065:
--

Answer to Question 1: Pig 1.0 had that syntax and it was retained for backward 
compatibility. Paolo suggested that for uniformity, the 'AS' clause for the 
load statements should be extended to all relational operators. Gradually, the 
column aliasing in the foreach should be removed from the documentation and 
eventually removed from the language.

 In-determinate behaviour of Union when there are 2 non-matching schema's
 

 Key: PIG-1065
 URL: https://issues.apache.org/jira/browse/PIG-1065
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.6.0


 I have a script which first does a union of these schemas and then does a 
 ORDER BY of this result.
 {code}
 f1 = LOAD '1.txt' as (key:chararray, v:chararray);
 f2 = LOAD '2.txt' as (key:chararray);
 u0 = UNION f1, f2;
 describe u0;
 dump u0;
 u1 = ORDER u0 BY $0;
 dump u1;
 {code}
 When I run in Map Reduce mode I get the following result:
 $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
 
 Schema for u0 unknown.
 
 (1,2)
 (2,3)
 (1)
 (2)
 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias u1
 at org.apache.pig.PigServer.openIterator(PigServer.java:475)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:397)
 
 Caused by: java.io.IOException: Type mismatch in key from map: expected 
 org.apache.pig.impl.io.NullableBytesWritable, recieved 
 org.apache.pig.impl.io.NullableText
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
 
 When I run the same script in local mode I get a different result, as we know 
 that local mode does not use any Hadoop Classes.
 $java -cp pig.jar org.apache.pig.Main -x local broken.pig
 
 Schema for u0 unknown
 
 (1,2)
 (1)
 (2,3)
 (2)
 
 (1,2)
 (1)
 (2,3)
 (2)
 
 Here are some questions
 1) Why do we allow union if the schemas do not match
 2) Should we not print an error message/warning so that the user knows that 
 this is not allowed or he can get unexpected results?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1001) Generate more meaningful error message when one input file does not exist

2009-11-05 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1001:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed.

 Generate more meaningful error message when one input file does not exist
 -

 Key: PIG-1001
 URL: https://issues.apache.org/jira/browse/PIG-1001
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1001-1.patch, PIG-1001-2.patch


 In the following query, if 1.txt does not exist, 
 a = load '1.txt';
 b = group a by $0;
 c = group b all;
 dump c;
 Pig throws error message ERROR 2100: file:/tmp/temp155054664/tmp1144108421 
 does not exist., Pig should deal with it with the error message Input file 
 1.txt not exist instead of those confusing messages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: How to clone a logical plan ?

2009-11-05 Thread Ashutosh Chauhan

Thanks, Santhosh for quick response and explaination. Saved few hours of
debugging :)

Ashutosh

On Thu, Nov 5, 2009 at 19:21, Santhosh Srinivasan s...@yahoo-inc.com wrote:

 If my memory serves me correctly, the logical plan cloning was
 implemented (by me) for cloning inner plans for foreach. As such, the
 top level plan cloning was never tested and some items are marked as
 TODO (see visit methods for LOLoad, LOStore and LOStream).

 If you want to use it as you mention in your test cases, then you need
 to add code for cloning the LOLoad, LOStore, LOStream and LOJoin.

 Santhosh


 -Original Message-
 From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com]
 Sent: Thursday, November 05, 2009 4:04 PM
 To: pig-dev@hadoop.apache.org
 Subject: RE: How to clone a logical plan ?

 You have hit a bug. I think LOJoin has to be added to
 LogicalPlanCloneHelper.java. Can you file a jira?

 Thanks,
 Santhosh

 -Original Message-
 From: Ashutosh Chauhan [mailto:ashutosh.chau...@gmail.com]
 Sent: Thursday, November 05, 2009 3:28 PM
 To: pig-dev@hadoop.apache.org
 Subject: How to clone a logical plan ?

 Hi,

 For our cost based optimizer for a given query plan we need to generate
 alternative query plans and evaluate them based on their estimated cost.
 As a result of that, I want to clone a logical plan. I thought
 LogicalPlanCloner is meant for that, but it doesnt seem to work. I added
 this simple test case in TestLogicalPlanBuilder.java

public void testLogicalPlanCloneHelper() throws
 CloneNotSupportedException{
LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load
 'B') by $0;);
LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
cloner.getClonedPlan();
}

 and this fails with the following stacktrace:

 java.lang.NullPointerException
at
 org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
at
 org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
at
 org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
at
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
 va:67)
at
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
 va:69)
at
 org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at
 org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at
 org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(Lo
 gicalPlanCloneHelper.java:73)
at
 org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(Logical
 PlanCloner.java:46)
at
 org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(Te
 stLogicalPlanBuilder.java:2110)

 I am debugging this, but wanted to ask if I have hit a bug here or if I
 am doing something wrong?

 Thanks,
 Ashutosh

[jira] Created: (PIG-1074) Zebra store function should allow '::' in column names in output schema

2009-11-05 Thread Pradeep Kamath (JIRA)

Zebra store function should allow '::' in column names in output schema
---

 Key: PIG-1074
 URL: https://issues.apache.org/jira/browse/PIG-1074
 Project: Pig
  Issue Type: Bug
Reporter: Pradeep Kamath


the following script fails: 

 {noformat}

a = load '/zebra/singlefile/studenttab10k' using 
org.apache.hadoop.zebra.pig.TableLoader() as (name, age, gpa);

b = load '/zebra/singlefile/votertab10k' using 
org.apache.hadoop.zebra.pig.TableLoader() as (name, age, registration, 
contributions);

c = filter a by age  20;

d = filter b by age  20;

store c into '/user/pig/out//ZebraMultiQuery_30.out.1' 
using org.apache.hadoop.zebra.pig.TableStorer('');

store d into '/user/pig/out//ZebraMultiQuery_30.out.2' 
using org.apache.hadoop.zebra.pig.TableStorer('');

e = cogroup c by name, d by name;

f = foreach e generate flatten(c), flatten(d);

store f into '/user/pig//ZebraMultiQuery_30.out.3' 
using org.apache.hadoop.zebra.pig.TableStorer('');

{noformat}
Here the schema of f has names like c::name and it looks like zebra storefunc 
does not allow '::' in column name 

The stack trace is

 

ERROR 2997: Unable to recreate exception from backend error: 
java.io.IOException: ColumnGroup.Writer constructor failed : Partition 
constructor failed :Encountered  : :  at line 1, column 3.

 



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin

2009-11-05 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1073:
--

Attachment: pig-1073.patch

Draft patch with testcase.

 LogicalPlanCloner can't clone plan containing LOJoin
 

 Key: PIG-1073
 URL: https://issues.apache.org/jira/browse/PIG-1073
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
 Attachments: pig-1073.patch


 Add following testcase in LogicalPlanBuilder.java
 public void testLogicalPlanCloner() throws CloneNotSupportedException{
 LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by 
 $0;);
 LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
 cloner.getClonedPlan();
 }
 and this fails with the following stacktrace:
 java.lang.NullPointerException
 at 
 org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
 at 
 org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
 at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
 at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46)
 at 
 org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin

2009-11-05 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned PIG-1073:
-

Assignee: Ashutosh Chauhan

 LogicalPlanCloner can't clone plan containing LOJoin
 

 Key: PIG-1073
 URL: https://issues.apache.org/jira/browse/PIG-1073
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: pig-1073.patch


 Add following testcase in LogicalPlanBuilder.java
 public void testLogicalPlanCloner() throws CloneNotSupportedException{
 LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by 
 $0;);
 LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
 cloner.getClonedPlan();
 }
 and this fails with the following stacktrace:
 java.lang.NullPointerException
 at 
 org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
 at 
 org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
 at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
 at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46)
 at 
 org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RequiredFields contents

2009-11-05 Thread Dmitriy Ryaboy

Hi all,

I am looking at the RequiredFields class and it has this explanation
of what getFields() returns:

/**
 * List of fields required from the input. This includes fields that are
 * transformed, and thus are no longer the same fields. Using the example 'B
 * = foreach A generate $0, $2, $3, udf($1)' would produce the list (0, 0),
 * (0, 2), (0, 3), (0, 1). Note that the order is not guaranteed.
 */


The second element of the pair is self-explanatory -- but what is the
first element in the pair?

Thanks,
-Dmitriy

[jira] Updated: (PIG-1071) Support comma separated file/directory names in load statements

2009-11-05 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1071:


   Resolution: Fixed
Fix Version/s: 0.6.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

+1, patch committed, Thanks Richard!

 Support comma separated file/directory names in load statements
 ---

 Key: PIG-1071
 URL: https://issues.apache.org/jira/browse/PIG-1071
 Project: Pig
  Issue Type: New Feature
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.6.0

 Attachments: PIG-1071.patch, PIG-1071.patch


 Currently Pig Latin support following LOAD syntax:
 {code}
 LOAD 'data' [USING loader function] [AS schema];  
 {code}
 where data is the name of the file or directory, including files specified 
 with Hadoop-supported globing syntax. This name is passed to the loader 
 function.
 This feature is to support loaders that can load multiple files from 
 different directories and allows users to pass in the file names in a comma 
 separated string.
 For example, these will be valid load statements:
 {code}
 LOAD '/usr/pig/test1/a,/usr/pig/test2/b' USING someloader()';
 {code}
 and 
 {code}
 LOAD '/usr/pig/test1/{a,c},/usr/pig/test2/b' USING someloader();
 {code}
 This comma separated string is passed to the loader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1026) [zebra] map split returns null

2009-11-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774214#action_12774214
 ] 

Hadoop QA commented on PIG-1026:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12423738/PIG_1026.patch
  against trunk revision 833266.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 10 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/141/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/141/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/141/console

This message is automatically generated.

 [zebra] map split returns null
 --

 Key: PIG-1026
 URL: https://issues.apache.org/jira/browse/PIG-1026
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Yan Zhou
 Fix For: 0.6.0

 Attachments: PIG_1026.patch


 Here is the test scenario:
  final static String STR_SCHEMA = m1:map(string),m2:map(map(int));
   //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, 
 m2#{z}];[m1];
  final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, 
 m2#{z}];[m1,m2];
 projection: String projection2 = new String(m1#{b}, m2#{x|z});
 User got null pointer exception on reading m1#{b}.
 Yan, please refer to the test class:
 TestNonDefaultWholeMapSplit.java 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: RequiredFields contents

2009-11-05 Thread RichardGUO Fei


Hi,

I am also curious. I studied this and I guess that it could be the input index. 
For example, foreach A generate  Here A's index is 0 in the inputs of the 
operator foreach.

Let me know if I am wrong.

Thanks,
Richard

 Date: Fri, 6 Nov 2009 00:04:36 -0500
 Subject: RequiredFields contents
 From: dvrya...@gmail.com
 To: pig-dev@hadoop.apache.org
 
 Hi all,
 
 I am looking at the RequiredFields class and it has this explanation
 of what getFields() returns:
 
 /**
  * List of fields required from the input. This includes fields that are
  * transformed, and thus are no longer the same fields. Using the example 
 'B
  * = foreach A generate $0, $2, $3, udf($1)' would produce the list (0, 
 0),
  * (0, 2), (0, 3), (0, 1). Note that the order is not guaranteed.
  */
 
 
 The second element of the pair is self-explanatory -- but what is the
 first element in the pair?
 
 Thanks,
 -Dmitriy
  
_
“游日本，拿现金”MClub白领股神大赛火热报名中
http://club.msn.cn/pr/?a=emoney

Re: How to clone a logical plan ?

2009-11-05 Thread Dmitriy Ryaboy

Richard,
The Load/Store redesign proposal has an interface that defines how
stats get represented; a loader that implements ResourceLoader will
pass statistics up into Pig, which will then take care of doing
whatever it needs to do with them. The specifics of how the stats get
loaded in by the loader are up to the implementation of the loader --
they can be read in from a metadata service, sampled on the fly,
stored in a metadata file, etc.

For simplicity, we are working with serialized JSON representations of
ResourceStatistics right now.

-Dmitriy

2009/11/6 RichardGUO Fei gladiato...@hotmail.com:

 Hi


Dmitriy,

 Thanks for sharing. I look forward to seeing your work. I implemented a 
 storage and want to connect Pig to my storage.
 In order to let the optimizer fully benefit from the histogram and the 
 side-information of my storage, I am thinking of
 implementing a cost-based optimizer.

 How do you plan to pass in the statistics? So let's say that your input file 
 is a plain-text log file, do you require the users to
 do a statistics themselves? Or do you plan to limit this to only certain 
 types of storage?

 Thanks,
 Richard

 Date: Thu, 5 Nov 2009 22:54:47 -0500
 Subject: Re: How to clone a logical plan ?
 From: dvrya...@gmail.com
 To: pig-dev@hadoop.apache.org

 At a high level, we are implementing the framework for propagating
 statistics between Pig operators, and using said statistics to make
 moderately intelligent decisions about Join types that should be used
 (unless they are specified by the user).  We do this in a fairly
 brute-force manner, by generating all alternative plans (that part is
 not working so hot right now, see subject) and costing them, choosing
 the global minimum (there is some pruning happening, but not as much
 as something like System R).  As far as relation order inside a given
 Join, we set that deterministically after choosing the join, as Pig
 has specific preferences for where the largest relation should go for
 a given join type.  Once we have join type selection working, other
 optimizations can be added -- the tricky part is making sure the
 costing functions can't produce drastically wrong results.

 All the work is happening at the logical layer, between the rule-based
 optimizer and LogToPhysTranslator.

 -D


 2009/11/5 RichardGUO Fei gladiato...@hotmail.com:
 
  Hi,
 
  I am also doing a cost-based optimizer. So I am interested in knowing some 
  of the specs that you are after.
 
  Thanks,
  Richard
 
  _
  上Windows Live 中国首页，下载Messenger2009安全版！
  http://www.windowslive.cn

 _
 上Windows Live 中国首页，下载Messenger2009安全版！
 http://www.windowslive.cn

RE: RequiredFields contents

2009-11-05 Thread Santhosh Srinivasan

The first element in the pair is the input number. Its mostly 0 for most
operators. For multi-input operators like join and cogroup, it will
range from 0 to (n - 1) where  n is the number of inputs.

Santhosh

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] 
Sent: Thursday, November 05, 2009 9:05 PM
To: pig-dev@hadoop.apache.org
Subject: RequiredFields contents

Hi all,

I am looking at the RequiredFields class and it has this explanation of
what getFields() returns:

/**
 * List of fields required from the input. This includes fields that
are
 * transformed, and thus are no longer the same fields. Using the
example 'B
 * = foreach A generate $0, $2, $3, udf($1)' would produce the list
(0, 0),
 * (0, 2), (0, 3), (0, 1). Note that the order is not guaranteed.
 */

The second element of the pair is self-explanatory -- but what is the
first element in the pair?

Thanks,
-Dmitriy

[jira] Created: (PIG-1075) Error in Cogroup when key fields types don't match

2009-11-05 Thread Ankur (JIRA)

Error in Cogroup when key fields types don't match
--

 Key: PIG-1075
 URL: https://issues.apache.org/jira/browse/PIG-1075
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Ankur


When Cogrouping 2 relations on multiple key fields, pig throws an error if the 
corresponding types don't match. 
Consider the following script:-
A = LOAD 'data' USING PigStorage() as (a:chararray, b:int, c:int);
B = LOAD 'data' USING PigStorage() as (a:chararray, b:chararray, c:int);
C = CoGROUP A BY (a,b,c), B BY (a,b,c);
D = FOREACH C GENERATE FLATTEN(A), FLATTEN(B);
describe D;
dump D;

The complete stack trace of the error thrown is

Pig Stack Trace
---
ERROR 1051: Cannot cast to Unknown

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to 
describe schema for alias D
at org.apache.pig.PigServer.dumpSchema(PigServer.java:436)
at 
org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:233)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:253)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:397)
Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An 
unexpected exception caused the validation to stop
at 
org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
at 
org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:83)
at org.apache.pig.PigServer.compileLp(PigServer.java:821)
at org.apache.pig.PigServer.dumpSchema(PigServer.java:428)
... 6 more
Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
ERROR 1060: Cannot resolve COGroup output schema
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2463)
at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:372)
at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:45)
at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
... 11 more
Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
ERROR 1051: Cannot cast to Unknown
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertAtomicCastForCOGroupInnerPlan(TypeCheckingVisitor.java:2552)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2451)
... 16 more

The error message does not help the user in identifying the issue clearly 
especially if the pig script is large and complex.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1075) Error in Cogroup when key fields types don't match

2009-11-05 Thread Ankur (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774222#action_12774222
 ] 

Ankur commented on PIG-1075:


Pig should throw an error message that better identifies the cause of the 
problem.

 Error in Cogroup when key fields types don't match
 --

 Key: PIG-1075
 URL: https://issues.apache.org/jira/browse/PIG-1075
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Ankur

 When Cogrouping 2 relations on multiple key fields, pig throws an error if 
 the corresponding types don't match. 
 Consider the following script:-
 A = LOAD 'data' USING PigStorage() as (a:chararray, b:int, c:int);
 B = LOAD 'data' USING PigStorage() as (a:chararray, b:chararray, c:int);
 C = CoGROUP A BY (a,b,c), B BY (a,b,c);
 D = FOREACH C GENERATE FLATTEN(A), FLATTEN(B);
 describe D;
 dump D;
 The complete stack trace of the error thrown is
 Pig Stack Trace
 ---
 ERROR 1051: Cannot cast to Unknown
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to 
 describe schema for alias D
 at org.apache.pig.PigServer.dumpSchema(PigServer.java:436)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:233)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:253)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:397)
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An 
 unexpected exception caused the validation to stop
 at 
 org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104)
 at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
 at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
 at 
 org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:83)
 at org.apache.pig.PigServer.compileLp(PigServer.java:821)
 at org.apache.pig.PigServer.dumpSchema(PigServer.java:428)
 ... 6 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 1060: Cannot resolve COGroup output schema
 at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2463)
 at 
 org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:372)
 at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:45)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
 ... 11 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 1051: Cannot cast to Unknown
 at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertAtomicCastForCOGroupInnerPlan(TypeCheckingVisitor.java:2552)
 at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2451)
 ... 16 more
 The error message does not help the user in identifying the issue clearly 
 especially if the pig script is large and complex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1060) MultiQuery optimization throws error for multi-level splits

[jira] Updated: (PIG-1060) MultiQuery optimization throws error for multi-level splits

[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

[jira] Commented: (PIG-1071) Support comma separated file/directory names in load statements

[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

[jira] Updated: (PIG-997) [zebra] Sorted Table Support by Zebra

[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

[jira] Resolved: (PIG-958) Splitting output data on key field

[jira] Updated: (PIG-1071) Support comma separated file/directory names in load statements

[jira] Commented: (PIG-1060) MultiQuery optimization throws error for multi-level splits

How to clone a logical plan ?

[jira] Created: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner

[jira] Commented: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner

RE: How to clone a logical plan ?

[jira] Created: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin

RE: How to clone a logical plan ?

[jira] Commented: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin

[jira] Commented: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin

[jira] Updated: (PIG-1026) [zebra] map split returns null

[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

[jira] Updated: (PIG-1001) Generate more meaningful error message when one input file does not exist

Re: How to clone a logical plan ?

[jira] Created: (PIG-1074) Zebra store function should allow '::' in column names in output schema

[jira] Updated: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin

[jira] Assigned: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin

RequiredFields contents

[jira] Updated: (PIG-1071) Support comma separated file/directory names in load statements

[jira] Commented: (PIG-1026) [zebra] map split returns null

RE: RequiredFields contents

Re: How to clone a logical plan ?

RE: RequiredFields contents

[jira] Created: (PIG-1075) Error in Cogroup when key fields types don't match

[jira] Commented: (PIG-1075) Error in Cogroup when key fields types don't match

33 matches

Site Navigation

Mail list logo

Footer information