[jira] Updated: (PIG-1427) Monitor and kill runaway UDFs

2010-06-22 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-1427:
---

Status: Patch Available  (was: Open)

 Monitor and kill runaway UDFs
 -

 Key: PIG-1427
 URL: https://issues.apache.org/jira/browse/PIG-1427
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Attachments: guava-r03.jar, monitoredUdf.patch, monitoredUdf.patch, 
 PIG-1427.diff, PIG-1427.diff, PIG-1427.diff


 As a safety measure, it is sometimes useful to monitor UDFs as they execute. 
 It is often preferable to return null or some other default value instead of 
 timing out a runaway evaluation and killing a job. We have in the past seen 
 complex regular expressions lead to job failures due to just half a dozen 
 (out of millions) particularly obnoxious strings.
 It would be great to give Pig users a lightweight way of enabling UDF 
 monitoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1427) Monitor and kill runaway UDFs

2010-06-22 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-1427:
---

Attachment: PIG-1427.diff

Final version of the patch.

 Monitor and kill runaway UDFs
 -

 Key: PIG-1427
 URL: https://issues.apache.org/jira/browse/PIG-1427
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: guava-r03.jar, monitoredUdf.patch, monitoredUdf.patch, 
 PIG-1427.diff, PIG-1427.diff, PIG-1427.diff


 As a safety measure, it is sometimes useful to monitor UDFs as they execute. 
 It is often preferable to return null or some other default value instead of 
 timing out a runaway evaluation and killing a job. We have in the past seen 
 complex regular expressions lead to job failures due to just half a dozen 
 (out of millions) particularly obnoxious strings.
 It would be great to give Pig users a lightweight way of enabling UDF 
 monitoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1427) Monitor and kill runaway UDFs

2010-06-22 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-1427:
---

   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.8.0
   Resolution: Fixed

Committed. 

 Monitor and kill runaway UDFs
 -

 Key: PIG-1427
 URL: https://issues.apache.org/jira/browse/PIG-1427
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: guava-r03.jar, monitoredUdf.patch, monitoredUdf.patch, 
 PIG-1427.diff, PIG-1427.diff, PIG-1427.diff


 As a safety measure, it is sometimes useful to monitor UDFs as they execute. 
 It is often preferable to return null or some other default value instead of 
 timing out a runaway evaluation and killing a job. We have in the past seen 
 complex regular expressions lead to job failures due to just half a dozen 
 (out of millions) particularly obnoxious strings.
 It would be great to give Pig users a lightweight way of enabling UDF 
 monitoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-506) Does pig need a NATIVE keyword?

2010-06-22 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881263#action_12881263
 ] 

Aniket Mokashi commented on PIG-506:


Revised syntax -- as from the proposal document (+ few changes)
{code}
B = native ('mymr.jar' [, 'other.jar' ...]) A store into 'storeLocation' using 
storeFunc load 'loadLocation' using loadFunc;
{code}
mymr.jar contains the MR code the user wants to run.
storeLocation is location the user's code expects to find the data.
storeFunc is the storage function Pig will use to store the data (from A)
loadLocation is where user's code will write the result data
loadFunc is the load function Pig will use to reload the data (into B)
other,jar contains jars to be shipped like InputFormat, OutputFormat for custom 
handling of mapreduce jobs.

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor

 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-506) Does pig need a NATIVE keyword?

2010-06-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-506:
--

Assignee: Aniket Mokashi  (was: Alan Gates)

 Does pig need a NATIVE keyword?
 ---

 Key: PIG-506
 URL: https://issues.apache.org/jira/browse/PIG-506
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Alan Gates
Assignee: Aniket Mokashi
Priority: Minor

 Assume a user had a job that broke easily into three pieces.  Further assume 
 that pieces one and three were easily expressible in pig, but that piece two 
 needed to be written in map reduce for whatever reason (performance, 
 something that pig could not easily express, legacy job that was too 
 important to change, etc.).  Today the user would either have to use map 
 reduce for the entire job or manually handle the stitching together of pig 
 and map reduce jobs.  What if instead pig provided a NATIVE keyword that 
 would allow the script to pass off the data stream to the underlying system 
 (in this case map reduce).  The semantics of NATIVE would vary by underlying 
 system.  In the map reduce case, we would assume that this indicated a 
 collection of one or more fully contained map reduce jobs, so that pig would 
 store the data, invoke the map reduce jobs, and then read the resulting data 
 to continue.  It might look something like this:
 {code}
 A = load 'myfile';
 X = load 'myotherfile';
 B = group A by $0;
 C = foreach B generate group, myudf(B);
 D = native (jar=mymr.jar, infile=frompig outfile=topig);
 E = join D by $0, X by $0;
 ...
 {code}
 This differs from streaming in that it allows the user to insert an arbitrary 
 amount of native processing, whereas streaming allows the insertion of one 
 binary.  It also differs in that, for streaming, data is piped directly into 
 and out of the binary as part of the pig pipeline.  Here the pipeline would 
 be broken, data written to disk, and the native block invoked, then data read 
 back from disk.
 Another alternative is to say this is unnecessary because the user can do the 
 coordination from java, using the PIgServer interface to run pig and calling 
 the map reduce job explicitly.  The advantages of the native keyword are that 
 the user need not be worried about coordination between the jobs, pig will 
 take care of it.  Also the user can make use of existing java applications 
 without being a java programmer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1461) support union operation that merges based on column names

2010-06-22 Thread Thejas M Nair (JIRA)
support union operation that merges based on column names
-

 Key: PIG-1461
 URL: https://issues.apache.org/jira/browse/PIG-1461
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.8.0
Reporter: Thejas M Nair
 Fix For: 0.8.0


When the data has schema, it often makes sense to union on column names in 
schema rather than the position of the columns. 
The behavior of existing union operator should remain backward compatible .

This feature can be supported using either a new operator or extending union to 
support 'using' clause . I am thinking of having a new operator called either 
unionschema or merge . Does anybody have any other suggestions for the syntax ?

example -

L1 = load 'x' as (a,b);
L2 = load 'y' as (b,c);
U = unionschema L1, L2;

describe U;
U: {a:bytearray, b:byetarray, c:bytearray}



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1461) support union operation that merges based on column names

2010-06-22 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881420#action_12881420
 ] 

Thejas M Nair commented on PIG-1461:


This operator will throw an error if the schema for any of the input relations 
is undefined.

Users often need to lookup  the source relation downstream after the 
'unionschema' operation. It will be convenient to project an additional pseudo 
column whose value is the name of the input relation.
ie, the schema of U in description becomes - U : {a:bytearray, b:bytearray, 
c:bytearray, source_relation : chararray } 

This feature does not enable a user to do something that was not possible 
earlier, it just makes the code more easy to maintain - you don't have to 
change the pig query if you have new columns .
The same results can be obtained using existing pig syntax as shown following 
query -

L1 = load 'x' as (a,b);
L2 = load 'y' as (b,c);
F1 = foreach L1 generate a, b, null as c, source_relation as 'F1';
F2 = foreach L1 generate null as a, b, c, source_relation as 'F2';
U = union F1, F2;

Note that, in this query if L1 or L2 schema changes, you will need to change F1 
or F2 . 



 support union operation that merges based on column names
 -

 Key: PIG-1461
 URL: https://issues.apache.org/jira/browse/PIG-1461
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.8.0
Reporter: Thejas M Nair
 Fix For: 0.8.0


 When the data has schema, it often makes sense to union on column names in 
 schema rather than the position of the columns. 
 The behavior of existing union operator should remain backward compatible .
 This feature can be supported using either a new operator or extending union 
 to support 'using' clause . I am thinking of having a new operator called 
 either unionschema or merge . Does anybody have any other suggestions for the 
 syntax ?
 example -
 L1 = load 'x' as (a,b);
 L2 = load 'y' as (b,c);
 U = unionschema L1, L2;
 describe U;
 U: {a:bytearray, b:byetarray, c:bytearray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1333) API interface to Pig

2010-06-22 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1333:
--

Status: Open  (was: Patch Available)

 API interface to Pig
 

 Key: PIG-1333
 URL: https://issues.apache.org/jira/browse/PIG-1333
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1333.patch, PIG-1333_1.patch, PIG-1333_2.patch


 It would be nice to make Pig more friendly for applications like workflow 
 that would be executing pig scripts on user behalf.
 Currently, they would have to use pig command line to execute the code; 
 however, this has limitation on the kind of output that would be delivered. 
 For instance, it is hard to produce error information that is easy to use 
 programatically or collect statistics.
 The proposal is to create a class that mimics the behavior of the Main but 
 gives users a status object back. The the main code of pig would look 
 somethig like:
 public static void main(String args[])
 {
 PigStatus ps = PigMain.exec(args);
 exit (PigStatus.rc);
 }
 We need to define the following:
 - Content of PigStatus. It should at least include
* return code
* error string
* exception 
* statistics
 - A way to propagate the status class through pig code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1333) API interface to Pig

2010-06-22 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1333:
--

Attachment: PIG-1333_3.patch

New patch to address the review comments.

 API interface to Pig
 

 Key: PIG-1333
 URL: https://issues.apache.org/jira/browse/PIG-1333
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1333.patch, PIG-1333_1.patch, PIG-1333_2.patch, 
 PIG-1333_3.patch


 It would be nice to make Pig more friendly for applications like workflow 
 that would be executing pig scripts on user behalf.
 Currently, they would have to use pig command line to execute the code; 
 however, this has limitation on the kind of output that would be delivered. 
 For instance, it is hard to produce error information that is easy to use 
 programatically or collect statistics.
 The proposal is to create a class that mimics the behavior of the Main but 
 gives users a status object back. The the main code of pig would look 
 somethig like:
 public static void main(String args[])
 {
 PigStatus ps = PigMain.exec(args);
 exit (PigStatus.rc);
 }
 We need to define the following:
 - Content of PigStatus. It should at least include
* return code
* error string
* exception 
* statistics
 - A way to propagate the status class through pig code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1333) API interface to Pig

2010-06-22 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1333:
--

Status: Patch Available  (was: Open)

 API interface to Pig
 

 Key: PIG-1333
 URL: https://issues.apache.org/jira/browse/PIG-1333
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1333.patch, PIG-1333_1.patch, PIG-1333_2.patch, 
 PIG-1333_3.patch


 It would be nice to make Pig more friendly for applications like workflow 
 that would be executing pig scripts on user behalf.
 Currently, they would have to use pig command line to execute the code; 
 however, this has limitation on the kind of output that would be delivered. 
 For instance, it is hard to produce error information that is easy to use 
 programatically or collect statistics.
 The proposal is to create a class that mimics the behavior of the Main but 
 gives users a status object back. The the main code of pig would look 
 somethig like:
 public static void main(String args[])
 {
 PigStatus ps = PigMain.exec(args);
 exit (PigStatus.rc);
 }
 We need to define the following:
 - Content of PigStatus. It should at least include
* return code
* error string
* exception 
* statistics
 - A way to propagate the status class through pig code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1333) API interface to Pig

2010-06-22 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881470#action_12881470
 ] 

Richard Ding commented on PIG-1333:
---

A few comments about the comments:

bq. Why is MAX_SCRIPT_SIZE so short? Is it an arbitrary number? We produce 
significantly longer scrips all the time. I assume it's to save on space; 
perhaps you can make it controllable by some property?

This is a compromise. Next release of Hadoop will completely fix this problem 
(no limit on the length of the scripts). Until then, we won't allow users to 
change this setting and inadvertently affect Hadoop performance. 

bq. I am not very familiar with the visitors, but it looks like normally, 
PhyPlanVisitor spawns a walker for the internal plans of Filter, 
CollectedGroup, and so on; this behavior appears to be gone from AliasVisitor. 
Should it be reproduced in AliasVisitor?

I think the top-level aliases are enough to identify the operators. No need to 
use alias in the inner plans.

bq. Just a style thing, but I prefer writing setter methods that return self 
instead of being void - that way you can chain them together.

I agree with you on this. But I also want to be consistent with the style used 
through out Pig. So I didn't change the setters. 



 API interface to Pig
 

 Key: PIG-1333
 URL: https://issues.apache.org/jira/browse/PIG-1333
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1333.patch, PIG-1333_1.patch, PIG-1333_2.patch, 
 PIG-1333_3.patch


 It would be nice to make Pig more friendly for applications like workflow 
 that would be executing pig scripts on user behalf.
 Currently, they would have to use pig command line to execute the code; 
 however, this has limitation on the kind of output that would be delivered. 
 For instance, it is hard to produce error information that is easy to use 
 programatically or collect statistics.
 The proposal is to create a class that mimics the behavior of the Main but 
 gives users a status object back. The the main code of pig would look 
 somethig like:
 public static void main(String args[])
 {
 PigStatus ps = PigMain.exec(args);
 exit (PigStatus.rc);
 }
 We need to define the following:
 - Content of PigStatus. It should at least include
* return code
* error string
* exception 
* statistics
 - A way to propagate the status class through pig code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1333) API interface to Pig

2010-06-22 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881513#action_12881513
 ] 

Dmitriy V. Ryaboy commented on PIG-1333:


+1

 API interface to Pig
 

 Key: PIG-1333
 URL: https://issues.apache.org/jira/browse/PIG-1333
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1333.patch, PIG-1333_1.patch, PIG-1333_2.patch, 
 PIG-1333_3.patch


 It would be nice to make Pig more friendly for applications like workflow 
 that would be executing pig scripts on user behalf.
 Currently, they would have to use pig command line to execute the code; 
 however, this has limitation on the kind of output that would be delivered. 
 For instance, it is hard to produce error information that is easy to use 
 programatically or collect statistics.
 The proposal is to create a class that mimics the behavior of the Main but 
 gives users a status object back. The the main code of pig would look 
 somethig like:
 public static void main(String args[])
 {
 PigStatus ps = PigMain.exec(args);
 exit (PigStatus.rc);
 }
 We need to define the following:
 - Content of PigStatus. It should at least include
* return code
* error string
* exception 
* statistics
 - A way to propagate the status class through pig code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-06-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1405:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
Release Note: javadoc warning is fixed. Patch committed. Thanks Aniket!
  Resolution: Fixed

 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, 
 StandardUDFtoPig4.patch, StandardUDFtoPigFinale.patch


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1462) No informative error message on parse problem

2010-06-22 Thread Ankur (JIRA)
No informative error message on parse problem
-

 Key: PIG-1462
 URL: https://issues.apache.org/jira/browse/PIG-1462
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ankur


Consider the following script

in = load 'data' using PigStorage() as (m:map[]);
tags = foreach in generate m#'k1' as (tagtuple: tuple(chararray));
dump tags;

This throws the following error message that does not really say that this is a 
bad declaration

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
parsing. Encountered  at line 2, column 38.
Was expecting one of:

at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:391)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1462) No informative error message on parse problem

2010-06-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881550#action_12881550
 ] 

Ashutosh Chauhan commented on PIG-1462:
---

This has come up before. As noted on PIG-798 correct way to achieve this is
{code}
grunt in = load 'data' using PigStorage() as (m:map[]); 
grunt tags = foreach in generate (tuple(chararray)) m#'k1' as tagtuple;

grunt dump tags;

{code}
 
We probably need to add a note about casting in cookbook. Also, need to 
generate better error message.

 No informative error message on parse problem
 -

 Key: PIG-1462
 URL: https://issues.apache.org/jira/browse/PIG-1462
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ankur

 Consider the following script
 in = load 'data' using PigStorage() as (m:map[]);
 tags = foreach in generate m#'k1' as (tagtuple: tuple(chararray));
 dump tags;
 This throws the following error message that does not really say that this is 
 a bad declaration
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. Encountered  at line 2, column 38.
 Was expecting one of:
 
   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
   at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
   at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
   at org.apache.pig.Main.main(Main.java:391)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1462) No informative error message on parse problem

2010-06-22 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881551#action_12881551
 ] 

Ankur commented on PIG-1462:


Right, the JIRA is for adding a better error message that doesn't leave a user 
guessing

 No informative error message on parse problem
 -

 Key: PIG-1462
 URL: https://issues.apache.org/jira/browse/PIG-1462
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ankur

 Consider the following script
 in = load 'data' using PigStorage() as (m:map[]);
 tags = foreach in generate m#'k1' as (tagtuple: tuple(chararray));
 dump tags;
 This throws the following error message that does not really say that this is 
 a bad declaration
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. Encountered  at line 2, column 38.
 Was expecting one of:
 
   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
   at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
   at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
   at org.apache.pig.Main.main(Main.java:391)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.