[jira] Created: (PIG-947) Parsing Bags by PigStorage is not handled correctly if whitespace before start of tuple.

2009-09-07 Thread Gandul Azul (JIRA)
Parsing Bags by PigStorage is not handled correctly if whitespace before start 
of tuple.


 Key: PIG-947
 URL: https://issues.apache.org/jira/browse/PIG-947
 Project: Pig
  Issue Type: Bug
  Components: data
 Environment: Pig on Hadoop 18
Reporter: Gandul Azul


PigStorage parser for bags is not working correctly when a tuple in a bag is 
proceeded by a space. For example, the following is parsed correctly:

{(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}

while this is not: (Note the space before the second tuple)
{(-5.243084,3.142401,0.000138,2.071200,0), 
(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}

It seems that the parser when it encounters the space, treats the rest of the 
line as a String. With a schema, this results in a typecast of string to 
databag which results in exception. Accordingly, because of this, when using 
pigstorage to output a bag, it cannot be loaded using pigstorage because of 
this inconsistency.

|WARN builtin.PigStorage: Unable to interpret value [...@2c9b42e6 in field 
being converted to type bag, caught ParseException Encountered  STRING   
 at |line 1, column 43.
|Was expecting:
|( ...
| field discarded


Below is the parser debug output for the parsing of the above error sequence: 
2.071200,0), ( from above...

** FOUND A DOUBLENUMBER MATCH (2.071200) **

  Call:   AtomDatum
Consumed token: DOUBLENUMBER: 2.071200 at line 1 column 31
  Return: AtomDatum
Return: Datum
   Matched the empty string as STRING token.
Current character : , (44) at line 1 column 39
   No more string literal token matches are possible.
   Currently matched the first 1 characters as a , token.
** FOUND A , MATCH (,) **

Consumed token: , at line 1 column 39
Call:   Datum
   Matched the empty string as STRING token.
Current character : 0 (48) at line 1 column 40
   No string literal matches possible.
   Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER }
Current character : 0 (48) at line 1 column 40
   Currently matched the first 1 characters as a SIGNEDINTEGER token.
   Possible kinds of longer matches : { STRING, SIGNEDINTEGER, 
DOUBLENUMBER, LONGINTEGER, 
 FLOATNUMBER }
Current character : ) (41) at line 1 column 41
   Currently matched the first 1 characters as a SIGNEDINTEGER token.
   Putting back 1 characters into the input stream.
** FOUND A SIGNEDINTEGER MATCH (0) **

  Call:   AtomDatum
Consumed token: SIGNEDINTEGER: 0 at line 1 column 40
  Return: AtomDatum
Return: Datum
   Matched the empty string as STRING token.
Current character : ) (41) at line 1 column 41
   No more string literal token matches are possible.
   Currently matched the first 1 characters as a ) token.
** FOUND A ) MATCH ()) **

  Return: Tuple
  Consumed token: ) at line 1 column 41
   Matched the empty string as STRING token.
Current character : , (44) at line 1 column 42
   No more string literal token matches are possible.
   Currently matched the first 1 characters as a , token.
** FOUND A , MATCH (,) **

  Consumed token: , at line 1 column 42
   Matched the empty string as STRING token.
Current character :   (32) at line 1 column 43
   No string literal matches possible.
   Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER }
Current character :   (32) at line 1 column 43
   Currently matched the first 1 characters as a STRING token.
   Possible kinds of longer matches : { STRING, SIGNEDINTEGER, 
DOUBLENUMBER }
Current character : ( (40) at line 1 column 44
   Currently matched the first 1 characters as a STRING token.
   Putting back 1 characters into the input stream.
** FOUND A STRING MATCH ( ) **

Return: Bag
  Return: Datum
Return: Parse



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-948) [Usability] Relating pig script with MR jobs

2009-09-07 Thread Ashutosh Chauhan (JIRA)
[Usability] Relating pig script with MR jobs


 Key: PIG-948
 URL: https://issues.apache.org/jira/browse/PIG-948
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor


Currently its hard to find a way to relate pig script with specific MR job. In 
a loaded cluster with multiple simultaneous job submissions, its not easy to 
figure out which specific MR jobs were launched for a given pig script. If Pig 
can provide this info, it will be useful to debug and monitor the jobs 
resulting from a pig script.

At the very least, Pig should be able to provide user the following information
1) Job id of the launched job.
2) Complete web url of jobtracker running this job. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-947) Parsing Bags by PigStorage is not handled correctly if whitespace before start of tuple.

2009-09-07 Thread Gandul Azul (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gandul Azul updated PIG-947:


Description: 
PigStorage parser for bags is not working correctly when a tuple in a bag is 
proceeded by a space. For example, the following is parsed correctly:

{(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}

while this is not: (Note the space before the second tuple)
{(-5.243084,3.142401,0.000138,2.071200,0), 
(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}

It seems that the parser when it encounters the space, treats the rest of the 
line as a String. With a schema, this results in a typecast of string to 
databag which results in exception. 

|WARN builtin.PigStorage: Unable to interpret value [...@2c9b42e6 in field 
being converted to type bag, caught ParseException Encountered  STRING   
 at |line 1, column 43.
|Was expecting:
|( ...
| field discarded


Below is the parser debug output for the parsing of the above error sequence: 
2.071200,0), ( from above...

** FOUND A DOUBLENUMBER MATCH (2.071200) **

  Call:   AtomDatum
Consumed token: DOUBLENUMBER: 2.071200 at line 1 column 31
  Return: AtomDatum
Return: Datum
   Matched the empty string as STRING token.
Current character : , (44) at line 1 column 39
   No more string literal token matches are possible.
   Currently matched the first 1 characters as a , token.
** FOUND A , MATCH (,) **

Consumed token: , at line 1 column 39
Call:   Datum
   Matched the empty string as STRING token.
Current character : 0 (48) at line 1 column 40
   No string literal matches possible.
   Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER }
Current character : 0 (48) at line 1 column 40
   Currently matched the first 1 characters as a SIGNEDINTEGER token.
   Possible kinds of longer matches : { STRING, SIGNEDINTEGER, 
DOUBLENUMBER, LONGINTEGER, 
 FLOATNUMBER }
Current character : ) (41) at line 1 column 41
   Currently matched the first 1 characters as a SIGNEDINTEGER token.
   Putting back 1 characters into the input stream.
** FOUND A SIGNEDINTEGER MATCH (0) **

  Call:   AtomDatum
Consumed token: SIGNEDINTEGER: 0 at line 1 column 40
  Return: AtomDatum
Return: Datum
   Matched the empty string as STRING token.
Current character : ) (41) at line 1 column 41
   No more string literal token matches are possible.
   Currently matched the first 1 characters as a ) token.
** FOUND A ) MATCH ()) **

  Return: Tuple
  Consumed token: ) at line 1 column 41
   Matched the empty string as STRING token.
Current character : , (44) at line 1 column 42
   No more string literal token matches are possible.
   Currently matched the first 1 characters as a , token.
** FOUND A , MATCH (,) **

  Consumed token: , at line 1 column 42
   Matched the empty string as STRING token.
Current character :   (32) at line 1 column 43
   No string literal matches possible.
   Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER }
Current character :   (32) at line 1 column 43
   Currently matched the first 1 characters as a STRING token.
   Possible kinds of longer matches : { STRING, SIGNEDINTEGER, 
DOUBLENUMBER }
Current character : ( (40) at line 1 column 44
   Currently matched the first 1 characters as a STRING token.
   Putting back 1 characters into the input stream.
** FOUND A STRING MATCH ( ) **

Return: Bag
  Return: Datum
Return: Parse



  was:
PigStorage parser for bags is not working correctly when a tuple in a bag is 
proceeded by a space. For example, the following is parsed correctly:

{(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}

while this is not: (Note the space before the second tuple)
{(-5.243084,3.142401,0.000138,2.071200,0), 
(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}

It seems that the parser when it encounters the space, treats the rest of the 
line as a String. With a schema, this results in a typecast of string to 
databag which results in exception. Accordingly, because of this, when using 
pigstorage to output a bag, it cannot be loaded using pigstorage because of 
this inconsistency.

|WARN builtin.PigStorage: Unable to interpret value [...@2c9b42e6 in field 
being converted to type bag, caught ParseException Encountered  STRING   
 at |line 1, column 43.
|Was expecting:
|( ...
| field discarded


Below is the parser debug output for the parsing of the above error sequence: 
2.071200,0), ( from above...

** FOUND A DOUBLENUMBER MATCH (2.071200) **

  Call:   AtomDatum
Consumed token: DOUBLENUMBER: 2.071200 at 

[jira] Updated: (PIG-948) [Usability] Relating pig script with MR jobs

2009-09-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-948:
-

Attachment: pig-948.patch

Attached is a patch which prints following information on grunt shell :

{code}
09/09/07 15:11:48 INFO mapReduceLayer.MapReduceLauncher: Submitting job: 
job_200908291847_0046 to execution engine.
09/09/07 15:11:48 INFO mapReduceLayer.MapReduceLauncher: More information at: 
http://www.jobtracker-site:50030/jobdetails.jsp?jobid=job_200908291847_0046
09/09/07 15:11:48 INFO mapReduceLayer.MapReduceLauncher: To kill this job, use: 
kill job_200908291847_0046
{code}

 [Usability] Relating pig script with MR jobs
 

 Key: PIG-948
 URL: https://issues.apache.org/jira/browse/PIG-948
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: pig-948.patch


 Currently its hard to find a way to relate pig script with specific MR job. 
 In a loaded cluster with multiple simultaneous job submissions, its not easy 
 to figure out which specific MR jobs were launched for a given pig script. If 
 Pig can provide this info, it will be useful to debug and monitor the jobs 
 resulting from a pig script.
 At the very least, Pig should be able to provide user the following 
 information
 1) Job id of the launched job.
 2) Complete web url of jobtracker running this job. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.

2009-09-07 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan reassigned PIG-939:
--

Assignee: Giridharan Kesavan

 Checkstyle pulls in junit3.7 which causes the build of test code to fail.
 -

 Key: PIG-939
 URL: https://issues.apache.org/jira/browse/PIG-939
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.3.0
Reporter: Lee Tucker
Assignee: Giridharan Kesavan
 Attachments: pig-939.patch


 Pig fails to compile if you execute: 
 ant -Dassociated flags for various components clean findbugs checkstyle 
 test 
 It gets the error:
 [javac] Compiling 153 source files to 
 /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes
 [javac] 
 /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31:
  cannot find symbol
 [javac] symbol  : constructor TestCase()
 [javac] location: class junit.framework.TestCase
 [javac] public abstract class PigExecTestCase extends TestCase {
 [javac] ^
 Once that's done, there's a copy of junit 3.7 cached from ivy that will 
 continue to cause the build to fail.  It will succeed, if you remove it, and 
 then do:
 ant -Dassociated flags for various components clean findbugs test
 This proves it's running checkstyle that pulls in junit 3.7

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.