[jira] Updated: (PIG-947) Parsing Bags by PigStorage is not handled correctly if whitespace before start of tuple.

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-947:
---

Fix Version/s: (was: 0.8.0)

I don't think anybody is signed up for this issue. Please, relink to the 
release if you are interested to work on it and assign to yourself.

 Parsing Bags by PigStorage is not handled correctly if whitespace before 
 start of tuple.
 

 Key: PIG-947
 URL: https://issues.apache.org/jira/browse/PIG-947
 Project: Pig
  Issue Type: Bug
  Components: data
 Environment: Pig on Hadoop 18
Reporter: Gandul Azul

 PigStorage parser for bags is not working correctly when a tuple in a bag is 
 proceeded by a space. For example, the following is parsed correctly:
 {(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}
 while this is not: (Note the space before the second tuple)
 {(-5.243084,3.142401,0.000138,2.071200,0), 
 (-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}
 It seems that the parser when it encounters the space, treats the rest of the 
 line as a String. With a schema, this results in a typecast of string to 
 databag which results in exception. 
 |WARN builtin.PigStorage: Unable to interpret value [...@2c9b42e6 in field 
 being converted to type bag, caught ParseException Encountered  STRING   
  at |line 1, column 43.
 |Was expecting:
 |( ...
 | field discarded
 Below is the parser debug output for the parsing of the above error sequence: 
 2.071200,0), ( from above...
 ** FOUND A DOUBLENUMBER MATCH (2.071200) **
   Call:   AtomDatum
 Consumed token: DOUBLENUMBER: 2.071200 at line 1 column 31
   Return: AtomDatum
 Return: Datum
Matched the empty string as STRING token.
 Current character : , (44) at line 1 column 39
No more string literal token matches are possible.
Currently matched the first 1 characters as a , token.
 ** FOUND A , MATCH (,) **
 Consumed token: , at line 1 column 39
 Call:   Datum
Matched the empty string as STRING token.
 Current character : 0 (48) at line 1 column 40
No string literal matches possible.
Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER 
 }
 Current character : 0 (48) at line 1 column 40
Currently matched the first 1 characters as a SIGNEDINTEGER token.
Possible kinds of longer matches : { STRING, SIGNEDINTEGER, 
 DOUBLENUMBER, LONGINTEGER, 
  FLOATNUMBER }
 Current character : ) (41) at line 1 column 41
Currently matched the first 1 characters as a SIGNEDINTEGER token.
Putting back 1 characters into the input stream.
 ** FOUND A SIGNEDINTEGER MATCH (0) **
   Call:   AtomDatum
 Consumed token: SIGNEDINTEGER: 0 at line 1 column 40
   Return: AtomDatum
 Return: Datum
Matched the empty string as STRING token.
 Current character : ) (41) at line 1 column 41
No more string literal token matches are possible.
Currently matched the first 1 characters as a ) token.
 ** FOUND A ) MATCH ()) **
   Return: Tuple
   Consumed token: ) at line 1 column 41
Matched the empty string as STRING token.
 Current character : , (44) at line 1 column 42
No more string literal token matches are possible.
Currently matched the first 1 characters as a , token.
 ** FOUND A , MATCH (,) **
   Consumed token: , at line 1 column 42
Matched the empty string as STRING token.
 Current character :   (32) at line 1 column 43
No string literal matches possible.
Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER 
 }
 Current character :   (32) at line 1 column 43
Currently matched the first 1 characters as a STRING token.
Possible kinds of longer matches : { STRING, SIGNEDINTEGER, 
 DOUBLENUMBER }
 Current character : ( (40) at line 1 column 44
Currently matched the first 1 characters as a STRING token.
Putting back 1 characters into the input stream.
 ** FOUND A STRING MATCH ( ) **
 Return: Bag
   Return: Datum
 Return: Parse

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-947) Parsing Bags by PigStorage is not handled correctly if whitespace before start of tuple.

2010-07-12 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-947:
---

Fix Version/s: 0.8.0

 Parsing Bags by PigStorage is not handled correctly if whitespace before 
 start of tuple.
 

 Key: PIG-947
 URL: https://issues.apache.org/jira/browse/PIG-947
 Project: Pig
  Issue Type: Bug
  Components: data
 Environment: Pig on Hadoop 18
Reporter: Gandul Azul
 Fix For: 0.8.0


 PigStorage parser for bags is not working correctly when a tuple in a bag is 
 proceeded by a space. For example, the following is parsed correctly:
 {(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}
 while this is not: (Note the space before the second tuple)
 {(-5.243084,3.142401,0.000138,2.071200,0), 
 (-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}
 It seems that the parser when it encounters the space, treats the rest of the 
 line as a String. With a schema, this results in a typecast of string to 
 databag which results in exception. 
 |WARN builtin.PigStorage: Unable to interpret value [...@2c9b42e6 in field 
 being converted to type bag, caught ParseException Encountered  STRING   
  at |line 1, column 43.
 |Was expecting:
 |( ...
 | field discarded
 Below is the parser debug output for the parsing of the above error sequence: 
 2.071200,0), ( from above...
 ** FOUND A DOUBLENUMBER MATCH (2.071200) **
   Call:   AtomDatum
 Consumed token: DOUBLENUMBER: 2.071200 at line 1 column 31
   Return: AtomDatum
 Return: Datum
Matched the empty string as STRING token.
 Current character : , (44) at line 1 column 39
No more string literal token matches are possible.
Currently matched the first 1 characters as a , token.
 ** FOUND A , MATCH (,) **
 Consumed token: , at line 1 column 39
 Call:   Datum
Matched the empty string as STRING token.
 Current character : 0 (48) at line 1 column 40
No string literal matches possible.
Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER 
 }
 Current character : 0 (48) at line 1 column 40
Currently matched the first 1 characters as a SIGNEDINTEGER token.
Possible kinds of longer matches : { STRING, SIGNEDINTEGER, 
 DOUBLENUMBER, LONGINTEGER, 
  FLOATNUMBER }
 Current character : ) (41) at line 1 column 41
Currently matched the first 1 characters as a SIGNEDINTEGER token.
Putting back 1 characters into the input stream.
 ** FOUND A SIGNEDINTEGER MATCH (0) **
   Call:   AtomDatum
 Consumed token: SIGNEDINTEGER: 0 at line 1 column 40
   Return: AtomDatum
 Return: Datum
Matched the empty string as STRING token.
 Current character : ) (41) at line 1 column 41
No more string literal token matches are possible.
Currently matched the first 1 characters as a ) token.
 ** FOUND A ) MATCH ()) **
   Return: Tuple
   Consumed token: ) at line 1 column 41
Matched the empty string as STRING token.
 Current character : , (44) at line 1 column 42
No more string literal token matches are possible.
Currently matched the first 1 characters as a , token.
 ** FOUND A , MATCH (,) **
   Consumed token: , at line 1 column 42
Matched the empty string as STRING token.
 Current character :   (32) at line 1 column 43
No string literal matches possible.
Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER 
 }
 Current character :   (32) at line 1 column 43
Currently matched the first 1 characters as a STRING token.
Possible kinds of longer matches : { STRING, SIGNEDINTEGER, 
 DOUBLENUMBER }
 Current character : ( (40) at line 1 column 44
Currently matched the first 1 characters as a STRING token.
Putting back 1 characters into the input stream.
 ** FOUND A STRING MATCH ( ) **
 Return: Bag
   Return: Datum
 Return: Parse

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-947) Parsing Bags by PigStorage is not handled correctly if whitespace before start of tuple.

2009-09-07 Thread Gandul Azul (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gandul Azul updated PIG-947:


Description: 
PigStorage parser for bags is not working correctly when a tuple in a bag is 
proceeded by a space. For example, the following is parsed correctly:

{(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}

while this is not: (Note the space before the second tuple)
{(-5.243084,3.142401,0.000138,2.071200,0), 
(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}

It seems that the parser when it encounters the space, treats the rest of the 
line as a String. With a schema, this results in a typecast of string to 
databag which results in exception. 

|WARN builtin.PigStorage: Unable to interpret value [...@2c9b42e6 in field 
being converted to type bag, caught ParseException Encountered  STRING   
 at |line 1, column 43.
|Was expecting:
|( ...
| field discarded


Below is the parser debug output for the parsing of the above error sequence: 
2.071200,0), ( from above...

** FOUND A DOUBLENUMBER MATCH (2.071200) **

  Call:   AtomDatum
Consumed token: DOUBLENUMBER: 2.071200 at line 1 column 31
  Return: AtomDatum
Return: Datum
   Matched the empty string as STRING token.
Current character : , (44) at line 1 column 39
   No more string literal token matches are possible.
   Currently matched the first 1 characters as a , token.
** FOUND A , MATCH (,) **

Consumed token: , at line 1 column 39
Call:   Datum
   Matched the empty string as STRING token.
Current character : 0 (48) at line 1 column 40
   No string literal matches possible.
   Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER }
Current character : 0 (48) at line 1 column 40
   Currently matched the first 1 characters as a SIGNEDINTEGER token.
   Possible kinds of longer matches : { STRING, SIGNEDINTEGER, 
DOUBLENUMBER, LONGINTEGER, 
 FLOATNUMBER }
Current character : ) (41) at line 1 column 41
   Currently matched the first 1 characters as a SIGNEDINTEGER token.
   Putting back 1 characters into the input stream.
** FOUND A SIGNEDINTEGER MATCH (0) **

  Call:   AtomDatum
Consumed token: SIGNEDINTEGER: 0 at line 1 column 40
  Return: AtomDatum
Return: Datum
   Matched the empty string as STRING token.
Current character : ) (41) at line 1 column 41
   No more string literal token matches are possible.
   Currently matched the first 1 characters as a ) token.
** FOUND A ) MATCH ()) **

  Return: Tuple
  Consumed token: ) at line 1 column 41
   Matched the empty string as STRING token.
Current character : , (44) at line 1 column 42
   No more string literal token matches are possible.
   Currently matched the first 1 characters as a , token.
** FOUND A , MATCH (,) **

  Consumed token: , at line 1 column 42
   Matched the empty string as STRING token.
Current character :   (32) at line 1 column 43
   No string literal matches possible.
   Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER }
Current character :   (32) at line 1 column 43
   Currently matched the first 1 characters as a STRING token.
   Possible kinds of longer matches : { STRING, SIGNEDINTEGER, 
DOUBLENUMBER }
Current character : ( (40) at line 1 column 44
   Currently matched the first 1 characters as a STRING token.
   Putting back 1 characters into the input stream.
** FOUND A STRING MATCH ( ) **

Return: Bag
  Return: Datum
Return: Parse



  was:
PigStorage parser for bags is not working correctly when a tuple in a bag is 
proceeded by a space. For example, the following is parsed correctly:

{(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}

while this is not: (Note the space before the second tuple)
{(-5.243084,3.142401,0.000138,2.071200,0), 
(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}

It seems that the parser when it encounters the space, treats the rest of the 
line as a String. With a schema, this results in a typecast of string to 
databag which results in exception. Accordingly, because of this, when using 
pigstorage to output a bag, it cannot be loaded using pigstorage because of 
this inconsistency.

|WARN builtin.PigStorage: Unable to interpret value [...@2c9b42e6 in field 
being converted to type bag, caught ParseException Encountered  STRING   
 at |line 1, column 43.
|Was expecting:
|( ...
| field discarded


Below is the parser debug output for the parsing of the above error sequence: 
2.071200,0), ( from above...

** FOUND A DOUBLENUMBER MATCH (2.071200) **

  Call:   AtomDatum
Consumed token: DOUBLENUMBER: 2.071200 at