Handling strings and exceptions in text data parser
---------------------------------------------------

                 Key: PIG-567
                 URL: https://issues.apache.org/jira/browse/PIG-567
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Santhosh Srinivasan
            Priority: Minor
             Fix For: types_branch


The text data parser treats a sequence of numerals as integer. If the data is 
too long to fit into an integer then a number format exception is thrown and no 
attempts are made to convert the data to a higher type. A couple of questions 
arise:

1. Should strings be annotated with delimiters like quotes to distinguish them 
from numbers?
2. Should conversions to higher types or strings be attempted? The conversions 
have performance implications.

{noformat}

Data file:
{(2985671202194220139L}

Pig script:
a = load 'data' as (list: bag{t: tuple(value: chararray)});
dump a

Output:
2008-12-13 09:08:24,831 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable
to open iterator for alias: a [Unable to store for alias: a [For input string: 
"2985671202194220139"]]
        at 
org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:178)
        at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:647)
        at org.apache.pig.PigServer.store(PigServer.java:452)
        at org.apache.pig.PigServer.store(PigServer.java:421)
        at org.apache.pig.PigServer.openIterator(PigServer.java:384)
        at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
        at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
        at 
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:94)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
        at org.apache.pig.Main.main(Main.java:282)
Caused by: java.io.IOException: Unable to store for alias: a [For input string: 
"2985671202194220139"]
        ... 10 more
Caused by: org.apache.pig.backend.executionengine.ExecException: For input 
string: "2985671202194220139"
        ... 10 more
Caused by: java.lang.NumberFormatException: For input string: 
"2985671202194220139"
        at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
        at java.lang.Integer.parseInt(Integer.java:459)
        at java.lang.Integer.parseInt(Integer.java:497)
        at 
org.apache.pig.data.parser.TextDataParser.AtomDatum(TextDataParser.java:291)
        at 
org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:359)
        at 
org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:149)
        at org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:85)
        at 
org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:345)
        at 
org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42)
        at 
org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:70)
        at 
org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:78)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:861)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:243)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:226)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.store(POStore.java:137)
        at 
org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:62)
        at 
org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:166)
        ... 9 more

2008-12-13 09:08:24,833 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
Unable to open iterator for alias: a [Unable to store for alias: a [For input 
string: "2985671202194220139"]]
2008-12-13 09:08:24,834 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
java.io.IOException: Unable to open iterator for alias: a [Unable to store for 
alias: a [For input string: "2985671202194220139"]]

{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to