[jira] Commented: (PIG-1339) International characters in column names not supported

2010-04-29 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862415#action_12862415
 ] 

Pradeep Kamath commented on PIG-1339:
-

I looked into this more and here are my observations:
 Grunt actually parses the unicode chars as ascii characters (my suspicion is 
this is due to Jline's ConsoleReader treating its input as ASCII). So though 
grunt is able to process the script, it actually interprets the column name as 
whatever the equivalent ASCII representation comes out as. So it's not really 
handling it correctly (in the above case, the column name becomes something 
like BDFGH). The reason ascii characters work is because the columname is 
matched by the IDENTIFIER token which only works with  ascii characters. This 
production cannot be extended easily to handle non ascii chars nor can a new 
token (COLNAME?) be used to allow non ascii chars along the lines of 
QUOTEDSTRING. In QUOTEDSTRING, non ascii chars are allowed only within the 
context of enclosing single quotes. Here we need the same functionality in the 
context of schema specification - "as (colname)". Otherwise most input 
would match to this token. Unfortunately this is a context within the parser 
rather than in the TokenManager which does have the concept of lexical states 
(http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq-ie.htm#tth_sEc3.11). 
Switching lexical states within the parser is considered unsafe since the 
tokenManager is looking ahead of where the parser is at 
(http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq-ie.htm#tth_sEc3.12). So at 
this point I am not clear what the approach to this issue should be.

If someone with better javacc knowledge knows how to do parser context based 
tokenizing, please give this issue a try and update with results.

> International characters in column names not supported
> --
>
> Key: PIG-1339
> URL: https://issues.apache.org/jira/browse/PIG-1339
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0, 0.7.0, 0.8.0
>Reporter: Viraj Bhat
>
> There is a particular use-case in which someone specifies a column name to be 
> in International characters.
> {code}
> inputdata = load '/user/viraj/inputdata.txt' using PigStorage() as (あいうえお);
> describe inputdata;
> dump inputdata;
> {code}
> ==
> Pig Stack Trace
> ---
> ERROR 1000: Error during parsing. Lexical error at line 1, column 64.  
> Encountered: "\u3042" (12354), after : ""
> org.apache.pig.impl.logicalLayer.parser.TokenMgrError: Lexical error at line 
> 1, column 64.  Encountered: "\u3042" (12354), after : ""
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1791)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_scan_token(QueryParser.java:8959)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_51(QueryParser.java:7462)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_120(QueryParser.java:7769)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_106(QueryParser.java:7787)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_63(QueryParser.java:8609)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_32(QueryParser.java:8621)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3_4(QueryParser.java:8354)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_2_4(QueryParser.java:6903)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1249)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:391)
> 

[jira] Commented: (PIG-1339) International characters in column names not supported

2010-04-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860606#action_12860606
 ] 

Ashutosh Chauhan commented on PIG-1339:
---

This works fine on grunt. 
{code}
grunt> a = load '1-3.txt' using PigStorage() as (あいうえお);
grunt> dump a;
{code}

gives expected result. Problem is if it is fed as script to Pig
{code}
bin/pig myscript.pig
{code}
gives the exception as you shown above. This looks like a bug in 
PigScriptParser.jj where it should read the stream from script file as UTF-8.

> International characters in column names not supported
> --
>
> Key: PIG-1339
> URL: https://issues.apache.org/jira/browse/PIG-1339
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0, 0.7.0, 0.8.0
>Reporter: Viraj Bhat
>
> There is a particular use-case in which someone specifies a column name to be 
> in International characters.
> {code}
> inputdata = load '/user/viraj/inputdata.txt' using PigStorage() as (あいうえお);
> describe inputdata;
> dump inputdata;
> {code}
> ==
> Pig Stack Trace
> ---
> ERROR 1000: Error during parsing. Lexical error at line 1, column 64.  
> Encountered: "\u3042" (12354), after : ""
> org.apache.pig.impl.logicalLayer.parser.TokenMgrError: Lexical error at line 
> 1, column 64.  Encountered: "\u3042" (12354), after : ""
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1791)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_scan_token(QueryParser.java:8959)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_51(QueryParser.java:7462)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_120(QueryParser.java:7769)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_106(QueryParser.java:7787)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_63(QueryParser.java:8609)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_32(QueryParser.java:8621)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3_4(QueryParser.java:8354)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_2_4(QueryParser.java:6903)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1249)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:391)
> ==
> Thanks Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1339) International characters in column names not supported

2010-04-23 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860445#action_12860445
 ] 

Viraj Bhat commented on PIG-1339:
-

Hi Ashutosh this does not work in trunk. I am using the latest build:

{code}
$java -cp  ~/pig-svn/trunk/pig.jar org.apache.pig.Main -version

Apache Pig version 0.8.0-dev (r937554) 
compiled Apr 23 2010, 16:57:32

{code}

2010-04-23 17:31:41,448 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1000: Error during parsing. Lexical error at line 1, column 71.  Encountered: 
"\u3042" (12354), after : ""


This is a valid bug.

Viraj

> International characters in column names not supported
> --
>
> Key: PIG-1339
> URL: https://issues.apache.org/jira/browse/PIG-1339
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0, 0.7.0, 0.8.0
>Reporter: Viraj Bhat
>
> There is a particular use-case in which someone specifies a column name to be 
> in International characters.
> {code}
> inputdata = load '/user/viraj/inputdata.txt' using PigStorage() as (あいうえお);
> describe inputdata;
> dump inputdata;
> {code}
> ==
> Pig Stack Trace
> ---
> ERROR 1000: Error during parsing. Lexical error at line 1, column 64.  
> Encountered: "\u3042" (12354), after : ""
> org.apache.pig.impl.logicalLayer.parser.TokenMgrError: Lexical error at line 
> 1, column 64.  Encountered: "\u3042" (12354), after : ""
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1791)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_scan_token(QueryParser.java:8959)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_51(QueryParser.java:7462)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_120(QueryParser.java:7769)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_106(QueryParser.java:7787)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_63(QueryParser.java:8609)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_32(QueryParser.java:8621)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3_4(QueryParser.java:8354)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_2_4(QueryParser.java:6903)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1249)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:391)
> ==
> Thanks Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1339) International characters in column names not supported

2010-04-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859152#action_12859152
 ] 

Ashutosh Chauhan commented on PIG-1339:
---

This is not reproducible on trunk. I get the expected output. Viraj, can you 
please verify if it works for you in trunk ?

> International characters in column names not supported
> --
>
> Key: PIG-1339
> URL: https://issues.apache.org/jira/browse/PIG-1339
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>
> There is a particular use-case in which someone specifies a column name to be 
> in International characters.
> {code}
> inputdata = load '/user/viraj/inputdata.txt' using PigStorage() as (あいうえお);
> describe inputdata;
> dump inputdata;
> {code}
> ==
> Pig Stack Trace
> ---
> ERROR 1000: Error during parsing. Lexical error at line 1, column 64.  
> Encountered: "\u3042" (12354), after : ""
> org.apache.pig.impl.logicalLayer.parser.TokenMgrError: Lexical error at line 
> 1, column 64.  Encountered: "\u3042" (12354), after : ""
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1791)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_scan_token(QueryParser.java:8959)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_51(QueryParser.java:7462)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_120(QueryParser.java:7769)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_106(QueryParser.java:7787)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_63(QueryParser.java:8609)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_32(QueryParser.java:8621)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3_4(QueryParser.java:8354)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_2_4(QueryParser.java:6903)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1249)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:391)
> ==
> Thanks Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.