[jira] Created: (PIG-947) Parsing Bags by PigStorage is not handled correctly if whitespace before start of tuple.
Parsing Bags by PigStorage is not handled correctly if whitespace before start of tuple. Key: PIG-947 URL: https://issues.apache.org/jira/browse/PIG-947 Project: Pig Issue Type: Bug Components: data Environment: Pig on Hadoop 18 Reporter: Gandul Azul PigStorage parser for bags is not working correctly when a tuple in a bag is proceeded by a space. For example, the following is parsed correctly: {(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)} while this is not: (Note the space before the second tuple) {(-5.243084,3.142401,0.000138,2.071200,0), (-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)} It seems that the parser when it encounters the space, treats the rest of the line as a String. With a schema, this results in a typecast of string to databag which results in exception. Accordingly, because of this, when using pigstorage to output a bag, it cannot be loaded using pigstorage because of this inconsistency. |WARN builtin.PigStorage: Unable to interpret value [...@2c9b42e6 in field being converted to type bag, caught ParseException Encountered STRING at |line 1, column 43. |Was expecting: |( ... | field discarded Below is the parser debug output for the parsing of the above error sequence: 2.071200,0), ( from above... ** FOUND A DOUBLENUMBER MATCH (2.071200) ** Call: AtomDatum Consumed token: DOUBLENUMBER: 2.071200 at line 1 column 31 Return: AtomDatum Return: Datum Matched the empty string as STRING token. Current character : , (44) at line 1 column 39 No more string literal token matches are possible. Currently matched the first 1 characters as a , token. ** FOUND A , MATCH (,) ** Consumed token: , at line 1 column 39 Call: Datum Matched the empty string as STRING token. Current character : 0 (48) at line 1 column 40 No string literal matches possible. Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER } Current character : 0 (48) at line 1 column 40 Currently matched the first 1 characters as a SIGNEDINTEGER token. Possible kinds of longer matches : { STRING, SIGNEDINTEGER, DOUBLENUMBER, LONGINTEGER, FLOATNUMBER } Current character : ) (41) at line 1 column 41 Currently matched the first 1 characters as a SIGNEDINTEGER token. Putting back 1 characters into the input stream. ** FOUND A SIGNEDINTEGER MATCH (0) ** Call: AtomDatum Consumed token: SIGNEDINTEGER: 0 at line 1 column 40 Return: AtomDatum Return: Datum Matched the empty string as STRING token. Current character : ) (41) at line 1 column 41 No more string literal token matches are possible. Currently matched the first 1 characters as a ) token. ** FOUND A ) MATCH ()) ** Return: Tuple Consumed token: ) at line 1 column 41 Matched the empty string as STRING token. Current character : , (44) at line 1 column 42 No more string literal token matches are possible. Currently matched the first 1 characters as a , token. ** FOUND A , MATCH (,) ** Consumed token: , at line 1 column 42 Matched the empty string as STRING token. Current character : (32) at line 1 column 43 No string literal matches possible. Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER } Current character : (32) at line 1 column 43 Currently matched the first 1 characters as a STRING token. Possible kinds of longer matches : { STRING, SIGNEDINTEGER, DOUBLENUMBER } Current character : ( (40) at line 1 column 44 Currently matched the first 1 characters as a STRING token. Putting back 1 characters into the input stream. ** FOUND A STRING MATCH ( ) ** Return: Bag Return: Datum Return: Parse -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-948) [Usability] Relating pig script with MR jobs
[Usability] Relating pig script with MR jobs Key: PIG-948 URL: https://issues.apache.org/jira/browse/PIG-948 Project: Pig Issue Type: Improvement Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Minor Currently its hard to find a way to relate pig script with specific MR job. In a loaded cluster with multiple simultaneous job submissions, its not easy to figure out which specific MR jobs were launched for a given pig script. If Pig can provide this info, it will be useful to debug and monitor the jobs resulting from a pig script. At the very least, Pig should be able to provide user the following information 1) Job id of the launched job. 2) Complete web url of jobtracker running this job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-947) Parsing Bags by PigStorage is not handled correctly if whitespace before start of tuple.
[ https://issues.apache.org/jira/browse/PIG-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gandul Azul updated PIG-947: Description: PigStorage parser for bags is not working correctly when a tuple in a bag is proceeded by a space. For example, the following is parsed correctly: {(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)} while this is not: (Note the space before the second tuple) {(-5.243084,3.142401,0.000138,2.071200,0), (-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)} It seems that the parser when it encounters the space, treats the rest of the line as a String. With a schema, this results in a typecast of string to databag which results in exception. |WARN builtin.PigStorage: Unable to interpret value [...@2c9b42e6 in field being converted to type bag, caught ParseException Encountered STRING at |line 1, column 43. |Was expecting: |( ... | field discarded Below is the parser debug output for the parsing of the above error sequence: 2.071200,0), ( from above... ** FOUND A DOUBLENUMBER MATCH (2.071200) ** Call: AtomDatum Consumed token: DOUBLENUMBER: 2.071200 at line 1 column 31 Return: AtomDatum Return: Datum Matched the empty string as STRING token. Current character : , (44) at line 1 column 39 No more string literal token matches are possible. Currently matched the first 1 characters as a , token. ** FOUND A , MATCH (,) ** Consumed token: , at line 1 column 39 Call: Datum Matched the empty string as STRING token. Current character : 0 (48) at line 1 column 40 No string literal matches possible. Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER } Current character : 0 (48) at line 1 column 40 Currently matched the first 1 characters as a SIGNEDINTEGER token. Possible kinds of longer matches : { STRING, SIGNEDINTEGER, DOUBLENUMBER, LONGINTEGER, FLOATNUMBER } Current character : ) (41) at line 1 column 41 Currently matched the first 1 characters as a SIGNEDINTEGER token. Putting back 1 characters into the input stream. ** FOUND A SIGNEDINTEGER MATCH (0) ** Call: AtomDatum Consumed token: SIGNEDINTEGER: 0 at line 1 column 40 Return: AtomDatum Return: Datum Matched the empty string as STRING token. Current character : ) (41) at line 1 column 41 No more string literal token matches are possible. Currently matched the first 1 characters as a ) token. ** FOUND A ) MATCH ()) ** Return: Tuple Consumed token: ) at line 1 column 41 Matched the empty string as STRING token. Current character : , (44) at line 1 column 42 No more string literal token matches are possible. Currently matched the first 1 characters as a , token. ** FOUND A , MATCH (,) ** Consumed token: , at line 1 column 42 Matched the empty string as STRING token. Current character : (32) at line 1 column 43 No string literal matches possible. Starting NFA to match one of : { STRING, SIGNEDINTEGER, DOUBLENUMBER } Current character : (32) at line 1 column 43 Currently matched the first 1 characters as a STRING token. Possible kinds of longer matches : { STRING, SIGNEDINTEGER, DOUBLENUMBER } Current character : ( (40) at line 1 column 44 Currently matched the first 1 characters as a STRING token. Putting back 1 characters into the input stream. ** FOUND A STRING MATCH ( ) ** Return: Bag Return: Datum Return: Parse was: PigStorage parser for bags is not working correctly when a tuple in a bag is proceeded by a space. For example, the following is parsed correctly: {(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)} while this is not: (Note the space before the second tuple) {(-5.243084,3.142401,0.000138,2.071200,0), (-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)} It seems that the parser when it encounters the space, treats the rest of the line as a String. With a schema, this results in a typecast of string to databag which results in exception. Accordingly, because of this, when using pigstorage to output a bag, it cannot be loaded using pigstorage because of this inconsistency. |WARN builtin.PigStorage: Unable to interpret value [...@2c9b42e6 in field being converted to type bag, caught ParseException Encountered STRING at |line 1, column 43. |Was expecting: |( ... | field discarded Below is the parser debug output for the parsing of the above error sequence: 2.071200,0), ( from above... ** FOUND A DOUBLENUMBER MATCH (2.071200) ** Call: AtomDatum Consumed token: DOUBLENUMBER: 2.071200 at
[jira] Updated: (PIG-948) [Usability] Relating pig script with MR jobs
[ https://issues.apache.org/jira/browse/PIG-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-948: - Attachment: pig-948.patch Attached is a patch which prints following information on grunt shell : {code} 09/09/07 15:11:48 INFO mapReduceLayer.MapReduceLauncher: Submitting job: job_200908291847_0046 to execution engine. 09/09/07 15:11:48 INFO mapReduceLayer.MapReduceLauncher: More information at: http://www.jobtracker-site:50030/jobdetails.jsp?jobid=job_200908291847_0046 09/09/07 15:11:48 INFO mapReduceLayer.MapReduceLauncher: To kill this job, use: kill job_200908291847_0046 {code} [Usability] Relating pig script with MR jobs Key: PIG-948 URL: https://issues.apache.org/jira/browse/PIG-948 Project: Pig Issue Type: Improvement Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Minor Attachments: pig-948.patch Currently its hard to find a way to relate pig script with specific MR job. In a loaded cluster with multiple simultaneous job submissions, its not easy to figure out which specific MR jobs were launched for a given pig script. If Pig can provide this info, it will be useful to debug and monitor the jobs resulting from a pig script. At the very least, Pig should be able to provide user the following information 1) Job id of the launched job. 2) Complete web url of jobtracker running this job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.
[ https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan reassigned PIG-939: -- Assignee: Giridharan Kesavan Checkstyle pulls in junit3.7 which causes the build of test code to fail. - Key: PIG-939 URL: https://issues.apache.org/jira/browse/PIG-939 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.3.0 Reporter: Lee Tucker Assignee: Giridharan Kesavan Attachments: pig-939.patch Pig fails to compile if you execute: ant -Dassociated flags for various components clean findbugs checkstyle test It gets the error: [javac] Compiling 153 source files to /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes [javac] /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31: cannot find symbol [javac] symbol : constructor TestCase() [javac] location: class junit.framework.TestCase [javac] public abstract class PigExecTestCase extends TestCase { [javac] ^ Once that's done, there's a copy of junit 3.7 cached from ivy that will continue to cause the build to fail. It will succeed, if you remove it, and then do: ant -Dassociated flags for various components clean findbugs test This proves it's running checkstyle that pulls in junit 3.7 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.