[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910629#action_12910629 ] Sandesh Devaraju commented on PIG-1229: --- I narrowed down the problem to org.apache.hadoop.mapred.Task.java lines 411-418. {code:title=org.apache.hadoop.mapred.Task.java|linenumbers=true|firstline=411} if (useNewApi) { LOG.debug(using new api for output committer); outputFormat = ReflectionUtils.newInstance(taskContext.getOutputFormatClass(), job); committer = outputFormat.getOutputCommitter(taskContext); } else { committer = conf.getOutputCommitter(); } {code} But DBStorage UDF assumes that the OutputFormat is in a closure. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1618) Switch to new parser generator technology
Switch to new parser generator technology - Key: PIG-1618 URL: https://issues.apache.org/jira/browse/PIG-1618 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Alan Gates Assignee: Xuefu Zhang Fix For: 0.9.0 There are many bugs in Pig related to the parser, particularly to bad error messages. After review of Java CC we feel these will be difficult to address using that tool. Also, the .jjt files used by JavaCC are hard to understand and maintain. ANTLR is being reviewed as the most likely choice to move to, but other parsers will be reviewed as well. This JIRA will act as an umbrella issue for other parser issues. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1619) Bad error message when a double constant is incorrectly specified
Bad error message when a double constant is incorrectly specified - Key: PIG-1619 URL: https://issues.apache.org/jira/browse/PIG-1619 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Xuefu Zhang Priority: Minor Fix For: 0.9.0 Given the following Pig Latin script (notice that the exponent for the floating point is a floating point when it should be a integer) {code} A = load '/Users/gates/test/data/studenttab10'; B = foreach A generate $0, 3.0e10.1; dump B; {code} Pig returns {code} ERROR 2999: Unexpected internal error. For input string: 3.0e10.1 {code} This should be a syntax error caught by the parser. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1620) ARRANGE keyword should be deprecated
ARRANGE keyword should be deprecated Key: PIG-1620 URL: https://issues.apache.org/jira/browse/PIG-1620 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Xuefu Zhang Priority: Minor Fix For: 0.9.0 ARRANGE is a synonym for ORDER in Pig Latin. As far as I know no one uses it. Its use is not documented. And I am a strong fan of having one way to do things in programming languages. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1621) What does EVAL keyword do?
What does EVAL keyword do? -- Key: PIG-1621 URL: https://issues.apache.org/jira/browse/PIG-1621 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Xuefu Zhang Priority: Minor Fix For: 0.9.0 EVAL is listed as a keyword in Pig Latin, in both the documentation and the QueryParser.jjt file. However, it has no productions in the grammar and no further mention in the documentation. We need to either clarify what it does, or why we are reserving it as a keyword, or remove it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1622) DEFINE streaming options are ill defined and not properly documented
DEFINE streaming options are ill defined and not properly documented Key: PIG-1622 URL: https://issues.apache.org/jira/browse/PIG-1622 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Xuefu Zhang Priority: Minor Fix For: 0.9.0 According to the documentation (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#DEFINE) the syntax for DEFINE when used to define a streaming command is: DEFINE cmd INPUT(stdin|path) OUTPUT(stdout|stderr|path) SHIP(path [, path, ...]) CACHE (path [, path, ...]) However, the actual parser accepts something pretty different. Consider the following script: {code} define strm `wc -l` INPUT(stdin) CACHE('/Users/gates/.vimrc#myvim') OUTPUT(stdin) INPUT('/tmp/fred') OUTPUT('/tmp/bob') SHIP('/Users/gates/.bashrc') SHIP('/Users/gates/.vimrc') CACHE('/Users/gates/.bashrc#mybash') stderr('/tmp/errors' limit 10); A = load '/Users/gates/test/data/studenttab10'; B = stream A through strm; dump B; {code} The above actually parsers. I see several issues here: # What do multiple INPUT and OUTPUT statements mean in the context of streaming? These should not be allowed. # The documentation implies an order (INPUT, OUTPUT, SHIP, CACHE) that is not enforced by the parser. We should either enforce the order in the parser or update the documentation. Most likely the latter to avoid breaking existing scripts. # Why are multiple SHIP and CACHE clauses allowed when each can take multiple paths? It seems we should only allow one of each. # The error clause is completely different that what is given in the documentation. I suspect this is a documentation error and the grammar supported by the parser here is what we want. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1479) Embed Pig in scripting languages
[ https://issues.apache.org/jira/browse/PIG-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1479: -- Attachment: pig-greek-test.tar Attach the test script modified based on Julien's comment. As for commend line option -g, it can also use one parameter (script file name) and let Pig determine the script engine by the file extension. Embed Pig in scripting languages Key: PIG-1479 URL: https://issues.apache.org/jira/browse/PIG-1479 Project: Pig Issue Type: New Feature Reporter: Julien Le Dem Attachments: PIG-1479.patch, PIG-1479_2.patch, pig-greek-test.tar, pig-greek-test.tar, pig-greek.tgz It should be possible to embed Pig calls in a scripting language and let functions defined in the same script available as UDFs. This is a spin off of https://issues.apache.org/jira/browse/PIG-928 which lets users define UDFs in scripting languages. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1623) Register syntax is ambiguous
Register syntax is ambiguous Key: PIG-1623 URL: https://issues.apache.org/jira/browse/PIG-1623 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Xuefu Zhang Priority: Minor Fix For: 0.9.0 All of the following register statements parse {code} register /Users/gates/tmp/pig-0.7/pig-0.7.0/./contrib/piggybank/java/piggybank.jar register '/Users/gates/tmp/pig-0.7/pig-0.7.0/./contrib/piggybank/java/piggybank.jar' register '/Users/gates/tmp/pig-0.7/pig-0.7.0/./contrib/piggybank/java/piggybank.jar'; {code} As far as I know register is the only Pig Latin command that does not require a semicolon at the end. It is also the only command that allows unquoted strings for file paths. We should align this with other similar syntax in Pig Latin. I order to avoid breaking existing scripts we may need to warn about this behavior for a while before no longer supporting it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1624) FOREACH AS documentation is incorrect
FOREACH AS documentation is incorrect - Key: PIG-1624 URL: https://issues.apache.org/jira/browse/PIG-1624 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Corinne Chandel Fix For: 0.9.0 According to the Pig Latin manual (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#FOREACH) the correct usage of AS in a FOREACH clause is: {code} B = foreach A generate $0, $1, $2 as (user, age, gpa); {code} However, this is incorrect, and produce a syntax error. The correct syntax for AS for FOREACH is: {code} B = foreach A generate $0 as user, $1 as age, $2 as gpa; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1624) FOREACH AS documentation is incorrect
[ https://issues.apache.org/jira/browse/PIG-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910705#action_12910705 ] Alan Gates commented on PIG-1624: - I should note as well that when a flatten is involved, the proper syntax is: {code} A = load '/Users/gates/test/data/studenttab10' as (name:chararray, b{}, gpa:double); B = foreach A generate name, flatten(b) as (fred, bob, joe), gpa; dump B; {code} Note the use of parenthesis to enclose the list of fields coming from the flattened bag. FOREACH AS documentation is incorrect - Key: PIG-1624 URL: https://issues.apache.org/jira/browse/PIG-1624 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Corinne Chandel Fix For: 0.9.0 According to the Pig Latin manual (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#FOREACH) the correct usage of AS in a FOREACH clause is: {code} B = foreach A generate $0, $1, $2 as (user, age, gpa); {code} However, this is incorrect, and produce a syntax error. The correct syntax for AS for FOREACH is: {code} B = foreach A generate $0 as user, $1 as age, $2 as gpa; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1624) FOREACH AS documentation is incorrect
[ https://issues.apache.org/jira/browse/PIG-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1624: Fix Version/s: 0.8.0 (was: 0.9.0) We are still updating docs so we should be able to get this in for 0.8 FOREACH AS documentation is incorrect - Key: PIG-1624 URL: https://issues.apache.org/jira/browse/PIG-1624 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Corinne Chandel Fix For: 0.8.0 According to the Pig Latin manual (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#FOREACH) the correct usage of AS in a FOREACH clause is: {code} B = foreach A generate $0, $1, $2 as (user, age, gpa); {code} However, this is incorrect, and produce a syntax error. The correct syntax for AS for FOREACH is: {code} B = foreach A generate $0 as user, $1 as age, $2 as gpa; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1625) Docs incorreclty say SAMPLE can be used in a nested FOREACH and do not mention projections in nested foreach
Docs incorreclty say SAMPLE can be used in a nested FOREACH and do not mention projections in nested foreach Key: PIG-1625 URL: https://issues.apache.org/jira/browse/PIG-1625 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Corinne Chandel Fix For: 0.8.0 Currently the docs in http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#FOREACH say that SAMPLE can be used as an operator in nested foreach. It cannot. Also, they do not mention the ability to do projections inside nested foreach, such as the following: {code} A = load '/Users/gates/test/data/studenttab10'; B = group A all; C = foreach B { C1 = A.$0; C2 = distinct C1; generate C2; } dump C; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1598) Pig gobbles up error messages - Part 2
[ https://issues.apache.org/jira/browse/PIG-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1598: --- Attachment: PIG-1598_0.patch Pig gobbles up error messages - Part 2 -- Key: PIG-1598 URL: https://issues.apache.org/jira/browse/PIG-1598 Project: Pig Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: niraj rai Fix For: 0.8.0 Attachments: PIG-1598_0.patch Another case of PIG-1531 . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1598) Pig gobbles up error messages - Part 2
[ https://issues.apache.org/jira/browse/PIG-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1598: --- Status: Patch Available (was: Open) Pig gobbles up error messages - Part 2 -- Key: PIG-1598 URL: https://issues.apache.org/jira/browse/PIG-1598 Project: Pig Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: niraj rai Fix For: 0.8.0 Attachments: PIG-1598_0.patch Another case of PIG-1531 . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1626) Need to clarify how COUNT handles nulls
Need to clarify how COUNT handles nulls --- Key: PIG-1626 URL: https://issues.apache.org/jira/browse/PIG-1626 Project: Pig Issue Type: Bug Components: documentation Reporter: Olga Natkovich Assignee: Corinne Chandel Fix For: 0.8.0 The current documentation just states: The COUNT function ignores NULL values. If you want to include NULL values in the count computation, use COUNT_STAR. The new text should be something like The COUNT function follows syntax semantics and ignores nulls. What this means is that a tuple in the bag will not be counted if the first field in this tuple is NULL. If you want to include NULL values in the count computation, use COUNT_STAR. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1627) Flattening of bags with unknown schemas produces wrong schema
Flattening of bags with unknown schemas produces wrong schema - Key: PIG-1627 URL: https://issues.apache.org/jira/browse/PIG-1627 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.9.0 The following should produce an unknown schema: {code} A = load '/Users/gates/test/data/studenttab10'; B = group A by $0; C = foreach B generate flatten(A); describe C; {code} Instead it gives {code} C: {bytearray} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1627) Flattening of bags with unknown schemas produces wrong schema
[ https://issues.apache.org/jira/browse/PIG-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910757#action_12910757 ] Alan Gates commented on PIG-1627: - The problem is in the flatten, not the group. The group has the proper schema (bytearray, bag{}). Loading a bag of unknown schema and flattening it produces the same result. Flattening a tuple of unknown content has the same problem as well. Flattening of bags with unknown schemas produces wrong schema - Key: PIG-1627 URL: https://issues.apache.org/jira/browse/PIG-1627 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.9.0 The following should produce an unknown schema: {code} A = load '/Users/gates/test/data/studenttab10'; B = group A by $0; C = foreach B generate flatten(A); describe C; {code} Instead it gives {code} C: {bytearray} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1628) log this message at debug level : 'Pig Internal storage in use'
log this message at debug level : 'Pig Internal storage in use' --- Key: PIG-1628 URL: https://issues.apache.org/jira/browse/PIG-1628 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 The temporary storage functions used are logging at the INFO level. This should change to debug level, they are reducing the visibility of more useful INFO messages. The messages include 'Pig Internal storage in use' from InterStorage and 'TFile storage in use' from TFileStorage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1508) Make 'docs' target (forrest) work with Java 1.6
[ https://issues.apache.org/jira/browse/PIG-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1508: Status: Resolved (was: Patch Available) Resolution: Fixed Patch checked in. Thanks Carl. Make 'docs' target (forrest) work with Java 1.6 --- Key: PIG-1508 URL: https://issues.apache.org/jira/browse/PIG-1508 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.7.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.8.0 Attachments: PIG-1508.patch.txt FOR-984 covers the very inconvenient fact that Forrest 0.8 does not work with Java 1.6 The same ticket also suggests a workaround: disabling sitemap and stylesheet validation by setting the forrest.validate.sitemap and forrest.validate.stylesheets properties to false. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1629) Need ability to limit bags produced during GROUP + LIMIT
Need ability to limit bags produced during GROUP + LIMIT Key: PIG-1629 URL: https://issues.apache.org/jira/browse/PIG-1629 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Thejas M Nair Fix For: 0.9.0 Currently, the code below will construct the full group in memory and then trim it. This requires in use of more memory than needed. A = load 'data' as (x, y, z); B = group A by x; C = foreach B{ D = limit A 100; generate group, MyUDF(D);} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.