[jira] Commented: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875242#action_12875242 ] Aniket Mokashi commented on PIG-972: Approach- 1. To implement above mentioned functionality, parser keeps track of all the nested aliases it comes across and adds them to LOForEach. 2. After we parse the query (foreach), we dump the schema for all nested-aliases stored in the list. 3. When describe foreach-alias; is queried, along with schema for foreach-alias, we dump the schema for all nested-aliases stored in the list. Data Structures - Adding mDescribedAliasList to LOForEach to list all the described nested aliases. LOForeach has mForEachPlans which creates plan for all projections. This keeps track of schemas for all nested aliases inside leaves of its plans. Issues- Verification of aliases in nested describe- As, we do not create a map for nested aliases, it is not possible to validate upfront the name of the alias used in nested describe. Multiple dumps- Above approach might lead to multiple dumping of schemas. These issues can be solved with adding more state into LOForeach. Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
[ https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875292#action_12875292 ] Pradeep Kamath commented on PIG-1433: - Hudson seems to be unresponsive - I ran unit tests locally and they completed successfully. The test-patch ant target also came back successfully except for a html page change in the release audit warnings which can be ignored. Patch is ready for review. pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1433.patch pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
[ https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875319#action_12875319 ] Ashutosh Chauhan commented on PIG-1433: --- +1 for the commit. couple of notes for future: * Since this is related to Hadoop property. We should consider this removing from Pig codebase when MAPREDUCE-1447 and MAPREDUCE-947 are fixed. * We have lot of constant strings in our codebase. For the sake of clean code, we shall put all of those public static final string in one top level interface called Constants. This should be part of seperate clean-up code jira. pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1433.patch pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
[ https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875321#action_12875321 ] Gianmarco De Francisci Morales commented on PIG-1433: - Just for the sake of clean code, constant interface is an anti-pattern. http://en.wikipedia.org/wiki/Constant_interface Public final instance controlled (no instance) classes are better for this purpose. pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1433.patch pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
[ https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875326#action_12875326 ] Ashutosh Chauhan commented on PIG-1433: --- My point was to have all constant strings in one place instead of each class having some of them It could be either interface or class. If interface is considered anti-pattern, doing it in class is fine too. pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1433.patch pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875341#action_12875341 ] Julien Le Dem commented on PIG-928: --- I like Register better as well. With java UDFs, you REGISTER a jar. Then you can use the classes in the jar using their fully qualified class name. Optionally you can use DEFINE to alias the functions or pass extra initialization parameters. with scripting as implemented by Arnab, you REGISTER a script file (adding the script language information as it is not only java anymore) and you can use all the functions in it (just like you do in java). Then I would say you should be able to alias them using DEFINE and define a closure by passing extra parameters, DEFINE log2 logn(2, $0); (maybe I am asking to much here ;) ) UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Fix For: 0.8.0 Attachments: calltrace.png, package.zip, pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, scripting.tgz, scripting.tgz, test.zip It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1434) Allow casting relations to scalars
Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Fix For: 0.8.0 This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1435) make sure dependent jobs fail when a jon in multiquery fails
make sure dependent jobs fail when a jon in multiquery fails Key: PIG-1435 URL: https://issues.apache.org/jira/browse/PIG-1435 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Fix For: 0.8.0 Currently if one of the MQ jobs fails, Pig tries to run all remainin jobs. As the result, if data was partially generated by the failed job, you might get incorrect results from dependent jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875390#action_12875390 ] Aniket Mokashi commented on PIG-972: Approach mentioned above seems to work. Here are some proposals on semantics of nested describe- a = load '1.txt' as (a0:int, a1:int); b = group a by $0; Proposal 1- Explicit describe. c = foreach b { d = order a by $0; describe d; e = ...; generate d.$0 ...;} (1a:Instantaneous responce - describes d after parsing above statement) describe c; Prints schema for c and d (but not e) Adv - Can select which one of nestedAlias to describe. Disadv - Extra typing. Proposal 2:- Implicit describe (no describe nested statements) c = foreach b { d = order a by $0; e = ...; generate d.$0 ...;} describe c; Describes c, d and e; Adv- less typing Disadv- extra prints (2a - describe c prints for c, d and e. Also describe c-d to describe nested d) (2b - describe c prints for c only. describe c- d to describe nested d). Alan/Olga, Let me know your comments on this, Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-972: --- Attachment: NestedDescribeProp1.patch Attaching patch for prop1. Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: NestedDescribeProp1.patch Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875410#action_12875410 ] Daniel Dai commented on PIG-972: If we have a statement: {code} a = load '1.txt' as (a0:int, a1:int); b = group a by a0; c = foreach b { d = order a by $0; generate d, *; } {code} Here is proposal 2.b from Aniket: {code} grunt describe c: c::d: {a0: int,a1: int} c: {d: {a0: int,a1: int},group: int,a: {a0: int,a1: int}} grunt describe c::d; c::d: {a0: int,a1: int} {code} I vote for this approach. Opinion? Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: NestedDescribeProp1.patch Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1436) Print number of records outputted at each step of a Pig script
Print number of records outputted at each step of a Pig script -- Key: PIG-1436 URL: https://issues.apache.org/jira/browse/PIG-1436 Project: Pig Issue Type: New Feature Components: grunt Affects Versions: 0.7.0 Reporter: Russell Jurney Priority: Minor Fix For: 0.8.0 I often run a script multiple times, or have to go and look through Hadoop task logs, to figure out where I broke a long script in such a way that I get 0 records out of it. I think this is a common problem. If someone can point me in the right direction, I can make a pass at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1437) [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct
[Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct - Key: PIG-1437 URL: https://issues.apache.org/jira/browse/PIG-1437 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Priority: Minor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1437) [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct
[ https://issues.apache.org/jira/browse/PIG-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1437: -- Release Note: (was: Its possible to rewrite queries like this {code} A = load 'data' as (name,age); B = group A by (name,age); C = foreach B generate group.name, group.age; dump C; {code} or {code} (name,age); B = group A by (name A = load 'data' as,age); C = foreach B generate flatten(group); dump C; {code} to {code} A = load 'data' as (name,age); B = distinct A; dump B; {code} This could only be done if no columns within the bags are referenced subsequently in the script. Since in Pig-Hadoop world DISTINCT will be executed more effeciently then group-by this will be a huge win. ) Description: Its possible to rewrite queries like this {code} A = load 'data' as (name,age); B = group A by (name,age); C = foreach B generate group.name, group.age; dump C; {code} or {code} (name,age); B = group A by (name A = load 'data' as,age); C = foreach B generate flatten(group); dump C; {code} to {code} A = load 'data' as (name,age); B = distinct A; dump B; {code} This could only be done if no columns within the bags are referenced subsequently in the script. Since in Pig-Hadoop world DISTINCT will be executed more effeciently then group-by this will be a huge win. [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct - Key: PIG-1437 URL: https://issues.apache.org/jira/browse/PIG-1437 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Priority: Minor Its possible to rewrite queries like this {code} A = load 'data' as (name,age); B = group A by (name,age); C = foreach B generate group.name, group.age; dump C; {code} or {code} (name,age); B = group A by (name A = load 'data' as,age); C = foreach B generate flatten(group); dump C; {code} to {code} A = load 'data' as (name,age); B = distinct A; dump B; {code} This could only be done if no columns within the bags are referenced subsequently in the script. Since in Pig-Hadoop world DISTINCT will be executed more effeciently then group-by this will be a huge win. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.