[jira] Commented: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)
[ https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803366#action_12803366 ] Alan Gates commented on PIG-1166: - In testing this we found that forces a recompile of all UDFs. We initially thought that it would not, but it does. Since that's a backward incompatibility I'm going to revert the change for now, while we discuss when we want to check this in. I'm not saying we won't check it in, but we want to decide on when to inflict this pain on the users. A bit change of the interface of Tuple DataBag ( make the set and append method return this) -- Key: PIG-1166 URL: https://issues.apache.org/jira/browse/PIG-1166 Project: Pig Issue Type: Improvement Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor Attachments: Pig_1166.patch When people write unit test for UDF, they always need to build a tuple or bag. If we change the interface of Tuple and DataBag, make the set and append method return this, it can decrease the code size. e.g. Now people have to write the following code to build a Tuple: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0); tuple.set(1,item_1); tuple.set(2,item_2); {code} If we change the interface, make the set and append method return this, we can rewrite the above code like this: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0).set(1,item_1).set(2,item_2); {code} This interface change won't have back compatibility problem and I think there's no performance problem too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)
[ https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803378#action_12803378 ] Dmitriy V. Ryaboy commented on PIG-1166: Aren't we looking at inflicting major pain with 0.7 anyway? A bit change of the interface of Tuple DataBag ( make the set and append method return this) -- Key: PIG-1166 URL: https://issues.apache.org/jira/browse/PIG-1166 Project: Pig Issue Type: Improvement Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor Attachments: Pig_1166.patch When people write unit test for UDF, they always need to build a tuple or bag. If we change the interface of Tuple and DataBag, make the set and append method return this, it can decrease the code size. e.g. Now people have to write the following code to build a Tuple: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0); tuple.set(1,item_1); tuple.set(2,item_2); {code} If we change the interface, make the set and append method return this, we can rewrite the above code like this: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0).set(1,item_1).set(2,item_2); {code} This interface change won't have back compatibility problem and I think there's no performance problem too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later
[ https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803429#action_12803429 ] Pradeep Kamath commented on PIG-1184: - +1 PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later -- Key: PIG-1184 URL: https://issues.apache.org/jira/browse/PIG-1184 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1184-1.patch The following script : {noformat} -e a = load 'input.txt' as (f1:chararray, f2:chararray, f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, \$4; dump b; {noformat} gives the following result: (oiue,M,10) {noformat} cat input.txt: oiueM {(3),(4)} {(toronto),(montreal)} {noformat} If PruneColumns optimizations is disabled, we get the right result: (oiue,M,10) (oiue,M,10) (oiue,M,10) (oiue,M,10) The flatten results in 4 records - so the output should contain 4 records. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)
[ https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803435#action_12803435 ] Alan Gates commented on PIG-1166: - Alright, I've reverted the patch. bq. Aren't we looking at inflicting major pain with 0.7 anyway? Yes, but so far only on load and store function writers. I don't believe we've done anything to force eval func writers (which vastly outnumber load and store func writers) to recompile their code. This is tied to PIG-1017, which I'd love to check in since it will improve memory management but will also inflict pain (and more than just a recompile) on eval func writers. We need to decide whether to inflict pain on both groups in 0.7 or just on load and store writers in 0.7 and on eval func writers at some later point. I also think we need criteria for deciding when we do and don't break backwards compatibility. I'll start a thread on this on the pig-dev list. A bit change of the interface of Tuple DataBag ( make the set and append method return this) -- Key: PIG-1166 URL: https://issues.apache.org/jira/browse/PIG-1166 Project: Pig Issue Type: Improvement Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor Attachments: Pig_1166.patch When people write unit test for UDF, they always need to build a tuple or bag. If we change the interface of Tuple and DataBag, make the set and append method return this, it can decrease the code size. e.g. Now people have to write the following code to build a Tuple: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0); tuple.set(1,item_1); tuple.set(2,item_2); {code} If we change the interface, make the set and append method return this, we can rewrite the above code like this: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0).set(1,item_1).set(2,item_2); {code} This interface change won't have back compatibility problem and I think there's no performance problem too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1199) help includes obsolete options
[ https://issues.apache.org/jira/browse/PIG-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803449#action_12803449 ] Alan Gates commented on PIG-1199: - I think both -cluster/-c and -jar/-j don't work anymore. help includes obsolete options -- Key: PIG-1199 URL: https://issues.apache.org/jira/browse/PIG-1199 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Olga Natkovich Fix For: 0.7.0 This is confusing to users -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1189: Status: Patch Available (was: Open) StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803464#action_12803464 ] Olga Natkovich commented on PIG-1189: - + 1 on the code assuming that test-patch does not generate any warnings StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: multimapstore.pig, multireducestore.pig, PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1192) Pig 0.6 doc updates and changes
[ https://issues.apache.org/jira/browse/PIG-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1192: Resolution: Fixed Status: Resolved (was: Patch Available) patch committed to both the trunk and 0.6.0 branch Pig 0.6 doc updates and changes --- Key: PIG-1192 URL: https://issues.apache.org/jira/browse/PIG-1192 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Priority: Blocker Fix For: 0.6.0 Attachments: pig-1192.patch Made the following changes. CONTENT 1. Add info about :: operator (Pig Latin doc) 2. Updated info about DEFINE and Auto-Ship (Pig Latin doc) 3. Updated info about PARALLEL (Pig Latin and Cookbook docs) 3. Updated info about JOIN (inner, outer) (Pig Latin doc) 4. Updated info about SET (Pig Latin doc) 5. Updated info about Modulor operator (Pig Latin doc) 6. Add link from Pig docs to Zebra/Pig doc (Pig User Guide) 7. Updated info about schemas and type casting in Zeba/Pig doc (Zebra Users and Zebra Pig docs) 8. Removed duplicate topics in Pig User Guide (parallel and performance) - these topics discussed in Pig Cookbook DOC FILES 1. Deletes Pig User Guide; Adds Pig Latin Ref Manual 1 2. Deletes Pig Latin Ref Manual; Adds Pig Latin Ref Manual 2 3. Updates doc links -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1192) Pig 0.6 doc updates and changes
[ https://issues.apache.org/jira/browse/PIG-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803541#action_12803541 ] Corinne Chandel commented on PIG-1192: -- Thanks Olga! Pig 0.6 doc updates and changes --- Key: PIG-1192 URL: https://issues.apache.org/jira/browse/PIG-1192 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Priority: Blocker Fix For: 0.6.0 Attachments: pig-1192.patch Made the following changes. CONTENT 1. Add info about :: operator (Pig Latin doc) 2. Updated info about DEFINE and Auto-Ship (Pig Latin doc) 3. Updated info about PARALLEL (Pig Latin and Cookbook docs) 3. Updated info about JOIN (inner, outer) (Pig Latin doc) 4. Updated info about SET (Pig Latin doc) 5. Updated info about Modulor operator (Pig Latin doc) 6. Add link from Pig docs to Zebra/Pig doc (Pig User Guide) 7. Updated info about schemas and type casting in Zeba/Pig doc (Zebra Users and Zebra Pig docs) 8. Removed duplicate topics in Pig User Guide (parallel and performance) - these topics discussed in Pig Cookbook DOC FILES 1. Deletes Pig User Guide; Adds Pig Latin Ref Manual 1 2. Deletes Pig Latin Ref Manual; Adds Pig Latin Ref Manual 2 3. Updates doc links -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803565#action_12803565 ] Alan Gates commented on PIG-1178: - Comments on lp.patch: 1) In LOJoin.getSchema, these lines of code: {code} for (Operator op : inputs) { LogicalSchema inputSchema = ((LogicalRelationalOperator)op).getSchema(); // the schema of one input is unknown, so the join schema is unknown, just return if (inputSchema == null) { schemaSet = true; return schema; } {code} You are assuming that schema is null. It would be better to explicitly set schema to null and then return it. 2) In SplitFilter.transform you put it in a while loop, finding each 'and' and splitting it into another filter. But there's already an outer while loop (the one in the optimizer applying the rule over and over) that will do that. One of the assertions in this design is that each rule should be as simple as possible. This rule should just split one and, and let the next application of the rule find the next and and split it again. Same comment applies to MergeFilter.transform and to PushUpFilterTransformer.check and .transform. 3) In MergeFilter.check: IIRC implicit splits aren't inserted into the plan until the logical to physical transformation. So it's possible that a filter actually has multiple successors. So instead of: {code} if (succeds != null succeds.size()0) { if (succeds.get(0) instanceof LOFilter) { return true; } } {code} it should read {code} if (succeds != null succeds.size() == 1) { if (succeds.get(0) instanceof LOFilter) { return true; } } {code} 4) In MergeFilter.combineFilterCond: The expressions have been written in such a way that they manage their own connections when they are created. See for example, AndExpression. In its constructor it takes it add itself to the expression plan and connects itself to its two operands. So there no need to to do the addPlan.add and addPlan.connect calls. 5) In PushUpFilterTransformer.check, you need to check that the join type is inner. Pushing past outer joins is much trickier, and need not be handled here. 6) In PushUpFilterTransformer.check I don't understand what findCommon is doing. In any case, it should not be paying attention to aliases. It should be using the inputNums from the projection. It should be checking that all projections in the filter are associated with the same inputNum. If so, it is pushable to that inputNum. If not, not. In the same way transform should be using inputNum to find the right predecessor, not aliases. 7) We need a fourth rule to handle swapping filters, so each one can be tried against the join. Since this rule will always pass check (it would just be two filters in a row) we need a way to check that it doesn't run more than twice for a given pair of filters. We can accomplish this by having it 'sign' each filter in the node each time it is applied. This is what the annotate call on Operator is for. So each time the transform is applied, it would annotate both filters with info that it was applied, and to which filters. Then part of check can be two check that this rule has been applied at most twice. LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions-2.patch, expressions.patch, lp.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-trunk #668
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/668/changes Changes: [gates] PIG-1166 Reverting this change pending further discussion of when we want to break UDF interfaces. -- [...truncated 240640 lines...] [junit] 10/01/22 02:26:10 INFO datanode.DataNode: PacketResponder 2 for block blk_-659169873293012232_1015 terminating [junit] 10/01/22 02:26:10 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:47623 is added to blk_-659169873293012232_1015 size 48817 [junit] 10/01/22 02:26:10 INFO hdfs.StateChange: DIR* NameSystem.completeFile: file /tmp/hadoop-hudson/mapred/system/job_20100122022538265_0002/job.xml is closed by DFSClient_-43967928 [junit] 10/01/22 02:26:10 INFO FSNamesystem.audit: ugi=hudson,hudson ip=/127.0.0.1 cmd=open src=/tmp/hadoop-hudson/mapred/system/job_20100122022538265_0002/job.xml dst=nullperm=null [junit] 10/01/22 02:26:10 INFO DataNode.clienttrace: src: /127.0.0.1:47623, dest: /127.0.0.1:41306, bytes: 49201, op: HDFS_READ, cliID: DFSClient_-43967928, srvID: DS-569646143-127.0.1.1-47623-1264127137287, blockid: blk_-659169873293012232_1015 [junit] 10/01/22 02:26:10 INFO FSNamesystem.audit: ugi=hudson,hudson ip=/127.0.0.1 cmd=open src=/tmp/hadoop-hudson/mapred/system/job_20100122022538265_0002/job.jar dst=nullperm=null [junit] 10/01/22 02:26:10 INFO DataNode.clienttrace: src: /127.0.0.1:47623, dest: /127.0.0.1:41307, bytes: 2744208, op: HDFS_READ, cliID: DFSClient_-43967928, srvID: DS-569646143-127.0.1.1-47623-1264127137287, blockid: blk_-6012860814043767958_1013 [junit] 10/01/22 02:26:10 INFO mapred.JobTracker: Initializing job_20100122022538265_0002 [junit] 10/01/22 02:26:10 INFO mapred.JobInProgress: Initializing job_20100122022538265_0002 [junit] 10/01/22 02:26:10 INFO FSNamesystem.audit: ugi=hudson,hudson ip=/127.0.0.1 cmd=create src=/tmp/temp-788732582/tmp475852586/_logs/history/localhost_1264127138288_job_20100122022538265_0002_hudson_Job4287868254605455320.jar dst=nullperm=hudson:supergroup:rw-r--r-- [junit] 10/01/22 02:26:10 INFO FSNamesystem.audit: ugi=hudson,hudson ip=/127.0.0.1 cmd=create src=/tmp/temp-788732582/tmp475852586/_logs/history/localhost_1264127138288_job_20100122022538265_0002_conf.xml dst=nullperm=hudson:supergroup:rw-r--r-- [junit] 10/01/22 02:26:10 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/temp-788732582/tmp475852586/_logs/history/localhost_1264127138288_job_20100122022538265_0002_conf.xml. blk_-2274710091252665892_1017 [junit] 10/01/22 02:26:10 INFO datanode.DataNode: Receiving block blk_-2274710091252665892_1017 src: /127.0.0.1:55255 dest: /127.0.0.1:53069 [junit] 10/01/22 02:26:10 INFO datanode.DataNode: Receiving block blk_-2274710091252665892_1017 src: /127.0.0.1:41309 dest: /127.0.0.1:47623 [junit] 10/01/22 02:26:10 INFO datanode.DataNode: Receiving block blk_-2274710091252665892_1017 src: /127.0.0.1:50113 dest: /127.0.0.1:5 [junit] 10/01/22 02:26:10 INFO DataNode.clienttrace: src: /127.0.0.1:50113, dest: /127.0.0.1:5, bytes: 48847, op: HDFS_WRITE, cliID: DFSClient_-43967928, srvID: DS-754302904-127.0.1.1-5-1264127137748, blockid: blk_-2274710091252665892_1017 [junit] 10/01/22 02:26:10 INFO datanode.DataNode: PacketResponder 0 for block blk_-2274710091252665892_1017 terminating [junit] 10/01/22 02:26:10 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:5 is added to blk_-2274710091252665892_1017 size 48847 [junit] 10/01/22 02:26:10 INFO DataNode.clienttrace: src: /127.0.0.1:41309, dest: /127.0.0.1:47623, bytes: 48847, op: HDFS_WRITE, cliID: DFSClient_-43967928, srvID: DS-569646143-127.0.1.1-47623-1264127137287, blockid: blk_-2274710091252665892_1017 [junit] 10/01/22 02:26:10 INFO datanode.DataNode: PacketResponder 1 for block blk_-2274710091252665892_1017 terminating [junit] 10/01/22 02:26:10 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:47623 is added to blk_-2274710091252665892_1017 size 48847 [junit] 10/01/22 02:26:10 INFO DataNode.clienttrace: src: /127.0.0.1:55255, dest: /127.0.0.1:53069, bytes: 48847, op: HDFS_WRITE, cliID: DFSClient_-43967928, srvID: DS-1215067469-127.0.1.1-53069-1264127136807, blockid: blk_-2274710091252665892_1017 [junit] 10/01/22 02:26:10 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:53069 is added to blk_-2274710091252665892_1017 size 48847 [junit] 10/01/22 02:26:10 INFO datanode.DataNode: PacketResponder 2 for block blk_-2274710091252665892_1017 terminating [junit] 10/01/22 02:26:10 INFO hdfs.StateChange: DIR* NameSystem.completeFile: file /tmp/temp-788732582/tmp475852586/_logs/history/localhost_1264127138288_job_20100122022538265_0002_conf.xml is closed