[jira] Commented: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880425#action_12880425 ] Hadoop QA commented on PIG-1453: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12447494/PIG-1453.patch against trunk revision 955763. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 36 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/331/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/331/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/331/console This message is automatically generated. > [zebra] Intermittent failure for TestOrderPreserveUnionHDFS > --- > > Key: PIG-1453 > URL: https://issues.apache.org/jira/browse/PIG-1453 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Yan Zhou > Fix For: 0.8.0 > > Attachments: PIG-1453.patch, PIG-1453.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880421#action_12880421 ] Hadoop QA commented on PIG-1405: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12447492/StandardUDFtoPig4.patch against trunk revision 955763. -1 @author. The patch appears to contain 2 @author tags which the Pig community has agreed to not allow in code contributions. +1 tests included. The patch appears to include 5 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/343/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/343/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/343/console This message is automatically generated. > Need to move many standard functions from piggybank into Pig > > > Key: PIG-1405 > URL: https://issues.apache.org/jira/browse/PIG-1405 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Aniket Mokashi > Fix For: 0.8.0 > > Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, > StandardUDFtoPig4.patch > > > There are currently a number of functions in Piggybank that represent > features commonly supported by languages and database engines. We need to > decide which of these Pig should support as built in functions and put them > in org.apache.pig.builtin. This will also mean adding unit tests and > javadocs for some UDFs. The existing classes will be left in Piggybank for > some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1453: -- Status: Patch Available (was: Open) > [zebra] Intermittent failure for TestOrderPreserveUnionHDFS > --- > > Key: PIG-1453 > URL: https://issues.apache.org/jira/browse/PIG-1453 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Yan Zhou > Fix For: 0.8.0 > > Attachments: PIG-1453.patch, PIG-1453.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1453: -- Status: Open (was: Patch Available) > [zebra] Intermittent failure for TestOrderPreserveUnionHDFS > --- > > Key: PIG-1453 > URL: https://issues.apache.org/jira/browse/PIG-1453 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Yan Zhou > Fix For: 0.8.0 > > Attachments: PIG-1453.patch, PIG-1453.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1453: -- Attachment: PIG-1453.patch > [zebra] Intermittent failure for TestOrderPreserveUnionHDFS > --- > > Key: PIG-1453 > URL: https://issues.apache.org/jira/browse/PIG-1453 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Yan Zhou > Fix For: 0.8.0 > > Attachments: PIG-1453.patch, PIG-1453.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1405: Status: Open (was: Patch Available) > Need to move many standard functions from piggybank into Pig > > > Key: PIG-1405 > URL: https://issues.apache.org/jira/browse/PIG-1405 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Aniket Mokashi > Fix For: 0.8.0 > > Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, > StandardUDFtoPig4.patch > > > There are currently a number of functions in Piggybank that represent > features commonly supported by languages and database engines. We need to > decide which of these Pig should support as built in functions and put them > in org.apache.pig.builtin. This will also mean adding unit tests and > javadocs for some UDFs. The existing classes will be left in Piggybank for > some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1405: Status: Patch Available (was: Open) > Need to move many standard functions from piggybank into Pig > > > Key: PIG-1405 > URL: https://issues.apache.org/jira/browse/PIG-1405 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Aniket Mokashi > Fix For: 0.8.0 > > Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, > StandardUDFtoPig4.patch > > > There are currently a number of functions in Piggybank that represent > features commonly supported by languages and database engines. We need to > decide which of these Pig should support as built in functions and put them > in org.apache.pig.builtin. This will also mean adding unit tests and > javadocs for some UDFs. The existing classes will be left in Piggybank for > some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-1405: Attachment: StandardUDFtoPig4.patch fixed findbugs error javac errors were due to having COR and COV implement serializable, removed those as pig doesnt need it test failures doesn't seem to be related to these code changes. > Need to move many standard functions from piggybank into Pig > > > Key: PIG-1405 > URL: https://issues.apache.org/jira/browse/PIG-1405 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Aniket Mokashi > Fix For: 0.8.0 > > Attachments: StandardUDFtoPig.patch, StandardUDFtoPig3.patch, > StandardUDFtoPig4.patch > > > There are currently a number of functions in Piggybank that represent > features commonly supported by languages and database engines. We need to > decide which of these Pig should support as built in functions and put them > in org.apache.pig.builtin. This will also mean adding unit tests and > javadocs for some UDFs. The existing classes will be left in Piggybank for > some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1460) UDF manual and javadocs should make clear how to use RequiredFieldList
[ https://issues.apache.org/jira/browse/PIG-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1460: Fix Version/s: 0.8.0 > UDF manual and javadocs should make clear how to use RequiredFieldList > -- > > Key: PIG-1460 > URL: https://issues.apache.org/jira/browse/PIG-1460 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.7.0 >Reporter: Alan Gates >Priority: Minor > Fix For: 0.8.0 > > > The UDF manual mentions that load function writers need to handle > RequiredFieldList passed to LoadPushDown.pushProjection, but it does not > specify how the writer should interpret the contents of that list. The > javadoc is similarly vague. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1460) UDF manual and javadocs should make clear how to use RequiredFieldList
[ https://issues.apache.org/jira/browse/PIG-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880299#action_12880299 ] Alan Gates commented on PIG-1460: - >From email thread on the pig-user list: {quote} The documentation is also poor when it comes to describing what the RequiredFieldList even is. It has a name and an index field. The code itself seems to allow for either of these to be filled. What do they mean? Is it: the schema returned by the loader is: (id: int, name: chararray, department: chararray) The RequiredFieldList is [ ("department", 1) , ("id", 0) ] What does that mean? * The name is the field name requested, and the index is the location it should be in the result? so return (id: int, department: chararray)? * The index is the index in the source schema, and the name is for renaming, so return (department: chararray, id: int) (where the data in department is actualy that from the original's name field)? * The location in the RequiredFieldList array is the 'destination' requested, the name is optional (if the schema had one) and the index is the location in the original schema. so the above RequiredFieldList is actually impossible, since "department" is always index 2. {quote} The last is the correct answer. > UDF manual and javadocs should make clear how to use RequiredFieldList > -- > > Key: PIG-1460 > URL: https://issues.apache.org/jira/browse/PIG-1460 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.7.0 >Reporter: Alan Gates >Priority: Minor > > The UDF manual mentions that load function writers need to handle > RequiredFieldList passed to LoadPushDown.pushProjection, but it does not > specify how the writer should interpret the contents of that list. The > javadoc is similarly vague. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1460) UDF manual and javadocs should make clear how to use RequiredFieldList
UDF manual and javadocs should make clear how to use RequiredFieldList -- Key: PIG-1460 URL: https://issues.apache.org/jira/browse/PIG-1460 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.7.0 Reporter: Alan Gates Priority: Minor The UDF manual mentions that load function writers need to handle RequiredFieldList passed to LoadPushDown.pushProjection, but it does not specify how the writer should interpret the contents of that list. The javadoc is similarly vague. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1459) Need a standard way to communicate the requested fields between front and back end for loaders
[ https://issues.apache.org/jira/browse/PIG-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880293#action_12880293 ] Alan Gates commented on PIG-1459: - >From email thread on pig-user: I'm trying to figure out how exactly to appropriately implement the LoadPushDown interface in my LoadFunc implementation. I need to take the list of column aliases and pass that from the LoadPushDown.pushProjection(RequiredFieldList) function to make it available in the getTuple function. I'm kind of new to this so forgive me if this is obvious. From my readings of the mailing list it appears that the pushProjection function is called in the front-end where as the getTuple function is called in the back-end. How does a LoanFunc pass information from the front to the back end instances? regards, Andrew I wish there was better documentation on that too. Looking at the PigStorage code, it serializes an array of Booleans via UDFContext to the backend. It would be significantly better if Pig serialized the requested fields for us, provided that pushProjection returned a code that indicated that the projection would be supported. Forcing users to do that serialization themselves is bug prone, especially in the presence of nested schemas. > Need a standard way to communicate the requested fields between front and > back end for loaders > -- > > Key: PIG-1459 > URL: https://issues.apache.org/jira/browse/PIG-1459 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Alan Gates > > Pig currently provides no mechanism for loader writers to communicate which > fields have been requested between the front and back end. Since any loader > that accepts pushed projections has to deal with this issue it would make > sense for Pig to provide a standard mechanism for it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1459) Need a standard way to communicate the requested fields between front and back end for loaders
Need a standard way to communicate the requested fields between front and back end for loaders -- Key: PIG-1459 URL: https://issues.apache.org/jira/browse/PIG-1459 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Alan Gates Pig currently provides no mechanism for loader writers to communicate which fields have been requested between the front and back end. Since any loader that accepts pushed projections has to deal with this issue it would make sense for Pig to provide a standard mechanism for it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: skew join in pig
Are you asking how many reducers are used to split a hot key? If so, the answer is as many as we estimate it will take to make the the records for the key fit into memory. For example, if we have a key which we estimate has 10 million records, each record being about 100 bytes and for each reduce task we have 400M available, then we will allocate 3 reducers for that hot key. We do not need to take into account any other keys sent to this reducer because reducers process rows one key at a time. Alan. On Jun 16, 2010, at 11:51 AM, Gang Luo wrote: Thanks for replying. It is much clear now. One more thing to ask about the third question is, how to allocate reducers to several hot keys? Hashing? Further, Pig doesn't divide the reducers into hot-key reducers and non-hot-key reducers, is it right? Thanks, -Gang - 原始邮件 发件人: Alan Gates 收件人: pig-dev@hadoop.apache.org 发送日期: 2010/6/16 (周三) 12:16:13 下午 主 题: Re: skew join in pig On Jun 16, 2010, at 8:36 AM, Gang Luo wrote: Hi, there is something confusing me in the skew join (http://wiki.apache.org/pig/PigSkewedJoinSpec ) 1. does the sampling job sample and build histogram on both tables, or just one table (in this case, which one) ? Just the left one. 2. the join job still take the two table as inputs, and shuffle tuples from partitioned table to particular reducer (one tuple to one reducer), and shuffle tuples from streamed table to all reducers associative to one partition (one tuple to multiple reducers). Is that correct? Keys with small enough values to fit in memory are shuffled to reducers as normal. Keys that are too large are split between reducers on the left side, and replicated to all of those reducers that have the splits (not all reducers) on the right side. Does that answer your question? 3. Hot keys need more than one reducers. Are these reducers dedicated to this key only? Could they also take other keys at the same time? They take other keys at the same time. 4. for non-hot keys, my understanding is that they are shuffled to reducers based on default hash partitioner. However, it could happen all the keys shuffled to one reducers incurs skew even none of them is skewed individually. This is always the case in map reduce, though a good hash function should minimize the occurrences of this. Can someone give me some ideas on these? Thanks. -Gang Alan.
[jira] Commented: (PIG-1221) Filter equality does not work for tuples
[ https://issues.apache.org/jira/browse/PIG-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880271#action_12880271 ] Alan Gates commented on PIG-1221: - +1 > Filter equality does not work for tuples > > > Key: PIG-1221 > URL: https://issues.apache.org/jira/browse/PIG-1221 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 > Environment: Windows and Linux. Java 1.6 hadoop 0.20.1 >Reporter: Neil Blue >Assignee: Jeff Zhang > Fix For: 0.8.0 > > Attachments: PIG_1221.patch > > > From the documentation I understand that it should be possible to filter a > relation based on the equality of tuples. > http://wiki.apache.org/pig/PigTypesFunctionalSpec , > http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#deref: > However with this data file > -- indext.txt: > (1,one) (1,ONE) > (2,two) (22, twentytwo) > (3,three) (3,three) > I run this pig script: > A = LOAD 'indext.txt' AS (t1:(a:int, b:chararray), t2:(a:int, b:chararray)); > B = FILTER A BY t1==t2; DUMP B; > Expecting the output: > ((3,three),(3,three)) > However there is an error: > 2010-02-03 09:05:20,523 [main] ERROR org.apache.pig.tools.grunt.Grunt > - ERROR 2067: EqualToExpr does not know how to handle type: tuple > > Pig Stack Trace > > --- > > ERROR 2067: EqualToExpr does not know how to handle type: tuple > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: > > Unable to > > open iterator for alias B > >at org.apache.pig.PigServer.openIterator(PigServer.java:475) > >at > > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java: > > 532) > >at > > org > > .apache > > .pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser. > > java:190) > >at > > org > > .apache > > .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166 > > ) > >at > > org > > .apache > > .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142 > > ) > >at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > >at org.apache.pig.Main.main(Main.java:397) > > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR > > 1002: > > Unable to store alias B > >at org.apache.pig.PigServer.store(PigServer.java:530) > >at org.apache.pig.PigServer.openIterator(PigServer.java:458) > >... 6 more > > Caused by: org.apache.pig.backend.executionengine.ExecException: > > ERROR 2067: > > EqualToExpr does not know how to handle type: tuple > >at > > org > > .apache > > .pig.backend.hadoop.executionengine.physicalLayer.expressionOperat > > ors.EqualToExpr.getNext(EqualToExpr.java:108) > >at > > org > > .apache > > .pig.backend.hadoop.executionengine.physicalLayer.relationalOperat > > ors.POFilter.getNext(POFilter.java:148) > >at > > org > > .apache > > .pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator > > .processInput(PhysicalOperator.java:231) > >at > > org > > .apache > > .pig.backend.local.executionengine.physicalLayer.counters.POCounte > > r.getNext(POCounter.java:71) > >at > > org > > .apache > > .pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator > > .processInput(PhysicalOperator.java:231) > >at > > org > > .apache > > .pig.backend.hadoop.executionengine.physicalLayer.relationalOperat > > ors.POStore.getNext(POStore.java:117) > >at > > org > > .apache > > .pig.backend.local.executionengine.LocalPigLauncher.runPipeline(Lo > > calPigLauncher.java:146) > >at > > org > > .apache > > .pig.backend.local.executionengine.LocalPigLauncher.launchPig(Loca > > lPigLauncher.java:109) > >at > > org > > .apache > > .pig.backend.local.executionengine.LocalExecutionEngine.execute(Lo > > calExecutionEngine.java:165) > Thanks > Neil -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.