[jira] Commented: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734878#action_12734878 ] Sriranjan Manjunath commented on PIG-792: - I have fixed the issue with a nullpointerexception when schema was specified as part of load. It was a bug in rewire of LOJoin. The current patch is the latest one and has no known issues. > PERFORMANCE: Support skewed join in pig > --- > > Key: PIG-792 > URL: https://issues.apache.org/jira/browse/PIG-792 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: skewedjoin.patch > > > Fragmented replicated join has a few limitations: > - One of the tables needs to be loaded into memory > - Join is limited to two tables > Skewed join partitions the table and joins the records in the reduce phase. > It computes a histogram of the key space to account for skewing in the input > records. Further, it adjusts the number of reducers depending on the key > distribution. > We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Status: Patch Available (was: Open) > PERFORMANCE: Support skewed join in pig > --- > > Key: PIG-792 > URL: https://issues.apache.org/jira/browse/PIG-792 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: skewedjoin.patch > > > Fragmented replicated join has a few limitations: > - One of the tables needs to be loaded into memory > - Join is limited to two tables > Skewed join partitions the table and joins the records in the reduce phase. > It computes a histogram of the key space to account for skewing in the input > records. Further, it adjusts the number of reducers depending on the key > distribution. > We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: (was: skewedjoin.patch) > PERFORMANCE: Support skewed join in pig > --- > > Key: PIG-792 > URL: https://issues.apache.org/jira/browse/PIG-792 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: skewedjoin.patch > > > Fragmented replicated join has a few limitations: > - One of the tables needs to be loaded into memory > - Join is limited to two tables > Skewed join partitions the table and joins the records in the reduce phase. > It computes a histogram of the key space to account for skewing in the input > records. Further, it adjusts the number of reducers depending on the key > distribution. > We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: skewedjoin.patch > PERFORMANCE: Support skewed join in pig > --- > > Key: PIG-792 > URL: https://issues.apache.org/jira/browse/PIG-792 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: skewedjoin.patch > > > Fragmented replicated join has a few limitations: > - One of the tables needs to be loaded into memory > - Join is limited to two tables > Skewed join partitions the table and joins the records in the reduce phase. > It computes a histogram of the key space to account for skewing in the input > records. Further, it adjusts the number of reducers depending on the key > distribution. > We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Status: Open (was: Patch Available) > PERFORMANCE: Support skewed join in pig > --- > > Key: PIG-792 > URL: https://issues.apache.org/jira/browse/PIG-792 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: skewedjoin.patch > > > Fragmented replicated join has a few limitations: > - One of the tables needs to be loaded into memory > - Join is limited to two tables > Skewed join partitions the table and joins the records in the reduce phase. > It computes a histogram of the key space to account for skewing in the input > records. Further, it adjusts the number of reducers depending on the key > distribution. > We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-773) Empty complex constants (empty bag, empty tuple and empty map) should be supported
[ https://issues.apache.org/jira/browse/PIG-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santhosh Srinivasan updated PIG-773: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch has been committed. Thanks for the fix Ashutosh. > Empty complex constants (empty bag, empty tuple and empty map) should be > supported > -- > > Key: PIG-773 > URL: https://issues.apache.org/jira/browse/PIG-773 > Project: Pig > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Ashutosh Chauhan >Priority: Minor > Fix For: 0.4.0 > > Attachments: pig-773.patch, pig-773_v2.patch, pig-773_v3.patch, > pig-773_v4.patch, pig-773_v5.patch > > > We should be able to create empty bag constant using {}, empty tuple constant > using (), empty map constant using [] within a pig script -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-773) Empty complex constants (empty bag, empty tuple and empty map) should be supported
[ https://issues.apache.org/jira/browse/PIG-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734810#action_12734810 ] Santhosh Srinivasan commented on PIG-773: - + 1 for the changes. > Empty complex constants (empty bag, empty tuple and empty map) should be > supported > -- > > Key: PIG-773 > URL: https://issues.apache.org/jira/browse/PIG-773 > Project: Pig > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Ashutosh Chauhan >Priority: Minor > Fix For: 0.4.0 > > Attachments: pig-773.patch, pig-773_v2.patch, pig-773_v3.patch, > pig-773_v4.patch, pig-773_v5.patch > > > We should be able to create empty bag constant using {}, empty tuple constant > using (), empty map constant using [] within a pig script -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-892) Make COUNT and AVG deal with nulls accordingly with SQL standar
[ https://issues.apache.org/jira/browse/PIG-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734806#action_12734806 ] Santhosh Srinivasan commented on PIG-892: - 1. Index: src/org/apache/pig/builtin/FloatAvg.java === The size of 't' is not checked before t.get(0) in the method count {code} +if (t != null && t.get(0) != null) +cnt++; +} {code} 2. Index: src/org/apache/pig/builtin/IntAvg.java === Same comment as FloatAvg.java 3. Index: src/org/apache/pig/builtin/DoubleAvg.java === Same comment as FloatAvg.java 4. Index: src/org/apache/pig/builtin/AVG.java === Same comment as FloatAvg.java 5. Index: src/org/apache/pig/builtin/LongAvg.java === Same comment as FloatAvg.java 6. Index: src/org/apache/pig/builtin/COUNT_STAR.java === I am not sure about the naming convention here. None of the built-in functions have a special character in the class name. COUNTSTAR would be better than COUNT_STAR. > Make COUNT and AVG deal with nulls accordingly with SQL standar > --- > > Key: PIG-892 > URL: https://issues.apache.org/jira/browse/PIG-892 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Olga Natkovich >Assignee: Olga Natkovich > Fix For: 0.4.0 > > Attachments: PIG-892.patch, PIG-892_v2.patch > > > both COUNT and AVG need to ignore nulls. Also add COUNT_STAR to match > COUNT(*) in SQL -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734786#action_12734786 ] Sriranjan Manjunath commented on PIG-792: - Ashutosh has discovered a bug with the patch. Fixing it right now. I will have more details soon > PERFORMANCE: Support skewed join in pig > --- > > Key: PIG-792 > URL: https://issues.apache.org/jira/browse/PIG-792 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: skewedjoin.patch > > > Fragmented replicated join has a few limitations: > - One of the tables needs to be loaded into memory > - Join is limited to two tables > Skewed join partitions the table and joins the records in the reduce phase. > It computes a histogram of the key space to account for skewing in the input > records. Further, it adjusts the number of reducers depending on the key > distribution. > We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-812) COUNT(*) does not work
[ https://issues.apache.org/jira/browse/PIG-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734694#action_12734694 ] Olga Natkovich commented on PIG-812: Ben, thanks for updating the docs. A couple of comments/suggestions: (1) In Star expression section, I think it would be helpful to explain the difference between * in Pig and SQL in more details. (2) Boolean, tuple, field, and general expression sections seems a little brief and I am not sure they add much to the user's understanding of the language. Perhaps examples would be helpful? (3) Description of map dereferencing has key while the Symbol column says 'key'. I think that's confusing. (4) The flatten description for a bag is not very clear and I also think has a typo: ({(b,c),(d,e)}) - I think the parenthesis are wrong - I think you meant to have a bag with a tuple that contains other tuples, right? (5) Group vs. Cogroup - I think we should put all the information under COUGROUP because we always sold that as the general case and GROUP as "alias" for 1 relation case. > COUNT(*) does not work > --- > > Key: PIG-812 > URL: https://issues.apache.org/jira/browse/PIG-812 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.2.0 >Reporter: Viraj Bhat >Assignee: Benjamin Reed > Fix For: 0.2.0 > > Attachments: PIG-812.patch, PIG-812.pdf, studenttab10k > > > Pig script to count the number of rows in a studenttab10k file which contains > 10k records. > {code} > studenttab = LOAD 'studenttab10k' AS (name:chararray, age:int,gpa:float); > X2 = GROUP studenttab ALL; > describe X2; > Y2 = FOREACH X2 GENERATE COUNT(*); > explain Y2; > DUMP Y2; > {code} > returns the following error > > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator > for alias Y2 > Details at logfile: /homes/viraj/pig-svn/trunk/pig_1242783700970.log > > If you look at the log file: > > Caused by: java.lang.ClassCastException > at org.apache.pig.builtin.COUNT$Initial.exec(COUNT.java:76) > at org.apache.pig.builtin.COUNT$Initial.exec(COUNT.java:68) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:223) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:245) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:236) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:88) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Is it a bug ?
It looks wrong to me, but I don't have a deep understanding of that code. Alan. On Jul 15, 2009, at 6:03 PM, zhang jianfeng wrote: Hi all, Today, when I read the source code, I find a piece of suspicious code: (PigServer.java Line 1047) graph.ignoreNumStores = processedStores;// I think here should be graph.ignoreNumStores = ignoreNumStores graph.processedStores = processedStores; graph.fileNameMap = fileNameMap; I think this may be a typing mistake. Can anyone confirm it ? Thank you. Jeff Zhang