[jira] Commented: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-23 Thread Sriranjan Manjunath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734878#action_12734878
 ] 

Sriranjan Manjunath commented on PIG-792:
-

I have fixed the issue with a nullpointerexception when schema was specified as 
part of load. It was a bug in rewire of LOJoin. The current patch is the latest 
one and has no known issues.

> PERFORMANCE: Support skewed join in pig
> ---
>
> Key: PIG-792
> URL: https://issues.apache.org/jira/browse/PIG-792
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: skewedjoin.patch
>
>
> Fragmented replicated join has a few limitations:
>  - One of the tables needs to be loaded into memory
>  - Join is limited to two tables
> Skewed join partitions the table and joins the records in the reduce phase. 
> It computes a histogram of the key space to account for skewing in the input 
> records. Further, it adjusts the number of reducers depending on the key 
> distribution.
> We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-23 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Status: Patch Available  (was: Open)

> PERFORMANCE: Support skewed join in pig
> ---
>
> Key: PIG-792
> URL: https://issues.apache.org/jira/browse/PIG-792
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: skewedjoin.patch
>
>
> Fragmented replicated join has a few limitations:
>  - One of the tables needs to be loaded into memory
>  - Join is limited to two tables
> Skewed join partitions the table and joins the records in the reduce phase. 
> It computes a histogram of the key space to account for skewing in the input 
> records. Further, it adjusts the number of reducers depending on the key 
> distribution.
> We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-23 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: (was: skewedjoin.patch)

> PERFORMANCE: Support skewed join in pig
> ---
>
> Key: PIG-792
> URL: https://issues.apache.org/jira/browse/PIG-792
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: skewedjoin.patch
>
>
> Fragmented replicated join has a few limitations:
>  - One of the tables needs to be loaded into memory
>  - Join is limited to two tables
> Skewed join partitions the table and joins the records in the reduce phase. 
> It computes a histogram of the key space to account for skewing in the input 
> records. Further, it adjusts the number of reducers depending on the key 
> distribution.
> We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-23 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: skewedjoin.patch

> PERFORMANCE: Support skewed join in pig
> ---
>
> Key: PIG-792
> URL: https://issues.apache.org/jira/browse/PIG-792
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: skewedjoin.patch
>
>
> Fragmented replicated join has a few limitations:
>  - One of the tables needs to be loaded into memory
>  - Join is limited to two tables
> Skewed join partitions the table and joins the records in the reduce phase. 
> It computes a histogram of the key space to account for skewing in the input 
> records. Further, it adjusts the number of reducers depending on the key 
> distribution.
> We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-23 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Status: Open  (was: Patch Available)

> PERFORMANCE: Support skewed join in pig
> ---
>
> Key: PIG-792
> URL: https://issues.apache.org/jira/browse/PIG-792
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: skewedjoin.patch
>
>
> Fragmented replicated join has a few limitations:
>  - One of the tables needs to be loaded into memory
>  - Join is limited to two tables
> Skewed join partitions the table and joins the records in the reduce phase. 
> It computes a histogram of the key space to account for skewing in the input 
> records. Further, it adjusts the number of reducers depending on the key 
> distribution.
> We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-773) Empty complex constants (empty bag, empty tuple and empty map) should be supported

2009-07-23 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-773:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch has been committed. Thanks for the fix Ashutosh.

> Empty complex constants (empty bag, empty tuple and empty map) should be 
> supported
> --
>
> Key: PIG-773
> URL: https://issues.apache.org/jira/browse/PIG-773
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: pig-773.patch, pig-773_v2.patch, pig-773_v3.patch, 
> pig-773_v4.patch, pig-773_v5.patch
>
>
> We should be able to create empty bag constant using {}, empty tuple constant 
> using (), empty map constant using [] within a pig script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-773) Empty complex constants (empty bag, empty tuple and empty map) should be supported

2009-07-23 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734810#action_12734810
 ] 

Santhosh Srinivasan commented on PIG-773:
-

+ 1 for the changes.

> Empty complex constants (empty bag, empty tuple and empty map) should be 
> supported
> --
>
> Key: PIG-773
> URL: https://issues.apache.org/jira/browse/PIG-773
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: pig-773.patch, pig-773_v2.patch, pig-773_v3.patch, 
> pig-773_v4.patch, pig-773_v5.patch
>
>
> We should be able to create empty bag constant using {}, empty tuple constant 
> using (), empty map constant using [] within a pig script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-892) Make COUNT and AVG deal with nulls accordingly with SQL standar

2009-07-23 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734806#action_12734806
 ] 

Santhosh Srinivasan commented on PIG-892:
-

1. Index: src/org/apache/pig/builtin/FloatAvg.java
===

The size of 't' is not checked before t.get(0) in the method count


{code}
+if (t != null && t.get(0) != null)
+cnt++;
+}
{code}

2. Index: src/org/apache/pig/builtin/IntAvg.java
===

Same comment as FloatAvg.java

3. Index: src/org/apache/pig/builtin/DoubleAvg.java
===

Same comment as FloatAvg.java

4. Index: src/org/apache/pig/builtin/AVG.java
===

Same comment as FloatAvg.java

5. Index: src/org/apache/pig/builtin/LongAvg.java
===

Same comment as FloatAvg.java


6. Index: src/org/apache/pig/builtin/COUNT_STAR.java
===

I am not sure about the naming convention here. None of the built-in functions 
have a special character in the class name. COUNTSTAR would be better than 
COUNT_STAR.


> Make COUNT and AVG deal with nulls accordingly with SQL standar
> ---
>
> Key: PIG-892
> URL: https://issues.apache.org/jira/browse/PIG-892
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Fix For: 0.4.0
>
> Attachments: PIG-892.patch, PIG-892_v2.patch
>
>
> both COUNT and AVG need to ignore nulls. Also add COUNT_STAR to match 
> COUNT(*) in SQL

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-23 Thread Sriranjan Manjunath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734786#action_12734786
 ] 

Sriranjan Manjunath commented on PIG-792:
-

Ashutosh has discovered a bug with the patch. Fixing it right now. I will have 
more details soon

> PERFORMANCE: Support skewed join in pig
> ---
>
> Key: PIG-792
> URL: https://issues.apache.org/jira/browse/PIG-792
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: skewedjoin.patch
>
>
> Fragmented replicated join has a few limitations:
>  - One of the tables needs to be loaded into memory
>  - Join is limited to two tables
> Skewed join partitions the table and joins the records in the reduce phase. 
> It computes a histogram of the key space to account for skewing in the input 
> records. Further, it adjusts the number of reducers depending on the key 
> distribution.
> We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-812) COUNT(*) does not work

2009-07-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734694#action_12734694
 ] 

Olga Natkovich commented on PIG-812:


Ben, thanks for updating the docs. A couple of comments/suggestions:

(1) In Star expression section, I think it would be helpful to explain the 
difference between *  in Pig and SQL in more details.
(2) Boolean, tuple, field, and general expression sections seems a little brief 
and I am not sure they add much to the user's understanding of the language. 
Perhaps examples would be helpful?
(3) Description of map dereferencing has key while the Symbol column says 
'key'. I think that's confusing. 
(4) The flatten description for a bag is not very clear and I also think has a 
typo: ({(b,c),(d,e)}) - I think the parenthesis are wrong - I think you meant  
to have a bag with a tuple that contains other tuples, right?
(5) Group vs. Cogroup - I think we should put all the information under 
COUGROUP because we always sold that as the general case and GROUP as "alias" 
for 1 relation case. 



> COUNT(*) does not work 
> ---
>
> Key: PIG-812
> URL: https://issues.apache.org/jira/browse/PIG-812
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Viraj Bhat
>Assignee: Benjamin Reed
> Fix For: 0.2.0
>
> Attachments: PIG-812.patch, PIG-812.pdf, studenttab10k
>
>
> Pig script to count the number of rows in a studenttab10k file which contains 
> 10k records.
> {code}
> studenttab = LOAD 'studenttab10k' AS (name:chararray, age:int,gpa:float);
> X2 = GROUP studenttab ALL;
> describe X2;
> Y2 = FOREACH X2 GENERATE COUNT(*);
> explain Y2;
> DUMP Y2;
> {code}
> returns the following error
> 
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator 
> for alias Y2
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1242783700970.log
> 
> If you look at the log file:
> 
> Caused by: java.lang.ClassCastException
> at org.apache.pig.builtin.COUNT$Initial.exec(COUNT.java:76)
> at org.apache.pig.builtin.COUNT$Initial.exec(COUNT.java:68)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:223)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:245)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:236)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:88)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Is it a bug ?

2009-07-23 Thread Alan Gates
It looks wrong to me, but I don't have a deep understanding of that  
code.


Alan.

On Jul 15, 2009, at 6:03 PM, zhang jianfeng wrote:


Hi all,



Today, when I read the source code, I find a piece of suspicious code:
(PigServer.java Line 1047)



   graph.ignoreNumStores = processedStores;//  I think  
here

should be graph.ignoreNumStores = ignoreNumStores

   graph.processedStores = processedStores;

   graph.fileNameMap = fileNameMap;



I think this may be a typing mistake. Can anyone confirm it ?



Thank you.





Jeff Zhang