[jira] Updated: (PIG-920) optimizing diamond queries

2009-10-30 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-920:
---

   Resolution: Fixed
Fix Version/s: 0.6.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

+1, patch committed, thanks for the contribution Richard!

> optimizing diamond queries
> --
>
> Key: PIG-920
> URL: https://issues.apache.org/jira/browse/PIG-920
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.6.0
>
> Attachments: PIG-920.patch, PIG-920.patch
>
>
> The following query
> A = load 'foo';
> B = filer A by $0>1;
> C = filter A by $1 = 'foo';
> D = COGROUP C by $0, B by $0;
> ..
> does not get efficiently executed. Currently, it runs a map only job that 
> basically reads and write the same data before doing the query processing.
> Query where the data is loaded twice actually executed more efficiently.
> This is not an uncommon query and we should fix this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-920) optimizing diamond queries

2009-10-30 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-920:
-

Attachment: PIG-920.patch

> optimizing diamond queries
> --
>
> Key: PIG-920
> URL: https://issues.apache.org/jira/browse/PIG-920
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Attachments: PIG-920.patch, PIG-920.patch
>
>
> The following query
> A = load 'foo';
> B = filer A by $0>1;
> C = filter A by $1 = 'foo';
> D = COGROUP C by $0, B by $0;
> ..
> does not get efficiently executed. Currently, it runs a map only job that 
> basically reads and write the same data before doing the query processing.
> Query where the data is loaded twice actually executed more efficiently.
> This is not an uncommon query and we should fix this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-920) optimizing diamond queries

2009-10-27 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-920:
-

Attachment: PIG-920.patch

This patch adds optimization for diamond queries as part of multi-query 
optimization.

> optimizing diamond queries
> --
>
> Key: PIG-920
> URL: https://issues.apache.org/jira/browse/PIG-920
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Attachments: PIG-920.patch
>
>
> The following query
> A = load 'foo';
> B = filer A by $0>1;
> C = filter A by $1 = 'foo';
> D = COGROUP C by $0, B by $0;
> ..
> does not get efficiently executed. Currently, it runs a map only job that 
> basically reads and write the same data before doing the query processing.
> Query where the data is loaded twice actually executed more efficiently.
> This is not an uncommon query and we should fix this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-920) optimizing diamond queries

2009-10-27 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-920:
-

Status: Patch Available  (was: Open)

> optimizing diamond queries
> --
>
> Key: PIG-920
> URL: https://issues.apache.org/jira/browse/PIG-920
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Attachments: PIG-920.patch
>
>
> The following query
> A = load 'foo';
> B = filer A by $0>1;
> C = filter A by $1 = 'foo';
> D = COGROUP C by $0, B by $0;
> ..
> does not get efficiently executed. Currently, it runs a map only job that 
> basically reads and write the same data before doing the query processing.
> Query where the data is loaded twice actually executed more efficiently.
> This is not an uncommon query and we should fix this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.