Re: Pig performance

2008-12-31 Thread Alan Gates
This will definitely be done after the merge of types to trunk.  As  
for PIG-273, the changes we need to make are larger than just that.   
Consider, for example:


A = load ...
B = filter ...
store B into 'bla';
C = group B by $0;
...

There's no split explicitly in there, but pig should be able to tee  
the input at the 'store B' and keep going.  So PIG-273 is part of it,  
but I imagine when we start working on it there'll be another JIRA to  
track all the changes, of which PIG-273 will become a sub-task.


Alan.

On Dec 30, 2008, at 12:48 AM, Kevin Weil wrote:


Hi Olga,

I am eagerly awaiting not having to re-read all data each time I  
store part
of a split!  As far as timelines go, I imagine this will be a  
larger fix

that will come in after the merge from types - trunk?  And is
Pig-273https://issues.apache.org/jira/browse/PIG-273the proper bug
for tracking this issue?

Thanks,
Kevin

On Mon, Dec 22, 2008 at 10:22 AM, Olga Natkovich ol...@yahoo- 
inc.comwrote:



The reason trunk does not contain the latest code is that Pig has
undergone a complete redesign that we could not do incrementally  
on the

trunk without jeopardizing its stability. The decision was made to do
the work on a brunch and then merge branch code to the trunk when  
it is

stable.

The merging will be happening in the early January.

The second comment that Alan made is that we are about to start  
work on

cross query optimization - ability to combine computations across
multiple stores.

Olga


-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com]
Sent: Saturday, December 20, 2008 10:33 AM
To: pig-dev@hadoop.apache.org
Cc: pig-dev@hadoop.apache.org
Subject: Re: Pig performance


I think the key points that Alan brought up in his blog
comment were that trunk pig is paradoxically not the most
current and that storing intermediate results can decrease
the scope of optimizations.

On Dec 20, 2008, at 10:16, Alan Gates ga...@yahoo-inc.com wrote:


I left a comment on the blog addressing some of the issues

he brought

up.

Alan.

On Dec 20, 2008, at 1:00 AM, Jeff Hammerbacher wrote:


Hey Pig team,

Did anyone check out the recent claims about Pig's poor

performance

versus Cascading? Though I haven't worked extensively with either
system, I found the statements made fairly bold and am curious to
hear more about their validity from the Pig development team:

http://www.manamplified.org/archives/2008/12/cascading-and-pig- 
planne

rs.html
.

Thanks,
Jeff










Re: Pig performance

2008-12-30 Thread Kevin Weil
Hi Olga,

I am eagerly awaiting not having to re-read all data each time I store part
of a split!  As far as timelines go, I imagine this will be a larger fix
that will come in after the merge from types - trunk?  And is
Pig-273https://issues.apache.org/jira/browse/PIG-273the proper bug
for tracking this issue?

Thanks,
Kevin

On Mon, Dec 22, 2008 at 10:22 AM, Olga Natkovich ol...@yahoo-inc.comwrote:

 The reason trunk does not contain the latest code is that Pig has
 undergone a complete redesign that we could not do incrementally on the
 trunk without jeopardizing its stability. The decision was made to do
 the work on a brunch and then merge branch code to the trunk when it is
 stable.

 The merging will be happening in the early January.

 The second comment that Alan made is that we are about to start work on
 cross query optimization - ability to combine computations across
 multiple stores.

 Olga

  -Original Message-
  From: Ted Dunning [mailto:ted.dunn...@gmail.com]
  Sent: Saturday, December 20, 2008 10:33 AM
  To: pig-dev@hadoop.apache.org
  Cc: pig-dev@hadoop.apache.org
  Subject: Re: Pig performance
 
 
  I think the key points that Alan brought up in his blog
  comment were that trunk pig is paradoxically not the most
  current and that storing intermediate results can decrease
  the scope of optimizations.
 
  On Dec 20, 2008, at 10:16, Alan Gates ga...@yahoo-inc.com wrote:
 
   I left a comment on the blog addressing some of the issues
  he brought
   up.
  
   Alan.
  
   On Dec 20, 2008, at 1:00 AM, Jeff Hammerbacher wrote:
  
   Hey Pig team,
  
   Did anyone check out the recent claims about Pig's poor
  performance
   versus Cascading? Though I haven't worked extensively with either
   system, I found the statements made fairly bold and am curious to
   hear more about their validity from the Pig development team:
  
  http://www.manamplified.org/archives/2008/12/cascading-and-pig-planne
   rs.html
   .
  
   Thanks,
   Jeff
  
 



RE: Pig performance

2008-12-22 Thread Olga Natkovich
The reason trunk does not contain the latest code is that Pig has
undergone a complete redesign that we could not do incrementally on the
trunk without jeopardizing its stability. The decision was made to do
the work on a brunch and then merge branch code to the trunk when it is
stable.

The merging will be happening in the early January.

The second comment that Alan made is that we are about to start work on
cross query optimization - ability to combine computations across
multiple stores.

Olga 

 -Original Message-
 From: Ted Dunning [mailto:ted.dunn...@gmail.com] 
 Sent: Saturday, December 20, 2008 10:33 AM
 To: pig-dev@hadoop.apache.org
 Cc: pig-dev@hadoop.apache.org
 Subject: Re: Pig performance
 
 
 I think the key points that Alan brought up in his blog 
 comment were that trunk pig is paradoxically not the most 
 current and that storing intermediate results can decrease 
 the scope of optimizations.
 
 On Dec 20, 2008, at 10:16, Alan Gates ga...@yahoo-inc.com wrote:
 
  I left a comment on the blog addressing some of the issues 
 he brought 
  up.
 
  Alan.
 
  On Dec 20, 2008, at 1:00 AM, Jeff Hammerbacher wrote:
 
  Hey Pig team,
 
  Did anyone check out the recent claims about Pig's poor 
 performance 
  versus Cascading? Though I haven't worked extensively with either 
  system, I found the statements made fairly bold and am curious to 
  hear more about their validity from the Pig development team:
  
 http://www.manamplified.org/archives/2008/12/cascading-and-pig-planne
  rs.html
  .
 
  Thanks,
  Jeff
 
 


Re: Pig performance

2008-12-20 Thread Alan Gates
I left a comment on the blog addressing some of the issues he brought  
up.


Alan.

On Dec 20, 2008, at 1:00 AM, Jeff Hammerbacher wrote:


Hey Pig team,

Did anyone check out the recent claims about Pig's poor performance  
versus
Cascading? Though I haven't worked extensively with either system,  
I found
the statements made fairly bold and am curious to hear more about  
their

validity from the Pig development team:
http://www.manamplified.org/archives/2008/12/cascading-and-pig- 
planners.html

.

Thanks,
Jeff




Re: Pig performance

2008-12-20 Thread Ted Dunning


I think the key points that Alan brought up in his blog comment were  
that trunk pig is paradoxically not the most current and that storing  
intermediate results can decrease the scope of optimizations.


On Dec 20, 2008, at 10:16, Alan Gates ga...@yahoo-inc.com wrote:

I left a comment on the blog addressing some of the issues he  
brought up.


Alan.

On Dec 20, 2008, at 1:00 AM, Jeff Hammerbacher wrote:


Hey Pig team,

Did anyone check out the recent claims about Pig's poor performance  
versus
Cascading? Though I haven't worked extensively with either system,  
I found
the statements made fairly bold and am curious to hear more about  
their

validity from the Pig development team:
http://www.manamplified.org/archives/2008/12/cascading-and-pig-planners.html
.

Thanks,
Jeff