Hi Daniel,
in 4.3.1, the example and figure 6 show this. 5.1 last paragraph says split 
operator maintain one-tuple buffer for each branch and talks about how to 
synchronize multiple branches. I do think that is the in-memory split.

here is the paper: http://www.vldb.org/pvldb/2/vldb09-1074.pdf


-Gang



----- 原始邮件 ----
发件人: Daniel Dai <jiany...@yahoo-inc.com>
收件人: "pig-dev@hadoop.apache.org" <pig-dev@hadoop.apache.org>
发送日期: 2010/7/26 (周一) 2:09:25 下午
主   题: Re: split operator

Hi, Gang,
Which part of the paper are you talking about? We don't do in-memory split. We 
dump the split result to a temporary file and start a new map-reduce job. Split 
do create a map-reduce boundary (Though it is not entirely true, multiquery 
optimizer may combine some of these jobs)

Daniel

Gang Luo wrote:
> Hi all
> according to the vldb 09 paper, the split operator and all its successive 
>operators reside in memory without any blocking in between. However, the 
>source 
>code (version 0.7) shows that a MR job is actually ended when it meets the 
>split 
>operator and multiple new MR jobs are created, each representing one branch. 
>This write-once-read-multiple-times method is different from the in-memory 
>method mentioned in that paper. Does pig change the strategy for split, or is 
>there still an in-memory version of split I didn't discover?
> 
> Thanks,
> -Gang
> 
> 
>        



Reply via email to