Hi Daniel, in 4.3.1, the example and figure 6 show this. 5.1 last paragraph says split operator maintain one-tuple buffer for each branch and talks about how to synchronize multiple branches. I do think that is the in-memory split.
here is the paper: http://www.vldb.org/pvldb/2/vldb09-1074.pdf -Gang ----- 原始邮件 ---- 发件人: Daniel Dai <jiany...@yahoo-inc.com> 收件人: "pig-dev@hadoop.apache.org" <pig-dev@hadoop.apache.org> 发送日期: 2010/7/26 (周一) 2:09:25 下午 主 题: Re: split operator Hi, Gang, Which part of the paper are you talking about? We don't do in-memory split. We dump the split result to a temporary file and start a new map-reduce job. Split do create a map-reduce boundary (Though it is not entirely true, multiquery optimizer may combine some of these jobs) Daniel Gang Luo wrote: > Hi all > according to the vldb 09 paper, the split operator and all its successive >operators reside in memory without any blocking in between. However, the >source >code (version 0.7) shows that a MR job is actually ended when it meets the >split >operator and multiple new MR jobs are created, each representing one branch. >This write-once-read-multiple-times method is different from the in-memory >method mentioned in that paper. Does pig change the strategy for split, or is >there still an in-memory version of split I didn't discover? > > Thanks, > -Gang > > >