when I use the distributed cache , I found that when the file is more than 100MB or the number of records are more than 10 million , the file can not be cache in the memory; and I try to set the io.sort.mb is 200MB ; it still can not work, Any suggestion would be fine! Thank you !
2012-11-16 发件人: yingnan.ma 发送时间: 2012-11-15 11:48:04 收件人: user 抄送: 主题: Re: Re: distributed cache Thank you so much! Both Replicated join and UDF to use distributed cache are useful for me, I am already done it , Thank you again. 2012-11-15 yingnan.ma 发件人: Prashant Kommireddi 发送时间: 2012-11-15 03:52:09 收件人: [email protected] 抄送: 主题: Re: distributed cache If it's for purposes other than a Join, you could write a UDF to use distributed cache. Look at the section "Loading the Distributed Cache" http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html On Wed, Nov 14, 2012 at 11:44 AM, Ruslan Al-Fakikh <[email protected]>wrote: > Maybe this is what you are looking for: > http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html > see "Replicated join" > > > On Tue, Nov 13, 2012 at 11:46 AM, yingnan.ma <[email protected]> > wrote: > > > Hi , > > > > I used the distributed cache in the hadoop though the "setup" and > "static" > > store an hashset in the mem; > > > > and I try to use the distributed cache in the Pig, and I don't know how > to > > store an hashset in the mem,I just can cache the file in the mem. > > > > Any advise would be fine, Thank you so much! > > > > Best Regards > > > > Malone > > > > 2012-11-13 > > > > > > >
