when I use the distributed cache , I found that when the file is more than 
100MB or the number of records are more than 10 million , the file can not be 
cache in the memory; and I try to set the io.sort.mb is 200MB ; it still can 
not work, Any suggestion would be fine! Thank you !
 


2012-11-16 




发件人: yingnan.ma 
发送时间: 2012-11-15  11:48:04 
收件人: user 
抄送: 
主题: Re: Re: distributed cache 
 
Thank you so much! Both Replicated join and UDF to use
distributed cache are useful for me, I am already done it , Thank you again.
2012-11-15 
yingnan.ma 
发件人: Prashant Kommireddi 
发送时间: 2012-11-15  03:52:09 
收件人: [email protected] 
抄送: 
主题: Re: distributed cache 

If it's for purposes other than a Join, you could write a UDF to use
distributed cache. Look at the section "Loading the Distributed Cache"
http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html
On Wed, Nov 14, 2012 at 11:44 AM, Ruslan Al-Fakikh <[email protected]>wrote:
> Maybe this is what you are looking for:
> http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html
> see "Replicated join"
>
>
> On Tue, Nov 13, 2012 at 11:46 AM, yingnan.ma <[email protected]>
> wrote:
>
> > Hi ,
> >
> > I used the distributed cache in the hadoop though the "setup" and
> "static"
> > store an hashset in the mem;
> >
> > and I try to use the distributed cache in the Pig, and I don't know how
> to
> > store an hashset in the mem,I just can cache the file in the mem.
> >
> > Any advise would be fine, Thank you so much!
> >
> > Best Regards
> >
> > Malone
> >
> > 2012-11-13
> >
> >
> >
>

Reply via email to