Thanks Eric,
I was afraid that would be the case.  If I understand you correctly, putting a 
GB file into Accumulo would be a bad idea.  Given that fact, are there any 
strategies available to ensure that a given file in HDFS is co-located with the 
index info for that file in Accumulo? (I would assume not).  In my case, I 
could use Accumulo to store my indexes for fast query, but then have them 
return a URL/URI to the actual file.  However, I have to process each of those 
files further to get to my final result, and I was hoping to do the second 
stage of processing without having to return intermediate results.  Am I 
correct in assuming that this can't be done?

Thanks,
Tejay

From: Eric Newton [mailto:[email protected]]
Sent: Thursday, August 23, 2012 3:06 PM
To: [email protected]
Subject: Re: EXTERNAL: Re: Large files in Accumulo

An entire mutation needs to fit in memory several times, so you should not 
attempt to push in a single mutation larger than a 100MB unless you have a lot 
of memory in your tserver/logger.

And while I'm at it, large keys will create large indexes, so try to keep your 
(row,cf,cq,cv) under 100K.

-Eric
On Thu, Aug 23, 2012 at 4:37 PM, Cardon, Tejay E 
<[email protected]<mailto:[email protected]>> wrote:
In my case I'll be doing a document based index store (like the wikisearch 
example), but my documents may be as large as several GB.  I just wanted to 
pick the collective brain of the group to see if I'm walking into a major 
headache.  If it's never been tried before, then I'll give it a shot and report 
back.

Tejay

From: William Slacum 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, August 23, 2012 2:07 PM
To: [email protected]<mailto:[email protected]>
Subject: EXTERNAL: Re: Large files in Accumulo

Are these RFiles as a whole? I know at some point HBase needed to have entire 
rows fit into memory; Accumulo does not have this restriction.
On Thu, Aug 23, 2012 at 12:55 PM, Cardon, Tejay E 
<[email protected]<mailto:[email protected]>> wrote:
Alright, this one's a quick question.  I've been told that HBase does not 
perform well if large (> 100MB) files are stored in it).  Does Accumulo have 
similar trouble?  If so, can it be overcome by storing the large files in their 
own locality group?

Thanks,
Tejay


Reply via email to