John, let me make sure I understand.  When a tserver is running on the same 
physical box as a datanode, it will write to the local disk.  HDFS will then 
replicate that write across the network.  When I read from that tserver, it 
will not need a network read (assuming no failures).  Is that correct?

Thanks,
Tejay

From: John Vines [mailto:[email protected]]
Sent: Monday, June 25, 2012 11:46 AM
To: [email protected]
Subject: EXTERNAL: Re: Accumulo and file locality

When a tserver writes, it writes out to hdfs. When you utilize the hdfs api, 
data will be written first to the local datanode. So it does go to local disk, 
but it's local disk as well as others via datanodes. So each tserver should run 
a datanode so you actually get locality. If there is no datanode where the 
tserver is, then all reads and writes go over the network, which is suboptimal.

John
On Mon, Jun 25, 2012 at 1:21 PM, William Slacum 
<[email protected]<mailto:[email protected]>> wrote:
The loggers will write to local disk, however, the TabletServer will
write out files to HDFS during major and minor compactions.

I don't know how complex the tablet assignment algorithm is, but it's
safe to assume that if your tablet spans multiple HDFS blocks, a
TServer will, in all likelihood, only be hosting 1 HDFS block of a
given tablet at any given time, and do fetches for other HDFS (and
RFile) blocks as the need arises. There is a caching mechanism for
holding on to RFile blocks.

On Mon, Jun 25, 2012 at 10:13 AM, Cardon, Tejay E
<[email protected]<mailto:[email protected]>> wrote:
> All,
>
>                 If I understand things correctly, when an Accumulo tablet
> server writes data, things are organized such that those writes go to the
> local disk (ie each tablet server writes and reads data to/from the disk
> local to that server).  Is this correct?  And if so, then is it correct to
> assume that every tablet server should run on an HDFS data node?  Or am I
> completely off base here?
>
>
>
> Thanks,
>
> Tejay Cardon

Reply via email to