Yep, thanks for the correction Christopher.

On Fri, Sep 10, 2021 at 9:15 AM Christopher <ctubb...@apache.org> wrote:

> One correction to what Mike said. The last location column doesn't store
> where it was last hosted. The current location column does that. Rather,
> the last location column stores where it was hosted when it last wrote to
> HDFS. The goal is what Mike said: it tries to provide a mechanism for
> preserving locality during reboots. However, it may not work very well,
> especially since the last written file may be very small, and only a tiny
> fraction of the tablet's overall data.
>
> On Fri, Sep 10, 2021, 09:03 Michael Wall <mjw...@apache.org> wrote:
>
>> If a tablet moves, the data files in HDFS do not go with it.  However,
>> during the next compaction one copy of the rfile should be written locally.
>>
>> Note, the metadata has a last column for each tablet, to record where the
>> table was last hosted.  On startup, Accumulo will try to assign a tablet to
>> that last location if possible to hopefully take advantage of the locality.
>>
>> To really take advantage of the data locality, you should configure Short
>> Circuit reads in HDFS. See
>> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html
>>
>> On Fri, Sep 10, 2021 at 8:47 AM Shailesh Ligade <slig...@fbi.gov> wrote:
>>
>>> Thank you,
>>>
>>>
>>>
>>> Is there way to maintain that data locality, I mean over time with table
>>> splitting, hdfs rebalancing etc we may not have data locality…
>>>
>>>
>>>
>>> Thanks again
>>>
>>>
>>>
>>> -S
>>>
>>>
>>>
>>> *From:* Christopher <ctubb...@apache.org>
>>> *Sent:* Friday, September 10, 2021 8:40 AM
>>> *To:* accumulo-user <user@accumulo.apache.org>
>>> *Subject:* [EXTERNAL EMAIL] - Re: accumulo and hdfs data locality
>>>
>>>
>>>
>>> Data locality and simplified deployments are the only reasons I can
>>> think of. Accumulo doesn't do anything particularly special for data
>>> locality, but typically, an HDFS client (like Accumulo servers) will (or
>>> can be configured to) write one copy of any new blocks locally, which
>>> should permit efficient reads later. This works well with Accumulo's
>>> hosting behavior, where each tablet is hosted on a single server solely
>>> responsible for its reads and writes.
>>>
>>>
>>>
>>> On Fri, Sep 10, 2021, 07:22 Shailesh Ligade <slig...@fbi.gov> wrote:
>>>
>>> Hello I am suing Hadoop 3.3 and accumulo 1.10. Does accumulo take
>>> advantage of Hadoop data locality? What are the other benefits of having
>>> tserver and datanode process on the same instance?
>>>
>>>
>>>
>>> -S
>>>
>>>
>>>
>>>
>>>
>>>

Reply via email to