Yep, thanks for the correction Christopher. On Fri, Sep 10, 2021 at 9:15 AM Christopher <ctubb...@apache.org> wrote:
> One correction to what Mike said. The last location column doesn't store > where it was last hosted. The current location column does that. Rather, > the last location column stores where it was hosted when it last wrote to > HDFS. The goal is what Mike said: it tries to provide a mechanism for > preserving locality during reboots. However, it may not work very well, > especially since the last written file may be very small, and only a tiny > fraction of the tablet's overall data. > > On Fri, Sep 10, 2021, 09:03 Michael Wall <mjw...@apache.org> wrote: > >> If a tablet moves, the data files in HDFS do not go with it. However, >> during the next compaction one copy of the rfile should be written locally. >> >> Note, the metadata has a last column for each tablet, to record where the >> table was last hosted. On startup, Accumulo will try to assign a tablet to >> that last location if possible to hopefully take advantage of the locality. >> >> To really take advantage of the data locality, you should configure Short >> Circuit reads in HDFS. See >> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html >> >> On Fri, Sep 10, 2021 at 8:47 AM Shailesh Ligade <slig...@fbi.gov> wrote: >> >>> Thank you, >>> >>> >>> >>> Is there way to maintain that data locality, I mean over time with table >>> splitting, hdfs rebalancing etc we may not have data locality… >>> >>> >>> >>> Thanks again >>> >>> >>> >>> -S >>> >>> >>> >>> *From:* Christopher <ctubb...@apache.org> >>> *Sent:* Friday, September 10, 2021 8:40 AM >>> *To:* accumulo-user <user@accumulo.apache.org> >>> *Subject:* [EXTERNAL EMAIL] - Re: accumulo and hdfs data locality >>> >>> >>> >>> Data locality and simplified deployments are the only reasons I can >>> think of. Accumulo doesn't do anything particularly special for data >>> locality, but typically, an HDFS client (like Accumulo servers) will (or >>> can be configured to) write one copy of any new blocks locally, which >>> should permit efficient reads later. This works well with Accumulo's >>> hosting behavior, where each tablet is hosted on a single server solely >>> responsible for its reads and writes. >>> >>> >>> >>> On Fri, Sep 10, 2021, 07:22 Shailesh Ligade <slig...@fbi.gov> wrote: >>> >>> Hello I am suing Hadoop 3.3 and accumulo 1.10. Does accumulo take >>> advantage of Hadoop data locality? What are the other benefits of having >>> tserver and datanode process on the same instance? >>> >>> >>> >>> -S >>> >>> >>> >>> >>> >>>