One correction to what Mike said. The last location column doesn't store where it was last hosted. The current location column does that. Rather, the last location column stores where it was hosted when it last wrote to HDFS. The goal is what Mike said: it tries to provide a mechanism for preserving locality during reboots. However, it may not work very well, especially since the last written file may be very small, and only a tiny fraction of the tablet's overall data.
On Fri, Sep 10, 2021, 09:03 Michael Wall <mjw...@apache.org> wrote: > If a tablet moves, the data files in HDFS do not go with it. However, > during the next compaction one copy of the rfile should be written locally. > > Note, the metadata has a last column for each tablet, to record where the > table was last hosted. On startup, Accumulo will try to assign a tablet to > that last location if possible to hopefully take advantage of the locality. > > To really take advantage of the data locality, you should configure Short > Circuit reads in HDFS. See > https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html > > On Fri, Sep 10, 2021 at 8:47 AM Shailesh Ligade <slig...@fbi.gov> wrote: > >> Thank you, >> >> >> >> Is there way to maintain that data locality, I mean over time with table >> splitting, hdfs rebalancing etc we may not have data locality… >> >> >> >> Thanks again >> >> >> >> -S >> >> >> >> *From:* Christopher <ctubb...@apache.org> >> *Sent:* Friday, September 10, 2021 8:40 AM >> *To:* accumulo-user <user@accumulo.apache.org> >> *Subject:* [EXTERNAL EMAIL] - Re: accumulo and hdfs data locality >> >> >> >> Data locality and simplified deployments are the only reasons I can think >> of. Accumulo doesn't do anything particularly special for data locality, >> but typically, an HDFS client (like Accumulo servers) will (or can be >> configured to) write one copy of any new blocks locally, which should >> permit efficient reads later. This works well with Accumulo's hosting >> behavior, where each tablet is hosted on a single server solely responsible >> for its reads and writes. >> >> >> >> On Fri, Sep 10, 2021, 07:22 Shailesh Ligade <slig...@fbi.gov> wrote: >> >> Hello I am suing Hadoop 3.3 and accumulo 1.10. Does accumulo take >> advantage of Hadoop data locality? What are the other benefits of having >> tserver and datanode process on the same instance? >> >> >> >> -S >> >> >> >> >> >>