See InputFormatBase#setScanOffline. Clone a table, take it offline and then use it as your map/reduce input format. This will preserve a consistent view of the underlying files, without going through the tablet servers.
-Eric On Wed, Oct 17, 2012 at 9:46 AM, Denis <[email protected]> wrote: > Hi. > > I am thinking about creating a Direct Reader for Accumulo. > > A library which has API compatible with the Accumulo client but > reads .rf-files directly from HDFS, bypassing tservers. > > Motivation is: > > 1. To have a possibility to quickly read stalled data when the > tserver is busy (with re-balancing, reading logs, etc) or just went > down and its tablets are not redistributed yet. > > 2. If the table is read-only or can afford eventual consistency, > many readers can work in parallel with no bottleneck of tserver. Also, > the table's data becomes local on three (number of HDFS replicas) > servers instead of one. > > 3. Distribution of data: analytics can download .rf-files (even to > a laptop) and run their software locally. > > Any suggestions ? > > Thanks.
