On Wed, Oct 17, 2012 at 10:57 AM, Eric Newton <[email protected]> wrote: > See InputFormatBase#setScanOffline.
This uses o.a.a.c.client.impl.OfflineScanner. OfflineScanner will scan an offline table by going directly to the files. It does the exact same thing the tablet server does when reading a tablets files. I was thinking of making OfflineScanner available through Connector somehow when adding setScanOffline to M/R code, but did not for some reason. If there is interest we could revisit this. > > Clone a table, take it offline and then use it as your map/reduce > input format. This will preserve a consistent view of the underlying > files, without going through the tablet servers. > > -Eric > > On Wed, Oct 17, 2012 at 9:46 AM, Denis <[email protected]> wrote: >> Hi. >> >> I am thinking about creating a Direct Reader for Accumulo. >> >> A library which has API compatible with the Accumulo client but >> reads .rf-files directly from HDFS, bypassing tservers. >> >> Motivation is: >> >> 1. To have a possibility to quickly read stalled data when the >> tserver is busy (with re-balancing, reading logs, etc) or just went >> down and its tablets are not redistributed yet. >> >> 2. If the table is read-only or can afford eventual consistency, >> many readers can work in parallel with no bottleneck of tserver. Also, >> the table's data becomes local on three (number of HDFS replicas) >> servers instead of one. >> >> 3. Distribution of data: analytics can download .rf-files (even to >> a laptop) and run their software locally. >> >> Any suggestions ? >> >> Thanks.
