That is correct. There is no effort expended ensure locality of bulk files.
-Eric On Wed, Oct 7, 2015 at 9:50 AM, Jeff Kubina <[email protected]> wrote: > So if the HDFS has a replication factor of m and an r-file has a range > that intersects n tablets, then data-locality will never be achieved for > max(0,n-m) of the r-files, that is, they will never be on the same node as > their tablet server until compaction, correct? > > -- > Jeff Kubina > 410-988-4436 > > > On Wed, Oct 7, 2015 at 9:35 AM, Josh Elser <[email protected]> wrote: > >> >> On Oct 7, 2015 8:47 AM, "Jeff Kubina" <[email protected]> wrote: >> > >> > How does Accumulo process an r-file for bulk ingesting when the key >> range of an r-file is within one tablet's key range and when the key range >> of an r-file spans two or more tablets? >> > >> > If the r-file is within one tablet's range I thought the file was "just >> renamed" and added to the tablet's list of r-files. Is that correct? >> >> Bingo >> >> > If the key range of the r-file spans two or more files is the r-file >> partitioned into separate r-files for each appropriate tablet server or are >> the records "batch-written" to each appropriate tablet in memory? >> >> They're logically partitioned if memory serves (the files are not >> rewritten). So you would see multiple entries in the metadata table for a >> single file with certain offsets. No replaying of mutations by batch >> writers. >> > >
