Are you working to ingest a large number of files into Accumulo?
On Thu, Dec 5, 2013 at 11:30 PM, David Medinets <[email protected]>wrote: > After ingesting a few million files using the method in the Accumulo File > System Archive (http://accumulo.apache.org/1.4/examples/dirlist.html) we > ran into a problem reading the information back out of Accumulo. I forget > the error but I resolved it by using DigestUtils.md5hex instead of > Digestutils.md5 which stored the md5 as hex string instead of a binary > value. We did not dig into what caused the error we just side-stepped it. > > > On Wed, Dec 4, 2013 at 11:37 PM, Chris Carrino <[email protected]>wrote: > >> The org.apache.accumulo.examples.simple.filedata.FileDataIngest class >> generates LOWERCASE hash keys via the hexString() method, and uses them as >> row ID's for storing file chunks in Accumulo. Note that NIST uses >> UPPERCASE hash keys in the Reference Data Set (RDS). See >> http://www.nsrl.nist.gov/ for the RDS. Both approaches are valid since >> the hexadecimal representation of the key is not case sensitive - but make >> sure you normalize to one case if you are comparing the keys generated in >> the FileDataIngest class to the RDS keys. >> > >
