Are the hadoop nodes handling your map-reduce job also running tservers? Do the Accumulo log files show the exception? If so, can you post it?
On Wed, Dec 23, 2015 at 9:12 AM, Jeff Kubina <jeff.kub...@gmail.com> wrote: > I've have a mapreduce job that reads rfiles as Accumulo key/value > pairs using FileSKVIterator within a RecordReader, partition/shuffles them > based on the byte string of the key, and writes them out as new rfiles > using the AccumuloFileOutputFormat. The objective is to create larger > rfiles for bulk ingesting and to minimize the number of tservers each rfile > is assigned to after they are bulk ingested. > > For tables with a simple schema it works fine, but for tables with complex > schema the new rfiles are causing the tservers to throw a null pointer > exception during a compaction. > > Is there more to an rfile than just the key/value pairs that I am missing? > > If I compute an order independent checksum of the bytes of the key/value > pairs in the original rfiles and the new rfiles shouldn't they be the same? > >