Sven, You might consider using a combination of AccumuloInputFormat and AccumuloFileOutputFormat in a map/reduce job. The job will run in parallel, speeding up your transformation, the map/reduce framework should help with hiccups, and the bulk load at the end provides a atomic, eventually consistent commit. These input/output formats can also be used with other job frameworks like Spark. See for example:
examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/TableToFile.java examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/bulk/BulkIngestExample.java Cheers, Adam On Wed, Jun 21, 2017 at 1:49 AM, Sven Hodapp <[email protected] > wrote: > Hi there, > > I would like to select a subset of a Accumulo talbe and refactor the keys > to create a new table. > There are about 30M records with a value size about 5-20KB each. > I'm using Accumulo 1.8.0 and Java accumulo-core client library 1.8.0. > > I've written client code like that: > > * create a scanner fetching a specific column in a specific range > * transforming the key into the new schema > * using a batch writer to write the new generated mutations into the new > table > > scan = createScanner(FROM, auths) > // range, fetchColumn > writer = createBatchWriter(TO, configWriter) > iter = scan.iterator() > while (iter.hasNext()) { > entry = iter.next() > // create mutation with new key schema, but unaltered value > writer.addMutation(mutation) > } > writer.close() > > But this is slow and error prone (hiccups, ...). > Is it possible to use the Accumulo shell for such a task? > Are there another solutions I can use or some tricks? > > Thank you very much for any advices! > > Regards, > Sven > > -- > Sven Hodapp, M.Sc., > Fraunhofer Institute for Algorithms and Scientific Computing SCAI, > Department of Bioinformatics > Schloss Birlinghoven, 53754 Sankt Augustin, Germany > [email protected] > www.scai.fraunhofer.de >
