'col3' sorts lexicographically before 'col16'. you'll either need to encode your numerics or zero pad them.
On Thu, Dec 6, 2012 at 9:03 AM, Andrew Catterall < [email protected]> wrote: > Hi, > > > I am trying to run a bulk ingest to import data into Accumulo but it is > failing at the reduce task with the below error: > > > > java.lang.IllegalStateException: Keys appended out-of-order. New key > client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:col3 [myVis] > 9223372036854775807 false, previous key > client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a > foo:col16 [myVis] 9223372036854775807 false > > at > org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:378) > > > > Could this be caused by the order at which the writes are being done? > > > *-- Background* > > * > * > > The input file is a tab separated file. A sample row would look like: > > Data1 Data2 Data3 Data4 Data5 … DataN > > > > The map parses the data, for each row, into a Map<String, String>. This > will contain the following: > > Col1 Data1 > > Col2 Data2 > > Col3 Data3 > > … > > ColN DataN > > > An outputKey is then generated for this row in the format * > client@timeStamp@randomUUID* > > Then for each entry in Map<String, String> a outputValue is generated in > the format *ColN|DataN* > > The outputKey and outputValue are written to Context. > > > > This completes successfully, however, the reduce task fails. > > > My ReduceClass is as follows: > > > > *public* *static* *class* ReduceClass *extends* > Reducer<Text,Text,Key,Value> > { > > *public* *void* reduce(Text key, Iterable<Text> keyValues, > Context output) *throws* IOException, InterruptedException { > > > > // for each value belonging to the key > > *for* (Text keyValue : keyValues) { > > > > //split the keyValue into *Col* and Data > > String[] values = keyValue.toString().split("\\|"); > > > > // Generate key > > Key outputKey = *new* Key(key, *new* Text("foo"), * > new* Text(values[0]), *new* Text("myVis")); > > > > // Generate value > > Value outputValue = *new* Value(values[1].getBytes(), > 0, values[1].length()); > > > > // Write to context > > output.write(outputKey, outputValue); > > } > > } > > } > > > > > *-- Expected output* > > > > I am expecting the contents of the Accumulo table to be as follows: > > > > client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:Col1 [myVis] > Data1 > > client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:Col2 [myVis] > Data2 > > client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:Col3 [myVis] > Data3 > > client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:Col4 [myVis] > Data4 > > client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:Col5 [myVis] > Data5 > > … > > client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:ColN [myVis] > DataN > > > > > > Thanks, > > Andrew >
