Re: Set AccumuloFileOutputFormat to save data to HDFS files instead of writing to Accumulo directly

Jianshi Huang Wed, 18 Jun 2014 09:30:50 -0700

Ah, it's AccumuloFileOutputFormat not AccumuloOutputFormat. I know where to
find it.


Thanks,
Jianshi


On Wed, Jun 18, 2014 at 10:32 PM, William Slacum <
[email protected]> wrote:

> It extends FileOutputFormat, which provides that method (I haven't fully
> investigated java 7 nor 8, has its handling of parent class static methods
> changed?):
>
>
> http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/mapred/FileOutputFormat.html
>
> Data written to RFiles has to be sorted.
>
> With partitioning, as you grow in scale, it's a good idea to use a table's
> splits (see TableOperations#listSplits) to determine how to partition the
> output among files.
>
> Note that you don't write a <Text, Mutation> pair, but a <Key, Value> pair
> to the files.
>
>
> On Wed, Jun 18, 2014 at 4:21 AM, Jianshi Huang <[email protected]>
> wrote:
>
>> Hi all,
>>
>> I saw this line in
>> accumulo-1.6.0/examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/bulk/BulkIngestExample.java
>>
>>   AccumuloFileOutputFormat.setOutputPath(job, new Path(opts.workDir +
>> "/files"));
>>
>> However, it seems setOutputPath is not in 1.6.0
>>
>>
>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/mapreduce/AccumuloOutputFormat.html
>>
>>
>> So how can I write the mutations to HDFS files? I think it might be
>> faster to import using importdirectory command.
>>
>> BTW, I think it's fastest to import if the mutation files are already
>> (partially) sorted and partitioned, makes sense?
>>
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: Set AccumuloFileOutputFormat to save data to HDFS files instead of writing to Accumulo directly

Reply via email to