Re: Anybody ever used the HDFS NFS Gateway?

David Medinets Tue, 06 Oct 2015 19:31:25 -0700

One aspect of creating rfiles for importing into Accumulo that I don't
recall mentioned before is the ability to archive them for future use.


On Tue, Oct 6, 2015 at 10:25 PM, Russ Weeks <[email protected]>
wrote:

> Hi, Dylan,
>
> Yeah, writing RFiles instead of using BatchWriters
> (AccumuloFileOutputFormat vs. AccumuloOutputFormat) for efficiency and
> atomicity of ingest ("improved" atomicity if that even makes sense).
>
> I'm thinking about the NFS gateway just because the system that's
> producing the CSV is kind of a black box to me. It doesn't speak Hadoop, as
> Christopher alluded to, and I can't control its output format, but I can
> direct its output to a filesystem that it perceives to be local.
>
> My options are either an NFS write direct to HDFS via the gateway, or an
> NFS write to a conventional filesystem that I control, followed by some
> sort of inotify-driven migration from that server to HDFS.
>
> -Russ
>
> On Tue, Oct 6, 2015 at 6:12 PM Dylan Hutchison <[email protected]> wrote:
>
>> Hi Russ,
>>   I'm curious what you have in mind.  Are you looking for a solution more
>> efficient than running clients that read the CSV files and open
>> BatchWriters?
>>
>> Regards, Dylan
>>
>> On Tue, Oct 6, 2015 at 4:56 PM, Christopher <[email protected]> wrote:
>>
>>> I haven't tried it, but it sounds like a cool use case. Might be a good
>>> alternative to distcp, more interoperable with tools which don't speak
>>> hadoop.
>>>
>>> On Tue, Oct 6, 2015, 18:41 Russ Weeks <[email protected]> wrote:
>>>
>>>> I hope this isn't too off-topic. Any opinions re. its
>>>> completeness/quality/reliability?
>>>>
>>>> (The use case is, CSV files -> NFS -> HDFS -> Spark -> RFiles ->
>>>> Accumulo. Relevance established!)
>>>>
>>>> Thanks,
>>>> -Russ
>>>>
>>>
>>

Re: Anybody ever used the HDFS NFS Gateway?

Reply via email to