That's a pretty big hole for a missing source/sink when looking at
transitioning from Dataproc to Dataflow using GCS as storage buffer instead
of a traditional hdfs.

>From what I've been able to tell from source code and documentation, Java
is able to but not Python?

Thanks,
Shannon

On Mon, Jul 1, 2019 at 5:29 PM Chamikara Jayalath <[email protected]>
wrote:

> I don't think we have a source/sink for reading Hadoop sequence files.
> Your best bet currently will probably be to use FileSystem abstraction to
> create a file from a ParDo and read directly from there using a library
> that can read sequence files.
>
> Thanks,
> Cham
>
> On Mon, Jul 1, 2019 at 8:42 AM Shannon Duncan <[email protected]>
> wrote:
>
>> I'm wanting to read a Sequence/Map file from Hadoop stored on Google
>> Cloud Storage via a " gs://bucket/link/SequenceFile-* " via the Python SDK.
>>
>> I cannot locate any good adapters for this, and the one Hadoop Filesystem
>> reader seems to only read from a "hdfs://" url.
>>
>> I'm wanting to use Dataflow and GCS exclusively to start mixing in Beam
>> pipelines with our current Hadoop Pipelines.
>>
>> Is this a feature that is supported or will be supported in the future?
>> Does anyone have any good suggestions for this that is performant?
>>
>> I'd also like to be able to write back out to a SequenceFile if possible.
>>
>> Thanks!
>>
>>

Reply via email to