I don't think we have a source/sink for reading Hadoop sequence files. Your
best bet currently will probably be to use FileSystem abstraction to create
a file from a ParDo and read directly from there using a library that can
read sequence files.

Thanks,
Cham

On Mon, Jul 1, 2019 at 8:42 AM Shannon Duncan <[email protected]>
wrote:

> I'm wanting to read a Sequence/Map file from Hadoop stored on Google Cloud
> Storage via a " gs://bucket/link/SequenceFile-* " via the Python SDK.
>
> I cannot locate any good adapters for this, and the one Hadoop Filesystem
> reader seems to only read from a "hdfs://" url.
>
> I'm wanting to use Dataflow and GCS exclusively to start mixing in Beam
> pipelines with our current Hadoop Pipelines.
>
> Is this a feature that is supported or will be supported in the future?
> Does anyone have any good suggestions for this that is performant?
>
> I'd also like to be able to write back out to a SequenceFile if possible.
>
> Thanks!
>
>

Reply via email to