partition columns with StreamingFileSink

Yitzchak Lieberman Wed, 19 Jun 2019 05:37:04 -0700

Hi.

I'm using the StreamingFileSink for writing partitioned data to s3.
The code is below:


StreamingFileSink<GenericRecord> sink =
StreamingFileSink.forBulkFormat(new Path("s3a://test-bucket/test"),
            ParquetAvroFactory.getParquetWriter(schema, "GZIP"))
.withBucketAssigner(new PartitionBucketAssigner(partitionColumns))
.build();

How can i remove the partition columns from the data (or not populating
them in the GenericRecord)?
My problem is with AWS Glue crawler which creates duplicate columns in the
table.

Thanks,
Yitzchak.

partition columns with StreamingFileSink

Reply via email to