Hi.

I'm using the StreamingFileSink for writing partitioned data to s3.
The code is below:

StreamingFileSink<GenericRecord> sink =
StreamingFileSink.forBulkFormat(new Path("s3a://test-bucket/test"),
            ParquetAvroFactory.getParquetWriter(schema, "GZIP"))
.withBucketAssigner(new PartitionBucketAssigner(partitionColumns))
.build();

How can i remove the partition columns from the data (or not populating
them in the GenericRecord)?
My problem is with AWS Glue crawler which creates duplicate columns in the
table.

Thanks,
Yitzchak.

Reply via email to