Hi,
I am writing a program that reads a CSV and writes to parquet.
PCollection<String> input =
pipeline.apply(TextIO.read().from("/tmp/beam/input.csv"));
input
.apply("Produce Avro records", ParDo.of(new
DeterministicallyConstructAvroRecordsFn()))
.setCoder(AvroCoder.of(SCHEMA))
.apply(
"Write Parquet files",
FileIO.<GenericRecord>write().via(ParquetIO.sink(SCHEMA)).to("/tmp/parquet/"));
pipeline.run();
}
Q1 ? how can i disable sharding in the above code and write to single parquet ?
Q2? how can i use a custom file naming with FileIO.write, say i want to use the
schema name GenericRecord.getSchema().getName() as prefix
Do i have to use FileIO.writeDynamic()
thanks
Sri