Hi,

I am writing a program that reads a CSV and writes to parquet.

PCollection<String> input =             
pipeline.apply(TextIO.read().from("/tmp/beam/input.csv"));
       input
        .apply("Produce Avro records", ParDo.of(new 
DeterministicallyConstructAvroRecordsFn()))
.setCoder(AvroCoder.of(SCHEMA))
        .apply(
            "Write Parquet files",
            
FileIO.<GenericRecord>write().via(ParquetIO.sink(SCHEMA)).to("/tmp/parquet/"));

    pipeline.run();
  }

Q1 ? how can i disable sharding in the above code and write to single parquet ?
Q2? how can i use a custom file naming with FileIO.write, say i want to use the 
schema name GenericRecord.getSchema().getName() as prefix
Do i have to use FileIO.writeDynamic()

thanks
Sri

Reply via email to