Hi Reuven,
Final Pcollection<Row> is derived from values in one of the tables as shown
below. Then I am converting it to Pcollection<String> and writing to gcs
using TextIO. No encoding is required. I just have to write the
Pcollection<String> to the gcs file with extension ".json". I am able to
perform the functionality using the below code. Is this the efficient way
of performing write to gcs. Please let me know if there are any better ways
of performing this.
String JSON_SQL_QUERY= "SELECT\n" +
" CONCAT( '{\"source\" : {\"name\": \"bulkUpload\",
\"payloadType\" : {\"version\" : \"1.0\", \"type\" : \"testtype\"},
\"ids\" : {\"testid1\" : \"', id, '\"}}' ) \n"+
" FROM \n" +
" tempTable\n";;
PCollection<Row> rowOutput =
tempTuple.apply(SqlTransform.query(JSON_SQL_QUERY));
PCollection<String> finalJsonString = rowOutput
.apply(
"create string from row",
MapElements.via(
new SimpleFunction<Row, String>() {
@Override
public String apply(Row input) {
return input.getValue(0);
}
}));
finalJsonString.apply("Write results", TextIO.write()
.to("gs://bucket/folder/result")
.withoutSharding()
.withSuffix(".json"));
Regards,
Kayal
On Sun, Feb 27, 2022 at 10:16 PM Reuven Lax <[email protected]> wrote:
> How is your data encoded? What do you want the files in GCS to look like?
>
> Reuven
>
> On Sun, Feb 27, 2022 at 4:59 PM Kayal P <[email protected]> wrote:
>
>> Hi Team,
>>
>> I have a few GBs of Data derived in a Pcollection as a result of dataflow
>> transforms. My objective is to write the data from this Pcollection to a
>> file in GCS using Java SDK. Could you suggest me efficient way of doing
>> this, along with sample code?
>>
>> Thanks in advance
>>
>> Regards,
>> Kayal
>>
>