Thank you. But if I am not using withousharding, I am getting the results written to various number of files based on the number of workers. Objective is to create one file per run.
Regards, Kayal > > On 27-Feb-2022, at 11:09 PM, Reuven Lax <[email protected]> wrote: > > > This works, however withoutSharding forces the entire write to be done by a > single worker. If your data grows large, you may find that this starts > performing poorly. > >> On Sun, Feb 27, 2022 at 7:41 PM Kayal P <[email protected]> wrote: >> Hi Reuven, >> >> Final Pcollection<Row> is derived from values in one of the tables as shown >> below. Then I am converting it to Pcollection<String> and writing to gcs >> using TextIO. No encoding is required. I just have to write the >> Pcollection<String> to the gcs file with extension ".json". I am able to >> perform the functionality using the below code. Is this the efficient way of >> performing write to gcs. Please let me know if there are any better ways of >> performing this. >> >> String JSON_SQL_QUERY= "SELECT\n" + >> " CONCAT( '{\"source\" : {\"name\": \"bulkUpload\", \"payloadType\" >> : {\"version\" : \"1.0\", \"type\" : \"testtype\"}, \"ids\" : {\"testid1\" : >> \"', id, '\"}}' ) \n"+ >> " FROM \n" + >> " tempTable\n";; >> PCollection<Row> rowOutput = >> tempTuple.apply(SqlTransform.query(JSON_SQL_QUERY)); >> PCollection<String> finalJsonString = rowOutput >> .apply( >> "create string from row", >> MapElements.via( >> new SimpleFunction<Row, String>() { >> @Override >> public String apply(Row input) { >> return input.getValue(0); >> } >> })); >> >> finalJsonString.apply("Write results", TextIO.write() >> .to("gs://bucket/folder/result") >> .withoutSharding() >> .withSuffix(".json")); >> >> Regards, >> Kayal >> >>> On Sun, Feb 27, 2022 at 10:16 PM Reuven Lax <[email protected]> wrote: >>> How is your data encoded? What do you want the files in GCS to look like? >>> >>> Reuven >>> >>>> On Sun, Feb 27, 2022 at 4:59 PM Kayal P <[email protected]> wrote: >>>> Hi Team, >>>> >>>> I have a few GBs of Data derived in a Pcollection as a result of dataflow >>>> transforms. My objective is to write the data from this Pcollection to a >>>> file in GCS using Java SDK. Could you suggest me efficient way of doing >>>> this, along with sample code? >>>> >>>> Thanks in advance >>>> >>>> Regards, >>>> Kayal
