Sure. Thank you for the assistance.
> > On 28-Feb-2022, at 12:16 AM, Reuven Lax <[email protected]> wrote: > > > Yes - writing to multiple files is the only way to achieve parallelism, as > multiple workers cannot safely append to the same file. If you need data > written to only one file, then it will process no faster than that single > worker can handle. > >> On Sun, Feb 27, 2022 at 8:39 PM Kayal P <[email protected]> wrote: >> Thank you. But if I am not using withousharding, I am getting the results >> written to various number of files based on the number of workers. Objective >> is to create one file per run. >> >> Regards, >> Kayal >> >>> >>>> On 27-Feb-2022, at 11:09 PM, Reuven Lax <[email protected]> wrote: >>>> >>> >>> This works, however withoutSharding forces the entire write to be done by a >>> single worker. If your data grows large, you may find that this starts >>> performing poorly. >>> >>>> On Sun, Feb 27, 2022 at 7:41 PM Kayal P <[email protected]> wrote: >>>> Hi Reuven, >>>> >>>> Final Pcollection<Row> is derived from values in one of the tables as >>>> shown below. Then I am converting it to Pcollection<String> and writing to >>>> gcs using TextIO. No encoding is required. I just have to write the >>>> Pcollection<String> to the gcs file with extension ".json". I am able to >>>> perform the functionality using the below code. Is this the efficient way >>>> of performing write to gcs. Please let me know if there are any better >>>> ways of performing this. >>>> >>>> String JSON_SQL_QUERY= "SELECT\n" + >>>> " CONCAT( '{\"source\" : {\"name\": \"bulkUpload\", >>>> \"payloadType\" : {\"version\" : \"1.0\", \"type\" : \"testtype\"}, >>>> \"ids\" : {\"testid1\" : \"', id, '\"}}' ) \n"+ >>>> " FROM \n" + >>>> " tempTable\n";; >>>> PCollection<Row> rowOutput = >>>> tempTuple.apply(SqlTransform.query(JSON_SQL_QUERY)); >>>> PCollection<String> finalJsonString = rowOutput >>>> .apply( >>>> "create string from row", >>>> MapElements.via( >>>> new SimpleFunction<Row, String>() { >>>> @Override >>>> public String apply(Row input) { >>>> return input.getValue(0); >>>> } >>>> })); >>>> >>>> finalJsonString.apply("Write results", TextIO.write() >>>> .to("gs://bucket/folder/result") >>>> .withoutSharding() >>>> .withSuffix(".json")); >>>> >>>> Regards, >>>> Kayal >>>> >>>>> On Sun, Feb 27, 2022 at 10:16 PM Reuven Lax <[email protected]> wrote: >>>>> How is your data encoded? What do you want the files in GCS to look like? >>>>> >>>>> Reuven >>>>> >>>>>> On Sun, Feb 27, 2022 at 4:59 PM Kayal P <[email protected]> >>>>>> wrote: >>>>>> Hi Team, >>>>>> >>>>>> I have a few GBs of Data derived in a Pcollection as a result of >>>>>> dataflow transforms. My objective is to write the data from this >>>>>> Pcollection to a file in GCS using Java SDK. Could you suggest me >>>>>> efficient way of doing this, along with sample code? >>>>>> >>>>>> Thanks in advance >>>>>> >>>>>> Regards, >>>>>> Kayal
