Thank you. But if I am not using withousharding, I am getting the results 
written to various number of files based on the number of workers. Objective is 
to create one file per run. 

Regards,
Kayal

> 
> On 27-Feb-2022, at 11:09 PM, Reuven Lax <[email protected]> wrote:
> 
> 
> This works, however withoutSharding forces the entire write to be done by a 
> single worker. If your data grows large, you may find that this starts 
> performing poorly.
> 
>> On Sun, Feb 27, 2022 at 7:41 PM Kayal P <[email protected]> wrote:
>> Hi Reuven,
>> 
>> Final Pcollection<Row> is derived from values in one of the tables as shown 
>> below. Then I am converting it to Pcollection<String> and writing to gcs 
>> using TextIO. No encoding is required. I just have to write the 
>> Pcollection<String> to the gcs file with extension ".json". I am able to 
>> perform the functionality using the below code. Is this the efficient way of 
>> performing write to gcs. Please let me know if there are any better ways of 
>> performing this.
>> 
>> String JSON_SQL_QUERY=  "SELECT\n" +
>>         "  CONCAT( '{\"source\" : {\"name\": \"bulkUpload\", \"payloadType\" 
>> : {\"version\" : \"1.0\", \"type\" : \"testtype\"}, \"ids\" : {\"testid1\" : 
>> \"', id, '\"}}' ) \n"+
>>         " FROM \n" +
>>         "     tempTable\n";;
>> PCollection<Row> rowOutput = 
>> tempTuple.apply(SqlTransform.query(JSON_SQL_QUERY));
>> PCollection<String> finalJsonString = rowOutput
>>         .apply(
>>                 "create string from row",
>>                 MapElements.via(
>>                         new SimpleFunction<Row, String>() {
>>                             @Override
>>                             public String apply(Row input) {
>>                                 return input.getValue(0);
>>                             }
>>                         }));
>> 
>> finalJsonString.apply("Write results", TextIO.write()
>>         .to("gs://bucket/folder/result")
>>         .withoutSharding()
>>         .withSuffix(".json"));
>> 
>> Regards,
>> Kayal
>> 
>>> On Sun, Feb 27, 2022 at 10:16 PM Reuven Lax <[email protected]> wrote:
>>> How is your data encoded? What do you want the files in GCS to look like?
>>> 
>>> Reuven
>>> 
>>>> On Sun, Feb 27, 2022 at 4:59 PM Kayal P <[email protected]> wrote:
>>>> Hi Team,
>>>> 
>>>> I have a few GBs of Data derived in a Pcollection as a result of dataflow 
>>>> transforms. My objective is to write the data from this Pcollection to a 
>>>> file in GCS using Java SDK. Could you suggest me efficient way of doing 
>>>> this, along with sample code?
>>>> 
>>>> Thanks in advance
>>>> 
>>>> Regards,
>>>> Kayal

Reply via email to