Sure. Thank you for the assistance.

> 
> On 28-Feb-2022, at 12:16 AM, Reuven Lax <[email protected]> wrote:
> 
> 
> Yes - writing to multiple files is the only way to achieve parallelism, as 
> multiple workers cannot safely append to the same file. If you need data 
> written to only one file, then it will process no faster than that single 
> worker can handle.
> 
>> On Sun, Feb 27, 2022 at 8:39 PM Kayal P <[email protected]> wrote:
>> Thank you. But if I am not using withousharding, I am getting the results 
>> written to various number of files based on the number of workers. Objective 
>> is to create one file per run. 
>> 
>> Regards,
>> Kayal
>> 
>>> 
>>>> On 27-Feb-2022, at 11:09 PM, Reuven Lax <[email protected]> wrote:
>>>> 
>>> 
>>> This works, however withoutSharding forces the entire write to be done by a 
>>> single worker. If your data grows large, you may find that this starts 
>>> performing poorly.
>>> 
>>>> On Sun, Feb 27, 2022 at 7:41 PM Kayal P <[email protected]> wrote:
>>>> Hi Reuven,
>>>> 
>>>> Final Pcollection<Row> is derived from values in one of the tables as 
>>>> shown below. Then I am converting it to Pcollection<String> and writing to 
>>>> gcs using TextIO. No encoding is required. I just have to write the 
>>>> Pcollection<String> to the gcs file with extension ".json". I am able to 
>>>> perform the functionality using the below code. Is this the efficient way 
>>>> of performing write to gcs. Please let me know if there are any better 
>>>> ways of performing this.
>>>> 
>>>> String JSON_SQL_QUERY=  "SELECT\n" +
>>>>         "  CONCAT( '{\"source\" : {\"name\": \"bulkUpload\", 
>>>> \"payloadType\" : {\"version\" : \"1.0\", \"type\" : \"testtype\"}, 
>>>> \"ids\" : {\"testid1\" : \"', id, '\"}}' ) \n"+
>>>>         " FROM \n" +
>>>>         "     tempTable\n";;
>>>> PCollection<Row> rowOutput = 
>>>> tempTuple.apply(SqlTransform.query(JSON_SQL_QUERY));
>>>> PCollection<String> finalJsonString = rowOutput
>>>>         .apply(
>>>>                 "create string from row",
>>>>                 MapElements.via(
>>>>                         new SimpleFunction<Row, String>() {
>>>>                             @Override
>>>>                             public String apply(Row input) {
>>>>                                 return input.getValue(0);
>>>>                             }
>>>>                         }));
>>>> 
>>>> finalJsonString.apply("Write results", TextIO.write()
>>>>         .to("gs://bucket/folder/result")
>>>>         .withoutSharding()
>>>>         .withSuffix(".json"));
>>>> 
>>>> Regards,
>>>> Kayal
>>>> 
>>>>> On Sun, Feb 27, 2022 at 10:16 PM Reuven Lax <[email protected]> wrote:
>>>>> How is your data encoded? What do you want the files in GCS to look like?
>>>>> 
>>>>> Reuven
>>>>> 
>>>>>> On Sun, Feb 27, 2022 at 4:59 PM Kayal P <[email protected]> 
>>>>>> wrote:
>>>>>> Hi Team,
>>>>>> 
>>>>>> I have a few GBs of Data derived in a Pcollection as a result of 
>>>>>> dataflow transforms. My objective is to write the data from this 
>>>>>> Pcollection to a file in GCS using Java SDK. Could you suggest me 
>>>>>> efficient way of doing this, along with sample code?
>>>>>> 
>>>>>> Thanks in advance
>>>>>> 
>>>>>> Regards,
>>>>>> Kayal

Reply via email to