A ParDo should always return an iterator not a string. So if you want to
output a single string it should either be "return [str]" or "yield str".

On Mon, Jun 18, 2018 at 1:39 PM OrielResearch Eila Arich-Landkof <
[email protected]> wrote:

> Thanks for the response.
> I tried this within the current parDo, CreateColForSampleFn, Apache beam
> returns a warning with recommendation not to return a string.
>
> So, my questions are:
> - Is it essential to separate this transformation in a different ParDo?
> - Should I ignore that message? When is this message relevant?
>
> Many thanks,
> Eila
>
> On Mon, Jun 18, 2018 at 2:52 PM Lukasz Cwik <[email protected]> wrote:
>
>> User is the correct mailing list.
>>
>> beam.io.WriteToText takes 'strings' which means that you have to format
>> the whole line yourself. You'll want to apply another ParDo
>> after CreateColForSampleFn which takes the 1x164 record and concatenates
>> each value with ',' in between.
>>
>> On Mon, Jun 18, 2018 at 9:00 AM OrielResearch Eila Arich-Landkof <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> Is anyone listening on the user@ mailing list? or should I use a
>>> different mailing list?
>>>
>>> I have made some progress.
>>> - ParDo returns a List now
>>> - add a header to the WriteToText.
>>>
>>> The pipeline looks like that:
>>> ExploreData = (p | "Extract the rows from dataframe" >>
>>> beam.io.Read(beam.io.BigQuerySource('archs4.Debug_annotation'))
>>>                 | "create more columns" >>
>>> beam.ParDo(CreateColForSampleFn(colListSubset,outputPath)))
>>>
>>> (ExploreData | 'writing to CSV files' >>
>>> beam.io.WriteToText('gs://dataExploration.txt',file_name_suffix='.csv',num_shards=1,append_trailing_newlines=True,header=colListStr))
>>>
>>>
>>> The remaining issue is that the output has new line after each value:
>>>
>>> *None
>>> None
>>> None
>>> None
>>> None
>>>  30
>>>  Primary Tissue
>>> None
>>> None
>>> None*
>>>
>>> Please let me know how do I get read from this new lines. I hope to be able 
>>> to open the output file with Google Sheet.
>>>
>>>
>>> Thanks,
>>>
>>> Eila
>>>
>>>
>>>
>>> On Fri, Jun 15, 2018 at 2:45 PM, OrielResearch Eila Arich-Landkof <
>>> [email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am running a pipeline, where a table from BQ is being processed line
>>>> by line using ParDo function.
>>>> CreateColForSampleFn generates a data frame, with headers and values
>>>> (shape: 1x164 ) that I want to pass to WriteToText.
>>>> See the followings:
>>>>
>>>> ExploreData = (p | "Extract the rows from dataframe" >> beam.io.Read(
>>>> beam.io.BigQuerySource('archs4.Debug_annotation'))
>>>>                 | "create more columns" >>
>>>> beam.ParDo(CreateColForSampleFn(colListSubset,outputPath)))
>>>>
>>>> (ExploreData | 'writing to CSV files' >>
>>>> beam.io.WriteToText('gs://dataExploration.txt',num_shards=1))
>>>>
>>>> My questions are related to the returned DF and WriteToText:
>>>> 1. when I pass DF from the CreateColForSampleFn to WriteToText , I get
>>>> only the headers:
>>>>
>>>> Sample_contact_phone
>>>> Sample_extract_protocol_ch1
>>>> Sample_platform_id
>>>> Sick
>>>> Sample_title
>>>> index
>>>> Sample_last_update_date
>>>> Sample_contact_country
>>>> Sample_channel_count
>>>> Sample_library_source
>>>> Sample_taxid_ch1
>>>>
>>>>
>>>> 2. When I return the df in a list [df], I get the following txt for
>>>> each row (including the dimensions)
>>>>
>>>>  Sample_contact_phone                        Sample_extract_protocol_ch1 
>>>> Sample_platform_id  Sick
>>>>
>>>> 0                       Library construction protocol: Four µg of tota... 
>>>>           GPL11154  None
>>>>
>>>> [1 rows x 168 columns]
>>>>
>>>>
>>>>
>>>> I want to generate a text file that includes:
>>>> - One header (if needed, I will add it after the pipeline completed)
>>>> - All the values from each rows that was processed and generated DF
>>>> - Full cell values, without ... in the middle
>>>>
>>>> What am I missing? any advice?
>>>>
>>>> Thanks,
>>>> --
>>>> Eila
>>>> www.orielresearch.org
>>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
>>>> m/Deep-Learning-In-Production/
>>>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Eila
>>> www.orielresearch.org
>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
>>> m/Deep-Learning-In-Production/
>>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>>
>>>
>>> --
> Eila
> www.orielresearch.org
> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
> m/Deep-Learning-In-Production/
> <https://www.meetup.com/Deep-Learning-In-Production/>
>
>
>

Reply via email to