A ParDo should always return an iterator not a string. So if you want to output a single string it should either be "return [str]" or "yield str".
On Mon, Jun 18, 2018 at 1:39 PM OrielResearch Eila Arich-Landkof < [email protected]> wrote: > Thanks for the response. > I tried this within the current parDo, CreateColForSampleFn, Apache beam > returns a warning with recommendation not to return a string. > > So, my questions are: > - Is it essential to separate this transformation in a different ParDo? > - Should I ignore that message? When is this message relevant? > > Many thanks, > Eila > > On Mon, Jun 18, 2018 at 2:52 PM Lukasz Cwik <[email protected]> wrote: > >> User is the correct mailing list. >> >> beam.io.WriteToText takes 'strings' which means that you have to format >> the whole line yourself. You'll want to apply another ParDo >> after CreateColForSampleFn which takes the 1x164 record and concatenates >> each value with ',' in between. >> >> On Mon, Jun 18, 2018 at 9:00 AM OrielResearch Eila Arich-Landkof < >> [email protected]> wrote: >> >>> Hi, >>> >>> Is anyone listening on the user@ mailing list? or should I use a >>> different mailing list? >>> >>> I have made some progress. >>> - ParDo returns a List now >>> - add a header to the WriteToText. >>> >>> The pipeline looks like that: >>> ExploreData = (p | "Extract the rows from dataframe" >> >>> beam.io.Read(beam.io.BigQuerySource('archs4.Debug_annotation')) >>> | "create more columns" >> >>> beam.ParDo(CreateColForSampleFn(colListSubset,outputPath))) >>> >>> (ExploreData | 'writing to CSV files' >> >>> beam.io.WriteToText('gs://dataExploration.txt',file_name_suffix='.csv',num_shards=1,append_trailing_newlines=True,header=colListStr)) >>> >>> >>> The remaining issue is that the output has new line after each value: >>> >>> *None >>> None >>> None >>> None >>> None >>> 30 >>> Primary Tissue >>> None >>> None >>> None* >>> >>> Please let me know how do I get read from this new lines. I hope to be able >>> to open the output file with Google Sheet. >>> >>> >>> Thanks, >>> >>> Eila >>> >>> >>> >>> On Fri, Jun 15, 2018 at 2:45 PM, OrielResearch Eila Arich-Landkof < >>> [email protected]> wrote: >>> >>>> Hi all, >>>> >>>> I am running a pipeline, where a table from BQ is being processed line >>>> by line using ParDo function. >>>> CreateColForSampleFn generates a data frame, with headers and values >>>> (shape: 1x164 ) that I want to pass to WriteToText. >>>> See the followings: >>>> >>>> ExploreData = (p | "Extract the rows from dataframe" >> beam.io.Read( >>>> beam.io.BigQuerySource('archs4.Debug_annotation')) >>>> | "create more columns" >> >>>> beam.ParDo(CreateColForSampleFn(colListSubset,outputPath))) >>>> >>>> (ExploreData | 'writing to CSV files' >> >>>> beam.io.WriteToText('gs://dataExploration.txt',num_shards=1)) >>>> >>>> My questions are related to the returned DF and WriteToText: >>>> 1. when I pass DF from the CreateColForSampleFn to WriteToText , I get >>>> only the headers: >>>> >>>> Sample_contact_phone >>>> Sample_extract_protocol_ch1 >>>> Sample_platform_id >>>> Sick >>>> Sample_title >>>> index >>>> Sample_last_update_date >>>> Sample_contact_country >>>> Sample_channel_count >>>> Sample_library_source >>>> Sample_taxid_ch1 >>>> >>>> >>>> 2. When I return the df in a list [df], I get the following txt for >>>> each row (including the dimensions) >>>> >>>> Sample_contact_phone Sample_extract_protocol_ch1 >>>> Sample_platform_id Sick >>>> >>>> 0 Library construction protocol: Four µg of tota... >>>> GPL11154 None >>>> >>>> [1 rows x 168 columns] >>>> >>>> >>>> >>>> I want to generate a text file that includes: >>>> - One header (if needed, I will add it after the pipeline completed) >>>> - All the values from each rows that was processed and generated DF >>>> - Full cell values, without ... in the middle >>>> >>>> What am I missing? any advice? >>>> >>>> Thanks, >>>> -- >>>> Eila >>>> www.orielresearch.org >>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/> >>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/> >>>> m/Deep-Learning-In-Production/ >>>> <https://www.meetup.com/Deep-Learning-In-Production/> >>>> >>>> >>>> >>> >>> >>> -- >>> Eila >>> www.orielresearch.org >>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/> >>> p.co <https://www.meetup.com/Deep-Learning-In-Production/> >>> m/Deep-Learning-In-Production/ >>> <https://www.meetup.com/Deep-Learning-In-Production/> >>> >>> >>> -- > Eila > www.orielresearch.org > https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/> > p.co <https://www.meetup.com/Deep-Learning-In-Production/> > m/Deep-Learning-In-Production/ > <https://www.meetup.com/Deep-Learning-In-Production/> > > >
