Thanks for the response. I tried this within the current parDo, CreateColForSampleFn, Apache beam returns a warning with recommendation not to return a string.
So, my questions are: - Is it essential to separate this transformation in a different ParDo? - Should I ignore that message? When is this message relevant? Many thanks, Eila On Mon, Jun 18, 2018 at 2:52 PM Lukasz Cwik <[email protected]> wrote: > User is the correct mailing list. > > beam.io.WriteToText takes 'strings' which means that you have to format > the whole line yourself. You'll want to apply another ParDo > after CreateColForSampleFn which takes the 1x164 record and concatenates > each value with ',' in between. > > On Mon, Jun 18, 2018 at 9:00 AM OrielResearch Eila Arich-Landkof < > [email protected]> wrote: > >> Hi, >> >> Is anyone listening on the user@ mailing list? or should I use a >> different mailing list? >> >> I have made some progress. >> - ParDo returns a List now >> - add a header to the WriteToText. >> >> The pipeline looks like that: >> ExploreData = (p | "Extract the rows from dataframe" >> >> beam.io.Read(beam.io.BigQuerySource('archs4.Debug_annotation')) >> | "create more columns" >> >> beam.ParDo(CreateColForSampleFn(colListSubset,outputPath))) >> >> (ExploreData | 'writing to CSV files' >> >> beam.io.WriteToText('gs://dataExploration.txt',file_name_suffix='.csv',num_shards=1,append_trailing_newlines=True,header=colListStr)) >> >> >> The remaining issue is that the output has new line after each value: >> >> *None >> None >> None >> None >> None >> 30 >> Primary Tissue >> None >> None >> None* >> >> Please let me know how do I get read from this new lines. I hope to be able >> to open the output file with Google Sheet. >> >> >> Thanks, >> >> Eila >> >> >> >> On Fri, Jun 15, 2018 at 2:45 PM, OrielResearch Eila Arich-Landkof < >> [email protected]> wrote: >> >>> Hi all, >>> >>> I am running a pipeline, where a table from BQ is being processed line >>> by line using ParDo function. >>> CreateColForSampleFn generates a data frame, with headers and values >>> (shape: 1x164 ) that I want to pass to WriteToText. >>> See the followings: >>> >>> ExploreData = (p | "Extract the rows from dataframe" >> beam.io.Read( >>> beam.io.BigQuerySource('archs4.Debug_annotation')) >>> | "create more columns" >> >>> beam.ParDo(CreateColForSampleFn(colListSubset,outputPath))) >>> >>> (ExploreData | 'writing to CSV files' >> >>> beam.io.WriteToText('gs://dataExploration.txt',num_shards=1)) >>> >>> My questions are related to the returned DF and WriteToText: >>> 1. when I pass DF from the CreateColForSampleFn to WriteToText , I get >>> only the headers: >>> >>> Sample_contact_phone >>> Sample_extract_protocol_ch1 >>> Sample_platform_id >>> Sick >>> Sample_title >>> index >>> Sample_last_update_date >>> Sample_contact_country >>> Sample_channel_count >>> Sample_library_source >>> Sample_taxid_ch1 >>> >>> >>> 2. When I return the df in a list [df], I get the following txt for each >>> row (including the dimensions) >>> >>> Sample_contact_phone Sample_extract_protocol_ch1 >>> Sample_platform_id Sick >>> >>> 0 Library construction protocol: Four µg of tota... >>> GPL11154 None >>> >>> [1 rows x 168 columns] >>> >>> >>> >>> I want to generate a text file that includes: >>> - One header (if needed, I will add it after the pipeline completed) >>> - All the values from each rows that was processed and generated DF >>> - Full cell values, without ... in the middle >>> >>> What am I missing? any advice? >>> >>> Thanks, >>> -- >>> Eila >>> www.orielresearch.org >>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/> >>> p.co <https://www.meetup.com/Deep-Learning-In-Production/> >>> m/Deep-Learning-In-Production/ >>> <https://www.meetup.com/Deep-Learning-In-Production/> >>> >>> >>> >> >> >> -- >> Eila >> www.orielresearch.org >> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/> >> p.co <https://www.meetup.com/Deep-Learning-In-Production/> >> m/Deep-Learning-In-Production/ >> <https://www.meetup.com/Deep-Learning-In-Production/> >> >> >> -- Eila www.orielresearch.org https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>p.co <https://www.meetup.com/Deep-Learning-In-Production/> m/Deep-Learning-In-Production/ <https://www.meetup.com/Deep-Learning-In-Production/>
