Hi,

Is anyone listening on the user@ mailing list? or should I use a different
mailing list?

I have made some progress.
- ParDo returns a List now
- add a header to the WriteToText.

The pipeline looks like that:
ExploreData = (p | "Extract the rows from dataframe" >>
beam.io.Read(beam.io.BigQuerySource('archs4.Debug_annotation'))
                | "create more columns" >>
beam.ParDo(CreateColForSampleFn(colListSubset,outputPath)))

(ExploreData | 'writing to CSV files' >>
beam.io.WriteToText('gs://dataExploration.txt',file_name_suffix='.csv',num_shards=1,append_trailing_newlines=True,header=colListStr))


The remaining issue is that the output has new line after each value:

*None
None
None
None
None
 30
 Primary Tissue
None
None
None*

Please let me know how do I get read from this new lines. I hope to be
able to open the output file with Google Sheet.


Thanks,

Eila



On Fri, Jun 15, 2018 at 2:45 PM, OrielResearch Eila Arich-Landkof <
[email protected]> wrote:

> Hi all,
>
> I am running a pipeline, where a table from BQ is being processed line by
> line using ParDo function.
> CreateColForSampleFn generates a data frame, with headers and values
> (shape: 1x164 ) that I want to pass to WriteToText.
> See the followings:
>
> ExploreData = (p | "Extract the rows from dataframe" >> beam.io.Read(
> beam.io.BigQuerySource('archs4.Debug_annotation'))
>                 | "create more columns" >> beam.ParDo(
> CreateColForSampleFn(colListSubset,outputPath)))
>
> (ExploreData | 'writing to CSV files' >>
> beam.io.WriteToText('gs://dataExploration.txt',num_shards=1))
>
> My questions are related to the returned DF and WriteToText:
> 1. when I pass DF from the CreateColForSampleFn to WriteToText , I get
> only the headers:
>
> Sample_contact_phone
> Sample_extract_protocol_ch1
> Sample_platform_id
> Sick
> Sample_title
> index
> Sample_last_update_date
> Sample_contact_country
> Sample_channel_count
> Sample_library_source
> Sample_taxid_ch1
>
>
> 2. When I return the df in a list [df], I get the following txt for each
> row (including the dimensions)
>
>  Sample_contact_phone                        Sample_extract_protocol_ch1 
> Sample_platform_id  Sick
>
> 0                       Library construction protocol: Four µg of tota...    
>        GPL11154  None
>
> [1 rows x 168 columns]
>
>
>
> I want to generate a text file that includes:
> - One header (if needed, I will add it after the pipeline completed)
> - All the values from each rows that was processed and generated DF
> - Full cell values, without ... in the middle
>
> What am I missing? any advice?
>
> Thanks,
> --
> Eila
> www.orielresearch.org
> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
> p.co <https://www.meetup.com/Deep-Learning-In-Production/>m/Deep-
> Learning-In-Production/
> <https://www.meetup.com/Deep-Learning-In-Production/>
>
>
>


-- 
Eila
www.orielresearch.org
https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>p.co
<https://www.meetup.com/Deep-Learning-In-Production/>
m/Deep-Learning-In-Production/
<https://www.meetup.com/Deep-Learning-In-Production/>

Reply via email to