Returning dataframe from parDo and printing its value - advice?

OrielResearch Eila Arich-Landkof Fri, 15 Jun 2018 11:46:29 -0700

Hi all,

I am running a pipeline, where a table from BQ is being processed line by
line using ParDo function.
CreateColForSampleFn generates a data frame, with headers and values
(shape: 1x164 ) that I want to pass to WriteToText.
See the followings:


ExploreData = (p | "Extract the rows from dataframe" >>
beam.io.Read(beam.io.BigQuerySource('archs4.Debug_annotation'))
                | "create more columns" >>
beam.ParDo(CreateColForSampleFn(colListSubset,outputPath)))

(ExploreData | 'writing to CSV files' >>
beam.io.WriteToText('gs://archs4/output/dataExploration.txt',num_shards=1))

My questions are related to the returned DF and WriteToText:
1. when I pass DF from the CreateColForSampleFn to WriteToText , I get only
the headers:

Sample_contact_phone
Sample_extract_protocol_ch1
Sample_platform_id
Sick
Sample_title
index
Sample_last_update_date
Sample_contact_country
Sample_channel_count
Sample_library_source
Sample_taxid_ch1


2. When I return the df in a list [df], I get the following txt for each
row (including the dimensions)

 Sample_contact_phone
Sample_extract_protocol_ch1 Sample_platform_id  Sick

0                       Library construction protocol: Four Âµg of
tota...           GPL11154  None

[1 rows x 168 columns]



I want to generate a text file that includes:
- One header (if needed, I will add it after the pipeline completed)
- All the values from each rows that was processed and generated DF
- Full cell values, without ... in the middle

What am I missing? any advice?

Thanks,
-- 
Eila
www.orielresearch.org
https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>p.co
<https://www.meetup.com/Deep-Learning-In-Production/>
m/Deep-Learning-In-Production/
<https://www.meetup.com/Deep-Learning-In-Production/>

Returning dataframe from parDo and printing its value - advice?

Reply via email to