Hello everybody
I am facing a problem with a pipeline that runs perfectly on directrunner,
but when it comes to dataflow, it turns into a mess. It changes the element
and the side input (access).
The side input reads only a line with credentials.
Any thoughts on how its done are more than welcome. How do you manage
sensitive information in templated pipelines?
It is something like this:
class GetStuff(beam.DoFn):
def __init__(self, input1, input2):
self.input1 = input1
self.input2 = input2
def process(self, element, access):
user, token = access.split('\t')
thing1, thing2 = element.split('\t')
credentials_pipe = (
p
| 'Get credentials' >> beam.io.ReadFromText(user_options.credentials)
)
main_pipe = (
p
| 'Get information' >> beam.io.ReadFromText(user_options.input_file)
| 'Get prediction from severity' >> beam.ParDo(GetPrediction(
user_options.input1,
user_options.input2,
), beam.pvalue.AsSingleton(credentials_pipe))
)
p.run()
--
*ANDRÉ ROCHA SILVA*
* DATA ENGINEER*
(48) 3181-0611
<https://www.linkedin.com/in/andre-rocha-silva/> /andre-rocha-silva/
<http://portaltelemedicina.com.br/>
<https://www.youtube.com/channel/UC0KH36-OXHFIKjlRY2GyAtQ>
<https://pt-br.facebook.com/PortalTelemedicina/>
<https://www.linkedin.com/company/9426084/>