Hello! I’m looking for some help writing a custom transform that reads from
an unbound source. A basic version of this would look something like this:

import apache_beam as beam
import myeventlibrary
class _ReadEventsFn(beam.DoFn):
  def process(self, unused_element):
    subscriber = myeventlibrary.Subscriber()
    while True:
      # blocks until an event is received
      event = subscriber.get()
      yield event

with beam.Pipeline() as pipe:
  (
    pipe
    | beam.Impulse()
    | beam.ParDo(_ReadEventsFn())
    | beam.Map(print)
  )

I have a few questions:

   - When executing this in the DirectRunner with the default number of
   workers (1), is it correct to assume the process would be constantly
   blocking and rarely processing other parts of the pipeline?
   - I noticed there is a decorator DoFn.unbounded_per_element which seems
   useful, but it’s a bit unclear what it does. I also noticed in the java
   docs it is an error to apply this decorator if the DoFn is not a SDF.
   - I took a stab at writing this as a SDF. The incoming events are not
   really something that can be read in parallel and they trickle in slowly.
   In this scenario is there anything to gain by using a SDF?

Thanks!
-Sam

Reply via email to