Dataflow currently supports custom containers only with pipelines that set --experiments=beam_fn_api. All Python streaming pipelines do so indirectly. Batch pipelines require setting this flag manually, but some functionality may not be available [1]. You could try custom containers with a simple (e.g. wordcount) pipeline first, and then try your pipeline. You can also try running your pipeline on Dataflow runner without custom containers and with --experiments=beam_fn_api to isolate any potential issues you might hit.
[1] https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0 On Mon, Dec 2, 2019 at 10:35 AM Kyle Weaver <[email protected]> wrote: > Sorry Carl, I think the page I sent you might be either incorrect or > incomplete. I filed https://issues.apache.org/jira/browse/BEAM-8863 for > that. > > In the meantime, the instructions on the page Luke linked should work. > > On Thu, Nov 28, 2019 at 9:38 AM Carl Thomé <[email protected]> wrote: > >> Thanks! >> >> I tried following https://beam >> .apache.org/documentation/runtime/environments/ but get a "Custom images >> are not yet supported" error message from DataFlow. Perhaps I did something >> wrong? >> >> "error": { >>> "code": 400, >>> "message": "(24f8c9b6e647d55d): The workflow could not be created. >>> Causes: (24f8c9b6e647de48): Invalid worker harness container image: >>> my_image. Custom images are not yet supported.", >>> "status": "INVALID_ARGUMENT" >>> } >>> >> >> On Wed, 27 Nov 2019 at 18:34, Kyle Weaver <[email protected]> wrote: >> >>> You can also configure your own Docker images if you like, instructions >>> here: https://beam.apache.org/documentation/runtime/environments/ >>> >>> On Wed, Nov 27, 2019 at 12:38 AM Carl Thomé <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I have a Beam pipeline written in the Python SDK that decodes audio >>>> files into TFRecord:s. I'd like to run it on DataFlow but I'm missing >>>> libsndfile1 in the workers. >>>> >>>> Is there any way of configuring the base image for the DataFlow workers >>>> (e.g. Dockerfile + apt install) to get audio decoding working? >>>> >>>> On a similar note, when it comes to Python dependencies in the DataFlow >>>> runtime (like librosa), is there a wish list somewhere on which we can >>>> upvote missing Python libraries? >>>> >>>> Cheers, >>>> Carl Thomé >>>> >>>
