Filed https://issues.apache.org/jira/browse/BEAM-9192 per offline
convo w/ Chamikara's.

On Mon, Jan 13, 2020 at 10:47 PM Bo Shi <[email protected]> wrote:
>
> Hi Chamikara,
>
> I've tried both with and without "--experiment=use_beam_bq_sink" and
> it doesn't seem to affect the outcome.
>
> Regarding "--experiment=beam_fn_api", we're using the portability
> framework [1] so that we can ship our python runtime environment as an
> image via "--worker_harness_container_image" instead of maintaining a
> setup.py file.  It's worked very well for us so far (with the
> exception of this latest issue).
>
>
> [1] Am I using the term right?
> https://beam.apache.org/documentation/runtime/environments/
>
> On Mon, Jan 13, 2020 at 4:40 PM Chamikara Jayalath <[email protected]> 
> wrote:
> >
> >
> >
> > On Mon, Jan 13, 2020 at 12:45 PM Bo Shi <[email protected]> wrote:
> >>
> >> Python 3.7.5
> >> Beam 2.17
> >>
> >> I've used both WriteToBigquery and BigQueryBatchFileLoads successfully
> >> using DirectRunner.  I've boiled the issue down to a small
> >> reproducible case (attached).  The following works great:
> >>
> >> $ pipenv run python repro-direct.py \
> >>   --project=CHANGEME \
> >>   --runner=DirectRunner \
> >>   --temp_location=gs://beam-to-bq--tmp/bshi/tmp \
> >>   --staging_location=gs://beam-to-bq--tmp/bshi/stg \
> >>   --experiment=beam_fn_api \   # <<<< NB! (I assume this is a no-op w/ 
> >> Direct
> >>   --save_main_function
> >>
> >> $ bq --project=CHANGEME query 'select * from bshi_test.direct'
> >> Waiting on bqjob_r53488a6b9ed6cb20_0000016fa05fb570_1 ... (0s) Current
> >> status: DONE
> >> +----------+----------+
> >> | some_str | some_int |
> >> +----------+----------+
> >> | hello    |        1 |
> >> | world    |        2 |
> >> +----------+----------+
> >>
> >> When changing to --runner=DataflowRunner, I get a cryptic Java error
> >> with no other useful Python feedback (see class_cast_exception.txt)
> >>
> >> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException:
> >> Dataflow pipeline failed. State: FAILED, Error:
> >> java.util.concurrent.ExecutionException: java.lang.ClassCastException:
> >> [B cannot be cast to org.apache.beam.sdk.values.KV
> >>
> >> On a hunch, I removed --experiment=beam_fn_api and DataflowRunner
> >> works fine.  Am I doing something against the intent of the SDK?
> >
> >
> > Can you try adding "--experiment=use_beam_bq_sink"  ?
> > Also, out of curiosity, why are you setting "--experiment=beam_fn_api" ?
> >
> >
> >>
> >> Happy to share access to the GCP project if that would assist in debugging.

Reply via email to