The test runner intentionally does some ugly things in order to expose problems which might otherwise be missed. In particular, I believe the test runner enforces coding between each transform and scrambles order of elements whereas production pipelines will often have many transforms fused together without serializing data.
> I've tried TableRowJsonCoder, but seems like it converts all object inside TableRow to LinkedHashMaps This is likely intended. I would expect only the top-level container to be a TableRow and that nested maps would be some other map type. On Wed, Jul 8, 2020 at 10:52 AM Kirill Zhdanovich <[email protected]> wrote: > Hi Jeff, > It's a simple pipeline that takes PCollection of TableRow which is > selected from Google Analytics export to BigQuery. So each TableRow follows > this scheme https://support.google.com/analytics/answer/3437719?hl=en > I have part of the code doing casting to TableRow like this: > > Boolean isMobile = (Boolean) (((TableRow) row.get("device")).get("isMobile")); > > or > > List<Hit> hits = ((List<TableRow>) > row.get("hits")).stream().map(Hit::new).collect(Collectors.toList()); > > I don't have issues running this pipeline in production. I have this > issue, only when I tried to write end to end test. > Do you know if there are existing coders for TableRow that I can use? I've > tried TableRowJsonCoder, but seems like it converts all object inside > TableRow to LinkedHashMaps > > On Wed, 8 Jul 2020 at 17:30, Jeff Klukas <[email protected]> wrote: > >> Kirill - Can you tell us more about what Job.runJob is doing? I would not >> expect the Beam SDK itself to do any casting to TableRow, so is there a >> line in your code where you're explicitly casting to TableRow? There may be >> a point where you need to explicitly set the coder on a PCollection to >> deserialize back to TableRow objects. >> >> On Wed, Jul 8, 2020 at 10:11 AM Kirill Zhdanovich <[email protected]> >> wrote: >> >>> Here is a code example: >>> >>> List<TableRow> ss = Arrays.asList(session1, session2); >>> PCollection<TableRow> sessions = p.apply(Create.of(ss)); >>> PCollection<MetricsWithDimension> res = Job.runJob(sessions, "20200614", >>> false, new ProductCatalog()); >>> p.run(); >>> >>> >>> On Wed, 8 Jul 2020 at 17:07, Kirill Zhdanovich <[email protected]> >>> wrote: >>> >>>> Hi, >>>> I want to test pipeline and the input for it is PCollection of >>>> TableRows. I've created a test, and when I run it I get an error: >>>> >>>> java.lang.ClassCastException: class java.util.LinkedHashMap cannot be >>>> cast to class com.google.api.services.bigquery.model.TableRow >>>> >>>> Is it a known issue? Thank you in advance >>>> >>>> -- >>>> Best Regards, >>>> Kirill >>>> >>> >>> >>> -- >>> Best Regards, >>> Kirill >>> >> > > -- > Best Regards, > Kirill >
