[ https://issues.apache.org/jira/browse/FLINK-28591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568488#comment-17568488 ]
Krzysztof Chmielewski edited comment on FLINK-28591 at 7/19/22 11:51 AM: ------------------------------------------------------------------------- The potential issue might be in _CopyingChainingOutput.class_ line 82 where we call input.processElement(copy); The type of input is "StreamExecCalc" but i do not see processElement method on this type.. and when I try to go inside with IntelliJ debug, I actually dont see anything.. Anyways, for case with bigint, {code:java} input.processElement(copy);{code} leads us to {_}GenericArrayData{_}, where for case with int, we dont have this object created. Unfortunately I dont know what is happening inside {code:java} input.processElement(copy);{code} Any hint about how to debug this place would help. Currently I see this: !image-2022-07-19-13-51-45-254.png! was (Author: kristoffsc): The potential issue might be in _CopyingChainingOutput.class_ line 82 where we call input.processElement(copy); The type of input is "StreamExecCalc" but i do not see processElement method on this type.. and when I try to go inside with IntelliJ debug, I actually dont see anything.. Anyways, for case with bigint, {code:java} input.processElement(copy);{code} leads us to {_}GenericArrayData{_}, where for case with int, we dont have this object created. Unfortunately I dont know what is happening inside {code:java} input.processElement(copy);{code} > Array<Row<...>> is not serialized correctly when BigInt is present > ------------------------------------------------------------------ > > Key: FLINK-28591 > URL: https://issues.apache.org/jira/browse/FLINK-28591 > Project: Flink > Issue Type: Bug > Components: Table SQL / API, Table SQL / Planner > Affects Versions: 1.15.0 > Reporter: Andrzej Swatowski > Priority: Major > Attachments: image-2022-07-19-13-51-45-254.png > > > When using Table API to insert data into array of rows, the data apparently > is incorrectly serialized internally, which leads to incorrect serialization > at the connectors. It happens when one of the table fields is a BIGINT (and > does not happen, when it is INT). > E.g., a following table: > {code:java} > CREATE TABLE wrongArray ( > foo bigint, > bar ARRAY<ROW<`foo1` STRING, `foo2` STRING>> > ) WITH ( > 'connector' = 'filesystem', > 'path' = 'file://path/to/somewhere', > 'format' = 'json' > ) {code} > along with the following insert: > {code:java} > insert into wrongArray ( > SELECT > 1, > array[ > ('Field1', 'Value1'), > ('Field2', 'Value2') > ] > FROM (VALUES(1)) > ) {code} > gets serialized into: > {code:java} > { > "foo":1, > "bar":[ > { > "foo1":"Field2", > "foo2":"Value2" > }, > { > "foo1":"Field2", > "foo2":"Value2" > } > ] > }{code} > It is easy to spot that `bar` (an Array of Rows with two Strings) consists of > duplicates of the last row in the array. > On the other hand, when `foo` is of type `int` instead of `bigint`: > {code:java} > CREATE TABLE wrongArray ( > foo int, > bar ARRAY<ROW<`foo1` STRING, `foo2` STRING>> > ) WITH ( > 'connector' = 'filesystem', > 'path' = 'file://path/to/somewhere', > 'format' = 'json' > ) {code} > the previous insert yields correct value: > {code:java} > { > "foo":1, > "bar":[ > { > "foo1":"Field1", > "foo2":"Value1" > }, > { > "foo1":"Field2", > "foo2":"Value2" > } > ] > }{code} > Bug reproduced in the Flink project: > [https://github.com/swtwsk/flink-array-row-bug] > ---- > It is not an error connected with either a specific connector or format. I > have done a bit of debugging when trying to implement my own format and it > seems that `BinaryArrayData` holding the row values has wrong data saved in > its `MemorySegment`, i.e. calling: > {code:java} > for (var i = 0; i < array.size(); i++) { > Object element = arrayDataElementGetter.getElementOrNull(array, i); > }{code} > correctly calculates offsets but yields the same result as the data is > malformed in the array's `MemorySegment`. Such a call can be, e.g., found in > `flink-json` — to be more specific in > {color:#e8912d}org.apache.flink.formats.json.RowDataToJsonConverters::createArrayConverter > {color}(line 241 in 1.15.0 version) -- This message was sent by Atlassian Jira (v8.20.10#820010)