[jira] [Commented] (FLINK-31686) Filesystem connector should replace the shallow copy with deep copy
[ https://issues.apache.org/jira/browse/FLINK-31686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710020#comment-17710020 ] luoyuxia commented on FLINK-31686: -- +1 for adding a new api. +1 for public discussion. > Filesystem connector should replace the shallow copy with deep copy > --- > > Key: FLINK-31686 > URL: https://issues.apache.org/jira/browse/FLINK-31686 > Project: Flink > Issue Type: Bug > Components: Connectors / FileSystem >Affects Versions: 1.16.1 >Reporter: tanjialiang >Priority: Major > Attachments: image-2023-04-01-16-18-48-762.png, > image-2023-04-01-16-18-56-075.png > > > Hi team, when i using the following sql > {code:java} > CREATE TABLE student ( > `id` STRING, > `name` STRING, > `age` INT > ) WITH ( > 'connector' = 'filesystem', > 'path' = '...', > 'format' = 'orc' > ); > select > t1.total, > t2.total > from > ( > select > count(*) as total, > 1 as join_key > from student > where name = 'tanjialiang' > ) t1 > LEFT JOIN ( > select > count(*) as total, > 1 as join_key > from student; > ) t2 > ON t1.join_key = t2.join_key; {code} > > it will throw an error > !image-2023-04-01-16-18-48-762.png! > > I tried to solve it, and i found filesystem connector's copy function using a > shallow copy instread of deep copy. It lead to all of query from a same > table source reuse the same bulkWriterFormat, and my query have filter > condition, which will push down into the bulkWriterFormat, so the filter > condition maybe reuse. > I found the DynamicTableSource and DynamicTableSink's copy function comment > to ask we should impletement it with deep copy, but i found every connector > are using shallow copy to impletement it. So i think not only the > filesystem connector have this problem. > !image-2023-04-01-16-18-56-075.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31686) Filesystem connector should replace the shallow copy with deep copy
[ https://issues.apache.org/jira/browse/FLINK-31686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709887#comment-17709887 ] Jark Wu commented on FLINK-31686: - I think you are right. The current implementation has some problems. The root cause is {{DecodingFormat}} doesn't support {{copy()}}, which makes the DecodingFormat resued after filter/projection is pushed down. Therefore, we need to first come up with a new API for {{DecodingFormat#copy()}} which may need a public discussion. What do you think [~luoyuxia] [~lincoln.86xy] [~twalthr]? > Filesystem connector should replace the shallow copy with deep copy > --- > > Key: FLINK-31686 > URL: https://issues.apache.org/jira/browse/FLINK-31686 > Project: Flink > Issue Type: Bug > Components: Connectors / FileSystem >Affects Versions: 1.16.1 >Reporter: tanjialiang >Priority: Major > Attachments: image-2023-04-01-16-18-48-762.png, > image-2023-04-01-16-18-56-075.png > > > Hi team, when i using the following sql > {code:java} > CREATE TABLE student ( > `id` STRING, > `name` STRING, > `age` INT > ) WITH ( > 'connector' = 'filesystem', > 'path' = '...', > 'format' = 'orc' > ); > select > t1.total, > t2.total > from > ( > select > count(*) as total, > 1 as join_key > from student > where name = 'tanjialiang' > ) t1 > LEFT JOIN ( > select > count(*) as total, > 1 as join_key > from student; > ) t2 > ON t1.join_key = t2.join_key; {code} > > it will throw an error > !image-2023-04-01-16-18-48-762.png! > > I tried to solve it, and i found filesystem connector's copy function using a > shallow copy instread of deep copy. It lead to all of query from a same > table source reuse the same bulkWriterFormat, and my query have filter > condition, which will push down into the bulkWriterFormat, so the filter > condition maybe reuse. > I found the DynamicTableSource and DynamicTableSink's copy function comment > to ask we should impletement it with deep copy, but i found every connector > are using shallow copy to impletement it. So i think not only the > filesystem connector have this problem. > !image-2023-04-01-16-18-56-075.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31686) Filesystem connector should replace the shallow copy with deep copy
[ https://issues.apache.org/jira/browse/FLINK-31686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17707522#comment-17707522 ] tanjialiang commented on FLINK-31686: - Maybe i can take this ticket, i would like to try it. > Filesystem connector should replace the shallow copy with deep copy > --- > > Key: FLINK-31686 > URL: https://issues.apache.org/jira/browse/FLINK-31686 > Project: Flink > Issue Type: Bug > Components: Connectors / FileSystem >Affects Versions: 1.16.1 >Reporter: tanjialiang >Priority: Major > Attachments: image-2023-04-01-16-18-48-762.png, > image-2023-04-01-16-18-56-075.png > > > Hi team, when i using the following sql > {code:java} > CREATE TABLE student ( > `id` STRING, > `name` STRING, > `age` INT > ) WITH ( > 'connector' = 'filesystem', > 'path' = '...', > 'format' = 'orc' > ); > select > t1.total, > t2.total > from > ( > select > count(*) as total, > 1 as join_key > from student > where name = 'tanjialiang' > ) t1 > LEFT JOIN ( > select > count(*) as total, > 1 as join_key > from student; > ) t2 > ON t1.join_key = t2.join_key; {code} > > it will throw an error > !image-2023-04-01-16-18-48-762.png! > > I tried to solve it, and i found filesystem connector's copy function using a > shallow copy instread of deep copy. It lead to all of query from a same > table source reuse the same bulkWriterFormat, and my query have filter > condition, which will push down into the bulkWriterFormat, so the filter > condition maybe reuse. > I found the DynamicTableSource and DynamicTableSink's copy function comment > to ask we should impletement it with deep copy, but i found every connector > are using shallow copy to impletement it. So i think not only the > filesystem connector have this problem. > !image-2023-04-01-16-18-56-075.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)