Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Thanks everyone for the positive feedback. I updated the FLIP with the proposed minimal solution: https://cwiki.apache.org/confluence/display/FLINK/FLIP-348%3A+Make+expanding+behavior+of+virtual+metadata+columns+configurable I also changed the title to not cause any confusion with the concept of system columns. Now the FLIP only introduces a new ConfigOption and does not specify constraints on the naming of metadata columns anymore. If there are no objections, I would start a voting thread by tomorrow. Thanks, Timo On 29.08.23 00:21, Alexey Leonov-Vendrovskiy wrote: SGTM For future reference, responding here: 5) Any operator can introduce system (psedo) columns. This is clearly out of scope for this FLIP. The implementation effort would be huge and could introduce a lot of bugs. I didn't imply any specific implementation or feature coverage in the currently proposed FLIP, but rather as a way to describe the semantics of system columns that those could be (but don't have to) introduced by any operator. Thanks, Alexey On Tue, Aug 22, 2023 at 2:18 PM Jing Ge wrote: Hi Timo, Your last suggestion sounds good. Best regards, Jing On Mon, Aug 21, 2023 at 4:21 AM Benchao Li wrote: It sounds good to me too, that we avoid introducing the concept of "system columns" for now. Timo Walther 于2023年8月18日周五 22:38写道: Great, I also like my last suggestion as it is even more elegant. I will update the FLIP until Monday. Regards, Timo On 17.08.23 13:55, Jark Wu wrote: Hi Timo, I'm fine with your latest suggestion that introducing a flag to control expanding behavior of metadata virtual columns, but not introducing any concept of system/pseudo columns for now. Best, Jark On Tue, 15 Aug 2023 at 23:25, Timo Walther wrote: Hi everyone, I would like to bump this thread up again. Esp. I would like to hear opinions on my latest suggestions to simply use METADATA VIRTUAL as system columns and only introduce a config option for the SELECT * behavior. Implementation-wise this means minimal effort and less new concepts. Looking forward to any kind of feedback. Thanks, Timo On 07.08.23 12:07, Timo Walther wrote: Hi everyone, thanks for the various feedback and lively discussion. Sorry, for the late reply as I was on vacation. Let me answer to some of the topics: 1) Systems columns should not be shown with DESCRIBE statements This sounds fine to me. I will update the FLIP in the next iteration. 2) Do you know why most SQL systems do not need any prefix with their pseudo column? Because most systems do not have external catalogs or connectors. And also the number of system columns is limited to a handful of columns. Flink is more generic and thus more complex. And we have already the concepts of metadata columns. We need to be careful with not overloading our language. 3) Implementation details > how to you plan to implement the "system columns", do we need to add it to `RelNode` level? Or we just need to do it in the parsing/validating phase? > I'm not sure that Calcite's "system column" feature is fully ready My plan would be to only modify the parsing/validating phase. I would like to avoid additional complexity in planner rules and connector/catalog interfaces. Metadata columns already support projection push down and are passed through the stack (via Schema, ResolvedSchema, SupportsReadableMetadata). Calcite's "system column" feature is not fully ready yet and it would be a large effort potentially introducing bugs in supporting it. Thus, I'm proposing to leverage what we already have. The only part that needs to be modified is the "expand star" method in SqlValidator and Table API. Queries such as `SELECT * FROM (SELECT $rowtime, * FROM t);` would show $rowtime as the expand star has only a special case when in the scope for `FROM t`. All further subqueries treat it as a regular column. 4) Built-in defined pseudo-column "$rowtime" > Did you consider making it as a built-in defined pseudo-column "$rowtime" which returns the time attribute value (if exists) or null (if non-exists) for every table/query, and pseudo-column "$proctime" always returns PROCTIME() value for each table/query Built-in pseudo-columns mean that connector or catalog providers need consensus in Flink which pseudo-columns should be built-in. We should keep the concept generic and let platform providers decide which pseudo columns to expose. $rowtime might be obvious but others such as $partition or $offset are tricky to get consensus as every external connector works differently. Also a connector might want to expose different time semantics (such as ingestion time). 5) Any operator can introduce system (psedo) columns. This is clearly out of scope for this FLIP. The implementation effort would be huge and could introduce a lot of bugs. 6) "Metadata Key Prefix Constraint" which is still a little complex Another option could be to
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
SGTM For future reference, responding here: 5) Any operator can introduce system (psedo) columns. > > This is clearly out of scope for this FLIP. The implementation effort > would be huge and could introduce a lot of bugs. > I didn't imply any specific implementation or feature coverage in the currently proposed FLIP, but rather as a way to describe the semantics of system columns that those could be (but don't have to) introduced by any operator. Thanks, Alexey On Tue, Aug 22, 2023 at 2:18 PM Jing Ge wrote: > Hi Timo, > > Your last suggestion sounds good. > > Best regards, > Jing > > On Mon, Aug 21, 2023 at 4:21 AM Benchao Li wrote: > > > It sounds good to me too, that we avoid introducing the concept of > "system > > columns" for now. > > > > Timo Walther 于2023年8月18日周五 22:38写道: > > > > > Great, I also like my last suggestion as it is even more elegant. I > will > > > update the FLIP until Monday. > > > > > > Regards, > > > Timo > > > > > > On 17.08.23 13:55, Jark Wu wrote: > > > > Hi Timo, > > > > > > > > I'm fine with your latest suggestion that introducing a flag to > control > > > > expanding behavior of metadata virtual columns, but not introducing > > > > any concept of system/pseudo columns for now. > > > > > > > > Best, > > > > Jark > > > > > > > > On Tue, 15 Aug 2023 at 23:25, Timo Walther > wrote: > > > > > > > >> Hi everyone, > > > >> > > > >> I would like to bump this thread up again. > > > >> > > > >> Esp. I would like to hear opinions on my latest suggestions to > simply > > > >> use METADATA VIRTUAL as system columns and only introduce a config > > > >> option for the SELECT * behavior. Implementation-wise this means > > minimal > > > >> effort and less new concepts. > > > >> > > > >> Looking forward to any kind of feedback. > > > >> > > > >> Thanks, > > > >> Timo > > > >> > > > >> On 07.08.23 12:07, Timo Walther wrote: > > > >>> Hi everyone, > > > >>> > > > >>> thanks for the various feedback and lively discussion. Sorry, for > the > > > >>> late reply as I was on vacation. Let me answer to some of the > topics: > > > >>> > > > >>> 1) Systems columns should not be shown with DESCRIBE statements > > > >>> > > > >>> This sounds fine to me. I will update the FLIP in the next > iteration. > > > >>> > > > >>> 2) Do you know why most SQL systems do not need any prefix with > their > > > >>> pseudo column? > > > >>> > > > >>> Because most systems do not have external catalogs or connectors. > And > > > >>> also the number of system columns is limited to a handful of > columns. > > > >>> Flink is more generic and thus more complex. And we have already > the > > > >>> concepts of metadata columns. We need to be careful with not > > > overloading > > > >>> our language. > > > >>> > > > >>> 3) Implementation details > > > >>> > > > >>> > how to you plan to implement the "system columns", do we need > to > > > add > > > >>> it to `RelNode` level? Or we just need to do it in the > > > >>> parsing/validating phase? > > > >>> > I'm not sure that Calcite's "system column" feature is fully > > ready > > > >>> > > > >>> My plan would be to only modify the parsing/validating phase. I > would > > > >>> like to avoid additional complexity in planner rules and > > > >>> connector/catalog interfaces. Metadata columns already support > > > >>> projection push down and are passed through the stack (via Schema, > > > >>> ResolvedSchema, SupportsReadableMetadata). Calcite's "system > column" > > > >>> feature is not fully ready yet and it would be a large effort > > > >>> potentially introducing bugs in supporting it. Thus, I'm proposing > to > > > >>> leverage what we already have. The only part that needs to be > > modified > > > >>> is the "expand star" method in SqlValidator and Table API. > > > >>> > > > >>> Queries such as `SELECT * FROM (SELECT $rowtime, * FROM t);` would > > show > > > >>> $rowtime as the expand star has only a special case when in the > scope > > > >>> for `FROM t`. All further subqueries treat it as a regular column. > > > >>> > > > >>> 4) Built-in defined pseudo-column "$rowtime" > > > >>> > > > >>> > Did you consider making it as a built-in defined pseudo-column > > > >>> "$rowtime" which returns the time attribute value (if exists) or > null > > > >>> (if non-exists) for every table/query, and pseudo-column > "$proctime" > > > >>> always returns PROCTIME() value for each table/query > > > >>> > > > >>> Built-in pseudo-columns mean that connector or catalog providers > need > > > >>> consensus in Flink which pseudo-columns should be built-in. We > should > > > >>> keep the concept generic and let platform providers decide which > > pseudo > > > >>> columns to expose. $rowtime might be obvious but others such as > > > >>> $partition or $offset are tricky to get consensus as every external > > > >>> connector works differently. Also a connector might want to expose > > > >>> different time semantics (such as ingestion time). > > > >>> > > > >>> 5) Any operator can
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Hi Timo, Your last suggestion sounds good. Best regards, Jing On Mon, Aug 21, 2023 at 4:21 AM Benchao Li wrote: > It sounds good to me too, that we avoid introducing the concept of "system > columns" for now. > > Timo Walther 于2023年8月18日周五 22:38写道: > > > Great, I also like my last suggestion as it is even more elegant. I will > > update the FLIP until Monday. > > > > Regards, > > Timo > > > > On 17.08.23 13:55, Jark Wu wrote: > > > Hi Timo, > > > > > > I'm fine with your latest suggestion that introducing a flag to control > > > expanding behavior of metadata virtual columns, but not introducing > > > any concept of system/pseudo columns for now. > > > > > > Best, > > > Jark > > > > > > On Tue, 15 Aug 2023 at 23:25, Timo Walther wrote: > > > > > >> Hi everyone, > > >> > > >> I would like to bump this thread up again. > > >> > > >> Esp. I would like to hear opinions on my latest suggestions to simply > > >> use METADATA VIRTUAL as system columns and only introduce a config > > >> option for the SELECT * behavior. Implementation-wise this means > minimal > > >> effort and less new concepts. > > >> > > >> Looking forward to any kind of feedback. > > >> > > >> Thanks, > > >> Timo > > >> > > >> On 07.08.23 12:07, Timo Walther wrote: > > >>> Hi everyone, > > >>> > > >>> thanks for the various feedback and lively discussion. Sorry, for the > > >>> late reply as I was on vacation. Let me answer to some of the topics: > > >>> > > >>> 1) Systems columns should not be shown with DESCRIBE statements > > >>> > > >>> This sounds fine to me. I will update the FLIP in the next iteration. > > >>> > > >>> 2) Do you know why most SQL systems do not need any prefix with their > > >>> pseudo column? > > >>> > > >>> Because most systems do not have external catalogs or connectors. And > > >>> also the number of system columns is limited to a handful of columns. > > >>> Flink is more generic and thus more complex. And we have already the > > >>> concepts of metadata columns. We need to be careful with not > > overloading > > >>> our language. > > >>> > > >>> 3) Implementation details > > >>> > > >>> > how to you plan to implement the "system columns", do we need to > > add > > >>> it to `RelNode` level? Or we just need to do it in the > > >>> parsing/validating phase? > > >>> > I'm not sure that Calcite's "system column" feature is fully > ready > > >>> > > >>> My plan would be to only modify the parsing/validating phase. I would > > >>> like to avoid additional complexity in planner rules and > > >>> connector/catalog interfaces. Metadata columns already support > > >>> projection push down and are passed through the stack (via Schema, > > >>> ResolvedSchema, SupportsReadableMetadata). Calcite's "system column" > > >>> feature is not fully ready yet and it would be a large effort > > >>> potentially introducing bugs in supporting it. Thus, I'm proposing to > > >>> leverage what we already have. The only part that needs to be > modified > > >>> is the "expand star" method in SqlValidator and Table API. > > >>> > > >>> Queries such as `SELECT * FROM (SELECT $rowtime, * FROM t);` would > show > > >>> $rowtime as the expand star has only a special case when in the scope > > >>> for `FROM t`. All further subqueries treat it as a regular column. > > >>> > > >>> 4) Built-in defined pseudo-column "$rowtime" > > >>> > > >>> > Did you consider making it as a built-in defined pseudo-column > > >>> "$rowtime" which returns the time attribute value (if exists) or null > > >>> (if non-exists) for every table/query, and pseudo-column "$proctime" > > >>> always returns PROCTIME() value for each table/query > > >>> > > >>> Built-in pseudo-columns mean that connector or catalog providers need > > >>> consensus in Flink which pseudo-columns should be built-in. We should > > >>> keep the concept generic and let platform providers decide which > pseudo > > >>> columns to expose. $rowtime might be obvious but others such as > > >>> $partition or $offset are tricky to get consensus as every external > > >>> connector works differently. Also a connector might want to expose > > >>> different time semantics (such as ingestion time). > > >>> > > >>> 5) Any operator can introduce system (psedo) columns. > > >>> > > >>> This is clearly out of scope for this FLIP. The implementation effort > > >>> would be huge and could introduce a lot of bugs. > > >>> > > >>> 6) "Metadata Key Prefix Constraint" which is still a little complex > > >>> > > >>> Another option could be to drop the naming pattern constraint. We > could > > >>> make it configurable that METADATA VIRTUAL columns are never selected > > by > > >>> default in SELECT * or visible in DESCRIBE. This would further > simplify > > >>> the FLIP and esp lower the impact on the planner and all interfaces. > > >>> > > >>> What do you think about this? We could introduce a flag: > > >>> > > >>> table.expand-metadata-columns (better name to be defined) > > >>> > > >>> This way we don't need to
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
It sounds good to me too, that we avoid introducing the concept of "system columns" for now. Timo Walther 于2023年8月18日周五 22:38写道: > Great, I also like my last suggestion as it is even more elegant. I will > update the FLIP until Monday. > > Regards, > Timo > > On 17.08.23 13:55, Jark Wu wrote: > > Hi Timo, > > > > I'm fine with your latest suggestion that introducing a flag to control > > expanding behavior of metadata virtual columns, but not introducing > > any concept of system/pseudo columns for now. > > > > Best, > > Jark > > > > On Tue, 15 Aug 2023 at 23:25, Timo Walther wrote: > > > >> Hi everyone, > >> > >> I would like to bump this thread up again. > >> > >> Esp. I would like to hear opinions on my latest suggestions to simply > >> use METADATA VIRTUAL as system columns and only introduce a config > >> option for the SELECT * behavior. Implementation-wise this means minimal > >> effort and less new concepts. > >> > >> Looking forward to any kind of feedback. > >> > >> Thanks, > >> Timo > >> > >> On 07.08.23 12:07, Timo Walther wrote: > >>> Hi everyone, > >>> > >>> thanks for the various feedback and lively discussion. Sorry, for the > >>> late reply as I was on vacation. Let me answer to some of the topics: > >>> > >>> 1) Systems columns should not be shown with DESCRIBE statements > >>> > >>> This sounds fine to me. I will update the FLIP in the next iteration. > >>> > >>> 2) Do you know why most SQL systems do not need any prefix with their > >>> pseudo column? > >>> > >>> Because most systems do not have external catalogs or connectors. And > >>> also the number of system columns is limited to a handful of columns. > >>> Flink is more generic and thus more complex. And we have already the > >>> concepts of metadata columns. We need to be careful with not > overloading > >>> our language. > >>> > >>> 3) Implementation details > >>> > >>> > how to you plan to implement the "system columns", do we need to > add > >>> it to `RelNode` level? Or we just need to do it in the > >>> parsing/validating phase? > >>> > I'm not sure that Calcite's "system column" feature is fully ready > >>> > >>> My plan would be to only modify the parsing/validating phase. I would > >>> like to avoid additional complexity in planner rules and > >>> connector/catalog interfaces. Metadata columns already support > >>> projection push down and are passed through the stack (via Schema, > >>> ResolvedSchema, SupportsReadableMetadata). Calcite's "system column" > >>> feature is not fully ready yet and it would be a large effort > >>> potentially introducing bugs in supporting it. Thus, I'm proposing to > >>> leverage what we already have. The only part that needs to be modified > >>> is the "expand star" method in SqlValidator and Table API. > >>> > >>> Queries such as `SELECT * FROM (SELECT $rowtime, * FROM t);` would show > >>> $rowtime as the expand star has only a special case when in the scope > >>> for `FROM t`. All further subqueries treat it as a regular column. > >>> > >>> 4) Built-in defined pseudo-column "$rowtime" > >>> > >>> > Did you consider making it as a built-in defined pseudo-column > >>> "$rowtime" which returns the time attribute value (if exists) or null > >>> (if non-exists) for every table/query, and pseudo-column "$proctime" > >>> always returns PROCTIME() value for each table/query > >>> > >>> Built-in pseudo-columns mean that connector or catalog providers need > >>> consensus in Flink which pseudo-columns should be built-in. We should > >>> keep the concept generic and let platform providers decide which pseudo > >>> columns to expose. $rowtime might be obvious but others such as > >>> $partition or $offset are tricky to get consensus as every external > >>> connector works differently. Also a connector might want to expose > >>> different time semantics (such as ingestion time). > >>> > >>> 5) Any operator can introduce system (psedo) columns. > >>> > >>> This is clearly out of scope for this FLIP. The implementation effort > >>> would be huge and could introduce a lot of bugs. > >>> > >>> 6) "Metadata Key Prefix Constraint" which is still a little complex > >>> > >>> Another option could be to drop the naming pattern constraint. We could > >>> make it configurable that METADATA VIRTUAL columns are never selected > by > >>> default in SELECT * or visible in DESCRIBE. This would further simplify > >>> the FLIP and esp lower the impact on the planner and all interfaces. > >>> > >>> What do you think about this? We could introduce a flag: > >>> > >>> table.expand-metadata-columns (better name to be defined) > >>> > >>> This way we don't need to introduce the concept of system columns yet, > >>> but can still offer similar functionality with minimal overhead in the > >>> code base. > >>> > >>> Regards, > >>> Timo > >>> > >>> > >>> > >>> > >>> On 04.08.23 23:06, Alexey Leonov-Vendrovskiy wrote: > Looks like both kinds of system columns can converge. > We can say that any operator can
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Great, I also like my last suggestion as it is even more elegant. I will update the FLIP until Monday. Regards, Timo On 17.08.23 13:55, Jark Wu wrote: Hi Timo, I'm fine with your latest suggestion that introducing a flag to control expanding behavior of metadata virtual columns, but not introducing any concept of system/pseudo columns for now. Best, Jark On Tue, 15 Aug 2023 at 23:25, Timo Walther wrote: Hi everyone, I would like to bump this thread up again. Esp. I would like to hear opinions on my latest suggestions to simply use METADATA VIRTUAL as system columns and only introduce a config option for the SELECT * behavior. Implementation-wise this means minimal effort and less new concepts. Looking forward to any kind of feedback. Thanks, Timo On 07.08.23 12:07, Timo Walther wrote: Hi everyone, thanks for the various feedback and lively discussion. Sorry, for the late reply as I was on vacation. Let me answer to some of the topics: 1) Systems columns should not be shown with DESCRIBE statements This sounds fine to me. I will update the FLIP in the next iteration. 2) Do you know why most SQL systems do not need any prefix with their pseudo column? Because most systems do not have external catalogs or connectors. And also the number of system columns is limited to a handful of columns. Flink is more generic and thus more complex. And we have already the concepts of metadata columns. We need to be careful with not overloading our language. 3) Implementation details > how to you plan to implement the "system columns", do we need to add it to `RelNode` level? Or we just need to do it in the parsing/validating phase? > I'm not sure that Calcite's "system column" feature is fully ready My plan would be to only modify the parsing/validating phase. I would like to avoid additional complexity in planner rules and connector/catalog interfaces. Metadata columns already support projection push down and are passed through the stack (via Schema, ResolvedSchema, SupportsReadableMetadata). Calcite's "system column" feature is not fully ready yet and it would be a large effort potentially introducing bugs in supporting it. Thus, I'm proposing to leverage what we already have. The only part that needs to be modified is the "expand star" method in SqlValidator and Table API. Queries such as `SELECT * FROM (SELECT $rowtime, * FROM t);` would show $rowtime as the expand star has only a special case when in the scope for `FROM t`. All further subqueries treat it as a regular column. 4) Built-in defined pseudo-column "$rowtime" > Did you consider making it as a built-in defined pseudo-column "$rowtime" which returns the time attribute value (if exists) or null (if non-exists) for every table/query, and pseudo-column "$proctime" always returns PROCTIME() value for each table/query Built-in pseudo-columns mean that connector or catalog providers need consensus in Flink which pseudo-columns should be built-in. We should keep the concept generic and let platform providers decide which pseudo columns to expose. $rowtime might be obvious but others such as $partition or $offset are tricky to get consensus as every external connector works differently. Also a connector might want to expose different time semantics (such as ingestion time). 5) Any operator can introduce system (psedo) columns. This is clearly out of scope for this FLIP. The implementation effort would be huge and could introduce a lot of bugs. 6) "Metadata Key Prefix Constraint" which is still a little complex Another option could be to drop the naming pattern constraint. We could make it configurable that METADATA VIRTUAL columns are never selected by default in SELECT * or visible in DESCRIBE. This would further simplify the FLIP and esp lower the impact on the planner and all interfaces. What do you think about this? We could introduce a flag: table.expand-metadata-columns (better name to be defined) This way we don't need to introduce the concept of system columns yet, but can still offer similar functionality with minimal overhead in the code base. Regards, Timo On 04.08.23 23:06, Alexey Leonov-Vendrovskiy wrote: Looks like both kinds of system columns can converge. We can say that any operator can introduce system (psedo) columns. cc Eugene who is also interested in the subject. On Wed, Aug 2, 2023 at 1:03 AM Paul Lam wrote: Hi Timo, Thanks for starting the discussion! System columns are no doubt a good boost on Flink SQL’s usability, and I see the feedbacks are mainly concerns about the accessibility of system columns. I think most of the concerns could be solved by clarifying the ownership of the system columns. Different from databases like Oracle/BigQuery/PG who owns the data/metadata, Flink uses the data/metadata from external systems. That means Flink could have 2 kinds of system columns (take ROWID for example): 1. system columns provided by external systems via catalogs, such as ROWID from
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Hi Timo, I'm fine with your latest suggestion that introducing a flag to control expanding behavior of metadata virtual columns, but not introducing any concept of system/pseudo columns for now. Best, Jark On Tue, 15 Aug 2023 at 23:25, Timo Walther wrote: > Hi everyone, > > I would like to bump this thread up again. > > Esp. I would like to hear opinions on my latest suggestions to simply > use METADATA VIRTUAL as system columns and only introduce a config > option for the SELECT * behavior. Implementation-wise this means minimal > effort and less new concepts. > > Looking forward to any kind of feedback. > > Thanks, > Timo > > On 07.08.23 12:07, Timo Walther wrote: > > Hi everyone, > > > > thanks for the various feedback and lively discussion. Sorry, for the > > late reply as I was on vacation. Let me answer to some of the topics: > > > > 1) Systems columns should not be shown with DESCRIBE statements > > > > This sounds fine to me. I will update the FLIP in the next iteration. > > > > 2) Do you know why most SQL systems do not need any prefix with their > > pseudo column? > > > > Because most systems do not have external catalogs or connectors. And > > also the number of system columns is limited to a handful of columns. > > Flink is more generic and thus more complex. And we have already the > > concepts of metadata columns. We need to be careful with not overloading > > our language. > > > > 3) Implementation details > > > > > how to you plan to implement the "system columns", do we need to add > > it to `RelNode` level? Or we just need to do it in the > > parsing/validating phase? > > > I'm not sure that Calcite's "system column" feature is fully ready > > > > My plan would be to only modify the parsing/validating phase. I would > > like to avoid additional complexity in planner rules and > > connector/catalog interfaces. Metadata columns already support > > projection push down and are passed through the stack (via Schema, > > ResolvedSchema, SupportsReadableMetadata). Calcite's "system column" > > feature is not fully ready yet and it would be a large effort > > potentially introducing bugs in supporting it. Thus, I'm proposing to > > leverage what we already have. The only part that needs to be modified > > is the "expand star" method in SqlValidator and Table API. > > > > Queries such as `SELECT * FROM (SELECT $rowtime, * FROM t);` would show > > $rowtime as the expand star has only a special case when in the scope > > for `FROM t`. All further subqueries treat it as a regular column. > > > > 4) Built-in defined pseudo-column "$rowtime" > > > > > Did you consider making it as a built-in defined pseudo-column > > "$rowtime" which returns the time attribute value (if exists) or null > > (if non-exists) for every table/query, and pseudo-column "$proctime" > > always returns PROCTIME() value for each table/query > > > > Built-in pseudo-columns mean that connector or catalog providers need > > consensus in Flink which pseudo-columns should be built-in. We should > > keep the concept generic and let platform providers decide which pseudo > > columns to expose. $rowtime might be obvious but others such as > > $partition or $offset are tricky to get consensus as every external > > connector works differently. Also a connector might want to expose > > different time semantics (such as ingestion time). > > > > 5) Any operator can introduce system (psedo) columns. > > > > This is clearly out of scope for this FLIP. The implementation effort > > would be huge and could introduce a lot of bugs. > > > > 6) "Metadata Key Prefix Constraint" which is still a little complex > > > > Another option could be to drop the naming pattern constraint. We could > > make it configurable that METADATA VIRTUAL columns are never selected by > > default in SELECT * or visible in DESCRIBE. This would further simplify > > the FLIP and esp lower the impact on the planner and all interfaces. > > > > What do you think about this? We could introduce a flag: > > > > table.expand-metadata-columns (better name to be defined) > > > > This way we don't need to introduce the concept of system columns yet, > > but can still offer similar functionality with minimal overhead in the > > code base. > > > > Regards, > > Timo > > > > > > > > > > On 04.08.23 23:06, Alexey Leonov-Vendrovskiy wrote: > >> Looks like both kinds of system columns can converge. > >> We can say that any operator can introduce system (psedo) columns. > >> > >> cc Eugene who is also interested in the subject. > >> > >> On Wed, Aug 2, 2023 at 1:03 AM Paul Lam wrote: > >> > >>> Hi Timo, > >>> > >>> Thanks for starting the discussion! System columns are no doubt a > >>> good boost on Flink SQL’s usability, and I see the feedbacks are > >>> mainly concerns about the accessibility of system columns. > >>> > >>> I think most of the concerns could be solved by clarifying the > >>> ownership of the system columns. Different from databases like > >>> Oracle/BigQuery/PG who
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Hi everyone, I would like to bump this thread up again. Esp. I would like to hear opinions on my latest suggestions to simply use METADATA VIRTUAL as system columns and only introduce a config option for the SELECT * behavior. Implementation-wise this means minimal effort and less new concepts. Looking forward to any kind of feedback. Thanks, Timo On 07.08.23 12:07, Timo Walther wrote: Hi everyone, thanks for the various feedback and lively discussion. Sorry, for the late reply as I was on vacation. Let me answer to some of the topics: 1) Systems columns should not be shown with DESCRIBE statements This sounds fine to me. I will update the FLIP in the next iteration. 2) Do you know why most SQL systems do not need any prefix with their pseudo column? Because most systems do not have external catalogs or connectors. And also the number of system columns is limited to a handful of columns. Flink is more generic and thus more complex. And we have already the concepts of metadata columns. We need to be careful with not overloading our language. 3) Implementation details > how to you plan to implement the "system columns", do we need to add it to `RelNode` level? Or we just need to do it in the parsing/validating phase? > I'm not sure that Calcite's "system column" feature is fully ready My plan would be to only modify the parsing/validating phase. I would like to avoid additional complexity in planner rules and connector/catalog interfaces. Metadata columns already support projection push down and are passed through the stack (via Schema, ResolvedSchema, SupportsReadableMetadata). Calcite's "system column" feature is not fully ready yet and it would be a large effort potentially introducing bugs in supporting it. Thus, I'm proposing to leverage what we already have. The only part that needs to be modified is the "expand star" method in SqlValidator and Table API. Queries such as `SELECT * FROM (SELECT $rowtime, * FROM t);` would show $rowtime as the expand star has only a special case when in the scope for `FROM t`. All further subqueries treat it as a regular column. 4) Built-in defined pseudo-column "$rowtime" > Did you consider making it as a built-in defined pseudo-column "$rowtime" which returns the time attribute value (if exists) or null (if non-exists) for every table/query, and pseudo-column "$proctime" always returns PROCTIME() value for each table/query Built-in pseudo-columns mean that connector or catalog providers need consensus in Flink which pseudo-columns should be built-in. We should keep the concept generic and let platform providers decide which pseudo columns to expose. $rowtime might be obvious but others such as $partition or $offset are tricky to get consensus as every external connector works differently. Also a connector might want to expose different time semantics (such as ingestion time). 5) Any operator can introduce system (psedo) columns. This is clearly out of scope for this FLIP. The implementation effort would be huge and could introduce a lot of bugs. 6) "Metadata Key Prefix Constraint" which is still a little complex Another option could be to drop the naming pattern constraint. We could make it configurable that METADATA VIRTUAL columns are never selected by default in SELECT * or visible in DESCRIBE. This would further simplify the FLIP and esp lower the impact on the planner and all interfaces. What do you think about this? We could introduce a flag: table.expand-metadata-columns (better name to be defined) This way we don't need to introduce the concept of system columns yet, but can still offer similar functionality with minimal overhead in the code base. Regards, Timo On 04.08.23 23:06, Alexey Leonov-Vendrovskiy wrote: Looks like both kinds of system columns can converge. We can say that any operator can introduce system (psedo) columns. cc Eugene who is also interested in the subject. On Wed, Aug 2, 2023 at 1:03 AM Paul Lam wrote: Hi Timo, Thanks for starting the discussion! System columns are no doubt a good boost on Flink SQL’s usability, and I see the feedbacks are mainly concerns about the accessibility of system columns. I think most of the concerns could be solved by clarifying the ownership of the system columns. Different from databases like Oracle/BigQuery/PG who owns the data/metadata, Flink uses the data/metadata from external systems. That means Flink could have 2 kinds of system columns (take ROWID for example): 1. system columns provided by external systems via catalogs, such as ROWID from the original system. 2. system columns generated by Flink, such as ROWID generated by Flink itself. IIUC, the FLIP is proposing the 1st approach: the catalog defines what system columns to provide, and Flink treats them as normal columns with a special naming pattern. On the other hand, Jark is proposing the 2nd one: the system columns are defined and owned by
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Hi everyone, thanks for the various feedback and lively discussion. Sorry, for the late reply as I was on vacation. Let me answer to some of the topics: 1) Systems columns should not be shown with DESCRIBE statements This sounds fine to me. I will update the FLIP in the next iteration. 2) Do you know why most SQL systems do not need any prefix with their pseudo column? Because most systems do not have external catalogs or connectors. And also the number of system columns is limited to a handful of columns. Flink is more generic and thus more complex. And we have already the concepts of metadata columns. We need to be careful with not overloading our language. 3) Implementation details > how to you plan to implement the "system columns", do we need to add it to `RelNode` level? Or we just need to do it in the parsing/validating phase? > I'm not sure that Calcite's "system column" feature is fully ready My plan would be to only modify the parsing/validating phase. I would like to avoid additional complexity in planner rules and connector/catalog interfaces. Metadata columns already support projection push down and are passed through the stack (via Schema, ResolvedSchema, SupportsReadableMetadata). Calcite's "system column" feature is not fully ready yet and it would be a large effort potentially introducing bugs in supporting it. Thus, I'm proposing to leverage what we already have. The only part that needs to be modified is the "expand star" method in SqlValidator and Table API. Queries such as `SELECT * FROM (SELECT $rowtime, * FROM t);` would show $rowtime as the expand star has only a special case when in the scope for `FROM t`. All further subqueries treat it as a regular column. 4) Built-in defined pseudo-column "$rowtime" > Did you consider making it as a built-in defined pseudo-column "$rowtime" which returns the time attribute value (if exists) or null (if non-exists) for every table/query, and pseudo-column "$proctime" always returns PROCTIME() value for each table/query Built-in pseudo-columns mean that connector or catalog providers need consensus in Flink which pseudo-columns should be built-in. We should keep the concept generic and let platform providers decide which pseudo columns to expose. $rowtime might be obvious but others such as $partition or $offset are tricky to get consensus as every external connector works differently. Also a connector might want to expose different time semantics (such as ingestion time). 5) Any operator can introduce system (psedo) columns. This is clearly out of scope for this FLIP. The implementation effort would be huge and could introduce a lot of bugs. 6) "Metadata Key Prefix Constraint" which is still a little complex Another option could be to drop the naming pattern constraint. We could make it configurable that METADATA VIRTUAL columns are never selected by default in SELECT * or visible in DESCRIBE. This would further simplify the FLIP and esp lower the impact on the planner and all interfaces. What do you think about this? We could introduce a flag: table.expand-metadata-columns (better name to be defined) This way we don't need to introduce the concept of system columns yet, but can still offer similar functionality with minimal overhead in the code base. Regards, Timo On 04.08.23 23:06, Alexey Leonov-Vendrovskiy wrote: Looks like both kinds of system columns can converge. We can say that any operator can introduce system (psedo) columns. cc Eugene who is also interested in the subject. On Wed, Aug 2, 2023 at 1:03 AM Paul Lam wrote: Hi Timo, Thanks for starting the discussion! System columns are no doubt a good boost on Flink SQL’s usability, and I see the feedbacks are mainly concerns about the accessibility of system columns. I think most of the concerns could be solved by clarifying the ownership of the system columns. Different from databases like Oracle/BigQuery/PG who owns the data/metadata, Flink uses the data/metadata from external systems. That means Flink could have 2 kinds of system columns (take ROWID for example): 1. system columns provided by external systems via catalogs, such as ROWID from the original system. 2. system columns generated by Flink, such as ROWID generated by Flink itself. IIUC, the FLIP is proposing the 1st approach: the catalog defines what system columns to provide, and Flink treats them as normal columns with a special naming pattern. On the other hand, Jark is proposing the 2nd one: the system columns are defined and owned by Flink, and can be inferred from external systems. Therefore, system columns should be predefined by Flink, and optionally implemented by the catalogs. Personally, I’m in favor of the 2nd approach, because it makes the system columns very accessible and more aligned across the catalogs. BTW, I second Alexey that systems columns should not be shown with DESCRIBE statements. WDYT? Thanks! Best, Paul Lam
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Looks like both kinds of system columns can converge. We can say that any operator can introduce system (psedo) columns. cc Eugene who is also interested in the subject. On Wed, Aug 2, 2023 at 1:03 AM Paul Lam wrote: > Hi Timo, > > Thanks for starting the discussion! System columns are no doubt a > good boost on Flink SQL’s usability, and I see the feedbacks are > mainly concerns about the accessibility of system columns. > > I think most of the concerns could be solved by clarifying the > ownership of the system columns. Different from databases like > Oracle/BigQuery/PG who owns the data/metadata, Flink uses the > data/metadata from external systems. That means Flink could > have 2 kinds of system columns (take ROWID for example): > > 1. system columns provided by external systems via catalogs, such > as ROWID from the original system. > 2. system columns generated by Flink, such as ROWID generated by > Flink itself. > > IIUC, the FLIP is proposing the 1st approach: the catalog defines what > system columns to provide, and Flink treats them as normal columns > with a special naming pattern. > > On the other hand, Jark is proposing the 2nd one: the system columns > are defined and owned by Flink, and can be inferred from external > systems. Therefore, system columns should be predefined by Flink, > and optionally implemented by the catalogs. > > Personally, I’m in favor of the 2nd approach, because it makes the > system columns very accessible and more aligned across the catalogs. > > BTW, I second Alexey that systems columns should not be shown with > DESCRIBE statements. > > WDYT? Thanks! > > Best, > Paul Lam > > > 2023年7月31日 23:54,Jark Wu 写道: > > > > Hi Timo, > > > > Thanks for your proposal. I think this is a nice feature for users and I > > prefer option 3. > > > > I only have one concern about the concept of pseudo-column or > > system-column, > > because this is the first time we introduce it in Flink SQL. The > > confusion is similar to the > > question of Benchao and Sergey about the propagation of pseudo-column. > > > > From my understanding, a pseudo-column can be get from an arbitrary > query, > > just similar to > > ROWNUM in Oracle[1], such as : > > > > SELECT * > > FROM (SELECT * FROM employees ORDER BY employee_id) > > WHERE ROWNUM < 11; > > > > However, IIUC, the proposed "$rowtime" pseudo-column can only be got from > > the physical table > > and can't be got from queries even if the query propagates the rowtime > > attribute. There was also > > a discussion about adding a pseudo-column "_proctime" [2] to make lookup > > join easier to use > > which can be got from arbitrary queries. That "_proctime" may conflict > with > > the proposed > > pseudo-column concept. > > > > Did you consider making it as a built-in defined pseudo-column "$rowtime" > > which returns the > > time attribute value (if exists) or null (if non-exists) for every > > table/query, and pseudo-column > > "$proctime" always returns PROCTIME() value for each table/query. In this > > way, catalogs only need > > to provide a default rowtime attribute and users can get it in the same > > way. And we don't need > > to introduce the contract interface of "Metadata Key Prefix Constraint" > > which is still a little complex > > for users and devs to understand. > > > > Best, > > Jark > > > > [1]: > > > https://docs.oracle.com/cd/E11882_01/server.112/e41084/pseudocolumns009.htm#SQLRF00255 > > [2]: https://lists.apache.org/thread/7ln106qxyw8sp7ljq40hs2p1lb1gdwj5 > > > > > > > > > > On Fri, 28 Jul 2023 at 06:18, Alexey Leonov-Vendrovskiy < > > vendrov...@gmail.com> wrote: > > > >>> > >>> `SELECT * FROM (SELECT $rowtime, * FROM t);` > >>> Am I right that it will show `$rowtime` in output ? > >> > >> > >> Yes, all explicitly selected columns become a part of the result (and > >> intermediate) schema, and hence propagate. > >> > >> On Thu, Jul 27, 2023 at 2:40 PM Alexey Leonov-Vendrovskiy < > >> vendrov...@gmail.com> wrote: > >> > >>> Thank you, Timo, for starting this FLIP! > >>> > >>> I propose the following change: > >>> > >>> Remove the requirement that DESCRIBE need to show system columns. > >>> > >>> > >>> Some concrete vendor specific catalog implementations might prefer this > >>> approach. > >>> Usually the same system columns are available on all (or family) of > >>> tables, and it can be easily captured in the documentation. > >>> > >>> For example, BigQuery does exactly this: there, pseudo-columns do not > >> show > >>> up in the table schema in any place, but can be accessed via reference. > >>> > >>> So I propose we: > >>> a) Either we say that DESCRIBE doesn't show system columns, > >>> b) Or leave this vendor-specific / or configurable via flag (if > needed). > >>> > >>> Regards, > >>> Alexey > >>> > >>> On Thu, Jul 27, 2023 at 3:27 AM Sergey Nuyanzin > >>> wrote: > >>> > Hi Timo, > > Thanks for the FLIP. > I also tend to think that Option 3 is better. > > I would be also
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Hi Timo, Thanks for starting the discussion! System columns are no doubt a good boost on Flink SQL’s usability, and I see the feedbacks are mainly concerns about the accessibility of system columns. I think most of the concerns could be solved by clarifying the ownership of the system columns. Different from databases like Oracle/BigQuery/PG who owns the data/metadata, Flink uses the data/metadata from external systems. That means Flink could have 2 kinds of system columns (take ROWID for example): 1. system columns provided by external systems via catalogs, such as ROWID from the original system. 2. system columns generated by Flink, such as ROWID generated by Flink itself. IIUC, the FLIP is proposing the 1st approach: the catalog defines what system columns to provide, and Flink treats them as normal columns with a special naming pattern. On the other hand, Jark is proposing the 2nd one: the system columns are defined and owned by Flink, and can be inferred from external systems. Therefore, system columns should be predefined by Flink, and optionally implemented by the catalogs. Personally, I’m in favor of the 2nd approach, because it makes the system columns very accessible and more aligned across the catalogs. BTW, I second Alexey that systems columns should not be shown with DESCRIBE statements. WDYT? Thanks! Best, Paul Lam > 2023年7月31日 23:54,Jark Wu 写道: > > Hi Timo, > > Thanks for your proposal. I think this is a nice feature for users and I > prefer option 3. > > I only have one concern about the concept of pseudo-column or > system-column, > because this is the first time we introduce it in Flink SQL. The > confusion is similar to the > question of Benchao and Sergey about the propagation of pseudo-column. > > From my understanding, a pseudo-column can be get from an arbitrary query, > just similar to > ROWNUM in Oracle[1], such as : > > SELECT * > FROM (SELECT * FROM employees ORDER BY employee_id) > WHERE ROWNUM < 11; > > However, IIUC, the proposed "$rowtime" pseudo-column can only be got from > the physical table > and can't be got from queries even if the query propagates the rowtime > attribute. There was also > a discussion about adding a pseudo-column "_proctime" [2] to make lookup > join easier to use > which can be got from arbitrary queries. That "_proctime" may conflict with > the proposed > pseudo-column concept. > > Did you consider making it as a built-in defined pseudo-column "$rowtime" > which returns the > time attribute value (if exists) or null (if non-exists) for every > table/query, and pseudo-column > "$proctime" always returns PROCTIME() value for each table/query. In this > way, catalogs only need > to provide a default rowtime attribute and users can get it in the same > way. And we don't need > to introduce the contract interface of "Metadata Key Prefix Constraint" > which is still a little complex > for users and devs to understand. > > Best, > Jark > > [1]: > https://docs.oracle.com/cd/E11882_01/server.112/e41084/pseudocolumns009.htm#SQLRF00255 > [2]: https://lists.apache.org/thread/7ln106qxyw8sp7ljq40hs2p1lb1gdwj5 > > > > > On Fri, 28 Jul 2023 at 06:18, Alexey Leonov-Vendrovskiy < > vendrov...@gmail.com> wrote: > >>> >>> `SELECT * FROM (SELECT $rowtime, * FROM t);` >>> Am I right that it will show `$rowtime` in output ? >> >> >> Yes, all explicitly selected columns become a part of the result (and >> intermediate) schema, and hence propagate. >> >> On Thu, Jul 27, 2023 at 2:40 PM Alexey Leonov-Vendrovskiy < >> vendrov...@gmail.com> wrote: >> >>> Thank you, Timo, for starting this FLIP! >>> >>> I propose the following change: >>> >>> Remove the requirement that DESCRIBE need to show system columns. >>> >>> >>> Some concrete vendor specific catalog implementations might prefer this >>> approach. >>> Usually the same system columns are available on all (or family) of >>> tables, and it can be easily captured in the documentation. >>> >>> For example, BigQuery does exactly this: there, pseudo-columns do not >> show >>> up in the table schema in any place, but can be accessed via reference. >>> >>> So I propose we: >>> a) Either we say that DESCRIBE doesn't show system columns, >>> b) Or leave this vendor-specific / or configurable via flag (if needed). >>> >>> Regards, >>> Alexey >>> >>> On Thu, Jul 27, 2023 at 3:27 AM Sergey Nuyanzin >>> wrote: >>> Hi Timo, Thanks for the FLIP. I also tend to think that Option 3 is better. I would be also interested in a question mentioned by Benchao Li. And a similar question about nested queries like `SELECT * FROM (SELECT $rowtime, * FROM t);` Am I right that it will show `$rowtime` in output ? On Thu, Jul 27, 2023 at 6:58 AM Benchao Li >> wrote: > Hi Timo, > > Thanks for the FLIP, I also like the idea and option 3 sounds good to me. > > I would like to discuss a case which is not
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Hi Timo, Thanks for your proposal. I think this is a nice feature for users and I prefer option 3. I only have one concern about the concept of pseudo-column or system-column, because this is the first time we introduce it in Flink SQL. The confusion is similar to the question of Benchao and Sergey about the propagation of pseudo-column. >From my understanding, a pseudo-column can be get from an arbitrary query, just similar to ROWNUM in Oracle[1], such as : SELECT * FROM (SELECT * FROM employees ORDER BY employee_id) WHERE ROWNUM < 11; However, IIUC, the proposed "$rowtime" pseudo-column can only be got from the physical table and can't be got from queries even if the query propagates the rowtime attribute. There was also a discussion about adding a pseudo-column "_proctime" [2] to make lookup join easier to use which can be got from arbitrary queries. That "_proctime" may conflict with the proposed pseudo-column concept. Did you consider making it as a built-in defined pseudo-column "$rowtime" which returns the time attribute value (if exists) or null (if non-exists) for every table/query, and pseudo-column "$proctime" always returns PROCTIME() value for each table/query. In this way, catalogs only need to provide a default rowtime attribute and users can get it in the same way. And we don't need to introduce the contract interface of "Metadata Key Prefix Constraint" which is still a little complex for users and devs to understand. Best, Jark [1]: https://docs.oracle.com/cd/E11882_01/server.112/e41084/pseudocolumns009.htm#SQLRF00255 [2]: https://lists.apache.org/thread/7ln106qxyw8sp7ljq40hs2p1lb1gdwj5 On Fri, 28 Jul 2023 at 06:18, Alexey Leonov-Vendrovskiy < vendrov...@gmail.com> wrote: > > > > `SELECT * FROM (SELECT $rowtime, * FROM t);` > > Am I right that it will show `$rowtime` in output ? > > > Yes, all explicitly selected columns become a part of the result (and > intermediate) schema, and hence propagate. > > On Thu, Jul 27, 2023 at 2:40 PM Alexey Leonov-Vendrovskiy < > vendrov...@gmail.com> wrote: > > > Thank you, Timo, for starting this FLIP! > > > > I propose the following change: > > > > Remove the requirement that DESCRIBE need to show system columns. > > > > > > Some concrete vendor specific catalog implementations might prefer this > > approach. > > Usually the same system columns are available on all (or family) of > > tables, and it can be easily captured in the documentation. > > > > For example, BigQuery does exactly this: there, pseudo-columns do not > show > > up in the table schema in any place, but can be accessed via reference. > > > > So I propose we: > > a) Either we say that DESCRIBE doesn't show system columns, > > b) Or leave this vendor-specific / or configurable via flag (if needed). > > > > Regards, > > Alexey > > > > On Thu, Jul 27, 2023 at 3:27 AM Sergey Nuyanzin > > wrote: > > > >> Hi Timo, > >> > >> Thanks for the FLIP. > >> I also tend to think that Option 3 is better. > >> > >> I would be also interested in a question mentioned by Benchao Li. > >> And a similar question about nested queries like > >> `SELECT * FROM (SELECT $rowtime, * FROM t);` > >> Am I right that it will show `$rowtime` in output ? > >> > >> > >> On Thu, Jul 27, 2023 at 6:58 AM Benchao Li > wrote: > >> > >> > Hi Timo, > >> > > >> > Thanks for the FLIP, I also like the idea and option 3 sounds good to > >> me. > >> > > >> > I would like to discuss a case which is not mentioned in the current > >> FLIP. > >> > How are the "System column"s expressed in intermediate result, e.g. > >> Join? > >> > E.g. `SELECT * FROM t1 JOIN t2`, I guess it should not include "system > >> > columns" from t1 and t2 as you proposed, and for `SELECT t1.$rowtime, > * > >> > FROM t1 JOIN t2`, it should also be valid. > >> > Then the question is how to you plan to implement the "system > columns", > >> do > >> > we need to add it to `RelNode` level? Or we just need to do it in the > >> > parsing/validating phase? > >> > I'm not sure that Calcite's "system column" feature is fully ready for > >> this > >> > since the code about this part is imported from the earlier project > >> before > >> > it gets into Apache, and has not been considered much in the past > >> > development. > >> > > >> > > >> > Jing Ge 于2023年7月26日周三 00:01写道: > >> > > >> > > Hi Timo, > >> > > > >> > > Thanks for your proposal. It is a very pragmatic feature. Among all > >> > options > >> > > in the FLIP, option 3 is one I prefer too and I'd like to ask some > >> > > questions to understand your thoughts. > >> > > > >> > > 1. I did some research on pseudo columns, just out of curiosity, do > >> you > >> > > know why most SQL systems do not need any prefix with their pseudo > >> > column? > >> > > 2. Some platform providers will use ${variable_name} to define their > >> own > >> > > configurations and allow them to be embedded into SQL scripts. Will > >> there > >> > > be any conflict with option 3? > >> > > > >> > > Best regards, > >> > >
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
> > `SELECT * FROM (SELECT $rowtime, * FROM t);` > Am I right that it will show `$rowtime` in output ? Yes, all explicitly selected columns become a part of the result (and intermediate) schema, and hence propagate. On Thu, Jul 27, 2023 at 2:40 PM Alexey Leonov-Vendrovskiy < vendrov...@gmail.com> wrote: > Thank you, Timo, for starting this FLIP! > > I propose the following change: > > Remove the requirement that DESCRIBE need to show system columns. > > > Some concrete vendor specific catalog implementations might prefer this > approach. > Usually the same system columns are available on all (or family) of > tables, and it can be easily captured in the documentation. > > For example, BigQuery does exactly this: there, pseudo-columns do not show > up in the table schema in any place, but can be accessed via reference. > > So I propose we: > a) Either we say that DESCRIBE doesn't show system columns, > b) Or leave this vendor-specific / or configurable via flag (if needed). > > Regards, > Alexey > > On Thu, Jul 27, 2023 at 3:27 AM Sergey Nuyanzin > wrote: > >> Hi Timo, >> >> Thanks for the FLIP. >> I also tend to think that Option 3 is better. >> >> I would be also interested in a question mentioned by Benchao Li. >> And a similar question about nested queries like >> `SELECT * FROM (SELECT $rowtime, * FROM t);` >> Am I right that it will show `$rowtime` in output ? >> >> >> On Thu, Jul 27, 2023 at 6:58 AM Benchao Li wrote: >> >> > Hi Timo, >> > >> > Thanks for the FLIP, I also like the idea and option 3 sounds good to >> me. >> > >> > I would like to discuss a case which is not mentioned in the current >> FLIP. >> > How are the "System column"s expressed in intermediate result, e.g. >> Join? >> > E.g. `SELECT * FROM t1 JOIN t2`, I guess it should not include "system >> > columns" from t1 and t2 as you proposed, and for `SELECT t1.$rowtime, * >> > FROM t1 JOIN t2`, it should also be valid. >> > Then the question is how to you plan to implement the "system columns", >> do >> > we need to add it to `RelNode` level? Or we just need to do it in the >> > parsing/validating phase? >> > I'm not sure that Calcite's "system column" feature is fully ready for >> this >> > since the code about this part is imported from the earlier project >> before >> > it gets into Apache, and has not been considered much in the past >> > development. >> > >> > >> > Jing Ge 于2023年7月26日周三 00:01写道: >> > >> > > Hi Timo, >> > > >> > > Thanks for your proposal. It is a very pragmatic feature. Among all >> > options >> > > in the FLIP, option 3 is one I prefer too and I'd like to ask some >> > > questions to understand your thoughts. >> > > >> > > 1. I did some research on pseudo columns, just out of curiosity, do >> you >> > > know why most SQL systems do not need any prefix with their pseudo >> > column? >> > > 2. Some platform providers will use ${variable_name} to define their >> own >> > > configurations and allow them to be embedded into SQL scripts. Will >> there >> > > be any conflict with option 3? >> > > >> > > Best regards, >> > > Jing >> > > >> > > On Tue, Jul 25, 2023 at 7:00 PM Konstantin Knauf >> > > wrote: >> > > >> > > > Hi Timo, >> > > > >> > > > this makes sense to me. Option 3 seems reasonable, too. >> > > > >> > > > Cheers, >> > > > >> > > > Konstantin >> > > > >> > > > Am Di., 25. Juli 2023 um 12:53 Uhr schrieb Timo Walther < >> > > > twal...@apache.org >> > > > >: >> > > > >> > > > > Hi everyone, >> > > > > >> > > > > I would like to start a discussion about introducing the concept >> of >> > > > > "System Columns" in SQL and Table API. >> > > > > >> > > > > The subject sounds bigger than it actually is. Luckily, Flink SQL >> > > > > already exposes the concept of metadata columns. And this >> proposal is >> > > > > just a slight adjustment for how metadata columns can be used as >> > system >> > > > > columns. >> > > > > >> > > > > The biggest problem of metadata columns currently is that a >> catalog >> > > > > implementation can't provide them by default because they would >> > affect >> > > > > `SELECT *` when adding another one. >> > > > > >> > > > > Looking forward to your feedback on FLIP-348: >> > > > > >> > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-348%3A+Support+System+Columns+in+SQL+and+Table+API >> > > > > >> > > > > Thanks, >> > > > > Timo >> > > > > >> > > > >> > > > >> > > > -- >> > > > https://twitter.com/snntrable >> > > > https://github.com/knaufk >> > > > >> > > >> > >> > >> > -- >> > >> > Best, >> > Benchao Li >> > >> >> >> -- >> Best regards, >> Sergey >> >
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Thank you, Timo, for starting this FLIP! I propose the following change: Remove the requirement that DESCRIBE need to show system columns. Some concrete vendor specific catalog implementations might prefer this approach. Usually the same system columns are available on all (or family) of tables, and it can be easily captured in the documentation. For example, BigQuery does exactly this: there, pseudo-columns do not show up in the table schema in any place, but can be accessed via reference. So I propose we: a) Either we say that DESCRIBE doesn't show system columns, b) Or leave this vendor-specific / or configurable via flag (if needed). Regards, Alexey On Thu, Jul 27, 2023 at 3:27 AM Sergey Nuyanzin wrote: > Hi Timo, > > Thanks for the FLIP. > I also tend to think that Option 3 is better. > > I would be also interested in a question mentioned by Benchao Li. > And a similar question about nested queries like > `SELECT * FROM (SELECT $rowtime, * FROM t);` > Am I right that it will show `$rowtime` in output ? > > > On Thu, Jul 27, 2023 at 6:58 AM Benchao Li wrote: > > > Hi Timo, > > > > Thanks for the FLIP, I also like the idea and option 3 sounds good to me. > > > > I would like to discuss a case which is not mentioned in the current > FLIP. > > How are the "System column"s expressed in intermediate result, e.g. Join? > > E.g. `SELECT * FROM t1 JOIN t2`, I guess it should not include "system > > columns" from t1 and t2 as you proposed, and for `SELECT t1.$rowtime, * > > FROM t1 JOIN t2`, it should also be valid. > > Then the question is how to you plan to implement the "system columns", > do > > we need to add it to `RelNode` level? Or we just need to do it in the > > parsing/validating phase? > > I'm not sure that Calcite's "system column" feature is fully ready for > this > > since the code about this part is imported from the earlier project > before > > it gets into Apache, and has not been considered much in the past > > development. > > > > > > Jing Ge 于2023年7月26日周三 00:01写道: > > > > > Hi Timo, > > > > > > Thanks for your proposal. It is a very pragmatic feature. Among all > > options > > > in the FLIP, option 3 is one I prefer too and I'd like to ask some > > > questions to understand your thoughts. > > > > > > 1. I did some research on pseudo columns, just out of curiosity, do you > > > know why most SQL systems do not need any prefix with their pseudo > > column? > > > 2. Some platform providers will use ${variable_name} to define their > own > > > configurations and allow them to be embedded into SQL scripts. Will > there > > > be any conflict with option 3? > > > > > > Best regards, > > > Jing > > > > > > On Tue, Jul 25, 2023 at 7:00 PM Konstantin Knauf > > > wrote: > > > > > > > Hi Timo, > > > > > > > > this makes sense to me. Option 3 seems reasonable, too. > > > > > > > > Cheers, > > > > > > > > Konstantin > > > > > > > > Am Di., 25. Juli 2023 um 12:53 Uhr schrieb Timo Walther < > > > > twal...@apache.org > > > > >: > > > > > > > > > Hi everyone, > > > > > > > > > > I would like to start a discussion about introducing the concept of > > > > > "System Columns" in SQL and Table API. > > > > > > > > > > The subject sounds bigger than it actually is. Luckily, Flink SQL > > > > > already exposes the concept of metadata columns. And this proposal > is > > > > > just a slight adjustment for how metadata columns can be used as > > system > > > > > columns. > > > > > > > > > > The biggest problem of metadata columns currently is that a catalog > > > > > implementation can't provide them by default because they would > > affect > > > > > `SELECT *` when adding another one. > > > > > > > > > > Looking forward to your feedback on FLIP-348: > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-348%3A+Support+System+Columns+in+SQL+and+Table+API > > > > > > > > > > Thanks, > > > > > Timo > > > > > > > > > > > > > > > > > -- > > > > https://twitter.com/snntrable > > > > https://github.com/knaufk > > > > > > > > > > > > > -- > > > > Best, > > Benchao Li > > > > > -- > Best regards, > Sergey >
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Hi Timo, Thanks for the FLIP. I also tend to think that Option 3 is better. I would be also interested in a question mentioned by Benchao Li. And a similar question about nested queries like `SELECT * FROM (SELECT $rowtime, * FROM t);` Am I right that it will show `$rowtime` in output ? On Thu, Jul 27, 2023 at 6:58 AM Benchao Li wrote: > Hi Timo, > > Thanks for the FLIP, I also like the idea and option 3 sounds good to me. > > I would like to discuss a case which is not mentioned in the current FLIP. > How are the "System column"s expressed in intermediate result, e.g. Join? > E.g. `SELECT * FROM t1 JOIN t2`, I guess it should not include "system > columns" from t1 and t2 as you proposed, and for `SELECT t1.$rowtime, * > FROM t1 JOIN t2`, it should also be valid. > Then the question is how to you plan to implement the "system columns", do > we need to add it to `RelNode` level? Or we just need to do it in the > parsing/validating phase? > I'm not sure that Calcite's "system column" feature is fully ready for this > since the code about this part is imported from the earlier project before > it gets into Apache, and has not been considered much in the past > development. > > > Jing Ge 于2023年7月26日周三 00:01写道: > > > Hi Timo, > > > > Thanks for your proposal. It is a very pragmatic feature. Among all > options > > in the FLIP, option 3 is one I prefer too and I'd like to ask some > > questions to understand your thoughts. > > > > 1. I did some research on pseudo columns, just out of curiosity, do you > > know why most SQL systems do not need any prefix with their pseudo > column? > > 2. Some platform providers will use ${variable_name} to define their own > > configurations and allow them to be embedded into SQL scripts. Will there > > be any conflict with option 3? > > > > Best regards, > > Jing > > > > On Tue, Jul 25, 2023 at 7:00 PM Konstantin Knauf > > wrote: > > > > > Hi Timo, > > > > > > this makes sense to me. Option 3 seems reasonable, too. > > > > > > Cheers, > > > > > > Konstantin > > > > > > Am Di., 25. Juli 2023 um 12:53 Uhr schrieb Timo Walther < > > > twal...@apache.org > > > >: > > > > > > > Hi everyone, > > > > > > > > I would like to start a discussion about introducing the concept of > > > > "System Columns" in SQL and Table API. > > > > > > > > The subject sounds bigger than it actually is. Luckily, Flink SQL > > > > already exposes the concept of metadata columns. And this proposal is > > > > just a slight adjustment for how metadata columns can be used as > system > > > > columns. > > > > > > > > The biggest problem of metadata columns currently is that a catalog > > > > implementation can't provide them by default because they would > affect > > > > `SELECT *` when adding another one. > > > > > > > > Looking forward to your feedback on FLIP-348: > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-348%3A+Support+System+Columns+in+SQL+and+Table+API > > > > > > > > Thanks, > > > > Timo > > > > > > > > > > > > > -- > > > https://twitter.com/snntrable > > > https://github.com/knaufk > > > > > > > > -- > > Best, > Benchao Li > -- Best regards, Sergey
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Hi Timo, Thanks for the FLIP, I also like the idea and option 3 sounds good to me. I would like to discuss a case which is not mentioned in the current FLIP. How are the "System column"s expressed in intermediate result, e.g. Join? E.g. `SELECT * FROM t1 JOIN t2`, I guess it should not include "system columns" from t1 and t2 as you proposed, and for `SELECT t1.$rowtime, * FROM t1 JOIN t2`, it should also be valid. Then the question is how to you plan to implement the "system columns", do we need to add it to `RelNode` level? Or we just need to do it in the parsing/validating phase? I'm not sure that Calcite's "system column" feature is fully ready for this since the code about this part is imported from the earlier project before it gets into Apache, and has not been considered much in the past development. Jing Ge 于2023年7月26日周三 00:01写道: > Hi Timo, > > Thanks for your proposal. It is a very pragmatic feature. Among all options > in the FLIP, option 3 is one I prefer too and I'd like to ask some > questions to understand your thoughts. > > 1. I did some research on pseudo columns, just out of curiosity, do you > know why most SQL systems do not need any prefix with their pseudo column? > 2. Some platform providers will use ${variable_name} to define their own > configurations and allow them to be embedded into SQL scripts. Will there > be any conflict with option 3? > > Best regards, > Jing > > On Tue, Jul 25, 2023 at 7:00 PM Konstantin Knauf > wrote: > > > Hi Timo, > > > > this makes sense to me. Option 3 seems reasonable, too. > > > > Cheers, > > > > Konstantin > > > > Am Di., 25. Juli 2023 um 12:53 Uhr schrieb Timo Walther < > > twal...@apache.org > > >: > > > > > Hi everyone, > > > > > > I would like to start a discussion about introducing the concept of > > > "System Columns" in SQL and Table API. > > > > > > The subject sounds bigger than it actually is. Luckily, Flink SQL > > > already exposes the concept of metadata columns. And this proposal is > > > just a slight adjustment for how metadata columns can be used as system > > > columns. > > > > > > The biggest problem of metadata columns currently is that a catalog > > > implementation can't provide them by default because they would affect > > > `SELECT *` when adding another one. > > > > > > Looking forward to your feedback on FLIP-348: > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-348%3A+Support+System+Columns+in+SQL+and+Table+API > > > > > > Thanks, > > > Timo > > > > > > > > > -- > > https://twitter.com/snntrable > > https://github.com/knaufk > > > -- Best, Benchao Li
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Hi Timo, Thanks for your proposal. It is a very pragmatic feature. Among all options in the FLIP, option 3 is one I prefer too and I'd like to ask some questions to understand your thoughts. 1. I did some research on pseudo columns, just out of curiosity, do you know why most SQL systems do not need any prefix with their pseudo column? 2. Some platform providers will use ${variable_name} to define their own configurations and allow them to be embedded into SQL scripts. Will there be any conflict with option 3? Best regards, Jing On Tue, Jul 25, 2023 at 7:00 PM Konstantin Knauf wrote: > Hi Timo, > > this makes sense to me. Option 3 seems reasonable, too. > > Cheers, > > Konstantin > > Am Di., 25. Juli 2023 um 12:53 Uhr schrieb Timo Walther < > twal...@apache.org > >: > > > Hi everyone, > > > > I would like to start a discussion about introducing the concept of > > "System Columns" in SQL and Table API. > > > > The subject sounds bigger than it actually is. Luckily, Flink SQL > > already exposes the concept of metadata columns. And this proposal is > > just a slight adjustment for how metadata columns can be used as system > > columns. > > > > The biggest problem of metadata columns currently is that a catalog > > implementation can't provide them by default because they would affect > > `SELECT *` when adding another one. > > > > Looking forward to your feedback on FLIP-348: > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-348%3A+Support+System+Columns+in+SQL+and+Table+API > > > > Thanks, > > Timo > > > > > -- > https://twitter.com/snntrable > https://github.com/knaufk >
Re: [DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Hi Timo, this makes sense to me. Option 3 seems reasonable, too. Cheers, Konstantin Am Di., 25. Juli 2023 um 12:53 Uhr schrieb Timo Walther : > Hi everyone, > > I would like to start a discussion about introducing the concept of > "System Columns" in SQL and Table API. > > The subject sounds bigger than it actually is. Luckily, Flink SQL > already exposes the concept of metadata columns. And this proposal is > just a slight adjustment for how metadata columns can be used as system > columns. > > The biggest problem of metadata columns currently is that a catalog > implementation can't provide them by default because they would affect > `SELECT *` when adding another one. > > Looking forward to your feedback on FLIP-348: > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-348%3A+Support+System+Columns+in+SQL+and+Table+API > > Thanks, > Timo > -- https://twitter.com/snntrable https://github.com/knaufk
[DISCUSS] FLIP-348: Support System Columns in SQL and Table API
Hi everyone, I would like to start a discussion about introducing the concept of "System Columns" in SQL and Table API. The subject sounds bigger than it actually is. Luckily, Flink SQL already exposes the concept of metadata columns. And this proposal is just a slight adjustment for how metadata columns can be used as system columns. The biggest problem of metadata columns currently is that a catalog implementation can't provide them by default because they would affect `SELECT *` when adding another one. Looking forward to your feedback on FLIP-348: https://cwiki.apache.org/confluence/display/FLINK/FLIP-348%3A+Support+System+Columns+in+SQL+and+Table+API Thanks, Timo