Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
panbingkun commented on PR #43751: URL: https://github.com/apache/spark/pull/43751#issuecomment-1958579779 Just to record SQL command using `StringUtils.filterPattern` in Spark: |SQL Command|Example| |---|---| ||| -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
cloud-fan commented on PR #43751: URL: https://github.com/apache/spark/pull/43751#issuecomment-1954053118 Yea let's add a legacy config. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
panbingkun commented on PR #43751: URL: https://github.com/apache/spark/pull/43751#issuecomment-1953931188 > I think it's more natural to follow the same behavior of the LIKE operator here. It seems all databases follow it (except for Hive before 4.0). Spark followed Hive at the beginning and that's probably why Spark has this special and weird behavior for the LIKE pattern in SHOW TABLES. > > In fact, this is out of Spark's control, as it's the external catalog that applies the pattern string. We should follow the industry standard for defining the v2 catalog API. We should also update the SHOW TABLES doc page to mention the ideal behavior of the pattern string, as well as the legacy Hive behavior. Okay, let me handle it in this PR and update the document. Additionally, do we need to add `a legacy configuration` (default is `new` behavior) to determine whether it is using the `legacy` behavior or the `new` behavior? (PS: Yes, from the first PR, we can see that the original author's intention was to respect the legacy hive behavior. https://github.com/apache/spark/assets/15246973/6c753a76-f134-41be-953d-ba9ade46bf6d;> ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
panbingkun commented on PR #43751: URL: https://github.com/apache/spark/pull/43751#issuecomment-1953619718 Just for making investigation records: 1.snowflake: https://docs.snowflake.com/en/sql-reference/sql/show-tables 2.clickhouse: https://clickhouse.com/docs/en/sql-reference/statements/show 3.mysql: https://dev.mysql.com/doc/refman/8.0/en/show-tables.html https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
panbingkun commented on code in PR #43751: URL: https://github.com/apache/spark/pull/43751#discussion_r1495333432 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -97,6 +102,28 @@ public interface TableCatalog extends CatalogPlugin { */ Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException; + /** + * List the tables in a namespace from the catalog by pattern string. + * + * If the catalog supports views, this must return identifiers for only tables and not views. + * + * @param namespace a multi-part namespace + * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will Review Comment: Okay, let me investigate it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
cloud-fan commented on code in PR #43751: URL: https://github.com/apache/spark/pull/43751#discussion_r1495320778 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -97,6 +102,28 @@ public interface TableCatalog extends CatalogPlugin { */ Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException; + /** + * List the tables in a namespace from the catalog by pattern string. + * + * If the catalog supports views, this must return identifiers for only tables and not views. + * + * @param namespace a multi-part namespace + * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will Review Comment: not related to this PR, but the existing doc is a bit vague. `|` is not a wildcard, right? And `|` is also a valid syntax in regex. Can we take a look at other databases and see how they document it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
panbingkun commented on code in PR #43751: URL: https://github.com/apache/spark/pull/43751#discussion_r1495149980 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin { */ Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException; + /** + * List the tables in a namespace from the catalog by pattern string. + * + * If the catalog supports views, this must return identifiers for only tables and not views. + * + * @param namespace a multi-part namespace + * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will + *follow regular expression convention, case-insensitive match and white spaces + *on both ends will be ignored Review Comment: I have looked at the document `https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-tables.html#parameters`(SHOW TABLES doc page) and found that the parameter `regex_pattern` in it explains the `pattern`. https://github.com/apache/spark/assets/15246973/6349db2d-825e-4031-8f3e-4c984673f962;> Thank you very much for your reminder, Let's refer to it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
cloud-fan commented on code in PR #43751: URL: https://github.com/apache/spark/pull/43751#discussion_r1494597611 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin { */ Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException; + /** + * List the tables in a namespace from the catalog by pattern string. + * + * If the catalog supports views, this must return identifiers for only tables and not views. + * + * @param namespace a multi-part namespace + * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will + *follow regular expression convention, case-insensitive match and white spaces + *on both ends will be ignored Review Comment: yea if they use the same implementation. The LIKE pattern doc does not even mention the `*`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
cloud-fan commented on code in PR #43751: URL: https://github.com/apache/spark/pull/43751#discussion_r1494598375 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin { */ Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException; + /** + * List the tables in a namespace from the catalog by pattern string. + * + * If the catalog supports views, this must return identifiers for only tables and not views. + * + * @param namespace a multi-part namespace + * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will + *follow regular expression convention, case-insensitive match and white spaces + *on both ends will be ignored Review Comment: Another option is to document it in the SHOW TABLES doc page. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
panbingkun commented on code in PR #43751: URL: https://github.com/apache/spark/pull/43751#discussion_r1494414935 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin { */ Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException; + /** + * List the tables in a namespace from the catalog by pattern string. + * + * If the catalog supports views, this must return identifiers for only tables and not views. + * + * @param namespace a multi-part namespace + * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will + *follow regular expression convention, case-insensitive match and white spaces + *on both ends will be ignored Review Comment: I searched the document and the `only possible relationship` is this one: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-like.html#parameters https://github.com/apache/spark/assets/15246973/357c77e8-ecdb-404e-b479-b7b7459fd06b;> Perhaps we should explain it in detail here? (PS: The first pr that introduces `StringUtils.filterPattern` is: https://github.com/apache/spark/pull/12206) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
cloud-fan commented on code in PR #43751: URL: https://github.com/apache/spark/pull/43751#discussion_r1494067449 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java: ## @@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin { */ Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException; + /** + * List the tables in a namespace from the catalog by pattern string. + * + * If the catalog supports views, this must return identifiers for only tables and not views. + * + * @param namespace a multi-part namespace + * @param pattern the filter pattern, only '*' and '|' are allowed as wildcards, others will + *follow regular expression convention, case-insensitive match and white spaces + *on both ends will be ignored Review Comment: do we have a doc page for the pattern string semantic? If we do we should reference it here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
github-actions[bot] commented on PR #43751: URL: https://github.com/apache/spark/pull/43751#issuecomment-1951500516 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]
panbingkun commented on PR #43751: URL: https://github.com/apache/spark/pull/43751#issuecomment-1805361784 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org