Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-21 Thread via GitHub


panbingkun commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1958579779

   Just to record SQL command using `StringUtils.filterPattern` in Spark:
   |SQL Command|Example|
   |---|---|
   |||


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-20 Thread via GitHub


cloud-fan commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1954053118

   Yea let's add a legacy config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-20 Thread via GitHub


panbingkun commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1953931188

   > I think it's more natural to follow the same behavior of the LIKE operator 
here. It seems all databases follow it (except for Hive before 4.0). Spark 
followed Hive at the beginning and that's probably why Spark has this special 
and weird behavior for the LIKE pattern in SHOW TABLES.
   > 
   > In fact, this is out of Spark's control, as it's the external catalog that 
applies the pattern string. We should follow the industry standard for defining 
the v2 catalog API. We should also update the SHOW TABLES doc page to mention 
the ideal behavior of the pattern string, as well as the legacy Hive behavior.
   
   Okay, let me handle it in this PR and update the document.
   Additionally, do we need to add `a legacy configuration` (default is `new` 
behavior) to determine whether it is using the `legacy` behavior or the `new` 
behavior?
   (PS: Yes, from the first PR, we can see that the original author's intention 
was to respect the legacy hive behavior.
   https://github.com/apache/spark/assets/15246973/6c753a76-f134-41be-953d-ba9ade46bf6d;>
   
   )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-19 Thread via GitHub


panbingkun commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1953619718

   Just for making investigation records:
   1.snowflake: https://docs.snowflake.com/en/sql-reference/sql/show-tables
   2.clickhouse: https://clickhouse.com/docs/en/sql-reference/statements/show
   3.mysql:
   https://dev.mysql.com/doc/refman/8.0/en/show-tables.html
   https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-19 Thread via GitHub


panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1495333432


##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##
@@ -97,6 +102,28 @@ public interface TableCatalog extends CatalogPlugin {
*/
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * 
+   * If the catalog supports views, this must return identifiers for only 
tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as 
wildcards, others will

Review Comment:
   Okay, let me investigate it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-19 Thread via GitHub


cloud-fan commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1495320778


##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##
@@ -97,6 +102,28 @@ public interface TableCatalog extends CatalogPlugin {
*/
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * 
+   * If the catalog supports views, this must return identifiers for only 
tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as 
wildcards, others will

Review Comment:
   not related to this PR, but the existing doc is a bit vague. `|` is not a 
wildcard, right? And `|` is also a valid syntax in regex. Can we take a look at 
other databases and see how they document it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-19 Thread via GitHub


panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1495149980


##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##
@@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin {
*/
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * 
+   * If the catalog supports views, this must return identifiers for only 
tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as 
wildcards, others will
+   *follow regular expression convention, case-insensitive 
match and white spaces
+   *on both ends will be ignored

Review Comment:
   I have looked at the document 
`https://spark.apache.org/docs/latest/sql-ref-syntax-aux-show-tables.html#parameters`(SHOW
 TABLES doc page) and found that the parameter `regex_pattern` in it explains 
the `pattern`.
   https://github.com/apache/spark/assets/15246973/6349db2d-825e-4031-8f3e-4c984673f962;>
   Thank you very much for your reminder, Let's refer to it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-19 Thread via GitHub


cloud-fan commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1494597611


##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##
@@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin {
*/
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * 
+   * If the catalog supports views, this must return identifiers for only 
tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as 
wildcards, others will
+   *follow regular expression convention, case-insensitive 
match and white spaces
+   *on both ends will be ignored

Review Comment:
   yea if they use the same implementation. The LIKE pattern doc does not even 
mention the `*`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-19 Thread via GitHub


cloud-fan commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1494598375


##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##
@@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin {
*/
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * 
+   * If the catalog supports views, this must return identifiers for only 
tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as 
wildcards, others will
+   *follow regular expression convention, case-insensitive 
match and white spaces
+   *on both ends will be ignored

Review Comment:
   Another option is to document it in the SHOW TABLES doc page.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-19 Thread via GitHub


panbingkun commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1494414935


##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##
@@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin {
*/
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * 
+   * If the catalog supports views, this must return identifiers for only 
tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as 
wildcards, others will
+   *follow regular expression convention, case-insensitive 
match and white spaces
+   *on both ends will be ignored

Review Comment:
   I searched the document and the `only possible relationship` is this one:
   
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-like.html#parameters
   https://github.com/apache/spark/assets/15246973/357c77e8-ecdb-404e-b479-b7b7459fd06b;>
   
   Perhaps we should explain it in detail here?
   (PS: The first pr that introduces `StringUtils.filterPattern` is: 
https://github.com/apache/spark/pull/12206)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-18 Thread via GitHub


cloud-fan commented on code in PR #43751:
URL: https://github.com/apache/spark/pull/43751#discussion_r1494067449


##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java:
##
@@ -97,6 +102,26 @@ public interface TableCatalog extends CatalogPlugin {
*/
   Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException;
 
+  /**
+   * List the tables in a namespace from the catalog by pattern string.
+   * 
+   * If the catalog supports views, this must return identifiers for only 
tables and not views.
+   *
+   * @param namespace a multi-part namespace
+   * @param pattern the filter pattern, only '*' and '|' are allowed as 
wildcards, others will
+   *follow regular expression convention, case-insensitive 
match and white spaces
+   *on both ends will be ignored

Review Comment:
   do we have a doc page for the pattern string semantic? If we do we should 
reference it here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2024-02-18 Thread via GitHub


github-actions[bot] commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1951500516

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45880][SQL] Introduce a new TableCatalog.listTable overload th… [spark]

2023-11-10 Thread via GitHub


panbingkun commented on PR #43751:
URL: https://github.com/apache/spark/pull/43751#issuecomment-1805361784

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org