[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143481973 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column: Review Comment: Applied the comments. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143463571 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column: Review Comment: Ah, I got it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143211271 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column: Review Comment: Oh, it supposed to be used to create the default index of the pandas API on Spark in the follow-up PR. To test this function by applying it to the pandas API on Spark code, it requires several other files also must be modified. So I separated the current work for review convenience. ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column: Review Comment: Or I could add another fixes to the current PR if the current fix looks good so far. If so, this PR would be more like "initial support for pandas API on Spark" rather than a "Add `_distributed_sequence_id` for distributed-sequence index.". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143207159 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column: Review Comment: Or I could add another fixes to the current PR if the current fix looks good so far. If so, this PR would be more like "initial support for pandas API on Spark" rather than a "Add `_distributed_sequence_id` for distributed-sequence index.". :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143198700 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column: Review Comment: Oh, it supposed to be used to create the default index of the pandas API on Spark in the follow-up PR. To test this function by applying it to the pandas API on Spark code, it requires several other files also must be modified. So I separate the current work for review convenience. ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column: Review Comment: Oh, it supposed to be used to create the default index of the pandas API on Spark in the follow-up PR. To test this function by applying it to the pandas API on Spark code, it requires several other files also must be modified. So I separated the current work for review convenience. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143207159 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column: Review Comment: Or I could add another fixes to the current PR if the current fix looks good so far. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143198700 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column: Review Comment: Oh, it supposed to be used to create the default index of the pandas API on Spark in the follow-up PR. To test this function by applying it to the pandas API on Spark code, several other files also must be modified. So I separate the current work for review convenience. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143200759 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column: Review Comment: On second thought, it would be good to have at least one test for example in current PR. Let me address it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143200759 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column: Review Comment: On second thought, it would be good to have at least one test for example in current PR. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143198700 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column: Review Comment: Oh, it supposed to be used to create the default index of the pandas API on Spark in the follow-up PR. Since the subsequent PR will include a many code fixes and tests, so I separate the current work for review convenience. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org