HyukjinKwon commented on a change in pull request #32566:
URL: https://github.com/apache/spark/pull/32566#discussion_r633216739
##########
File path: python/pyspark/sql/functions.py
##########
@@ -2681,6 +2681,30 @@ def overlay(src, replace, pos, len=-1):
))
+def sentences(str, lang="", country=""):
Review comment:
lang -> language to be consistent
##########
File path: python/pyspark/sql/functions.py
##########
@@ -2681,6 +2681,30 @@ def overlay(src, replace, pos, len=-1):
))
+def sentences(str, lang="", country=""):
+ """
+ Splits a string into arrays of sentences, where each sentence is an array
of words.
+ The 'lang' and 'country' arguments are optional, and if omitted, the
default locale is used.
+
+ .. versionadded:: 3.2.0
+
Review comment:
Can we also add:
```
Parameters
----------
str: ...
```
(see also https://numpydoc.readthedocs.io/en/latest/format.html)
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##########
@@ -2867,6 +2867,25 @@ object functions {
new Overlay(src.expr, replace.expr, pos.expr)
}
+ /**
+ * Splits a string into arrays of sentences, where each sentence is an array
of words.
+ * @group string_funcs
+ * @since 3.2.0
+ */
+ def sentences(str: Column, language: String, country: String): Column =
withExpr {
Review comment:
I think we should make it `Column` for both `language` and `country` in
case they we make it accept non-literal values (which is also documented on the
top of this file).
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
##########
@@ -589,6 +596,6 @@ class StringFunctionsSuite extends QueryTest with
SharedSparkSession {
df2.selectExpr("str_to_map(a)"),
Seq(Row(Map("a" -> "1", "b" -> "2", "c" -> "3")))
)
-
}
+
Review comment:
no biggie but I would remove these newline changes
##########
File path: python/pyspark/sql/functions.py
##########
@@ -2681,6 +2681,30 @@ def overlay(src, replace, pos, len=-1):
))
+def sentences(str, lang="", country=""):
Review comment:
no biggie but I would avoid `str` as it shadows the built-in function
`str` in Python. Maybe just simply `string`.
##########
File path: python/pyspark/sql/functions.py
##########
@@ -2681,6 +2681,30 @@ def overlay(src, replace, pos, len=-1):
))
+def sentences(str, lang="", country=""):
+ """
+ Splits a string into arrays of sentences, where each sentence is an array
of words.
+ The 'lang' and 'country' arguments are optional, and if omitted, the
default locale is used.
+
+ .. versionadded:: 3.2.0
+
+ Examples
Review comment:
```suggestion
Examples
--------
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]