Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/20495#discussion_r165819296
--- Diff: python/pyspark/sql/functions.py ---
@@ -1705,10 +1705,12 @@ def unhex(col):
@ignore_unicode_prefix
@since(1.5)
def length(col):
- """Calculates the length of a string or binary expression.
+ """Computes the character length of a given string or number of bytes
or a binary string.
+ The length of character strings include the trailing spaces. The
length of binary strings
+ includes binary zeros.
- >>> spark.createDataFrame([('ABC',)],
['a']).select(length('a').alias('length')).collect()
- [Row(length=3)]
+ >>> spark.createDataFrame([('ABC ',)],
['a']).select(length('a').alias('length')).collect()
--- End diff --
Actually, not only `description`, this PR improves the test coverage and
refactos the code, too.
Could you update the PR description/title more correctly?
Otherwise, we had better split this PR according to @rdblue 's
recommendations in our dev thread.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]