[GitHub] [spark] HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] Add missing string column name support for some SQL functions

2019-03-18 Thread GitBox
HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] 
Add missing string column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#discussion_r266337288
 
 

 ##
 File path: python/pyspark/sql/functions.py
 ##
 @@ -1437,10 +1453,6 @@ def hash(*cols):
 'ascii': 'Computes the numeric value of the first character of the string 
column.',
 'base64': 'Computes the BASE64 encoding of a binary column and returns it 
as a string column.',
 'unbase64': 'Decodes a BASE64 encoded string column and returns it as a 
binary column.',
-'initcap': 'Returns a new string column by converting the first letter of 
each word to ' +
-   'uppercase. Words are delimited by whitespace.',
-'lower': 'Converts a string column to lower case.',
-'upper': 'Converts a string column to upper case.',
 
 Review comment:
   Also, if it was an exception, it had to describe it specifically.
   
   ```
   # name functions take a column name as their argument
   'lit': _lit_doc,
   ```
   
   This doesn't look making sense if you read the codes from scratch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] Add missing string column name support for some SQL functions

2019-03-18 Thread GitBox
HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] 
Add missing string column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#discussion_r266336898
 
 

 ##
 File path: python/pyspark/sql/functions.py
 ##
 @@ -85,13 +96,16 @@ def _():
 >>> df.select(lit(5).alias('height')).withColumn('spark_user', 
lit(True)).take(1)
 [Row(height=5, spark_user=True)]
 """
-_functions = {
+_name_functions = {
+# name functions take a column name as their argument
 'lit': _lit_doc,
 
 Review comment:
   One (lit) of five (col, column, asc, desc, lit) doesn't sound like a special 
case tho. It had to have a better category if 20% of items doesn't fit to the 
category.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] Add missing string column name support for some SQL functions

2019-03-18 Thread GitBox
HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] 
Add missing string column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#discussion_r266336008
 
 

 ##
 File path: python/pyspark/sql/functions.py
 ##
 @@ -1437,10 +1453,6 @@ def hash(*cols):
 'ascii': 'Computes the numeric value of the first character of the string 
column.',
 'base64': 'Computes the BASE64 encoding of a binary column and returns it 
as a string column.',
 'unbase64': 'Decodes a BASE64 encoded string column and returns it as a 
binary column.',
-'initcap': 'Returns a new string column by converting the first letter of 
each word to ' +
-   'uppercase. Words are delimited by whitespace.',
-'lower': 'Converts a string column to lower case.',
-'upper': 'Converts a string column to upper case.',
 
 Review comment:
   I was saying `lower` and `upper`. Looks it has been overwritten by this, so 
if we keep here, it keeps previous definition. Did you check which PR and JIRAs 
added `lower` and `upper`?
   
   Also, strictly this should have not been removed in this PR as it doesn't 
target to remove overwritten functions. As you said, we should avoid such 
function definition way later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] Add missing string column name support for some SQL functions

2019-03-18 Thread GitBox
HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] 
Add missing string column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#discussion_r266334279
 
 

 ##
 File path: python/pyspark/sql/functions.py
 ##
 @@ -1437,10 +1453,6 @@ def hash(*cols):
 'ascii': 'Computes the numeric value of the first character of the string 
column.',
 'base64': 'Computes the BASE64 encoding of a binary column and returns it 
as a string column.',
 'unbase64': 'Decodes a BASE64 encoded string column and returns it as a 
binary column.',
-'initcap': 'Returns a new string column by converting the first letter of 
each word to ' +
-   'uppercase. Words are delimited by whitespace.',
-'lower': 'Converts a string column to lower case.',
-'upper': 'Converts a string column to upper case.',
 'ltrim': 'Trim the spaces from left end for the specified string value.',
 'rtrim': 'Trim the spaces from right end for the specified string value.',
 'trim': 'Trim the spaces from both ends for the specified string column.',
 
 Review comment:
   I was saying this because I didn't get why you excluded string functions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] Add missing string column name support for some SQL functions

2019-03-18 Thread GitBox
HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] 
Add missing string column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#discussion_r266287361
 
 

 ##
 File path: python/pyspark/sql/functions.py
 ##
 @@ -1437,10 +1453,6 @@ def hash(*cols):
 'ascii': 'Computes the numeric value of the first character of the string 
column.',
 'base64': 'Computes the BASE64 encoding of a binary column and returns it 
as a string column.',
 'unbase64': 'Decodes a BASE64 encoded string column and returns it as a 
binary column.',
-'initcap': 'Returns a new string column by converting the first letter of 
each word to ' +
-   'uppercase. Words are delimited by whitespace.',
-'lower': 'Converts a string column to lower case.',
-'upper': 'Converts a string column to upper case.',
 'ltrim': 'Trim the spaces from left end for the specified string value.',
 'rtrim': 'Trim the spaces from right end for the specified string value.',
 'trim': 'Trim the spaces from both ends for the specified string column.',
 
 Review comment:
   This is what I initially expected when I asked to whitelist them, @asmello


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] Add missing string column name support for some SQL functions

2019-03-17 Thread GitBox
HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] 
Add missing string column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#discussion_r266287361
 
 

 ##
 File path: python/pyspark/sql/functions.py
 ##
 @@ -1437,10 +1453,6 @@ def hash(*cols):
 'ascii': 'Computes the numeric value of the first character of the string 
column.',
 'base64': 'Computes the BASE64 encoding of a binary column and returns it 
as a string column.',
 'unbase64': 'Decodes a BASE64 encoded string column and returns it as a 
binary column.',
-'initcap': 'Returns a new string column by converting the first letter of 
each word to ' +
-   'uppercase. Words are delimited by whitespace.',
-'lower': 'Converts a string column to lower case.',
-'upper': 'Converts a string column to upper case.',
 'ltrim': 'Trim the spaces from left end for the specified string value.',
 'rtrim': 'Trim the spaces from right end for the specified string value.',
 'trim': 'Trim the spaces from both ends for the specified string column.',
 
 Review comment:
   This is what I initially expected when I asked to whitelist them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] Add missing string column name support for some SQL functions

2019-03-17 Thread GitBox
HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] 
Add missing string column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#discussion_r266287274
 
 

 ##
 File path: python/pyspark/sql/functions.py
 ##
 @@ -1437,10 +1453,6 @@ def hash(*cols):
 'ascii': 'Computes the numeric value of the first character of the string 
column.',
 'base64': 'Computes the BASE64 encoding of a binary column and returns it 
as a string column.',
 'unbase64': 'Decodes a BASE64 encoded string column and returns it as a 
binary column.',
-'initcap': 'Returns a new string column by converting the first letter of 
each word to ' +
-   'uppercase. Words are delimited by whitespace.',
-'lower': 'Converts a string column to lower case.',
-'upper': 'Converts a string column to upper case.',
 'ltrim': 'Trim the spaces from left end for the specified string value.',
 'rtrim': 'Trim the spaces from right end for the specified string value.',
 'trim': 'Trim the spaces from both ends for the specified string column.',
 
 Review comment:
   Did you not add the column name support because Scala side has this 
signautre below?:
   
   ```scala
 def trim(e: Column): Column
 def trim(e: Column, trimString: String): Column
   ```
   
   That's not allowed in Python
   
   ```python
   >>> from pyspark.sql.functions import trim, lit
   >>> spark.range(1).select(lit('a').alias("value")).select(trim("value", "a"))
   Traceback (most recent call last):
 File "", line 1, in 
   TypeError: _() takes exactly 1 argument (2 given)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] Add missing string column name support for some SQL functions

2019-03-17 Thread GitBox
HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] 
Add missing string column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#discussion_r266284370
 
 

 ##
 File path: python/pyspark/sql/functions.py
 ##
 @@ -1437,10 +1453,6 @@ def hash(*cols):
 'ascii': 'Computes the numeric value of the first character of the string 
column.',
 'base64': 'Computes the BASE64 encoding of a binary column and returns it 
as a string column.',
 'unbase64': 'Decodes a BASE64 encoded string column and returns it as a 
binary column.',
-'initcap': 'Returns a new string column by converting the first letter of 
each word to ' +
-   'uppercase. Words are delimited by whitespace.',
-'lower': 'Converts a string column to lower case.',
-'upper': 'Converts a string column to upper case.',
 
 Review comment:
   This had to stay in string functions!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] Add missing string column name support for some SQL functions

2019-03-17 Thread GitBox
HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] 
Add missing string column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#discussion_r266272011
 
 

 ##
 File path: python/pyspark/sql/functions.py
 ##
 @@ -85,13 +96,16 @@ def _():
 >>> df.select(lit(5).alias('height')).withColumn('spark_user', 
lit(True)).take(1)
 [Row(height=5, spark_user=True)]
 """
-_functions = {
+_name_functions = {
+# name functions take a column name as their argument
 'lit': _lit_doc,
 
 Review comment:
   `lit` doesn't take a column name as their argument. Why did we use such 
`name function` category for `lit`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] Add missing string column name support for some SQL functions

2019-03-14 Thread GitBox
HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] 
Add missing string column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#discussion_r265823143
 
 

 ##
 File path: python/pyspark/sql/functions.py
 ##
 @@ -85,13 +96,16 @@ def _():
 >>> df.select(lit(5).alias('height')).withColumn('spark_user', 
lit(True)).take(1)
 [Row(height=5, spark_user=True)]
 """
-_functions = {
+_name_functions = {
+# name functions take a column name as their argument
 'lit': _lit_doc,
 
 Review comment:
   and .. what's really "name function" ... ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] Add missing string column name support for some SQL functions

2019-03-14 Thread GitBox
HyukjinKwon commented on a change in pull request #23882: [SPARK-26979][PYTHON] 
Add missing string column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#discussion_r265816244
 
 

 ##
 File path: python/pyspark/sql/functions.py
 ##
 @@ -85,13 +96,16 @@ def _():
 >>> df.select(lit(5).alias('height')).withColumn('spark_user', 
lit(True)).take(1)
 [Row(height=5, spark_user=True)]
 """
-_functions = {
+_name_functions = {
+# name functions take a column name as their argument
 'lit': _lit_doc,
 
 Review comment:
   Does `lit` takes the string as column names? how do we create string literal?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org