[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17469 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r110800502 --- Diff: python/pyspark/sql/column.py --- @@ -303,8 +342,27 @@ def isin(self, *cols): desc = _unary_op("desc", "Returns a sort expression based on the" " descending order of the given column name.") -isNull = _unary_op("isNull", "True if the current expression is null.") -isNotNull = _unary_op("isNotNull", "True if the current expression is not null.") +_isNull_doc = """ +True if the current expression is null. Often combined with +:func:`DataFrame.filter` to select rows with null values. + +>>> df2.collect() --- End diff -- Also, It seems somehow it can't fine `df2` here in the doctest just as the error message says. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r110800343 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,50 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ +Return a Boolean :class:`Column` based on a regex match. + +:param other: an extended regex expression + +>>> df.filter(df.name.rlike('ice$')).collect() +[Row(name=u'Alice', age=1)] --- End diff -- It sounds the test failure is here. If you click the link, the full logs can be checked (e.g., https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75622/console). Up to my knowledge, Row in python sorts the field names and therefore I guess it should be `[Row(age=1, name=u'Alice')]`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user map222 commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109989329 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,39 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n --- End diff -- That fixed it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109575589 --- Diff: python/pyspark/sql/column.py --- @@ -303,8 +333,25 @@ def isin(self, *cols): desc = _unary_op("desc", "Returns a sort expression based on the" " descending order of the given column name.") -isNull = _unary_op("isNull", "True if the current expression is null.") -isNotNull = _unary_op("isNotNull", "True if the current expression is not null.") +_isNull_doc = ''' True if the current expression is null. Often combined with + :func:`DataFrame.filter` to select rows with null values. + + >>> df2.collect() + [Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)] + >>> df2.filter( df2.height.isNull ).collect() + [Row(name=u'Alice', height=None)] + ''' +_isNotNull_doc = ''' True if the current expression is null. Often combined with --- End diff -- ^ cc @holdenk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109575023 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,39 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n --- End diff -- Could you maybe give a shot with this patch - https://github.com/map222/spark/compare/patterson-documentation...HyukjinKwon:rlike-docstring.patch ? I double checked it produces ![2017-04-04 1 23 30](https://cloud.githubusercontent.com/assets/6477701/24641412/84765e9c-193a-11e7-85d5-9745ea151c12.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109574284 --- Diff: python/pyspark/sql/column.py --- @@ -303,8 +333,25 @@ def isin(self, *cols): desc = _unary_op("desc", "Returns a sort expression based on the" " descending order of the given column name.") -isNull = _unary_op("isNull", "True if the current expression is null.") -isNotNull = _unary_op("isNotNull", "True if the current expression is not null.") +_isNull_doc = ''' True if the current expression is null. Often combined with --- End diff -- I just found a good reference in pep8 > For triple-quoted strings, always use double quote characters to be consistent with the docstring convention in PEP 257 https://www.python.org/dev/peps/pep-0008/#string-quotes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109574278 --- Diff: python/pyspark/sql/column.py --- @@ -303,8 +333,25 @@ def isin(self, *cols): desc = _unary_op("desc", "Returns a sort expression based on the" " descending order of the given column name.") -isNull = _unary_op("isNull", "True if the current expression is null.") -isNotNull = _unary_op("isNotNull", "True if the current expression is not null.") +_isNull_doc = ''' True if the current expression is null. Often combined with + :func:`DataFrame.filter` to select rows with null values. + + >>> df2.collect() + [Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)] + >>> df2.filter( df2.height.isNull ).collect() + [Row(name=u'Alice', height=None)] + ''' +_isNotNull_doc = ''' True if the current expression is null. Often combined with --- End diff -- Up to my knowledge, both docstrings comply pep8 up to my knowledge, ``` """ ... """ ``` or ``` """ ... """ ``` but for this case, it seems a separate variable. Personally, I prefer ```python _isNull_doc = """ True if the current expression is null. Often combined with :func:`DataFrame.filter` to select rows with null values. >>> df2.collect() [Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)] >>> df2.filter( df2.height.isNull ).collect() [Row(name=u'Alice', height=None)] """ ``` but I could not find a formal reference to support this idea (in case that it is a separate variable) and I am not supposed to decide this. So, I am fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user map222 commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109479266 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,39 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n --- End diff -- I made a branch without the newlines in `rlike` and `like`. This is the html it makes: https://raw.githubusercontent.com/map222/spark/50fc4f4a4a19a95eb5eaae76840d63f540bd45e0/python/docs/_build/html/pyspark.sql.html In the html, the `param` line is concatenated to the first one. Could there be a setting on my local machine that is different? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user map222 commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109477286 --- Diff: python/pyspark/sql/column.py --- @@ -303,8 +333,25 @@ def isin(self, *cols): desc = _unary_op("desc", "Returns a sort expression based on the" " descending order of the given column name.") -isNull = _unary_op("isNull", "True if the current expression is null.") -isNotNull = _unary_op("isNotNull", "True if the current expression is not null.") +_isNull_doc = ''' True if the current expression is null. Often combined with + :func:`DataFrame.filter` to select rows with null values. + + >>> df2.collect() + [Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)] + >>> df2.filter( df2.height.isNull ).collect() + [Row(name=u'Alice', height=None)] + ''' +_isNotNull_doc = ''' True if the current expression is null. Often combined with --- End diff -- Do you mean formatting it more like https://github.com/map222/spark/blob/90d1150867184d86b029c1f6397dcd855b1f5961/python/pyspark/sql/column.py#L482-L493? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109309888 --- Diff: python/pyspark/sql/column.py --- @@ -303,8 +333,25 @@ def isin(self, *cols): desc = _unary_op("desc", "Returns a sort expression based on the" " descending order of the given column name.") -isNull = _unary_op("isNull", "True if the current expression is null.") -isNotNull = _unary_op("isNotNull", "True if the current expression is not null.") +_isNull_doc = ''' True if the current expression is null. Often combined with + :func:`DataFrame.filter` to select rows with null values. + + >>> df2.collect() + [Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)] + >>> df2.filter( df2.height.isNull ).collect() + [Row(name=u'Alice', height=None)] + ''' +_isNotNull_doc = ''' True if the current expression is null. Often combined with --- End diff -- BTW, just a question. Do we need the leading space here in the documentation? I think we should remove if it is unnecessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109309540 --- Diff: python/pyspark/sql/column.py --- @@ -303,8 +333,25 @@ def isin(self, *cols): desc = _unary_op("desc", "Returns a sort expression based on the" " descending order of the given column name.") -isNull = _unary_op("isNull", "True if the current expression is null.") -isNotNull = _unary_op("isNotNull", "True if the current expression is not null.") +_isNull_doc = ''' True if the current expression is null. Often combined with --- End diff -- It seems the key is consistency in this case. We could use `"""` here if there is no specific other reasons. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109309750 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,39 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n --- End diff -- I just manually tried after replacing `\n` to a newline with `make clean` and `make html`. It seems fine to use a newline (not `\n`). Could you double check and replace it if I was not wrong? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109309786 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,41 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n + :param other: an extended regex expression + + >>> df.filter( df.name.rlike('ice$') ).collect() --- End diff -- Let's make sure there is no extra space between `(...)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user map222 commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109225499 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,39 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n --- End diff -- Without the newlines, the description and and params lines are combined into one line. I don't totally understand why. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user map222 commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109225302 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,39 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n +:param other: an extended regex expression\n + +>>> df.filter( df.name.rlike('ice$') ).collect() +[Row(name=u'Alice', age=1)] +""" +_like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE match.\n + :param other: a SQL LIKE pattern\n + See :func:`pyspark.sql.Column.rlike` for a regex version\n + + >>> df.filter( df.name.like('Al%') ).collect() + [Row(name=u'Alice', age=1)] +""" +_startswith_doc = ''' Return a Boolean :class:`Column` based on a string match.\n --- End diff -- Are [these the Scala docs](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column@rlike(literal:String):org.apache.spark.sql.Column) you're referring to? It looks like the Scala Column docs are missing examples for a couple more functions that are documented in python (substr, isin). I could expand the ticket / PR to include Scala examples for all of these functions. I also noticed isNull and isNotNull are not document either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user map222 commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109217816 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,39 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n +:param other: an extended regex expression\n + +>>> df.filter( df.name.rlike('ice$') ).collect() +[Row(name=u'Alice', age=1)] +""" +_like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE match.\n + :param other: a SQL LIKE pattern\n + See :func:`pyspark.sql.Column.rlike` for a regex version\n + + >>> df.filter( df.name.like('Al%') ).collect() + [Row(name=u'Alice', age=1)] +""" --- End diff -- I made the indentation more consistent, and changed all the block quotes to `"""` to match the rest of the file in most recent commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109050696 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,39 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n --- End diff -- Why do we have the newlines here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109050842 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,39 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n +:param other: an extended regex expression\n + +>>> df.filter( df.name.rlike('ice$') ).collect() +[Row(name=u'Alice', age=1)] +""" +_like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE match.\n + :param other: a SQL LIKE pattern\n + See :func:`pyspark.sql.Column.rlike` for a regex version\n + + >>> df.filter( df.name.like('Al%') ).collect() + [Row(name=u'Alice', age=1)] +""" +_startswith_doc = ''' Return a Boolean :class:`Column` based on a string match.\n + :param other: string at end of line (do not use a regex `^`)\n + >>> df.filter(df.name.startswith('Al')).collect() + [Row(name=u'Alice', age=1)] + >>> df.filter(df.name.startswith('^Al')).collect() + [] + ''' +_endswith_doc = ''' Return a Boolean :class:`Column` based on matching end of string.\n --- End diff -- Is there a reason we switched from """ to ''' between the docstrings? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109049873 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,39 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n +:param other: an extended regex expression\n + +>>> df.filter( df.name.rlike('ice$') ).collect() +[Row(name=u'Alice', age=1)] +""" +_like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE match.\n + :param other: a SQL LIKE pattern\n + See :func:`pyspark.sql.Column.rlike` for a regex version\n + + >>> df.filter( df.name.like('Al%') ).collect() + [Row(name=u'Alice', age=1)] +""" --- End diff -- Inconsistent indentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109085179 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,39 @@ def __iter__(self): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n +:param other: an extended regex expression\n + +>>> df.filter( df.name.rlike('ice$') ).collect() +[Row(name=u'Alice', age=1)] +""" +_like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE match.\n + :param other: a SQL LIKE pattern\n + See :func:`pyspark.sql.Column.rlike` for a regex version\n + + >>> df.filter( df.name.like('Al%') ).collect() + [Row(name=u'Alice', age=1)] +""" +_startswith_doc = ''' Return a Boolean :class:`Column` based on a string match.\n --- End diff -- This documentation is different than the Scala docs, we generally try and keep them the same in other places so it might make sense to update both. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user map222 commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109036171 --- Diff: python/pyspark/sql/column.py --- @@ -124,6 +124,35 @@ def _(self, other): return _ +like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE match.\n + :param other: a SQL LIKE pattern\n + See :func:`pyspark.sql.Column.rlike` for a regex version\n + + >>> df.filter( df.name.like('Al%') ).collect() + [Row(name=u'Alice', age=1)] +""" +rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n +:param other: an extended regex expression\n + +>>> df.filter( df.name.rlike('ice$') ).collect() +[Row(name=u'Alice', age=1)] +""" +endswith_doc = ''' Return a Boolean :class:`Column` based on matching end of string.\n + :param other: string at end of line (do not use a regex `$`)\n + >>> df.filter(df.name.endswith('ice')).collect() + [Row(name=u'Alice', age=1)] + >>> df.filter(df.name.endswith('ice$')).collect() + [] + ''' +startswith_doc = ''' Return a Boolean :class:`Column` based on a string match.\n --- End diff -- Ah, thank you! I couldn't figure out how to get the docs from not showing up as separate functions, but this solved it. I added `_` to the front of the docstrings, and moved them into the `Column` class, immediately before the `_bin_op` definitions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r108835449 --- Diff: python/pyspark/sql/column.py --- @@ -124,6 +124,35 @@ def _(self, other): return _ +like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE match.\n + :param other: a SQL LIKE pattern\n + See :func:`pyspark.sql.Column.rlike` for a regex version\n + + >>> df.filter( df.name.like('Al%') ).collect() + [Row(name=u'Alice', age=1)] +""" +rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n +:param other: an extended regex expression\n + +>>> df.filter( df.name.rlike('ice$') ).collect() +[Row(name=u'Alice', age=1)] +""" +endswith_doc = ''' Return a Boolean :class:`Column` based on matching end of string.\n + :param other: string at end of line (do not use a regex `$`)\n + >>> df.filter(df.name.endswith('ice')).collect() + [Row(name=u'Alice', age=1)] + >>> df.filter(df.name.endswith('ice$')).collect() + [] + ''' +startswith_doc = ''' Return a Boolean :class:`Column` based on a string match.\n --- End diff -- Mind adding `_` as a prefix in this variable to indicate this is a private one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org