[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17469


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r110800502
  
--- Diff: python/pyspark/sql/column.py ---
@@ -303,8 +342,27 @@ def isin(self, *cols):
 desc = _unary_op("desc", "Returns a sort expression based on the"
  " descending order of the given column name.")
 
-isNull = _unary_op("isNull", "True if the current expression is null.")
-isNotNull = _unary_op("isNotNull", "True if the current expression is 
not null.")
+_isNull_doc = """
+True if the current expression is null. Often combined with
+:func:`DataFrame.filter` to select rows with null values.
+
+>>> df2.collect()
--- End diff --

Also, It seems somehow it can't fine `df2` here in the doctest just as the 
error message says.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r110800343
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,50 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """
+Return a Boolean :class:`Column` based on a regex match.
+
+:param other: an extended regex expression
+
+>>> df.filter(df.name.rlike('ice$')).collect()
+[Row(name=u'Alice', age=1)]
--- End diff --

It sounds the test failure is here. If you click the link, the full logs 
can be checked (e.g., 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75622/console).

Up to my knowledge, Row in python sorts the field names and therefore I 
guess it should be `[Row(age=1, name=u'Alice')]`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-05 Thread map222
Github user map222 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109989329
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
--- End diff --

That fixed it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109575589
  
--- Diff: python/pyspark/sql/column.py ---
@@ -303,8 +333,25 @@ def isin(self, *cols):
 desc = _unary_op("desc", "Returns a sort expression based on the"
  " descending order of the given column name.")
 
-isNull = _unary_op("isNull", "True if the current expression is null.")
-isNotNull = _unary_op("isNotNull", "True if the current expression is 
not null.")
+_isNull_doc = ''' True if the current expression is null. Often 
combined with 
+  :func:`DataFrame.filter` to select rows with null 
values.
+
+  >>> df2.collect()
+  [Row(name=u'Tom', height=80), Row(name=u'Alice', 
height=None)]
+  >>> df2.filter( df2.height.isNull ).collect()
+  [Row(name=u'Alice', height=None)]
+  '''
+_isNotNull_doc = ''' True if the current expression is null. Often 
combined with 
--- End diff --

^ cc @holdenk


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109575023
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
--- End diff --

Could you maybe give a shot with this patch - 
https://github.com/map222/spark/compare/patterson-documentation...HyukjinKwon:rlike-docstring.patch

?

I double checked it produces 

![2017-04-04 1 23 
30](https://cloud.githubusercontent.com/assets/6477701/24641412/84765e9c-193a-11e7-85d5-9745ea151c12.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109574284
  
--- Diff: python/pyspark/sql/column.py ---
@@ -303,8 +333,25 @@ def isin(self, *cols):
 desc = _unary_op("desc", "Returns a sort expression based on the"
  " descending order of the given column name.")
 
-isNull = _unary_op("isNull", "True if the current expression is null.")
-isNotNull = _unary_op("isNotNull", "True if the current expression is 
not null.")
+_isNull_doc = ''' True if the current expression is null. Often 
combined with 
--- End diff --

I just found a good reference in pep8

> For triple-quoted strings, always use double quote characters to be 
consistent with the docstring convention in PEP 257

https://www.python.org/dev/peps/pep-0008/#string-quotes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109574278
  
--- Diff: python/pyspark/sql/column.py ---
@@ -303,8 +333,25 @@ def isin(self, *cols):
 desc = _unary_op("desc", "Returns a sort expression based on the"
  " descending order of the given column name.")
 
-isNull = _unary_op("isNull", "True if the current expression is null.")
-isNotNull = _unary_op("isNotNull", "True if the current expression is 
not null.")
+_isNull_doc = ''' True if the current expression is null. Often 
combined with 
+  :func:`DataFrame.filter` to select rows with null 
values.
+
+  >>> df2.collect()
+  [Row(name=u'Tom', height=80), Row(name=u'Alice', 
height=None)]
+  >>> df2.filter( df2.height.isNull ).collect()
+  [Row(name=u'Alice', height=None)]
+  '''
+_isNotNull_doc = ''' True if the current expression is null. Often 
combined with 
--- End diff --

Up to my knowledge, both docstrings comply pep8 up to my knowledge,
```
""" ...
"""
```

or 

```
"""
...
"""
```

but for this case, it seems a separate variable. Personally, I prefer

```python
_isNull_doc = """
True if the current expression is null. Often combined with
:func:`DataFrame.filter` to select rows with null values.

>>> df2.collect()
[Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)]
>>> df2.filter( df2.height.isNull ).collect()
[Row(name=u'Alice', height=None)]
"""
```

but I could not find a formal reference to support this idea (in case that 
it is a separate variable) and I am not supposed to decide this. So, I am fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-03 Thread map222
Github user map222 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109479266
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
--- End diff --

I made a branch without the newlines in `rlike` and `like`. This is the 
html it makes: 
https://raw.githubusercontent.com/map222/spark/50fc4f4a4a19a95eb5eaae76840d63f540bd45e0/python/docs/_build/html/pyspark.sql.html
In the html, the `param` line is concatenated to the first one.
Could there be a setting on my local machine that is different?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-03 Thread map222
Github user map222 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109477286
  
--- Diff: python/pyspark/sql/column.py ---
@@ -303,8 +333,25 @@ def isin(self, *cols):
 desc = _unary_op("desc", "Returns a sort expression based on the"
  " descending order of the given column name.")
 
-isNull = _unary_op("isNull", "True if the current expression is null.")
-isNotNull = _unary_op("isNotNull", "True if the current expression is 
not null.")
+_isNull_doc = ''' True if the current expression is null. Often 
combined with 
+  :func:`DataFrame.filter` to select rows with null 
values.
+
+  >>> df2.collect()
+  [Row(name=u'Tom', height=80), Row(name=u'Alice', 
height=None)]
+  >>> df2.filter( df2.height.isNull ).collect()
+  [Row(name=u'Alice', height=None)]
+  '''
+_isNotNull_doc = ''' True if the current expression is null. Often 
combined with 
--- End diff --

Do you mean formatting it more like 
https://github.com/map222/spark/blob/90d1150867184d86b029c1f6397dcd855b1f5961/python/pyspark/sql/column.py#L482-L493?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109309888
  
--- Diff: python/pyspark/sql/column.py ---
@@ -303,8 +333,25 @@ def isin(self, *cols):
 desc = _unary_op("desc", "Returns a sort expression based on the"
  " descending order of the given column name.")
 
-isNull = _unary_op("isNull", "True if the current expression is null.")
-isNotNull = _unary_op("isNotNull", "True if the current expression is 
not null.")
+_isNull_doc = ''' True if the current expression is null. Often 
combined with 
+  :func:`DataFrame.filter` to select rows with null 
values.
+
+  >>> df2.collect()
+  [Row(name=u'Tom', height=80), Row(name=u'Alice', 
height=None)]
+  >>> df2.filter( df2.height.isNull ).collect()
+  [Row(name=u'Alice', height=None)]
+  '''
+_isNotNull_doc = ''' True if the current expression is null. Often 
combined with 
--- End diff --

BTW, just a question. Do we need the leading space here in the 
documentation? I think we should remove if it is unnecessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109309540
  
--- Diff: python/pyspark/sql/column.py ---
@@ -303,8 +333,25 @@ def isin(self, *cols):
 desc = _unary_op("desc", "Returns a sort expression based on the"
  " descending order of the given column name.")
 
-isNull = _unary_op("isNull", "True if the current expression is null.")
-isNotNull = _unary_op("isNotNull", "True if the current expression is 
not null.")
+_isNull_doc = ''' True if the current expression is null. Often 
combined with 
--- End diff --

It seems the key is consistency in this case. We could use `"""` here if 
there is no specific other reasons.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109309750
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
--- End diff --

I just manually tried after replacing `\n` to a newline with `make clean` 
and `make html`. It seems fine to use a newline (not `\n`). Could you double 
check and replace it if I was not wrong?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109309786
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,41 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
+ :param other: an extended regex expression
+
+ >>> df.filter( df.name.rlike('ice$') ).collect()
--- End diff --

Let's make sure there is no extra space between `(...)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-03-31 Thread map222
Github user map222 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109225499
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
--- End diff --

Without the newlines, the description and and params lines are combined 
into one line. I don't totally understand why.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-03-31 Thread map222
Github user map222 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109225302
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
+:param other: an extended regex expression\n
+
+>>> df.filter( df.name.rlike('ice$') ).collect()
+[Row(name=u'Alice', age=1)]
+"""
+_like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE 
match.\n
+   :param other: a SQL LIKE pattern\n
+   See :func:`pyspark.sql.Column.rlike` for a regex version\n
+
+   >>> df.filter( df.name.like('Al%') ).collect()
+   [Row(name=u'Alice', age=1)]
+"""
+_startswith_doc = ''' Return a Boolean :class:`Column` based on a 
string match.\n
--- End diff --

Are [these the Scala 
docs](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column@rlike(literal:String):org.apache.spark.sql.Column)
 you're referring to? It looks like the Scala Column docs are missing examples 
for a couple more functions that are documented in python (substr, isin). I 
could expand the ticket / PR to include Scala examples for all of these 
functions. I also noticed isNull and isNotNull are not document either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-03-31 Thread map222
Github user map222 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109217816
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
+:param other: an extended regex expression\n
+
+>>> df.filter( df.name.rlike('ice$') ).collect()
+[Row(name=u'Alice', age=1)]
+"""
+_like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE 
match.\n
+   :param other: a SQL LIKE pattern\n
+   See :func:`pyspark.sql.Column.rlike` for a regex version\n
+
+   >>> df.filter( df.name.like('Al%') ).collect()
+   [Row(name=u'Alice', age=1)]
+"""
--- End diff --

I made the indentation more consistent, and changed all the block quotes to 
`"""` to match the rest of the file in most recent commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-03-30 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109050696
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
--- End diff --

Why do we have the newlines here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-03-30 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109050842
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
+:param other: an extended regex expression\n
+
+>>> df.filter( df.name.rlike('ice$') ).collect()
+[Row(name=u'Alice', age=1)]
+"""
+_like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE 
match.\n
+   :param other: a SQL LIKE pattern\n
+   See :func:`pyspark.sql.Column.rlike` for a regex version\n
+
+   >>> df.filter( df.name.like('Al%') ).collect()
+   [Row(name=u'Alice', age=1)]
+"""
+_startswith_doc = ''' Return a Boolean :class:`Column` based on a 
string match.\n
+ :param other: string at end of line (do not use a 
regex `^`)\n
+ >>> df.filter(df.name.startswith('Al')).collect()
+ [Row(name=u'Alice', age=1)]
+ >>> df.filter(df.name.startswith('^Al')).collect()
+ []
+ '''
+_endswith_doc = ''' Return a Boolean :class:`Column` based on matching 
end of string.\n
--- End diff --

Is there a reason we switched from """ to ''' between the docstrings?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-03-30 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109049873
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
+:param other: an extended regex expression\n
+
+>>> df.filter( df.name.rlike('ice$') ).collect()
+[Row(name=u'Alice', age=1)]
+"""
+_like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE 
match.\n
+   :param other: a SQL LIKE pattern\n
+   See :func:`pyspark.sql.Column.rlike` for a regex version\n
+
+   >>> df.filter( df.name.like('Al%') ).collect()
+   [Row(name=u'Alice', age=1)]
+"""
--- End diff --

Inconsistent indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-03-30 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109085179
  
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
 raise TypeError("Column is not iterable")
 
 # string methods
+_rlike_doc = """ Return a Boolean :class:`Column` based on a regex 
match.\n
+:param other: an extended regex expression\n
+
+>>> df.filter( df.name.rlike('ice$') ).collect()
+[Row(name=u'Alice', age=1)]
+"""
+_like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE 
match.\n
+   :param other: a SQL LIKE pattern\n
+   See :func:`pyspark.sql.Column.rlike` for a regex version\n
+
+   >>> df.filter( df.name.like('Al%') ).collect()
+   [Row(name=u'Alice', age=1)]
+"""
+_startswith_doc = ''' Return a Boolean :class:`Column` based on a 
string match.\n
--- End diff --

This documentation is different than the Scala docs, we generally try and 
keep them the same in other places so it might make sense to update both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-03-30 Thread map222
Github user map222 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r109036171
  
--- Diff: python/pyspark/sql/column.py ---
@@ -124,6 +124,35 @@ def _(self, other):
 return _
 
 
+like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE 
match.\n
+   :param other: a SQL LIKE pattern\n
+   See :func:`pyspark.sql.Column.rlike` for a regex version\n
+
+   >>> df.filter( df.name.like('Al%') ).collect()
+   [Row(name=u'Alice', age=1)]
+"""
+rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n
+:param other: an extended regex expression\n
+
+>>> df.filter( df.name.rlike('ice$') ).collect()
+[Row(name=u'Alice', age=1)]
+"""
+endswith_doc = ''' Return a Boolean :class:`Column` based on matching end 
of string.\n
+   :param other: string at end of line (do not use a regex 
`$`)\n
+   >>> df.filter(df.name.endswith('ice')).collect()
+   [Row(name=u'Alice', age=1)]
+   >>> df.filter(df.name.endswith('ice$')).collect()
+   []
+   '''
+startswith_doc = ''' Return a Boolean :class:`Column` based on a string 
match.\n
--- End diff --

Ah, thank you! I couldn't figure out how to get the docs from not showing 
up as separate functions, but this solved it. I added `_` to the front of the 
docstrings, and moved them into the `Column` class, immediately before the 
`_bin_op` definitions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-03-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17469#discussion_r108835449
  
--- Diff: python/pyspark/sql/column.py ---
@@ -124,6 +124,35 @@ def _(self, other):
 return _
 
 
+like_doc = """ Return a Boolean :class:`Column` based on a SQL LIKE 
match.\n
+   :param other: a SQL LIKE pattern\n
+   See :func:`pyspark.sql.Column.rlike` for a regex version\n
+
+   >>> df.filter( df.name.like('Al%') ).collect()
+   [Row(name=u'Alice', age=1)]
+"""
+rlike_doc = """ Return a Boolean :class:`Column` based on a regex match.\n
+:param other: an extended regex expression\n
+
+>>> df.filter( df.name.rlike('ice$') ).collect()
+[Row(name=u'Alice', age=1)]
+"""
+endswith_doc = ''' Return a Boolean :class:`Column` based on matching end 
of string.\n
+   :param other: string at end of line (do not use a regex 
`$`)\n
+   >>> df.filter(df.name.endswith('ice')).collect()
+   [Row(name=u'Alice', age=1)]
+   >>> df.filter(df.name.endswith('ice$')).collect()
+   []
+   '''
+startswith_doc = ''' Return a Boolean :class:`Column` based on a string 
match.\n
--- End diff --

Mind adding `_` as a prefix in this variable to indicate this is a private 
one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org