[jira] [Commented] (SPARK-14834) Force adding doc for new api in pyspark with @since annotation

2018-06-28 Thread Alexander Gorokhov (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526102#comment-16526102
 ] 

Alexander Gorokhov commented on SPARK-14834:


So, basically, this is about to make "since" decorator require docs to be 
written for underlying function?


> Force adding doc for new api in pyspark with @since annotation
> --
>
> Key: SPARK-14834
> URL: https://issues.apache.org/jira/browse/SPARK-14834
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Jeff Zhang
>Priority: Minor
>
> This is for enforcing user to add python doc when adding new python api with 
> @since annotation. But I think about it again, this is only suitable for 
> adding new api for existing python module. If it is a new python module 
> migrating from scala api, python doc is not mandatory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17333) Make pyspark interface friendly with static analysis

2018-06-20 Thread Alexander Gorokhov (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518126#comment-16518126
 ] 

Alexander Gorokhov commented on SPARK-17333:


Hi everyone

There was almost a year since the last comment on this issue.

Are there any updates on this? 

Why i am asking is that i would like to see static typing support in pyspark 
and ready to implement that and provide a pull request. After some analyze i 
think this should be implemented as .pyi stub files since they supported both 
by type checking tools such as great mypy and pycharm, and docstring type 
annotation syntax is not even going to be supported by mypy, as Guido van 
Rossum mentioned on similar ticket in mypy: 
[https://github.com/python/mypy/issues/612#issuecomment-223467302] 

> Make pyspark interface friendly with static analysis
> 
>
> Key: SPARK-17333
> URL: https://issues.apache.org/jira/browse/SPARK-17333
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Assaf Mendelson
>Priority: Trivial
>
> Static analysis tools such as those common to IDE for auto completion and 
> error marking, tend to have poor results with pyspark.
> This is cause by two separate issues:
> The first is that many elements are created programmatically such as the max 
> function in pyspark.sql.functions.
> The second is that we tend to use pyspark in a functional manner, meaning 
> that we chain many actions (e.g. df.filter().groupby().agg()) and since 
> python has no type information this can become difficult to understand.
> I would suggest changing the interface to improve it. 
> The way I see it we can either change the interface or provide interface 
> enhancements.
> Changing the interface means defining (when possible) all functions directly, 
> i.e. instead of having a __functions__ dictionary in pyspark.sql.functions.py 
> and then generating the functions programmatically by using _create_function, 
> create the function directly. 
> def max(col):
>"""
>docstring
>"""
>_create_function(max,"docstring")
> Second we can add type indications to all functions as defined in pep 484 or 
> pycharm's legacy type hinting 
> (https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html#legacy).
> So for example max might look like this:
> def max(col):
>"""
>does  a max.
>   :type col: Column
>   :rtype Column
>"""
> This would provide a wide range of support as these types of hints, while old 
> are pretty common.
> A second option is to use PEP 3107 to define interfaces (pyi files)
> in this case we might have a functions.pyi file which would contain something 
> like:
> def max(col: Column) -> Column:
> """
> Aggregate function: returns the maximum value of the expression in a 
> group.
> """
> ...
> This has the advantage of easier to understand types and not touching the 
> code (only supported code) but has the disadvantage of being separately 
> managed (i.e. greater chance of doing a mistake) and the fact that some 
> configuration would be needed in the IDE/static analysis tool instead of 
> working out of the box.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org