Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/16537
  
    Putting this particular PR and the scalability of the improvement process 
aside, Spark is heavily  underdocumented. This is something that hits Python 
and R users way more than everyone else. In the worst case scenario when 
working with Scala you can just follow the types. It wouldn't be a problem if 
used consistent conventions, idiomatic Python and didn't make hidden 
assumptions once in a while :) Take things like `DataFrame.replace` or some 
parts of the `DStream` API (can you point out places when expect `function` not 
a `Callable`) for example. I am not really a stakeholder here but i really 
believe that small things like this are crucial to create decent user 
experience.
    
    In hindsight, I overdid with the number of tasks but on my defense I am 
pretty sure that at least some of these won't get merged. Moreover problem is 
not imaginary. For many users it is not obvious how to use udfs 
(http://stackoverflow.com/q/35546576, http://stackoverflow.com/q/35375255, 
http://stackoverflow.com/q/39254503) and docstring of `udf` is actually the 
only documented example I found.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to