Github user zjffdu commented on the pull request:

    https://github.com/apache/spark/pull/10062#issuecomment-161158196
  
    @davies  The inconsistency between slice and startPos is because in the 
world of sql people use 1 based while in the world of programmer they usually 
use 0 based. Column#substr (scala) is already exposed as the 2 usages (one is 
explicitly used as part of data frame api, another is used implicitly in sql).  
I think scala programmer will also confuse to find that substr is 1 based for 
now. Besides, slice is a standard operation for python users. If we don't 
support this then have to enforce user to use substr directly, they may also 
confuse at the 1 based substr. I suppose there are more people using data frame 
api directly than using sql, so should make them comfortable about the api. So 
here's my suggestion:
    
    * Add document on substr to highlight that it is 1 based
    * deprecate substr and replace it with a new function substring that is 0 
based to make the people using data frame api comfortable. So that in the world 
of sql, they use substr which is 1 based while the programmer use substring 
which is 0 based. 
    * Use substring to support python slice
    
    Anyway I have to admit there's no perfect solution for now. If necessary, I 
can start a thread on spark user mail list to get more feedback from users.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to