Github user feynmanliang commented on the pull request:

    https://github.com/apache/spark/pull/6742#issuecomment-118166449
  
    1. Maybe I should use OpenHashSet. Is it recommended?
    In my opinion set should suffice.
    
    2. Currently I leave the null in input array untouched, i.e. Array(null, 
null) => Array(null, null). 
    I imagine this will be used downstream of `Tokenizer`, in which case you 
should not need to worry about `null`s. Perhaps document the behavior in the 
ScalaDocs.
    
    3. If the current stop words set looks too limited, any suggestion for 
replacement? We can have something similar to the one in SKlearn.
    See in-line comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to