Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/10272#issuecomment-199535165
@hhbyyh Joseph's comment was about carefully introducing new dependencies.
If we pick a library that doesn't depend on many others, it should be safe for
us. There are several packages containing Porter stemmer, e.g., lucene,
CoreNLP, and chalk. Lucene is a lightweight library, but used by many other
systems. So it is not a safe choice. CoreNLP is licensed under LGPL, so not an
option here. chalk seems okay to me by looking at its dependencies.
I'm a little worried about the cost if we maintain our own implementation
in MLlib. We cannot leverage other NLP projects (where the experts are) on
possible improvements. So could you take a look at chalk?
@jasonbaldridge To add chalk as a dependency, we need chalk releases for
both Scala 2.10 and 2.11. But I only see 2.10 releases on maven central. Do you
have plans for publishing new releases for both 2.10 and 2.11?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]