GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/6330
[SPARK-7794] [MLLIB] update RegexTokenizer default settings
The previous default is `{gaps: false, pattern: "\\p{L}+|[^\\p{L}\\s]+"}`.
The default pattern is hard to understand. This PR changes the default to
`{gaps: true, pattern: "\\s+"}`. @jkbradley
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mengxr/spark SPARK-7794
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6330.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6330
----
commit 5ee7cdea2a759dd2fdc35ea4e5552a46a731862b
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-21T20:38:13Z
update RegexTokenizer default settings
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]