GitHub user mengxr opened a pull request:

    https://github.com/apache/spark/pull/6330

    [SPARK-7794] [MLLIB] update RegexTokenizer default settings

    The previous default is `{gaps: false, pattern: "\\p{L}+|[^\\p{L}\\s]+"}`. 
The default pattern is hard to understand. This PR changes the default to 
`{gaps: true, pattern: "\\s+"}`. @jkbradley

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mengxr/spark SPARK-7794

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6330.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6330
    
----
commit 5ee7cdea2a759dd2fdc35ea4e5552a46a731862b
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-21T20:38:13Z

    update RegexTokenizer default settings

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to