GitHub user mengxr opened a pull request:

    https://github.com/apache/spark/pull/12843

    [SPARK-14050] [ML] Add multiple languages support and additional methods 
for Stop Words Remover

    ## What changes were proposed in this pull request?
    
    This PR continues the work from #11871 with the following changes:
    * load English stopwords as default
    * covert stopwords to list in Python
    * update some tests and doc
    
    ## How was this patch tested?
    
    Unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mengxr/spark SPARK-14050

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12843.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12843
    
----
commit c126c87818eb06aa5c2ac23b362d504f342c72b0
Author: Burak Köse <[email protected]>
Date:   2016-03-14T22:22:02Z

    add language files

commit 8248579ec27a40de98fe1f3020d947c478981ebc
Author: Burak Köse <[email protected]>
Date:   2016-03-14T22:23:32Z

    add multi-language support for stop words

commit 2c7b73df14d2d292eff88d7f3c358d29f82f6122
Author: Burak Köse <[email protected]>
Date:   2016-03-14T22:24:41Z

    add new tests for StopWordsRemover

commit 43e5cf54d4f9583f8b90291b3c7603ac4e7fab2a
Author: Burak Köse <[email protected]>
Date:   2016-03-21T23:41:47Z

    adjust resource files

commit a43039223a28b308ae1c14d33be5e5a1df382ed6
Author: Burak Köse <[email protected]>
Date:   2016-03-21T23:43:15Z

    adjust resource files

commit 28ee249f676971371d11d16c2912bbf81e045269
Author: Burak Köse <[email protected]>
Date:   2016-03-21T23:46:42Z

    fix stopwords bug

commit 6d215b31a205c4a79e8cc0ef6963d239941e80ff
Author: Burak Köse <[email protected]>
Date:   2016-03-21T23:53:06Z

    update comment lines

commit 6deceecf88c66b3293698aca5d7306c2aa02e2e0
Author: Burak Köse <[email protected]>
Date:   2016-03-22T16:24:38Z

    update stop words list

commit 41cd25815af3baa8fe9ed9336812f436d7ed7bd5
Author: Burak Köse <[email protected]>
Date:   2016-03-22T16:25:36Z

    update stopwordsremover

commit 4d1812aae64b0b15312940b1a6c42e19f9686480
Author: Burak KOSE <[email protected]>
Date:   2016-03-22T17:35:37Z

    fix test case bug
    
    After updating English stop words list, "d" is a stop word.

commit a30862231c3944c55c96cc94e162f61614aee6d5
Author: Burak Köse <[email protected]>
Date:   2016-03-22T21:45:48Z

    fix encoding

commit 2e7c54e5c17e7c5672a43ffc28acb207e94bf28a
Author: Burak Köse <[email protected]>
Date:   2016-03-23T01:42:36Z

    fix pyspark test

commit 7efda40e39663deef0b0884a7bfca13b5d10d706
Author: Burak Köse <[email protected]>
Date:   2016-03-23T16:51:48Z

    add licence for stop words list

commit a066e8b34ec4824fa26a1e306e197b66400f5ccb
Author: Burak Köse <[email protected]>
Date:   2016-03-24T17:12:20Z

    change licence to license

commit d0f43ace892332dfb3ad25d0ef1d0c0451540e5c
Author: Burak Köse <[email protected]>
Date:   2016-03-25T16:23:37Z

    add readme for stopwords list

commit c017ee235287554e28281d1691d0188e358b7ad8
Author: Burak Köse <[email protected]>
Date:   2016-03-25T16:26:23Z

    merge StopWords into StopWordsRemover

commit 55191ce1f449bed55884a4481071b0fc5ee776a9
Author: Burak Köse <[email protected]>
Date:   2016-03-25T16:27:59Z

    add python stopwords support for language selection

commit 789342f2d26759db180868a9f59b02c8f85cc835
Author: Burak Köse <[email protected]>
Date:   2016-03-25T16:28:48Z

    add new tests for stopwords

commit 4f97c8d5a088595a23f7ec848c793d05fc052d79
Author: Xiangrui Meng <[email protected]>
Date:   2016-05-02T15:26:29Z

    Merge remote-tracking branch 'apache/master' into SPARK-14050

commit 713d4d5e81b2194efa640ec46fa16c56049c00f5
Author: Xiangrui Meng <[email protected]>
Date:   2016-05-02T15:51:31Z

    minor updates

commit 1bd69af46f43d25518f6c5e01e2ee7fc5c279a03
Author: Xiangrui Meng <[email protected]>
Date:   2016-05-02T16:05:52Z

    fix python tests and add a TODO

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to