GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/11890
[SPARK-13449][MLLIB][R] Naive Bayes wrapper in SparkR
## What changes were proposed in this pull request?
This PR continues the work in #11486 from @yinxusen with some code
refactoring. In R package e1071, `naiveBayes` supports both categorical
(Bernoulli) and continuous features (Gaussian), while in MLlib we support
Bernoulli and multinomial. This PR implements the common subset: Bernoulli.
I moved the implementation out from SparkRWrappers to NaiveBayesWrapper to
make it easier to read. Argument names, default values, and summary now match
e1071's naiveBayes.
I removed the preprocess part that omit NA values because we don't know
which columns to process.
## How was this patch tested?
Test against output from R package e1071's naiveBayes.
cc: @jkbradley @yinxusen
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mengxr/spark SPARK-13449
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11890.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11890
----
commit fb1bca43fc13fa5509539e4a6d4fe20cd26d1dd5
Author: Xusen Yin <[email protected]>
Date: 2016-03-01T05:15:21Z
runable draft
commit 787f25f5a84330632dff5c8d9fd8d7c0de02de8c
Author: Xusen Yin <[email protected]>
Date: 2016-03-01T06:50:17Z
refine test and na handler
commit b66d3e5ef0803dad949a53d4210a455856c8a400
Author: Xusen Yin <[email protected]>
Date: 2016-03-01T17:55:36Z
refine getModelName
commit a5ab2e678c660fbee957cbb20dced0b5f5a4a256
Author: Xusen Yin <[email protected]>
Date: 2016-03-01T22:26:39Z
remove default interface
commit 9215fafd3295f488ba6d827b49eddb91c3032438
Author: Xusen Yin <[email protected]>
Date: 2016-03-01T23:42:51Z
refine code
commit 388e85dbf41faeea74f5aaa084664d9d52cce184
Author: Xusen Yin <[email protected]>
Date: 2016-03-03T05:31:32Z
add summary for NaiveBayes
commit 26d38e1baa0574221fc8cca104dfeeb1e057755f
Author: Xusen Yin <[email protected]>
Date: 2016-03-03T06:06:49Z
refine
commit a07beb2a26a4650b12b2fa72a8b802125b6b5560
Author: Xusen Yin <[email protected]>
Date: 2016-03-03T07:19:47Z
fix bugs
commit afaba4a22b40bafe5f9fb5c2796f7a72deff8a61
Author: Xusen Yin <[email protected]>
Date: 2016-03-07T17:58:39Z
revert NaiveBayes labels
commit 1a685e1d345f53ae9f7cfb270f110052df818f4c
Author: Xusen Yin <[email protected]>
Date: 2016-03-07T18:22:39Z
refine extracing labels
commit 30e9c372207ed206a7dc294b5726ad008a18ed12
Author: Xusen Yin <[email protected]>
Date: 2016-03-09T04:37:37Z
fix error
commit 390f8e62ed1eccaf22b5d4da1123a6f98080e4ba
Author: Xusen Yin <[email protected]>
Date: 2016-03-09T06:04:42Z
fix typos
commit dbaf4e622dd20d646e0cc26d5df1ba3aec02f827
Author: Xusen Yin <[email protected]>
Date: 2016-03-09T19:53:20Z
resolve dependency issue
commit 9991e7993d425acf54471ddf4380d4c106138501
Author: Xusen Yin <[email protected]>
Date: 2016-03-13T03:11:11Z
fix nit
commit 6c97cefdba5686704d31555ee71423d4afb888f4
Author: Xusen Yin <[email protected]>
Date: 2016-03-16T23:29:52Z
fix nits
commit 721a8b75abcff2970b4f74817e754dcff047c810
Author: Xusen Yin <[email protected]>
Date: 2016-03-17T00:19:06Z
remove NaiveBayesModelSummary
commit 8e2139379313f2b7094e750fba816e5a701a413a
Author: Xusen Yin <[email protected]>
Date: 2016-03-17T02:22:48Z
add raw label prediction
commit 90b6ad9ebd91d8cdfe9680c9c89355eaf3936b12
Author: Xusen Yin <[email protected]>
Date: 2016-03-17T02:42:25Z
fix r style
commit b4ee1aab70008919ba17cf02c8470f1a75c23ef8
Author: Xusen Yin <[email protected]>
Date: 2016-03-19T22:29:50Z
merge with master
commit 87fa0aa25f897ffef755557d2a9320eda86e74ed
Author: Xusen Yin <[email protected]>
Date: 2016-03-20T08:06:34Z
add IndexToString to extract labels
commit 3d291de561bc9155e32a0c286309e8b7ddde48c4
Author: Xusen Yin <[email protected]>
Date: 2016-03-20T08:22:15Z
remove useless imports
commit 49f36f304fd92130d55509ac0309f5f7d74d0e5c
Author: Xiangrui Meng <[email protected]>
Date: 2016-03-22T06:05:14Z
refactor with NaiveBayesWrapper
commit ce77e8811c008f90de41881348ae722df601ecb6
Author: Xiangrui Meng <[email protected]>
Date: 2016-03-22T15:55:41Z
fix tests
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]