Hi,
I'm new to using Mahout, and I'm trying to use it to make predictions on a
series of log files. I'm running it in a Windows Azure HDInsight cluster
(hadoop based). I'm using Mahout 0.5 as that is what I could get to work with
the samples (I'm fine with upgrading to 0.8 if I can get the samples work).
I'm following the same idea as the spam classification example found
here<http://searchhub.org/2011/05/04/an-introductory-how-to-build-a-spam-filter-server-with-mahout/>
using Naïve Bayes (which I can make work without problems), but when I try to
use my own data (which is obviously not emails), I end up with a prediction
model that characterizes everything as unknown. I can see that the computed
normalizing factors are NaN:
13/08/22 12:13:57 INFO bayes.BayesDriver: Calculating the weight Normalisation
factor for each class...
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: Sigma_k for Each Label
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: {spam=NaN, ham=NaN}
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: Sigma_kSigma_j for
each Label and for each Features
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: NaN
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: Vocabulary Count
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: 182316.0
But I'm not sure what that means, or why that is? Could this be related to my
input documents? The spam filter is based on emails roughly a couple of kb in
size, whereas my inputs is a series of log files of roughly a couple of mb in
size. Also, the training is done on a small dataset of only 100-120 samples
(I'm working on gathering more data to run on a larger sample).
Attached is the script I use to train and test the model as well as the output
from executing the script on the cluster.
Any help is appreciated!
-Simon Ejsing
unzip: cannot find any matches for wildcard specification "*.zip".
No zipfiles found.
Could Not Find
c:\apps\dist\mahout-0.5\examples\bin\work\falserejection\train\ham\*.zip
unzip: cannot find any matches for wildcard specification "*.zip".
No zipfiles found.
Could Not Find
c:\apps\dist\mahout-0.5\examples\bin\work\falserejection\train\spam\*.zip
unzip: cannot find any matches for wildcard specification "*.zip".
No zipfiles found.
Could Not Find
c:\apps\dist\mahout-0.5\examples\bin\work\falserejection\test\ham\*.zip
unzip: cannot find any matches for wildcard specification "*.zip".
No zipfiles found.
Could Not Find
c:\apps\dist\mahout-0.5\examples\bin\work\falserejection\test\spam\*.zip
rmr: cannot remove prepared-train: No such file or directory.
13/08/22 12:08:19 WARN driver.MahoutDriver: No prepare20newsgroups.props found
on classpath, will use command-line arguments only
13/08/22 12:08:49 INFO driver.MahoutDriver: Program took 30591 ms
13/08/22 12:08:58 WARN driver.MahoutDriver: No prepare20newsgroups.props found
on classpath, will use command-line arguments only
13/08/22 12:09:04 INFO driver.MahoutDriver: Program took 6773 ms
13/08/22 12:09:31 WARN driver.MahoutDriver: No trainclassifier.props found on
classpath, will use command-line arguments only
13/08/22 12:09:31 INFO bayes.TrainClassifier: Training Bayes Classifier
13/08/22 12:09:32 INFO bayes.BayesDriver: Reading features...
13/08/22 12:09:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.
13/08/22 12:09:33 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/08/22 12:09:33 WARN snappy.LoadSnappy: Snappy native library not loaded
13/08/22 12:09:33 INFO mapred.FileInputFormat: Total input paths to process : 2
13/08/22 12:09:33 INFO mapred.JobClient: Running job: job_201308211228_0020
13/08/22 12:09:34 INFO mapred.JobClient: map 0% reduce 0%
13/08/22 12:10:04 INFO mapred.JobClient: map 18% reduce 0%
13/08/22 12:10:08 INFO mapred.JobClient: map 24% reduce 0%
13/08/22 12:10:11 INFO mapred.JobClient: map 35% reduce 0%
13/08/22 12:10:20 INFO mapred.JobClient: map 36% reduce 0%
13/08/22 12:10:24 INFO mapred.JobClient: map 52% reduce 0%
13/08/22 12:10:27 INFO mapred.JobClient: map 66% reduce 0%
13/08/22 12:10:36 INFO mapred.JobClient: map 68% reduce 0%
13/08/22 12:10:39 INFO mapred.JobClient: map 78% reduce 0%
13/08/22 12:10:43 INFO mapred.JobClient: map 79% reduce 0%
13/08/22 12:10:49 INFO mapred.JobClient: map 79% reduce 11%
13/08/22 12:10:52 INFO mapred.JobClient: map 87% reduce 11%
13/08/22 12:10:55 INFO mapred.JobClient: map 90% reduce 11%
13/08/22 12:11:04 INFO mapred.JobClient: map 91% reduce 11%
13/08/22 12:11:07 INFO mapred.JobClient: map 93% reduce 11%
13/08/22 12:11:10 INFO mapred.JobClient: map 95% reduce 11%
13/08/22 12:11:20 INFO mapred.JobClient: map 97% reduce 22%
13/08/22 12:11:23 INFO mapred.JobClient: map 99% reduce 22%
13/08/22 12:11:35 INFO mapred.JobClient: map 100% reduce 22%
13/08/22 12:11:50 INFO mapred.JobClient: map 100% reduce 68%
13/08/22 12:11:53 INFO mapred.JobClient: map 100% reduce 87%
13/08/22 12:11:59 INFO mapred.JobClient: map 100% reduce 100%
13/08/22 12:12:02 INFO mapred.JobClient: Job complete: job_201308211228_0020
13/08/22 12:12:02 INFO mapred.JobClient: Counters: 31
13/08/22 12:12:02 INFO mapred.JobClient: Job Counters
13/08/22 12:12:02 INFO mapred.JobClient: Launched reduce tasks=1
13/08/22 12:12:02 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=216449
13/08/22 12:12:02 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
13/08/22 12:12:02 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
13/08/22 12:12:02 INFO mapred.JobClient: Rack-local map tasks=3
13/08/22 12:12:02 INFO mapred.JobClient: Launched map tasks=4
13/08/22 12:12:02 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=78124
13/08/22 12:12:02 INFO mapred.JobClient: File Input Format Counters
13/08/22 12:12:02 INFO mapred.JobClient: Bytes Read=449768986
13/08/22 12:12:02 INFO mapred.JobClient: File Output Format Counters
13/08/22 12:12:02 INFO mapred.JobClient: Bytes Written=23183044
13/08/22 12:12:02 INFO mapred.JobClient: FileSystemCounters
13/08/22 12:12:02 INFO mapred.JobClient: FILE_BYTES_READ=77463058
13/08/22 12:12:02 INFO mapred.JobClient: HDFS_BYTES_READ=418
13/08/22 12:12:02 INFO mapred.JobClient: ASV_BYTES_READ=449768986
13/08/22 12:12:02 INFO mapred.JobClient: ASV_BYTES_WRITTEN=23183044
13/08/22 12:12:02 INFO mapred.JobClient: FILE_BYTES_WRITTEN=106615423
13/08/22 12:12:02 INFO mapred.JobClient: Map-Reduce Framework
13/08/22 12:12:02 INFO mapred.JobClient: Map output materialized
bytes=29047784
13/08/22 12:12:02 INFO mapred.JobClient: Map input records=105
13/08/22 12:12:02 INFO mapred.JobClient: Reduce shuffle bytes=29047784
13/08/22 12:12:02 INFO mapred.JobClient: Spilled Records=3108573
13/08/22 12:12:02 INFO mapred.JobClient: Map output bytes=109598321
13/08/22 12:12:02 INFO mapred.JobClient: Total committed heap usage
(bytes)=3516727296
13/08/22 12:12:02 INFO mapred.JobClient: CPU time spent (ms)=227980
13/08/22 12:12:02 INFO mapred.JobClient: Map input bytes=446815282
13/08/22 12:12:02 INFO mapred.JobClient: SPLIT_RAW_BYTES=418
13/08/22 12:12:02 INFO mapred.JobClient: Combine input records=4516952
13/08/22 12:12:02 INFO mapred.JobClient: Reduce input records=871311
13/08/22 12:12:02 INFO mapred.JobClient: Reduce input groups=762742
13/08/22 12:12:02 INFO mapred.JobClient: Combine output records=2237262
13/08/22 12:12:02 INFO mapred.JobClient: Physical memory (bytes)
snapshot=3286016000
13/08/22 12:12:02 INFO mapred.JobClient: Reduce output records=580426
13/08/22 12:12:02 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=4132474880
13/08/22 12:12:02 INFO mapred.JobClient: Map output records=3151001
13/08/22 12:12:02 INFO bayes.BayesDriver: Calculating Tf-Idf...
13/08/22 12:12:02 INFO common.BayesTfIdfDriver: Counts of documents in Each
Label
13/08/22 12:12:02 INFO common.BayesTfIdfDriver: {spam=12.0, ham=93.0}
13/08/22 12:12:02 INFO common.BayesTfIdfDriver: {dataSource=hdfs, alpha_i=1.0,
minDf=1, gramSize=1}
13/08/22 12:12:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.
13/08/22 12:12:02 INFO mapred.FileInputFormat: Total input paths to process : 3
13/08/22 12:12:02 INFO mapred.JobClient: Running job: job_201308211228_0021
13/08/22 12:12:03 INFO mapred.JobClient: map 0% reduce 0%
13/08/22 12:12:29 INFO mapred.JobClient: map 33% reduce 0%
13/08/22 12:12:36 INFO mapred.JobClient: map 66% reduce 0%
13/08/22 12:12:38 INFO mapred.JobClient: map 100% reduce 0%
13/08/22 12:12:45 INFO mapred.JobClient: map 100% reduce 22%
13/08/22 12:13:04 INFO mapred.JobClient: map 100% reduce 100%
13/08/22 12:13:06 INFO mapred.JobClient: Job complete: job_201308211228_0021
13/08/22 12:13:06 INFO mapred.JobClient: Counters: 31
13/08/22 12:13:06 INFO mapred.JobClient: Job Counters
13/08/22 12:13:06 INFO mapred.JobClient: Launched reduce tasks=1
13/08/22 12:13:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=51263
13/08/22 12:13:06 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
13/08/22 12:13:06 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
13/08/22 12:13:06 INFO mapred.JobClient: Rack-local map tasks=1
13/08/22 12:13:06 INFO mapred.JobClient: Launched map tasks=3
13/08/22 12:13:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=24999
13/08/22 12:13:06 INFO mapred.JobClient: File Input Format Counters
13/08/22 12:13:06 INFO mapred.JobClient: Bytes Read=23182886
13/08/22 12:13:06 INFO mapred.JobClient: File Output Format Counters
13/08/22 12:13:06 INFO mapred.JobClient: Bytes Written=8248604
13/08/22 12:13:06 INFO mapred.JobClient: FileSystemCounters
13/08/22 12:13:06 INFO mapred.JobClient: FILE_BYTES_READ=13947865
13/08/22 12:13:06 INFO mapred.JobClient: HDFS_BYTES_READ=476
13/08/22 12:13:06 INFO mapred.JobClient: ASV_BYTES_READ=23182886
13/08/22 12:13:06 INFO mapred.JobClient: ASV_BYTES_WRITTEN=8248604
13/08/22 12:13:06 INFO mapred.JobClient: FILE_BYTES_WRITTEN=27999743
13/08/22 12:13:06 INFO mapred.JobClient: Map-Reduce Framework
13/08/22 12:13:06 INFO mapred.JobClient: Map output materialized
bytes=13946357
13/08/22 12:13:06 INFO mapred.JobClient: Map input records=580424
13/08/22 12:13:06 INFO mapred.JobClient: Reduce shuffle bytes=13946357
13/08/22 12:13:06 INFO mapred.JobClient: Spilled Records=796218
13/08/22 12:13:06 INFO mapred.JobClient: Map output bytes=16249470
13/08/22 12:13:06 INFO mapred.JobClient: Total committed heap usage
(bytes)=2715418624
13/08/22 12:13:06 INFO mapred.JobClient: CPU time spent (ms)=54867
13/08/22 12:13:06 INFO mapred.JobClient: Map input bytes=23182589
13/08/22 12:13:06 INFO mapred.JobClient: SPLIT_RAW_BYTES=476
13/08/22 12:13:06 INFO mapred.JobClient: Combine input records=580424
13/08/22 12:13:06 INFO mapred.JobClient: Reduce input records=398109
13/08/22 12:13:06 INFO mapred.JobClient: Reduce input groups=199055
13/08/22 12:13:06 INFO mapred.JobClient: Combine output records=398109
13/08/22 12:13:06 INFO mapred.JobClient: Physical memory (bytes)
snapshot=1899667456
13/08/22 12:13:06 INFO mapred.JobClient: Reduce output records=199055
13/08/22 12:13:06 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=3492704256
13/08/22 12:13:06 INFO mapred.JobClient: Map output records=580424
13/08/22 12:13:06 INFO bayes.BayesDriver: Calculating weight sums for labels
and features...
13/08/22 12:13:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.
13/08/22 12:13:06 INFO mapred.FileInputFormat: Total input paths to process : 1
13/08/22 12:13:06 INFO mapred.JobClient: Running job: job_201308211228_0022
13/08/22 12:13:07 INFO mapred.JobClient: map 0% reduce 0%
13/08/22 12:13:34 INFO mapred.JobClient: map 100% reduce 0%
13/08/22 12:13:46 INFO mapred.JobClient: map 100% reduce 33%
13/08/22 12:13:49 INFO mapred.JobClient: map 100% reduce 91%
13/08/22 12:13:55 INFO mapred.JobClient: map 100% reduce 100%
13/08/22 12:13:57 INFO mapred.JobClient: Job complete: job_201308211228_0022
13/08/22 12:13:57 INFO mapred.JobClient: Counters: 31
13/08/22 12:13:57 INFO mapred.JobClient: Job Counters
13/08/22 12:13:57 INFO mapred.JobClient: Launched reduce tasks=1
13/08/22 12:13:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=26983
13/08/22 12:13:57 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
13/08/22 12:13:57 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
13/08/22 12:13:57 INFO mapred.JobClient: Rack-local map tasks=2
13/08/22 12:13:57 INFO mapred.JobClient: Launched map tasks=2
13/08/22 12:13:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=16187
13/08/22 12:13:57 INFO mapred.JobClient: File Input Format Counters
13/08/22 12:13:57 INFO mapred.JobClient: Bytes Read=8449612
13/08/22 12:13:57 INFO mapred.JobClient: File Output Format Counters
13/08/22 12:13:57 INFO mapred.JobClient: Bytes Written=6686249
13/08/22 12:13:57 INFO mapred.JobClient: FileSystemCounters
13/08/22 12:13:57 INFO mapred.JobClient: FILE_BYTES_READ=11661732
13/08/22 12:13:57 INFO mapred.JobClient: HDFS_BYTES_READ=334
13/08/22 12:13:57 INFO mapred.JobClient: ASV_BYTES_READ=8449612
13/08/22 12:13:57 INFO mapred.JobClient: ASV_BYTES_WRITTEN=6686249
13/08/22 12:13:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=17570077
13/08/22 12:13:57 INFO mapred.JobClient: Map-Reduce Framework
13/08/22 12:13:57 INFO mapred.JobClient: Map output materialized
bytes=5830559
13/08/22 12:13:57 INFO mapred.JobClient: Map input records=199054
13/08/22 12:13:57 INFO mapred.JobClient: Reduce shuffle bytes=5830559
13/08/22 12:13:57 INFO mapred.JobClient: Spilled Records=576780
13/08/22 12:13:57 INFO mapred.JobClient: Map output bytes=13541939
13/08/22 12:13:57 INFO mapred.JobClient: Total committed heap usage
(bytes)=2301886464
13/08/22 12:13:57 INFO mapred.JobClient: CPU time spent (ms)=28294
13/08/22 12:13:57 INFO mapred.JobClient: Map input bytes=8248381
13/08/22 12:13:57 INFO mapred.JobClient: SPLIT_RAW_BYTES=334
13/08/22 12:13:57 INFO mapred.JobClient: Combine input records=597162
13/08/22 12:13:57 INFO mapred.JobClient: Reduce input records=192260
13/08/22 12:13:57 INFO mapred.JobClient: Reduce input groups=182319
13/08/22 12:13:57 INFO mapred.JobClient: Combine output records=192260
13/08/22 12:13:57 INFO mapred.JobClient: Physical memory (bytes)
snapshot=1415127040
13/08/22 12:13:57 INFO mapred.JobClient: Reduce output records=182319
13/08/22 12:13:57 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=2786963456
13/08/22 12:13:57 INFO mapred.JobClient: Map output records=597162
13/08/22 12:13:57 INFO bayes.BayesDriver: Calculating the weight Normalisation
factor for each class...
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: Sigma_k for Each Label
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: {spam=NaN, ham=NaN}
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: Sigma_kSigma_j for
each Label and for each Features
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: NaN
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: Vocabulary Count
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: 182316.0
13/08/22 12:13:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.
13/08/22 12:13:58 INFO mapred.FileInputFormat: Total input paths to process : 1
13/08/22 12:13:58 INFO mapred.JobClient: Running job: job_201308211228_0023
13/08/22 12:13:59 INFO mapred.JobClient: map 0% reduce 0%
13/08/22 12:14:26 INFO mapred.JobClient: map 50% reduce 0%
13/08/22 12:14:35 INFO mapred.JobClient: map 50% reduce 16%
13/08/22 12:14:37 INFO mapred.JobClient: map 100% reduce 16%
13/08/22 12:14:50 INFO mapred.JobClient: map 100% reduce 100%
13/08/22 12:14:52 INFO mapred.JobClient: Job complete: job_201308211228_0023
13/08/22 12:14:52 INFO mapred.JobClient: Counters: 30
13/08/22 12:14:52 INFO mapred.JobClient: Job Counters
13/08/22 12:14:52 INFO mapred.JobClient: Launched reduce tasks=1
13/08/22 12:14:52 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=14813
13/08/22 12:14:52 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
13/08/22 12:14:52 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
13/08/22 12:14:52 INFO mapred.JobClient: Launched map tasks=2
13/08/22 12:14:52 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=21890
13/08/22 12:14:52 INFO mapred.JobClient: File Input Format Counters
13/08/22 12:14:52 INFO mapred.JobClient: Bytes Read=8449612
13/08/22 12:14:52 INFO mapred.JobClient: File Output Format Counters
13/08/22 12:14:52 INFO mapred.JobClient: Bytes Written=158
13/08/22 12:14:52 INFO mapred.JobClient: FileSystemCounters
13/08/22 12:14:52 INFO mapred.JobClient: FILE_BYTES_READ=532
13/08/22 12:14:52 INFO mapred.JobClient: HDFS_BYTES_READ=334
13/08/22 12:14:52 INFO mapred.JobClient: ASV_BYTES_READ=8449612
13/08/22 12:14:52 INFO mapred.JobClient: ASV_BYTES_WRITTEN=158
13/08/22 12:14:52 INFO mapred.JobClient: FILE_BYTES_WRITTEN=80569
13/08/22 12:14:52 INFO mapred.JobClient: Map-Reduce Framework
13/08/22 12:14:52 INFO mapred.JobClient: Map output materialized bytes=82
13/08/22 12:14:52 INFO mapred.JobClient: Map input records=199054
13/08/22 12:14:52 INFO mapred.JobClient: Reduce shuffle bytes=82
13/08/22 12:14:52 INFO mapred.JobClient: Spilled Records=6
13/08/22 12:14:52 INFO mapred.JobClient: Map output bytes=4242761
13/08/22 12:14:52 INFO mapred.JobClient: Total committed heap usage
(bytes)=1543569408
13/08/22 12:14:52 INFO mapred.JobClient: CPU time spent (ms)=10655
13/08/22 12:14:52 INFO mapred.JobClient: Map input bytes=8248381
13/08/22 12:14:52 INFO mapred.JobClient: SPLIT_RAW_BYTES=334
13/08/22 12:14:52 INFO mapred.JobClient: Combine input records=199054
13/08/22 12:14:52 INFO mapred.JobClient: Reduce input records=3
13/08/22 12:14:52 INFO mapred.JobClient: Reduce input groups=2
13/08/22 12:14:52 INFO mapred.JobClient: Combine output records=3
13/08/22 12:14:52 INFO mapred.JobClient: Physical memory (bytes)
snapshot=842493952
13/08/22 12:14:52 INFO mapred.JobClient: Reduce output records=2
13/08/22 12:14:52 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=2315640832
13/08/22 12:14:52 INFO mapred.JobClient: Map output records=199054
13/08/22 12:14:52 INFO common.HadoopUtil: Deleting bayes-model/trainer-docCount
13/08/22 12:14:52 INFO common.HadoopUtil: Deleting
bayes-model/trainer-termDocCount
13/08/22 12:14:52 INFO common.HadoopUtil: Deleting
bayes-model/trainer-featureCount
13/08/22 12:14:52 INFO common.HadoopUtil: Deleting bayes-model/trainer-wordFreq
13/08/22 12:14:53 INFO common.HadoopUtil: Deleting
bayes-model/trainer-tfIdf/trainer-vocabCount
13/08/22 12:14:53 INFO driver.MahoutDriver: Program took 321794 ms
13/08/22 12:15:02 WARN driver.MahoutDriver: No testclassifier.props found on
classpath, will use command-line arguments only
13/08/22 12:15:02 INFO bayes.TestClassifier: Loading model from:
{basePath=bayes-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown,
testDirPath=prepared-test}
13/08/22 12:15:02 INFO bayes.TestClassifier: Testing Bayes Classifier
13/08/22 12:15:04 INFO io.SequenceFileModelReader: Read 50000 feature weights
13/08/22 12:15:04 INFO io.SequenceFileModelReader: Read 100000 feature weights
13/08/22 12:15:04 INFO io.SequenceFileModelReader: Read 150000 feature weights
13/08/22 12:15:05 INFO io.SequenceFileModelReader: NaN
13/08/22 12:15:06 INFO datastore.InMemoryBayesDatastore: spam NaN NaN NaN
13/08/22 12:15:06 INFO datastore.InMemoryBayesDatastore: ham NaN NaN NaN
13/08/22 12:15:19 INFO bayes.TestClassifier: Classified instances from ham.txt
13/08/22 12:15:21 INFO bayes.TestClassifier: Classified instances from spam.txt
13/08/22 12:15:21 INFO bayes.TestClassifier:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 0 0%
Incorrectly Classified Instances : 23 100%
Total Classified Instances : 23
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c <--Classified as
0 0 3 | 3 a = spam
0 0 20 | 20 b = ham
0 0 0 | 0 c = unknown
Default Category: unknown: 2
13/08/22 12:15:21 INFO driver.MahoutDriver: Program took 19842 ms