Hi,

I'm new to using Mahout, and I'm trying to use it to make predictions on a 
series of log files. I'm running it in a Windows Azure HDInsight cluster 
(hadoop based). I'm using Mahout 0.5 as that is what I could get to work with 
the samples (I'm fine with upgrading to 0.8 if I can get the samples work).

I'm following the same idea as the spam classification example found 
here<http://searchhub.org/2011/05/04/an-introductory-how-to-build-a-spam-filter-server-with-mahout/>
 using Naïve Bayes (which I can make work without problems), but when I try to 
use my own data (which is obviously not emails), I end up with a prediction 
model that characterizes everything as unknown. I can see that the computed 
normalizing factors are NaN:

13/08/22 12:13:57 INFO bayes.BayesDriver: Calculating the weight Normalisation 
factor for each class...
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: Sigma_k for Each Label
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: {spam=NaN, ham=NaN}
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: Sigma_kSigma_j for 
each Label and for each Features
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: NaN
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: Vocabulary Count
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: 182316.0

But I'm not sure what that means, or why that is? Could this be related to my 
input documents? The spam filter is based on emails roughly a couple of kb in 
size, whereas my inputs is a series of log files of roughly a couple of mb in 
size. Also, the training is done on a small dataset of only 100-120 samples 
(I'm working on gathering more data to run on a larger sample).

Attached is the script I use to train and test the model as well as the output 
from executing the script on the cluster.

Any help is appreciated!

-Simon Ejsing
unzip:  cannot find any matches for wildcard specification "*.zip".

No zipfiles found.
Could Not Find 
c:\apps\dist\mahout-0.5\examples\bin\work\falserejection\train\ham\*.zip
unzip:  cannot find any matches for wildcard specification "*.zip".

No zipfiles found.
Could Not Find 
c:\apps\dist\mahout-0.5\examples\bin\work\falserejection\train\spam\*.zip
unzip:  cannot find any matches for wildcard specification "*.zip".

No zipfiles found.
Could Not Find 
c:\apps\dist\mahout-0.5\examples\bin\work\falserejection\test\ham\*.zip
unzip:  cannot find any matches for wildcard specification "*.zip".

No zipfiles found.
Could Not Find 
c:\apps\dist\mahout-0.5\examples\bin\work\falserejection\test\spam\*.zip
rmr: cannot remove prepared-train: No such file or directory.
13/08/22 12:08:19 WARN driver.MahoutDriver: No prepare20newsgroups.props found 
on classpath, will use command-line arguments only
13/08/22 12:08:49 INFO driver.MahoutDriver: Program took 30591 ms
13/08/22 12:08:58 WARN driver.MahoutDriver: No prepare20newsgroups.props found 
on classpath, will use command-line arguments only
13/08/22 12:09:04 INFO driver.MahoutDriver: Program took 6773 ms
13/08/22 12:09:31 WARN driver.MahoutDriver: No trainclassifier.props found on 
classpath, will use command-line arguments only
13/08/22 12:09:31 INFO bayes.TrainClassifier: Training Bayes Classifier
13/08/22 12:09:32 INFO bayes.BayesDriver: Reading features...
13/08/22 12:09:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
13/08/22 12:09:33 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/08/22 12:09:33 WARN snappy.LoadSnappy: Snappy native library not loaded
13/08/22 12:09:33 INFO mapred.FileInputFormat: Total input paths to process : 2
13/08/22 12:09:33 INFO mapred.JobClient: Running job: job_201308211228_0020
13/08/22 12:09:34 INFO mapred.JobClient:  map 0% reduce 0%
13/08/22 12:10:04 INFO mapred.JobClient:  map 18% reduce 0%
13/08/22 12:10:08 INFO mapred.JobClient:  map 24% reduce 0%
13/08/22 12:10:11 INFO mapred.JobClient:  map 35% reduce 0%
13/08/22 12:10:20 INFO mapred.JobClient:  map 36% reduce 0%
13/08/22 12:10:24 INFO mapred.JobClient:  map 52% reduce 0%
13/08/22 12:10:27 INFO mapred.JobClient:  map 66% reduce 0%
13/08/22 12:10:36 INFO mapred.JobClient:  map 68% reduce 0%
13/08/22 12:10:39 INFO mapred.JobClient:  map 78% reduce 0%
13/08/22 12:10:43 INFO mapred.JobClient:  map 79% reduce 0%
13/08/22 12:10:49 INFO mapred.JobClient:  map 79% reduce 11%
13/08/22 12:10:52 INFO mapred.JobClient:  map 87% reduce 11%
13/08/22 12:10:55 INFO mapred.JobClient:  map 90% reduce 11%
13/08/22 12:11:04 INFO mapred.JobClient:  map 91% reduce 11%
13/08/22 12:11:07 INFO mapred.JobClient:  map 93% reduce 11%
13/08/22 12:11:10 INFO mapred.JobClient:  map 95% reduce 11%
13/08/22 12:11:20 INFO mapred.JobClient:  map 97% reduce 22%
13/08/22 12:11:23 INFO mapred.JobClient:  map 99% reduce 22%
13/08/22 12:11:35 INFO mapred.JobClient:  map 100% reduce 22%
13/08/22 12:11:50 INFO mapred.JobClient:  map 100% reduce 68%
13/08/22 12:11:53 INFO mapred.JobClient:  map 100% reduce 87%
13/08/22 12:11:59 INFO mapred.JobClient:  map 100% reduce 100%
13/08/22 12:12:02 INFO mapred.JobClient: Job complete: job_201308211228_0020
13/08/22 12:12:02 INFO mapred.JobClient: Counters: 31
13/08/22 12:12:02 INFO mapred.JobClient:   Job Counters 
13/08/22 12:12:02 INFO mapred.JobClient:     Launched reduce tasks=1
13/08/22 12:12:02 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=216449
13/08/22 12:12:02 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
13/08/22 12:12:02 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
13/08/22 12:12:02 INFO mapred.JobClient:     Rack-local map tasks=3
13/08/22 12:12:02 INFO mapred.JobClient:     Launched map tasks=4
13/08/22 12:12:02 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=78124
13/08/22 12:12:02 INFO mapred.JobClient:   File Input Format Counters 
13/08/22 12:12:02 INFO mapred.JobClient:     Bytes Read=449768986
13/08/22 12:12:02 INFO mapred.JobClient:   File Output Format Counters 
13/08/22 12:12:02 INFO mapred.JobClient:     Bytes Written=23183044
13/08/22 12:12:02 INFO mapred.JobClient:   FileSystemCounters
13/08/22 12:12:02 INFO mapred.JobClient:     FILE_BYTES_READ=77463058
13/08/22 12:12:02 INFO mapred.JobClient:     HDFS_BYTES_READ=418
13/08/22 12:12:02 INFO mapred.JobClient:     ASV_BYTES_READ=449768986
13/08/22 12:12:02 INFO mapred.JobClient:     ASV_BYTES_WRITTEN=23183044
13/08/22 12:12:02 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=106615423
13/08/22 12:12:02 INFO mapred.JobClient:   Map-Reduce Framework
13/08/22 12:12:02 INFO mapred.JobClient:     Map output materialized 
bytes=29047784
13/08/22 12:12:02 INFO mapred.JobClient:     Map input records=105
13/08/22 12:12:02 INFO mapred.JobClient:     Reduce shuffle bytes=29047784
13/08/22 12:12:02 INFO mapred.JobClient:     Spilled Records=3108573
13/08/22 12:12:02 INFO mapred.JobClient:     Map output bytes=109598321
13/08/22 12:12:02 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=3516727296
13/08/22 12:12:02 INFO mapred.JobClient:     CPU time spent (ms)=227980
13/08/22 12:12:02 INFO mapred.JobClient:     Map input bytes=446815282
13/08/22 12:12:02 INFO mapred.JobClient:     SPLIT_RAW_BYTES=418
13/08/22 12:12:02 INFO mapred.JobClient:     Combine input records=4516952
13/08/22 12:12:02 INFO mapred.JobClient:     Reduce input records=871311
13/08/22 12:12:02 INFO mapred.JobClient:     Reduce input groups=762742
13/08/22 12:12:02 INFO mapred.JobClient:     Combine output records=2237262
13/08/22 12:12:02 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=3286016000
13/08/22 12:12:02 INFO mapred.JobClient:     Reduce output records=580426
13/08/22 12:12:02 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=4132474880
13/08/22 12:12:02 INFO mapred.JobClient:     Map output records=3151001
13/08/22 12:12:02 INFO bayes.BayesDriver: Calculating Tf-Idf...
13/08/22 12:12:02 INFO common.BayesTfIdfDriver: Counts of documents in Each 
Label
13/08/22 12:12:02 INFO common.BayesTfIdfDriver: {spam=12.0, ham=93.0}
13/08/22 12:12:02 INFO common.BayesTfIdfDriver: {dataSource=hdfs, alpha_i=1.0, 
minDf=1, gramSize=1}
13/08/22 12:12:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
13/08/22 12:12:02 INFO mapred.FileInputFormat: Total input paths to process : 3
13/08/22 12:12:02 INFO mapred.JobClient: Running job: job_201308211228_0021
13/08/22 12:12:03 INFO mapred.JobClient:  map 0% reduce 0%
13/08/22 12:12:29 INFO mapred.JobClient:  map 33% reduce 0%
13/08/22 12:12:36 INFO mapred.JobClient:  map 66% reduce 0%
13/08/22 12:12:38 INFO mapred.JobClient:  map 100% reduce 0%
13/08/22 12:12:45 INFO mapred.JobClient:  map 100% reduce 22%
13/08/22 12:13:04 INFO mapred.JobClient:  map 100% reduce 100%
13/08/22 12:13:06 INFO mapred.JobClient: Job complete: job_201308211228_0021
13/08/22 12:13:06 INFO mapred.JobClient: Counters: 31
13/08/22 12:13:06 INFO mapred.JobClient:   Job Counters 
13/08/22 12:13:06 INFO mapred.JobClient:     Launched reduce tasks=1
13/08/22 12:13:06 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=51263
13/08/22 12:13:06 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
13/08/22 12:13:06 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
13/08/22 12:13:06 INFO mapred.JobClient:     Rack-local map tasks=1
13/08/22 12:13:06 INFO mapred.JobClient:     Launched map tasks=3
13/08/22 12:13:06 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=24999
13/08/22 12:13:06 INFO mapred.JobClient:   File Input Format Counters 
13/08/22 12:13:06 INFO mapred.JobClient:     Bytes Read=23182886
13/08/22 12:13:06 INFO mapred.JobClient:   File Output Format Counters 
13/08/22 12:13:06 INFO mapred.JobClient:     Bytes Written=8248604
13/08/22 12:13:06 INFO mapred.JobClient:   FileSystemCounters
13/08/22 12:13:06 INFO mapred.JobClient:     FILE_BYTES_READ=13947865
13/08/22 12:13:06 INFO mapred.JobClient:     HDFS_BYTES_READ=476
13/08/22 12:13:06 INFO mapred.JobClient:     ASV_BYTES_READ=23182886
13/08/22 12:13:06 INFO mapred.JobClient:     ASV_BYTES_WRITTEN=8248604
13/08/22 12:13:06 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27999743
13/08/22 12:13:06 INFO mapred.JobClient:   Map-Reduce Framework
13/08/22 12:13:06 INFO mapred.JobClient:     Map output materialized 
bytes=13946357
13/08/22 12:13:06 INFO mapred.JobClient:     Map input records=580424
13/08/22 12:13:06 INFO mapred.JobClient:     Reduce shuffle bytes=13946357
13/08/22 12:13:06 INFO mapred.JobClient:     Spilled Records=796218
13/08/22 12:13:06 INFO mapred.JobClient:     Map output bytes=16249470
13/08/22 12:13:06 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=2715418624
13/08/22 12:13:06 INFO mapred.JobClient:     CPU time spent (ms)=54867
13/08/22 12:13:06 INFO mapred.JobClient:     Map input bytes=23182589
13/08/22 12:13:06 INFO mapred.JobClient:     SPLIT_RAW_BYTES=476
13/08/22 12:13:06 INFO mapred.JobClient:     Combine input records=580424
13/08/22 12:13:06 INFO mapred.JobClient:     Reduce input records=398109
13/08/22 12:13:06 INFO mapred.JobClient:     Reduce input groups=199055
13/08/22 12:13:06 INFO mapred.JobClient:     Combine output records=398109
13/08/22 12:13:06 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=1899667456
13/08/22 12:13:06 INFO mapred.JobClient:     Reduce output records=199055
13/08/22 12:13:06 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=3492704256
13/08/22 12:13:06 INFO mapred.JobClient:     Map output records=580424
13/08/22 12:13:06 INFO bayes.BayesDriver: Calculating weight sums for labels 
and features...
13/08/22 12:13:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
13/08/22 12:13:06 INFO mapred.FileInputFormat: Total input paths to process : 1
13/08/22 12:13:06 INFO mapred.JobClient: Running job: job_201308211228_0022
13/08/22 12:13:07 INFO mapred.JobClient:  map 0% reduce 0%
13/08/22 12:13:34 INFO mapred.JobClient:  map 100% reduce 0%
13/08/22 12:13:46 INFO mapred.JobClient:  map 100% reduce 33%
13/08/22 12:13:49 INFO mapred.JobClient:  map 100% reduce 91%
13/08/22 12:13:55 INFO mapred.JobClient:  map 100% reduce 100%
13/08/22 12:13:57 INFO mapred.JobClient: Job complete: job_201308211228_0022
13/08/22 12:13:57 INFO mapred.JobClient: Counters: 31
13/08/22 12:13:57 INFO mapred.JobClient:   Job Counters 
13/08/22 12:13:57 INFO mapred.JobClient:     Launched reduce tasks=1
13/08/22 12:13:57 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=26983
13/08/22 12:13:57 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
13/08/22 12:13:57 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
13/08/22 12:13:57 INFO mapred.JobClient:     Rack-local map tasks=2
13/08/22 12:13:57 INFO mapred.JobClient:     Launched map tasks=2
13/08/22 12:13:57 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=16187
13/08/22 12:13:57 INFO mapred.JobClient:   File Input Format Counters 
13/08/22 12:13:57 INFO mapred.JobClient:     Bytes Read=8449612
13/08/22 12:13:57 INFO mapred.JobClient:   File Output Format Counters 
13/08/22 12:13:57 INFO mapred.JobClient:     Bytes Written=6686249
13/08/22 12:13:57 INFO mapred.JobClient:   FileSystemCounters
13/08/22 12:13:57 INFO mapred.JobClient:     FILE_BYTES_READ=11661732
13/08/22 12:13:57 INFO mapred.JobClient:     HDFS_BYTES_READ=334
13/08/22 12:13:57 INFO mapred.JobClient:     ASV_BYTES_READ=8449612
13/08/22 12:13:57 INFO mapred.JobClient:     ASV_BYTES_WRITTEN=6686249
13/08/22 12:13:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=17570077
13/08/22 12:13:57 INFO mapred.JobClient:   Map-Reduce Framework
13/08/22 12:13:57 INFO mapred.JobClient:     Map output materialized 
bytes=5830559
13/08/22 12:13:57 INFO mapred.JobClient:     Map input records=199054
13/08/22 12:13:57 INFO mapred.JobClient:     Reduce shuffle bytes=5830559
13/08/22 12:13:57 INFO mapred.JobClient:     Spilled Records=576780
13/08/22 12:13:57 INFO mapred.JobClient:     Map output bytes=13541939
13/08/22 12:13:57 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=2301886464
13/08/22 12:13:57 INFO mapred.JobClient:     CPU time spent (ms)=28294
13/08/22 12:13:57 INFO mapred.JobClient:     Map input bytes=8248381
13/08/22 12:13:57 INFO mapred.JobClient:     SPLIT_RAW_BYTES=334
13/08/22 12:13:57 INFO mapred.JobClient:     Combine input records=597162
13/08/22 12:13:57 INFO mapred.JobClient:     Reduce input records=192260
13/08/22 12:13:57 INFO mapred.JobClient:     Reduce input groups=182319
13/08/22 12:13:57 INFO mapred.JobClient:     Combine output records=192260
13/08/22 12:13:57 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=1415127040
13/08/22 12:13:57 INFO mapred.JobClient:     Reduce output records=182319
13/08/22 12:13:57 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=2786963456
13/08/22 12:13:57 INFO mapred.JobClient:     Map output records=597162
13/08/22 12:13:57 INFO bayes.BayesDriver: Calculating the weight Normalisation 
factor for each class...
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: Sigma_k for Each Label
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: {spam=NaN, ham=NaN}
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: Sigma_kSigma_j for 
each Label and for each Features
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: NaN
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: Vocabulary Count
13/08/22 12:13:57 INFO bayes.BayesThetaNormalizerDriver: 182316.0
13/08/22 12:13:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
13/08/22 12:13:58 INFO mapred.FileInputFormat: Total input paths to process : 1
13/08/22 12:13:58 INFO mapred.JobClient: Running job: job_201308211228_0023
13/08/22 12:13:59 INFO mapred.JobClient:  map 0% reduce 0%
13/08/22 12:14:26 INFO mapred.JobClient:  map 50% reduce 0%
13/08/22 12:14:35 INFO mapred.JobClient:  map 50% reduce 16%
13/08/22 12:14:37 INFO mapred.JobClient:  map 100% reduce 16%
13/08/22 12:14:50 INFO mapred.JobClient:  map 100% reduce 100%
13/08/22 12:14:52 INFO mapred.JobClient: Job complete: job_201308211228_0023
13/08/22 12:14:52 INFO mapred.JobClient: Counters: 30
13/08/22 12:14:52 INFO mapred.JobClient:   Job Counters 
13/08/22 12:14:52 INFO mapred.JobClient:     Launched reduce tasks=1
13/08/22 12:14:52 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=14813
13/08/22 12:14:52 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
13/08/22 12:14:52 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
13/08/22 12:14:52 INFO mapred.JobClient:     Launched map tasks=2
13/08/22 12:14:52 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=21890
13/08/22 12:14:52 INFO mapred.JobClient:   File Input Format Counters 
13/08/22 12:14:52 INFO mapred.JobClient:     Bytes Read=8449612
13/08/22 12:14:52 INFO mapred.JobClient:   File Output Format Counters 
13/08/22 12:14:52 INFO mapred.JobClient:     Bytes Written=158
13/08/22 12:14:52 INFO mapred.JobClient:   FileSystemCounters
13/08/22 12:14:52 INFO mapred.JobClient:     FILE_BYTES_READ=532
13/08/22 12:14:52 INFO mapred.JobClient:     HDFS_BYTES_READ=334
13/08/22 12:14:52 INFO mapred.JobClient:     ASV_BYTES_READ=8449612
13/08/22 12:14:52 INFO mapred.JobClient:     ASV_BYTES_WRITTEN=158
13/08/22 12:14:52 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=80569
13/08/22 12:14:52 INFO mapred.JobClient:   Map-Reduce Framework
13/08/22 12:14:52 INFO mapred.JobClient:     Map output materialized bytes=82
13/08/22 12:14:52 INFO mapred.JobClient:     Map input records=199054
13/08/22 12:14:52 INFO mapred.JobClient:     Reduce shuffle bytes=82
13/08/22 12:14:52 INFO mapred.JobClient:     Spilled Records=6
13/08/22 12:14:52 INFO mapred.JobClient:     Map output bytes=4242761
13/08/22 12:14:52 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=1543569408
13/08/22 12:14:52 INFO mapred.JobClient:     CPU time spent (ms)=10655
13/08/22 12:14:52 INFO mapred.JobClient:     Map input bytes=8248381
13/08/22 12:14:52 INFO mapred.JobClient:     SPLIT_RAW_BYTES=334
13/08/22 12:14:52 INFO mapred.JobClient:     Combine input records=199054
13/08/22 12:14:52 INFO mapred.JobClient:     Reduce input records=3
13/08/22 12:14:52 INFO mapred.JobClient:     Reduce input groups=2
13/08/22 12:14:52 INFO mapred.JobClient:     Combine output records=3
13/08/22 12:14:52 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=842493952
13/08/22 12:14:52 INFO mapred.JobClient:     Reduce output records=2
13/08/22 12:14:52 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=2315640832
13/08/22 12:14:52 INFO mapred.JobClient:     Map output records=199054
13/08/22 12:14:52 INFO common.HadoopUtil: Deleting bayes-model/trainer-docCount
13/08/22 12:14:52 INFO common.HadoopUtil: Deleting 
bayes-model/trainer-termDocCount
13/08/22 12:14:52 INFO common.HadoopUtil: Deleting 
bayes-model/trainer-featureCount
13/08/22 12:14:52 INFO common.HadoopUtil: Deleting bayes-model/trainer-wordFreq
13/08/22 12:14:53 INFO common.HadoopUtil: Deleting 
bayes-model/trainer-tfIdf/trainer-vocabCount
13/08/22 12:14:53 INFO driver.MahoutDriver: Program took 321794 ms
13/08/22 12:15:02 WARN driver.MahoutDriver: No testclassifier.props found on 
classpath, will use command-line arguments only
13/08/22 12:15:02 INFO bayes.TestClassifier: Loading model from: 
{basePath=bayes-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs, 
gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, 
testDirPath=prepared-test}
13/08/22 12:15:02 INFO bayes.TestClassifier: Testing Bayes Classifier
13/08/22 12:15:04 INFO io.SequenceFileModelReader: Read 50000 feature weights
13/08/22 12:15:04 INFO io.SequenceFileModelReader: Read 100000 feature weights
13/08/22 12:15:04 INFO io.SequenceFileModelReader: Read 150000 feature weights
13/08/22 12:15:05 INFO io.SequenceFileModelReader: NaN
13/08/22 12:15:06 INFO datastore.InMemoryBayesDatastore: spam NaN NaN NaN
13/08/22 12:15:06 INFO datastore.InMemoryBayesDatastore: ham NaN NaN NaN
13/08/22 12:15:19 INFO bayes.TestClassifier: Classified instances from ham.txt
13/08/22 12:15:21 INFO bayes.TestClassifier: Classified instances from spam.txt
13/08/22 12:15:21 INFO bayes.TestClassifier: 
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :          0             0%
Incorrectly Classified Instances        :         23           100%
Total Classified Instances              :         23

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       <--Classified as
0       0       3        |  3           a     = spam
0       0       20       |  20          b     = ham
0       0       0        |  0           c     = unknown
Default Category: unknown: 2


13/08/22 12:15:21 INFO driver.MahoutDriver: Program took 19842 ms

Reply via email to