Re: Bayes classifier can't get model when running on Hadoop

Wangda.Tan Mon, 17 Oct 2011 01:50:57 -0700

It seems that I can't put attachment to mail list directly, so I've paste
there:


--------------------
Running on hadoop, using
HADOOP_HOME=/Users/hadoop/project/private/Release-1_1_0_0-branch/hadoop/had
oop-0.20.205/
HADOOP_CONF_DIR=/Users/hadoop/project/private/171_hadoop_conf
Warning: $HADOOP_HOME is deprecated.

11/10/17 16:21:42 INFO bayes.TrainClassifier: Training Bayes Classifier
11/10/17 16:21:43 INFO common.HadoopUtil: Deleting bayes-model
11/10/17 16:21:43 INFO bayes.BayesDriver: Reading features...
11/10/17 16:21:43 DEBUG mapred.JobClient: adding the following namenodes'
delegation tokens:null
11/10/17 16:21:43 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/10/17 16:21:43 DEBUG mapred.JobClient: default FileSystem:
hdfs://hdsh171.lss.emc.com
11/10/17 16:21:50 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token
354 for hadoop on 10.37.7.171:8020
11/10/17 16:21:50 INFO security.TokenCache: Got dt for
hdfs://hdsh171.lss.emc.com/tmp/hadoop-mapred/mapred/staging/hadoop/.staging
/job_201110162043_0041;uri=10.37.7.171:8020;t.service=10.37.7.171:8020
11/10/17 16:21:50 DEBUG mapred.JobClient: Creating splits at
hdfs://hdsh171.lss.emc.com/tmp/hadoop-mapred/mapred/staging/hadoop/.staging
/job_201110162043_0041
11/10/17 16:21:50 INFO mapred.FileInputFormat: Total input paths to
process : 1
11/10/17 16:21:50 DEBUG mapred.FileInputFormat: Total # of splits: 2
11/10/17 16:21:50 DEBUG mapred.JobClient: Printing tokens for job:
job_201110162043_0041
11/10/17 16:21:50 DEBUG mapred.JobClient: Submitting with
HDFS_DELEGATION_TOKEN token 354 for hadoop on 10.37.7.171:8020
11/10/17 16:21:50 INFO mapred.JobClient: Running job: job_201110162043_0041
11/10/17 16:21:51 INFO mapred.JobClient:  map 0% reduce 0%
11/10/17 16:22:07 INFO mapred.JobClient:  map 50% reduce 0%
11/10/17 16:22:10 INFO mapred.JobClient:  map 100% reduce 0%
11/10/17 16:22:16 INFO mapred.JobClient:  map 100% reduce 33%
11/10/17 16:22:21 INFO mapred.JobClient:  map 100% reduce 100%
11/10/17 16:22:26 INFO mapred.JobClient: Job complete:
job_201110162043_0041
11/10/17 16:22:27 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.FileInputFormat$Counter with bundle
11/10/17 16:22:27 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.JobInProgress$Counter with bundle
11/10/17 16:22:27 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.FileOutputFormat$Counter with bundle
11/10/17 16:22:27 DEBUG mapred.Counters: Creating group FileSystemCounters
with nothing
11/10/17 16:22:27 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.Task$Counter with bundle
11/10/17 16:22:27 INFO mapred.JobClient: Counters: 27
11/10/17 16:22:27 INFO mapred.JobClient:   Job Counters
11/10/17 16:22:27 INFO mapred.JobClient:     Launched reduce tasks=1
11/10/17 16:22:27 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=19918
11/10/17 16:22:27 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
11/10/17 16:22:27 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
11/10/17 16:22:27 INFO mapred.JobClient:     Rack-local map tasks=1
11/10/17 16:22:27 INFO mapred.JobClient:     Launched map tasks=2
11/10/17 16:22:27 INFO mapred.JobClient:     Data-local map tasks=1
11/10/17 16:22:27 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14116
11/10/17 16:22:27 INFO mapred.JobClient:   File Input Format Counters
11/10/17 16:22:27 INFO mapred.JobClient:     Bytes Read=6006
11/10/17 16:22:27 INFO mapred.JobClient:   File Output Format Counters
11/10/17 16:22:27 INFO mapred.JobClient:     Bytes Written=47021
11/10/17 16:22:27 INFO mapred.JobClient:   FileSystemCounters
11/10/17 16:22:27 INFO mapred.JobClient:     FILE_BYTES_READ=51923
11/10/17 16:22:27 INFO mapred.JobClient:     HDFS_BYTES_READ=6234
11/10/17 16:22:27 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=180219
11/10/17 16:22:27 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=47021
11/10/17 16:22:27 INFO mapred.JobClient:   Map-Reduce Framework
11/10/17 16:22:27 INFO mapred.JobClient:     Map output materialized
bytes=51929
11/10/17 16:22:27 INFO mapred.JobClient:     Map input records=12
11/10/17 16:22:27 INFO mapred.JobClient:     Reduce shuffle bytes=51929
11/10/17 16:22:27 INFO mapred.JobClient:     Spilled Records=3384
11/10/17 16:22:27 INFO mapred.JobClient:     Map output bytes=57532
11/10/17 16:22:27 INFO mapred.JobClient:     Map input bytes=4003
11/10/17 16:22:27 INFO mapred.JobClient:     Combine input records=2048
11/10/17 16:22:27 INFO mapred.JobClient:     SPLIT_RAW_BYTES=228
11/10/17 16:22:27 INFO mapred.JobClient:     Reduce input records=1692
11/10/17 16:22:27 INFO mapred.JobClient:     Reduce input groups=1569
11/10/17 16:22:27 INFO mapred.JobClient:     Combine output records=1692
11/10/17 16:22:27 INFO mapred.JobClient:     Reduce output records=1205
11/10/17 16:22:27 INFO mapred.JobClient:     Map output records=2048
11/10/17 16:22:27 INFO bayes.BayesDriver: Calculating Tf-Idf...
11/10/17 16:22:27 INFO common.BayesTfIdfDriver: Counts of documents in
Each Label
11/10/17 16:22:27 INFO common.BayesTfIdfDriver: {lucene=4.0, mahout=4.0,
spamassasin=4.0}
11/10/17 16:22:27 INFO common.BayesTfIdfDriver: {dataSource=hdfs,
alpha_i=1.0, minDf=1, gramSize=1}
11/10/17 16:22:27 DEBUG mapred.JobClient: adding the following namenodes'
delegation tokens:null
11/10/17 16:22:27 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/10/17 16:22:27 DEBUG mapred.JobClient: default FileSystem:
hdfs://hdsh171.lss.emc.com
11/10/17 16:22:33 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token
355 for hadoop on 10.37.7.171:8020
11/10/17 16:22:33 INFO security.TokenCache: Got dt for
hdfs://hdsh171.lss.emc.com/tmp/hadoop-mapred/mapred/staging/hadoop/.staging
/job_201110162043_0042;uri=10.37.7.171:8020;t.service=10.37.7.171:8020
11/10/17 16:22:33 DEBUG mapred.JobClient: Creating splits at
hdfs://hdsh171.lss.emc.com/tmp/hadoop-mapred/mapred/staging/hadoop/.staging
/job_201110162043_0042
11/10/17 16:22:33 INFO mapred.FileInputFormat: Total input paths to
process : 3
11/10/17 16:22:33 DEBUG mapred.FileInputFormat: Total # of splits: 3
11/10/17 16:22:33 DEBUG mapred.JobClient: Printing tokens for job:
job_201110162043_0042
11/10/17 16:22:33 DEBUG mapred.JobClient: Submitting with
HDFS_DELEGATION_TOKEN token 355 for hadoop on 10.37.7.171:8020
11/10/17 16:22:33 INFO mapred.JobClient: Running job: job_201110162043_0042
11/10/17 16:22:34 INFO mapred.JobClient:  map 0% reduce 0%
11/10/17 16:22:49 INFO mapred.JobClient:  map 33% reduce 0%
11/10/17 16:22:54 INFO mapred.JobClient:  map 100% reduce 0%
11/10/17 16:23:04 INFO mapred.JobClient:  map 100% reduce 100%
11/10/17 16:23:09 INFO mapred.JobClient: Job complete:
job_201110162043_0042
11/10/17 16:23:09 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.FileInputFormat$Counter with bundle
11/10/17 16:23:09 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.JobInProgress$Counter with bundle
11/10/17 16:23:09 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.FileOutputFormat$Counter with bundle
11/10/17 16:23:09 DEBUG mapred.Counters: Creating group FileSystemCounters
with nothing
11/10/17 16:23:09 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.Task$Counter with bundle
11/10/17 16:23:09 INFO mapred.JobClient: Counters: 27
11/10/17 16:23:09 INFO mapred.JobClient:   Job Counters
11/10/17 16:23:09 INFO mapred.JobClient:     Launched reduce tasks=1
11/10/17 16:23:09 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=25272
11/10/17 16:23:09 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
11/10/17 16:23:09 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
11/10/17 16:23:09 INFO mapred.JobClient:     Rack-local map tasks=2
11/10/17 16:23:09 INFO mapred.JobClient:     Launched map tasks=3
11/10/17 16:23:09 INFO mapred.JobClient:     Data-local map tasks=1
11/10/17 16:23:09 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14395
11/10/17 16:23:09 INFO mapred.JobClient:   File Input Format Counters
11/10/17 16:23:09 INFO mapred.JobClient:     Bytes Read=46821
11/10/17 16:23:09 INFO mapred.JobClient:   File Output Format Counters
11/10/17 16:23:09 INFO mapred.JobClient:     Bytes Written=17470
11/10/17 16:23:09 INFO mapred.JobClient:   FileSystemCounters
11/10/17 16:23:09 INFO mapred.JobClient:     FILE_BYTES_READ=29171
11/10/17 16:23:09 INFO mapred.JobClient:     HDFS_BYTES_READ=47222
11/10/17 16:23:09 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=161103
11/10/17 16:23:09 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=17470
11/10/17 16:23:09 INFO mapred.JobClient:   Map-Reduce Framework
11/10/17 16:23:09 INFO mapred.JobClient:     Map output materialized
bytes=29183
11/10/17 16:23:09 INFO mapred.JobClient:     Map input records=1202
11/10/17 16:23:09 INFO mapred.JobClient:     Reduce shuffle bytes=29183
11/10/17 16:23:09 INFO mapred.JobClient:     Spilled Records=1678
11/10/17 16:23:09 INFO mapred.JobClient:     Map output bytes=33658
11/10/17 16:23:09 INFO mapred.JobClient:     Map input bytes=46524
11/10/17 16:23:09 INFO mapred.JobClient:     Combine input records=1202
11/10/17 16:23:09 INFO mapred.JobClient:     SPLIT_RAW_BYTES=401
11/10/17 16:23:09 INFO mapred.JobClient:     Reduce input records=839
11/10/17 16:23:09 INFO mapred.JobClient:     Reduce input groups=420
11/10/17 16:23:09 INFO mapred.JobClient:     Combine output records=839
11/10/17 16:23:09 INFO mapred.JobClient:     Reduce output records=420
11/10/17 16:23:09 INFO mapred.JobClient:     Map output records=1202
11/10/17 16:23:09 INFO bayes.BayesDriver: Calculating weight sums for
labels and features...
11/10/17 16:23:09 DEBUG mapred.JobClient: adding the following namenodes'
delegation tokens:null
11/10/17 16:23:09 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/10/17 16:23:09 DEBUG mapred.JobClient: default FileSystem:
hdfs://hdsh171.lss.emc.com
11/10/17 16:23:16 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token
356 for hadoop on 10.37.7.171:8020
11/10/17 16:23:16 INFO security.TokenCache: Got dt for
hdfs://hdsh171.lss.emc.com/tmp/hadoop-mapred/mapred/staging/hadoop/.staging
/job_201110162043_0043;uri=10.37.7.171:8020;t.service=10.37.7.171:8020
11/10/17 16:23:16 DEBUG mapred.JobClient: Creating splits at
hdfs://hdsh171.lss.emc.com/tmp/hadoop-mapred/mapred/staging/hadoop/.staging
/job_201110162043_0043
11/10/17 16:23:16 INFO mapred.FileInputFormat: Total input paths to
process : 1
11/10/17 16:23:16 DEBUG mapred.FileInputFormat: Total # of splits: 2
11/10/17 16:23:16 DEBUG mapred.JobClient: Printing tokens for job:
job_201110162043_0043
11/10/17 16:23:16 DEBUG mapred.JobClient: Submitting with
HDFS_DELEGATION_TOKEN token 356 for hadoop on 10.37.7.171:8020
11/10/17 16:23:16 INFO mapred.JobClient: Running job: job_201110162043_0043
11/10/17 16:23:17 INFO mapred.JobClient:  map 0% reduce 0%
11/10/17 16:23:33 INFO mapred.JobClient:  map 100% reduce 0%
11/10/17 16:23:42 INFO mapred.JobClient:  map 100% reduce 33%
11/10/17 16:23:48 INFO mapred.JobClient:  map 100% reduce 100%
11/10/17 16:23:53 INFO mapred.JobClient: Job complete:
job_201110162043_0043
11/10/17 16:23:53 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.FileInputFormat$Counter with bundle
11/10/17 16:23:53 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.JobInProgress$Counter with bundle
11/10/17 16:23:53 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.FileOutputFormat$Counter with bundle
11/10/17 16:23:53 DEBUG mapred.Counters: Creating group FileSystemCounters
with nothing
11/10/17 16:23:53 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.Task$Counter with bundle
11/10/17 16:23:53 INFO mapred.JobClient: Counters: 27
11/10/17 16:23:53 INFO mapred.JobClient:   Job Counters
11/10/17 16:23:53 INFO mapred.JobClient:     Launched reduce tasks=1
11/10/17 16:23:53 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=20189
11/10/17 16:23:53 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
11/10/17 16:23:53 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
11/10/17 16:23:53 INFO mapred.JobClient:     Rack-local map tasks=1
11/10/17 16:23:53 INFO mapred.JobClient:     Launched map tasks=2
11/10/17 16:23:53 INFO mapred.JobClient:     Data-local map tasks=1
11/10/17 16:23:53 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14172
11/10/17 16:23:53 INFO mapred.JobClient:   File Input Format Counters
11/10/17 16:23:53 INFO mapred.JobClient:     Bytes Read=18934
11/10/17 16:23:53 INFO mapred.JobClient:   File Output Format Counters
11/10/17 16:23:53 INFO mapred.JobClient:     Bytes Written=12454
11/10/17 16:23:53 INFO mapred.JobClient:   FileSystemCounters
11/10/17 16:23:53 INFO mapred.JobClient:     FILE_BYTES_READ=10684
11/10/17 16:23:53 INFO mapred.JobClient:     HDFS_BYTES_READ=19274
11/10/17 16:23:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=97099
11/10/17 16:23:53 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=12454
11/10/17 16:23:53 INFO mapred.JobClient:   Map-Reduce Framework
11/10/17 16:23:53 INFO mapred.JobClient:     Map output materialized
bytes=10690
11/10/17 16:23:53 INFO mapred.JobClient:     Map input records=419
11/10/17 16:23:53 INFO mapred.JobClient:     Reduce shuffle bytes=10690
11/10/17 16:23:53 INFO mapred.JobClient:     Spilled Records=806
11/10/17 16:23:53 INFO mapred.JobClient:     Map output bytes=28400
11/10/17 16:23:53 INFO mapred.JobClient:     Map input bytes=17247
11/10/17 16:23:53 INFO mapred.JobClient:     Combine input records=1257
11/10/17 16:23:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=284
11/10/17 16:23:53 INFO mapred.JobClient:     Reduce input records=403
11/10/17 16:23:53 INFO mapred.JobClient:     Reduce input groups=368
11/10/17 16:23:53 INFO mapred.JobClient:     Combine output records=403
11/10/17 16:23:53 INFO mapred.JobClient:     Reduce output records=368
11/10/17 16:23:53 INFO mapred.JobClient:     Map output records=1257
11/10/17 16:23:53 INFO bayes.BayesDriver: Calculating the weight
Normalisation factor for each class...
11/10/17 16:23:53 INFO bayes.BayesThetaNormalizerDriver: Sigma_k for Each
Label
11/10/17 16:23:53 INFO bayes.BayesThetaNormalizerDriver:
{lucene=16.413062914189613, mahout=17.411160024749904,
spamassasin=16.14911438451097}
11/10/17 16:23:53 INFO bayes.BayesThetaNormalizerDriver: Sigma_kSigma_j
for each Label and for each Features
11/10/17 16:23:53 INFO bayes.BayesThetaNormalizerDriver: 49.97333732345051
11/10/17 16:23:53 INFO bayes.BayesThetaNormalizerDriver: Vocabulary Count
11/10/17 16:23:53 INFO bayes.BayesThetaNormalizerDriver: 364.0
11/10/17 16:23:54 DEBUG mapred.JobClient: adding the following namenodes'
delegation tokens:null
11/10/17 16:23:54 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/10/17 16:23:54 DEBUG mapred.JobClient: default FileSystem:
hdfs://hdsh171.lss.emc.com
11/10/17 16:24:00 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token
357 for hadoop on 10.37.7.171:8020
11/10/17 16:24:00 INFO security.TokenCache: Got dt for
hdfs://hdsh171.lss.emc.com/tmp/hadoop-mapred/mapred/staging/hadoop/.staging
/job_201110162043_0044;uri=10.37.7.171:8020;t.service=10.37.7.171:8020
11/10/17 16:24:00 DEBUG mapred.JobClient: Creating splits at
hdfs://hdsh171.lss.emc.com/tmp/hadoop-mapred/mapred/staging/hadoop/.staging
/job_201110162043_0044
11/10/17 16:24:00 INFO mapred.FileInputFormat: Total input paths to
process : 1
11/10/17 16:24:00 DEBUG mapred.FileInputFormat: Total # of splits: 2
11/10/17 16:24:00 DEBUG mapred.JobClient: Printing tokens for job:
job_201110162043_0044
11/10/17 16:24:00 DEBUG mapred.JobClient: Submitting with
HDFS_DELEGATION_TOKEN token 357 for hadoop on 10.37.7.171:8020
11/10/17 16:24:00 INFO mapred.JobClient: Running job: job_201110162043_0044
11/10/17 16:24:01 INFO mapred.JobClient:  map 0% reduce 0%
11/10/17 16:24:16 INFO mapred.JobClient:  map 50% reduce 0%
11/10/17 16:24:19 INFO mapred.JobClient:  map 100% reduce 0%
11/10/17 16:24:31 INFO mapred.JobClient:  map 100% reduce 100%
11/10/17 16:24:42 INFO mapred.JobClient: Job complete:
job_201110162043_0044
11/10/17 16:24:42 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.FileInputFormat$Counter with bundle
11/10/17 16:24:42 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.JobInProgress$Counter with bundle
11/10/17 16:24:42 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.FileOutputFormat$Counter with bundle
11/10/17 16:24:42 DEBUG mapred.Counters: Creating group FileSystemCounters
with nothing
11/10/17 16:24:42 DEBUG mapred.Counters: Creating group
org.apache.hadoop.mapred.Task$Counter with bundle
11/10/17 16:24:42 INFO mapred.JobClient: Counters: 27
11/10/17 16:24:42 INFO mapred.JobClient:   Job Counters
11/10/17 16:24:42 INFO mapred.JobClient:     Launched reduce tasks=1
11/10/17 16:24:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=25508
11/10/17 16:24:42 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
11/10/17 16:24:42 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
11/10/17 16:24:42 INFO mapred.JobClient:     Rack-local map tasks=1
11/10/17 16:24:42 INFO mapred.JobClient:     Launched map tasks=2
11/10/17 16:24:42 INFO mapred.JobClient:     Data-local map tasks=1
11/10/17 16:24:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14156
11/10/17 16:24:42 INFO mapred.JobClient:   File Input Format Counters
11/10/17 16:24:42 INFO mapred.JobClient:     Bytes Read=18934
11/10/17 16:24:42 INFO mapred.JobClient:   File Output Format Counters
11/10/17 16:24:42 INFO mapred.JobClient:     Bytes Written=200
11/10/17 16:24:42 INFO mapred.JobClient:   FileSystemCounters
11/10/17 16:24:42 INFO mapred.JobClient:     FILE_BYTES_READ=115
11/10/17 16:24:42 INFO mapred.JobClient:     HDFS_BYTES_READ=19274
11/10/17 16:24:42 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=78265
11/10/17 16:24:42 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=200
11/10/17 16:24:42 INFO mapred.JobClient:   Map-Reduce Framework
11/10/17 16:24:42 INFO mapred.JobClient:     Map output materialized
bytes=121
11/10/17 16:24:42 INFO mapred.JobClient:     Map input records=419
11/10/17 16:24:42 INFO mapred.JobClient:     Reduce shuffle bytes=121
11/10/17 16:24:42 INFO mapred.JobClient:     Spilled Records=8
11/10/17 16:24:42 INFO mapred.JobClient:     Map output bytes=10661
11/10/17 16:24:42 INFO mapred.JobClient:     Map input bytes=17247
11/10/17 16:24:42 INFO mapred.JobClient:     Combine input records=419
11/10/17 16:24:42 INFO mapred.JobClient:     SPLIT_RAW_BYTES=284
11/10/17 16:24:42 INFO mapred.JobClient:     Reduce input records=4
11/10/17 16:24:42 INFO mapred.JobClient:     Reduce input groups=3
11/10/17 16:24:42 INFO mapred.JobClient:     Combine output records=4
11/10/17 16:24:42 INFO mapred.JobClient:     Reduce output records=3
11/10/17 16:24:42 INFO mapred.JobClient:     Map output records=419
11/10/17 16:24:42 INFO common.HadoopUtil: Deleting
bayes-model/trainer-docCount
11/10/17 16:24:42 INFO common.HadoopUtil: Deleting
bayes-model/trainer-termDocCount
11/10/17 16:24:42 INFO common.HadoopUtil: Deleting
bayes-model/trainer-featureCount
11/10/17 16:24:42 INFO common.HadoopUtil: Deleting
bayes-model/trainer-wordFreq
11/10/17 16:24:42 INFO common.HadoopUtil: Deleting
bayes-model/trainer-tfIdf/trainer-vocabCount
11/10/17 16:24:43 INFO driver.MahoutDriver: Program took 180367 ms

--------------------------
Thanks




On 10/17/11 4:08 PM, "Grant Ingersoll" <[email protected]> wrote:

>Hi Wangda,
>
>Can you include the logs that were spit out by Mahout?
>
>On Oct 16, 2011, at 10:46 PM, <[email protected]> wrote:
>
>> Hi All,
>> I use a very simple input file as the bayes input (and I tried
>>20newspaper example, it will get same result):
>> ------
>> mahout  Mahout's goal is to build scalable machine learning libraries.
>>With scalable we mean: Scalable to reasonably large data sets. Our core
>>algorithms for clustering, classfication and batch based collaborative
>>filtering are implemented on top of Apache Hadoop using the map/reduce
>>paradigm. However we do not restrict contributions to Hadoop based
>>implementations: Contributions that run on
>> lucene  All deprecations targeted to be removed in version 3.0 were
>>removed. If you are upgrading from version 2.9.1 of Lucene, you have to
>>fix all deprecation warnings in your code base to be able to recompile
>>against this version. This is the first Lucene
>> spamassasin SpamAssassin is a mail filter to identify spam. It is an
>>intelligent email filter which uses a diverse range of tests to identify
>>unsolicited bulk email, more commonly known as Spam. These tests are
>>applied to email headers and content to classify email using advanced
>>statistical methods. In addition,
>> ------
>>
>> And I put the input to a directory named bayes-input, and run the
>>commandline:
>>    bin/mahout trainclassifier -i bayes-input -o bayes-model
>>--classifierType bayes -ng 1 -source hdfs
>> ----
>> After finished training, in bayes-model path, all files' size == 0
>>
>> bin/hadoop fs -ls bayes-model
>> Found 5 items
>> -rw-r--r--   3 hadoop supergroup          0 2011-10-17 10:16
>>/user/hadoop/bayes-model/_SUCCESS
>> drwxrwxrwx   - hadoop supergroup          0 2011-10-17 10:16
>>/user/hadoop/bayes-model/_logs
>> drwxrwxrwx   - hadoop supergroup          0 2011-10-17 10:19
>>/user/hadoop/bayes-model/trainer-tfIdf
>> drwxrwxrwx   - hadoop supergroup          0 2011-10-17 10:19
>>/user/hadoop/bayes-model/trainer-thetaNormalizer
>> drwxrwxrwx   - hadoop supergroup          0 2011-10-17 10:18
>>/user/hadoop/bayes-model/trainer-weights
>> ----
>> And I use this model to classify new data, all sample will be
>>classified to "unknown"
>>
>> My Environment:
>>
>> 1.  Os     : cent-os 5
>> 2.  Mahout : 0.5
>> 3.  Hadoop : 0.20.205
>>
>> Thanks,
>> Wangda
>>
>
>--------------------------------------------
>Grant Ingersoll
>http://www.lucidimagination.com
>Lucene Eurocon 2011: http://www.lucene-eurocon.com
>

Re: Bayes classifier can't get model when running on Hadoop

Reply via email to