Hi , This request is in referece to the 20-newsgroups Classification Example on the below link https://mahout.apache.org/users/classification/twenty-newsgroups.html
I am able to run the example and get the results as mentioned in the link, but when I am trying to do this example without the split command the results are not same. Also when I try to run the other test data against the same model results are not accurate. Can we have this example run without the split command ? Basically I am trying to do this : I took both the datasets for training & testing. Run below commands on both sets: 1. seqdirectory 2. seq2sparse Now I have vectors generated for both datasets. - Run trainnb command using first dataset's vectors output. So instead of training a model on 80% of the data, I am using the whole dataset. - Run testnb command using second dataset's vectors output. This is not the 20% of the data, it's completely new dataset, solely used for testing. So instead of using mahout split, we I have specified separate dataset for testing the model. Results for this exercise is totally different then what I get when I am using split command to split the data . Thanks & Regards, Alok R. Tanna