Hi, looking at the org.apache.mahout.classifier.sgd.TrainNewsGroups
examples class, it seems that the online nature of the SGD logistic
regression will always be dependent on the order in which the classifier is
trained.

There is a call to randomize the order in which the newsgroup files are
read in on line 112 of TrainNewsGroups (the Collections.shuffle(files);
call). This means that the output of the TrainNewsGroups main method will
be non-deterministic.

I am specifically looking at the weights put into the
org.mahout.classifier.sgd.ModelDissector core class.

Is there a way to make the feature weights deterministic, no matter the
order of the input training vectors?

Reply via email to