Hi, looking at the org.apache.mahout.classifier.sgd.TrainNewsGroups examples class, it seems that the online nature of the SGD logistic regression will always be dependent on the order in which the classifier is trained.
There is a call to randomize the order in which the newsgroup files are read in on line 112 of TrainNewsGroups (the Collections.shuffle(files); call). This means that the output of the TrainNewsGroups main method will be non-deterministic. I am specifically looking at the weights put into the org.mahout.classifier.sgd.ModelDissector core class. Is there a way to make the feature weights deterministic, no matter the order of the input training vectors?
