This is a resource for other discussions about the SGD implementation in Mahout:
http://find.searchhub.org/?q=mahout+sgd One important point is that this SGD implementation requires input data to be heavily randomized. ----- Original Message ----- | From: "Alex Goldstein" <[email protected]> | To: [email protected] | Sent: Monday, October 29, 2012 10:18:27 AM | Subject: SGD in Mahout | | | Hi, hope anyone can help me out. | In the company I work at we are running SGD algorithms using STATA | and recently testing out Mahout and R as we need to run the model on | a lot of data. | STATA has been the preference from the analytics group and | confortable with the results. | An initial test in R gave similar results, but processing times were | really slow in comparison. | Now trying out Mahout, and using the trainlogistic with the input | file, correct target and predictive variable, the speed is great, | but the results are way off of what we expected. | The coefficients of the function are nothing even close. | | Can anyone point me in the right direction on how to write our own | code to run sgd algorithm in mahout. Haven't found much | documentation regarding this, even in teh book Mahout in Action the | documentation seems scarse. | | In STATA the options for running are very few. Simply run the | logistic regression with target variable and the predictive | variables and thats it. | | I'm sure I'll need to write my own code for this, but just wanted som | pointers if anyone had worked with the SGD algorithm extensively. | | Thanks | | Alex |
