Re: Interpretation of cluster output

2014-06-16 Thread Kamesh
Thanks for the response Andrew. I am using Mahout 0.9 version. However, I tried with trunk version but still I am getting output in the following format C-55{n=1 c=[15993058.000] r=[]} C-56{n=2 c=[15993061.167] r=[]} C-57{n=1 c=[15993062.000] r=[]} C-97{n=1 c=[15993103.000] r=[]} C-98{n=2

Re: Mahout SingularValueDecomposition gives wrong answer

2014-06-16 Thread Han Fan
Thanks Chris and Ted. I see where the problem is now. And thanks to Chris for your reminder of the transposes. if (arg.numRows() arg.numCols()) { transpositionNeeded = true; } ... public Matrix getU() { if (transpositionNeeded) { //case numRows() numCols() return new

divide a vector (sum) by a double, error

2014-06-16 Thread Patrice Seyed
Hi all, I have attempted to write a method centroid() that 1) sums a HashSet of org.apache.mahout.math.Vector (vectors that are DenseVector), and 2) (org.apache.mahout.math.Vector.divide) divides the summed vector by its size, as a double. I get an error: Exception in thread main

Re: divide a vector (sum) by a double, error

2014-06-16 Thread Ted Dunning
Patrice, This sounds like a classpath problem more than code error. Are you sure that you can run any program that use Mahout? Do you perhaps have two versions of Mahout floating around? Regarding the code, this is a more compact idiom for the same thing: Matrix m = ...; Vector

Re: divide a vector (sum) by a double, error

2014-06-16 Thread Sebastian Schelter
Its also not a good idea to put the vectors into a hashset, i don't think we have equals and hashcode correctly implemented for that Am 16.06.2014 18:21 schrieb Ted Dunning ted.dunn...@gmail.com: Patrice, This sounds like a classpath problem more than code error. Are you sure that you can

Re: divide a vector (sum) by a double, error

2014-06-16 Thread Ted Dunning
On Mon, Jun 16, 2014 at 9:51 AM, Sebastian Schelter ssc.o...@googlemail.com wrote: Its also not a good idea to put the vectors into a hashset, i don't think we have equals and hashcode correctly implemented for that Woof. You are right. This is a very important point and one that I should

ALS, weighed vs. non-weighed regularization paper

2014-06-16 Thread Dmitriy Lyubimov
Probably a question for Sebastian. As we know, the two papers (Hu-Koren-Volynsky and Zhou et. al) use slightly different loss functions. Zhou et al. are fairly unique in that they multiply norm of U, V vectors additionally by the number of observied interactions. The paper doesn't explain why

Re: ALS, weighed vs. non-weighed regularization paper

2014-06-16 Thread Dmitriy Lyubimov
PS it might even be the case that hyperactive users simply are not as informative as users that buy fewer items, and vice versa, which may have some explanation based on information enthropy each such observation set has, but if this is really why it works, it has nothing to do with desire to

Re: ALS, weighed vs. non-weighed regularization paper

2014-06-16 Thread Sean Owen
Yeah I've turned that over in my head. I am not sure I have a great answer. But I interpret the net effect to be that the model prefers simple explanations for active users, at the cost of more error in the approximation. One would rather pick a basis that more naturally explains the data observed

Re: ALS, weighed vs. non-weighed regularization paper

2014-06-16 Thread Ted Dunning
It may actually be that they weren't solving the problem they thought. By regularizing prolific users more vigorously, they may actually have just been down-weighting them. We effectively do the same in ISJ by down-sampling the data. It is very important to do so, but not because of

Re: ALS, weighed vs. non-weighed regularization paper

2014-06-16 Thread Dmitriy Lyubimov
yeah so that was my best guess as well. nothing to do with regularization, just importance weighing. The reason i was asking becase i was traditionally including do WR/ do not do WR as a training parameter but wasn't sure if it had much sense. Now i was revisiting this for M-1365 again. i guess

a seemingly benign test that fails MahoutTestCase

2014-06-16 Thread Wei Zhang
Hello, I am wrting a simple Unit test, which extends MahoutTestCase. I am using Mahout 0.9 and Hadoop 1.0.3. The test is quite simple, it just creates an empty file on HDFS. The entire coding is the following: @Test public void simpleTest() throws IOException{ Configuration

Re: a seemingly benign test that fails MahoutTestCase

2014-06-16 Thread Ted Dunning
Attachments are stripped by the mailing list. Can you throw the stack trace onto a github gist or something? On Mon, Jun 16, 2014 at 3:27 PM, Wei Zhang w...@us.ibm.com wrote: Hello, I am wrting a simple Unit test, which extends MahoutTestCase. I am using Mahout 0.9 and Hadoop 1.0.3.

Re: a seemingly benign test that fails MahoutTestCase

2014-06-16 Thread Wei Zhang
Hello Ted, Thank you very much for the reply! I have created a github gist that records the error message at https://gist.github.com/anonymous/21a2f36c995c1d39717b I am sorry that I forgot to attach the error log, it is something that looks like this (just in case that github gist is displaying

Re: a seemingly benign test that fails MahoutTestCase

2014-06-16 Thread Ted Dunning
Close the FileSystem object (fs). On Mon, Jun 16, 2014 at 6:22 PM, Wei Zhang w...@us.ibm.com wrote: Hello Ted, Thank you very much for the reply! I have created a github gist that records the error message at https://gist.github.com/anonymous/21a2f36c995c1d39717b I am sorry that I

Re: a seemingly benign test that fails MahoutTestCase

2014-06-16 Thread Wei Zhang
Thanks, Ted. I did try to do the fs.close(); The code looks like this: @Test public void simpleTest() throws IOException{ Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path filenamePath = new

Re: a seemingly benign test that fails MahoutTestCase

2014-06-16 Thread Ted Dunning
Well, the problem is that there is a thread leak. It is not clear whether the file creation succeeded. It isn't that surprising that there is a thread leak since Hadoop code is pretty cavalier about this and the Hadoop test cases don't check for this. I think that there might be a way to signal