Re: distributed SVD

2010-05-07 Thread Jake Mannix
On Thu, May 6, 2010 at 2:18 PM, Ted Dunning wrote: > > The only issues with the current distributed Lanczos solver is storage for > the auxiliary matrices as they are produced. Jake intimated that the had a > solution for that that wasn't prime-time yet. > Measuring exactly how much this can aff

Re: distributed SVD

2010-05-07 Thread Jake Mannix
On Thu, May 6, 2010 at 2:18 PM, Ted Dunning wrote: > > The only issues with the current distributed Lanczos solver is storage for > the auxiliary matrices as they are produced. Jake intimated that the had a > solution for that that wasn't prime-time yet. > I think what you were referring to, act

Re: Replacing the Netflix data set

2010-05-07 Thread Grant Ingersoll
What about the Last.fm data that is released under non-commercial usage restrictions? On May 6, 2010, at 10:07 AM, Sean Owen wrote: > Looks like the Netflix data set is no longer officially available, and > has been disappearing from the internet. Too bad, since I wrote up > Chapter 6 of the boo

Re: Replacing the Netflix data set

2010-05-07 Thread Sean Owen
I like it, though looks like it's only half a million ratings. I was shooting for 100M or more. On Fri, May 7, 2010 at 2:27 PM, Grant Ingersoll wrote: > What about the Last.fm data that is released under non-commercial usage > restrictions?

Installation problem in utils

2010-05-07 Thread Saul Moncada
Hi, I'm having an issue installing mahout to run the taste-web example. I'm following these instructions (http://lucene.apache.org/mahout/taste.html#demo): To build and run the demo, follow the instructions below, which are written for Unix-like operating systems: 1. Obtain a copy of the Mah

Re: Installation problem in utils

2010-05-07 Thread Sean Owen
To be honest I am also having the same issues, and I do not understand how. My build fails in utils/ with *no* error message. While it seems like it must be related to my last set of changes, I don't see how so, since these are Maven-level dependency problems. Is anyone else seeing any such thing?

Re: Replacing the Netflix data set

2010-05-07 Thread Pedro Oliveira
This dataset seems to have a few million triples from last.fm: http://mtg.upf.edu/node/1671 Cheers, Pedro On Fri, May 7, 2010 at 9:29 AM, Sean Owen wrote: > I like it, though looks like it's only half a million ratings. I was > shooting for 100M or more. > > On Fri, May 7, 2010 at 2:27 PM, Gra

Re: Replacing the Netflix data set

2010-05-07 Thread Sean Owen
Cool, yeah I'm looking for something even larger, since this is small enough that processing it easily fits on one computer. The chapter in question is about distributing via Hadoop. My current next-best option, if it can be used, is the LiveJournal network data here: http://snap.stanford.edu/data

Re: Installation problem in utils

2010-05-07 Thread Sebastian Schelter
I'm seeing a very strange compilation problem: Somehow LuceneIterableTest tries to extend org.apache.mahout.math.MahoutTestCase which does not exist, but it correctly imports org.apache.mahout.common.MahoutTestCase... mvn -e -DskipTests=true clean install [ERROR] BUILD FAILURE [INFO] --

Re: Installation problem in utils

2010-05-07 Thread Sean Owen
It exists, in math/. core/ depends on math/, and utils/ depends on core/, so all should be well. I'm confused since I don't see this, but I do see other problems. I'm going to delete everything and start over again. On Fri, May 7, 2010 at 5:57 PM, Sebastian Schelter wrote: > I'm seeing a very s

Re: Installation problem in utils

2010-05-07 Thread Sebastian Schelter
I got mvn -DskipTests=true clean install to build successfully, I did not execute the tests though. I had to add org.apache.mahout mahout-math ${project.version} test-jar test in utils/pom.xml and examples/pom.xml (seems like this is not transitively resol

Re: Installation problem in utils

2010-05-07 Thread Saul Moncada
Are these changes in the repository? Regards, SM On Fri, May 7, 2010 at 1:16 PM, Sebastian Schelter wrote: > I got mvn -DskipTests=true clean install to build successfully, I did > not execute the tests though. > > I had to add > >     >      org.apache.mahout >      mahout-math >      ${projec

mahout in action

2010-05-07 Thread Tamas Jambor
I am looking into the distributed part of the recommendation. Sean mentioned that there is a book Mahout in Action coming out. I had a look at the table of content. I've been working with recommended systems (and mahout) for quite a long time now, I assume that chapter 2-5 wouldn't really give m

Re: mahout in action

2010-05-07 Thread Sean Owen
It more or less walks through org.apache.mahout.cf.taste.hadoop.item, how it works, and the issues involved. Later I'd like for it to cover the SVD-based implementation that will be written this summer too. If you're pretty into the code already, it may not add much value. But hey there's the whol

Re: Installation problem in utils

2010-05-07 Thread Sean Owen
No I'm still working on this -- Sebastian is correct but that fix unleashes even more mess... On Fri, May 7, 2010 at 7:13 PM, Saul Moncada wrote: > Are these changes in the repository?

Re: Installation problem in utils

2010-05-07 Thread Sean Owen
OK, think I fixed it. Sebastian was substantially right, but I don't understand why examples and utils need to depend directly on math when core does already? ... and then that uncovered some more problems that actually should have been addressed in the first patch, but somehow I was not seeing th

Re: Replacing the Netflix data set

2010-05-07 Thread Jake Mannix
If you're willing to live with a social network graph for this example, then even bigger than LJ, but still public, is the twitter social graph, available as a torrent , which I've also put on S3 and just need to make public at some point. It has 1.47 bil

Re: Replacing the Netflix data set

2010-05-07 Thread Sean Owen
Now we're talking, that's perfect I think. I will set about getting a copy and if it seems to work out, will ask about using the data set as a topic for the chapter. On Fri, May 7, 2010 at 9:25 PM, Jake Mannix wrote: > If you're willing to live with a social network graph for this example, then >

Re: Installation problem in utils

2010-05-07 Thread Tamas Jambor
it still fails here: :( testCreateTermFrequencyVectors(org.apache.mahout.utils.vectors.text.DictionaryVectorizerTest and the log: --- Test set: org.apache.mahout.utils.vectors.text.DictionaryVectorizerTest -

Re: Installation problem in utils

2010-05-07 Thread Sean Owen
Ah, one straggler. This happens to not fail for me since "/" is writable on a Mac. I fixed it and after running all tests, verified that nothing showed up in "/". On Fri, May 7, 2010 at 10:42 PM, Tamas Jambor wrote: > it still fails here: :( > > testCreateTermFrequencyVectors(org.apache.mahout.ut

Re: Installation problem in utils

2010-05-07 Thread Tamas Jambor
thanks. On 07/05/2010 22:57, Sean Owen wrote: Ah, one straggler. This happens to not fail for me since "/" is writable on a Mac. I fixed it and after running all tests, verified that nothing showed up in "/".

Creating Vectors for KMeans

2010-05-07 Thread david.stu...@progressivealliance.co.uk
​Hi All, I am trying to create a vector file to go into KMeans clustering Algorithm. The Data I have is in Solr and I have followed this tutorial https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text and used this command bin/mahout lucene.vector --dir /solr/data/in

Re: Installation problem in utils

2010-05-07 Thread Rob Marano
THANK YOU! -- Rob Marano (E) robmar...@gmail.com (Skype) robmarano (O) 646-461-1732 Arnold H. Glasow - "Success is simple. Do what's right, the right way, at the right time." On Fri, May 7, 2010 at 5:57 PM, Sean Owen wrote: > Ah, one straggler. This happens to not fail for me since "/" is > wri

Re: Replacing the Netflix data set

2010-05-07 Thread Ted Dunning
Along those lines, there is the wikipedia link graph: http://users.on.net/~henry/home/wikipedia.htm On Fri, May 7, 2010 at 1:31 PM, Sean Owen wrote: > Now we're talking, that's perfect I think. I will set about getting a > copy and if it seems to work out, will ask about using the data set as >