Re: Circular dependency issue between examples and integration?

2012-04-10 Thread Sean Owen
/mahout-integration/RecommenderServlet?userID=1debug=true Sean, do you have any link which specifies correct steps to run recommendation demo? Thanks, Yugang On Tue, Jan 24, 2012 at 2:31 PM, Sean Owen sro...@gmail.com wrote: Backing up a sec -- I looked, and there is not a circular

Re: Evalutation of recommenders

2012-04-10 Thread Sean Owen
You're talking about recommendations now... are we talking about a clustering, classification or recommender system? In general I don't know if it makes sense for business users to be deciding aspects of the internal model. At most someone should input the tradeoffs -- how important is accuracy

Re: Evalutation of recommenders

2012-04-10 Thread Sean Owen
You are making recommendations, and you want to do this via clustering. OK, that's fine. How you implement it isn't so important -- it's that you have some parameters to change and want to know how any given process does. You just want to use some standard recommender metrics, to start, I'd

Re: Clustering question

2012-04-09 Thread Sean Owen
I think you would cluster these like any other text document. The centroid of each cluster tells you where the cluster is in feature-space, but the features are just words. If you find the features (words) with largest absolute value, those ought to be the words that appear frequently in the

Re: Combining CF content-based recommendations

2012-04-08 Thread Sean Owen
In theory this is what the system is learning for you, that there is some pattern to the preferences and so someone who likes Julia Roberts's movies would tend to be recommended more of them. So I suppose I'd advise against making a pseudo-item out of a feature unless you have specific, new

Re: citing mahout

2012-04-08 Thread Sean Owen
I don't know if there's any particular preferred format. I think you'd generally cite the web site, and follow any standard citation format for that. Sean On Sun, Apr 8, 2012 at 8:24 PM, Ahmed Abdeen Hamed ahmed.elma...@gmail.comwrote: Hello, Is there a specific format the Mahout developers

Re: Combining CF content-based recommendations

2012-04-06 Thread Sean Owen
(Hmm, I don't know why it doesn't post to the mailing list. We get a message about moderating everything. I'll copy it to the list now.) In #1, you describe the usual user-item preference matrix. Yes it's sparse. I guess you could make up pseudo-items like genre in the matrix, yes, if you had

Re: Error: ... overrides final method tokenStream

2012-04-06 Thread Sean Owen
This means you have an incompatible version of Lucene in your app at runtime. Use the same one Mahout uses. On Fri, Apr 6, 2012 at 2:21 PM, Tristan Slominski tristan.slomin...@gmail.com wrote: Hello group, I managed to get Mahout running.. awesome! But I keep on running into issues that break

Re: Error: ... overrides final method tokenStream

2012-04-06 Thread Sean Owen
It's coming from somewhere else then. I think you'd want to examine the rest of the classpaths. You do not need to put Lucene jars in the classpath yourself. It will just cause issues. You will need to make sure you're looking at the remote cluster's classpath, if it's remote. On Fri, Apr 6,

Re: Mahout beginner questions...

2012-04-05 Thread Sean Owen
It might or might not be interesting to comment on this discussion in light of the new product/project I mentioned last night, Myrrix. It's definitely an example of precisely this two-layered architecture we've been discussing on this thread. http://myrrix.com/design/ The nice thing about a

Re: recommend ads using mahout?

2012-04-04 Thread Sean Owen
I would recommend you use (only) the ad data. These are boolean data points in the recommender engine speak. You can 'recommend' ads this way. I understand your question is a bit more than that. First you want to use the *not*-clicked data. My first question is, is this meaningful? I am served

Re: Cancel running distributed RecommenderJob

2012-04-04 Thread Sean Owen
also cancel mahout sub tasks. Do you think it could work that way? On 02/04/12 19:05, Sean Owen wrote: You can use the Hadoop interface itself (like, the command-line hadoop tool) to kill a job by its ID. If you kill one MapReduce job the entire process should halt after that. On Mon

Commercializing Mahout: the Myrrix recommender platform

2012-04-04 Thread Sean Owen
Dear all -- I've long promised (threatened?) to begin efforts to commercialize Apache Mahout. Given my line of work in VC, I see evidence for positive symbiosis between open source and commercial enterprise. We have evidence from the growth in user base and mailing list, as well as the Mahout in

Re: What equation do NDCG used ?

2012-04-04 Thread Sean Owen
It's the same formula, what do you think is different? On Wed, Apr 4, 2012 at 9:35 PM, ziad kamel ziad.kame...@gmail.com wrote: Hi , I checked the code for NDCG and it seems not same as http://en.wikipedia.org/wiki/Discounted_cumulative_gain How that formula was derived ? Thanks

Re: What equation do NDCG used ?

2012-04-04 Thread Sean Owen
;        } Second thing is that it have a relevance number rel which the formula don't use. On Wed, Apr 4, 2012 at 3:54 PM, Sean Owen sro...@gmail.com wrote: It's the same formula, what do you think is different? On Wed, Apr 4, 2012 at 9:35 PM, ziad kamel ziad.kame...@gmail.com wrote: Hi , I checked

Re: What equation do NDCG used ?

2012-04-04 Thread Sean Owen
No, re-read my last message. The ordering matters, since the discount changes at each position. On Wed, Apr 4, 2012 at 10:36 PM, ziad kamel ziad.kame...@gmail.com wrote: It seems that having a recommended list that is for example 9, 23, 8 or 8 , 9 , 23 will give same NDGC , since it just

Re: Commercializing Mahout: the Myrrix recommender platform

2012-04-04 Thread Sean Owen
Not with the Apache license... it's not copyleft. The GNU license might require this. On Wed, Apr 4, 2012 at 11:43 PM, Darren Govoni dar...@ontrenet.com wrote: The short answer is that they have to open their source. So anything they do to the original code is readily available to all.

Re: Pre-configured Mahout on the cloud

2012-04-03 Thread Sean Owen
This is lightly covered in Mahout in Action but yes there is really little more to know. You upload the job jar and run it like anything else in AWS. On Apr 3, 2012 10:24 AM, Sebastian Schelter s...@apache.org wrote: None that I'm aware of. But its supereasy to use Mahout in EMR: You need to

Re: Cancel running distributed RecommenderJob

2012-04-02 Thread Sean Owen
You can use the Hadoop interface itself (like, the command-line hadoop tool) to kill a job by its ID. If you kill one MapReduce job the entire process should halt after that. On Mon, Apr 2, 2012 at 6:44 PM, Sören Brunk soren.br...@deri.org wrote: Hi, I'm using the distributed RecommenderJob

Re: User Similarity and neighborhoods

2012-04-01 Thread Sean Owen
(Why not read the code first? We kinda reserve the mailing list for more specific questions from after you've tried the basics.) On Sun, Apr 1, 2012 at 3:13 PM, ziad kamel ziad.kame...@gmail.com wrote: Do Mahout compute the similarity between every pair of users to determine their

Re: How to customize A-B Similarity, not default A-B similarity?

2012-03-30 Thread Sean Owen
You don't want to do this. Similarity only makes sense if it's symmetric. Instead, you probably want to weight at the point that the similarity is used. Compute it normally, then weight depending on which item is what. On Fri, Mar 30, 2012 at 8:02 AM, tianwild tianwild...@hotmail.com wrote: Hi

Re: Getting InMemBuilder to use more mappers

2012-03-30 Thread Sean Owen
L Shaw jls...@uw.edu wrote: Suggestion, indeed. I passed that option, but still only 2 mappers were created. On Thu, Mar 29, 2012 at 5:23 PM, Sean Owen sro...@gmail.com wrote: Hadoop is what chooses the number of mappers, and it bases it on input size. Generally

Re: Getting InMemBuilder to use more mappers

2012-03-30 Thread Sean Owen
I think the real cause is perhaps that the implementation is not fully fleshed out. I haven't looked at it, but I'm sure that if you find additions and improvements you could post them and get them committed. I am probably missing something basic, but you seemed to say at the outset that you

Re: CityBlockSimilarity details

2012-03-29 Thread Sean Owen
Nope it's the sum of the absolute values of differences in ratings, for your purposes. On Thu, Mar 29, 2012 at 7:29 PM, ziad kamel ziad.kame...@gmail.com wrote: City block distance or Manhattan distance Wikipedia define it for points as http://en.wikipedia.org/wiki/Taxicab_geometry So how

Re: CityBlockSimilarity details

2012-03-29 Thread Sean Owen
Like I think we've said, it depends on your data. I expect that some similarity metrics will work better than others. Why is hard to say without knowing anything about your data. I don't understand your previous question about representation. I just gave you the definition of city-block distance.

Re: Getting InMemBuilder to use more mappers

2012-03-29 Thread Sean Owen
Hadoop is what chooses the number of mappers, and it bases it on input size. Generally it will not assign less than one worker per chunk and a chunk is usually 64MB (still, I believe). You can override this directly (well, at least, register a suggestion to Hadoop). I would tell you the exact flag

Re: CityBlockSimilarity details

2012-03-29 Thread Sean Owen
What top items? I am not sure what you're referring to here, but, no I do not expect things to be identical when changing metrics in general. I've already answered your other question. On Thu, Mar 29, 2012 at 10:52 PM, ziad kamel ziad.kame...@gmail.com wrote: OK, things become more clear .

Re: Getting InMemBuilder to use more mappers

2012-03-29 Thread Sean Owen
(If you're using a modern version of Hadoop, the flag is something different, so make sure you check what the real value is.) There's another option concerning minimum split size that you could reduce from its default too. On Thu, Mar 29, 2012 at 11:05 PM, Jason L Shaw jls...@uw.edu wrote:

Re: Evaluation - score vs precision

2012-03-28 Thread Sean Owen
There is not necessarily a relation, but, a good recommender ought to be good at predicting ratings, and ought to return good recommendations. So yes you would generally expect a low error when you get a high precision, but there is not a direct connection. On Wed, Mar 28, 2012 at 5:52 PM, ziad

Re: How AveragingPreferenceInferrer works?

2012-03-28 Thread Sean Owen
It pretends that any non-existent preference actually exists and is equal to the user's average preference. It is only done for purposes of computing similarity. It does not actually set a value in the model. On Wed, Mar 28, 2012 at 10:45 PM, ziad kamel ziad.kame...@gmail.com wrote: Hi , It

Re: java.lang.NullPointerException when using mysql

2012-03-27 Thread Sean Owen
Make sure you use the latest MySQL driver. Are you sure that one of your columns is not NULL? On Tue, Mar 27, 2012 at 6:42 AM, 344911009 mudaom...@vip.qq.com wrote: windows XP 2G  runing MovieSite ,use mysql-connector-java-5.1.13-bin.jar,  userID and movieID are INTEGER ,but still has errors

Re: Why ignoring preferences return higher precision ? Questions on Boolean preferences

2012-03-27 Thread Sean Owen
Any similarity metric works to the extent that its assumptions match the data's reality. Pearson's key assumption is that ratings scale proportionally with our degree of like or dislike for a thing. That is only sort of how people rate things. Really a 1 or 2 (on a scale of 5) means I am sort of

Re: Mahout beginner questions...

2012-03-26 Thread Sean Owen
I'm sure he's referring to the off-line model-building bit, not an online component. On Mon, Mar 26, 2012 at 9:27 AM, Razon, Oren oren.ra...@intel.com wrote: By saying: At Veoh, we built our models from several billion interactions on a tiny cluster you meant that you used the distributed

Re: Mahout beginner questions...

2012-03-26 Thread Sean Owen
necessarily need to load the entire intermediate file (similarity results) into the memory?! -Original Message- From: Sean Owen [mailto:sro...@gmail.com] Sent: Monday, March 26, 2012 11:48 To: user@mahout.apache.org Subject: Re: Mahout beginner questions... I'm sure he's referring

Re: Mahout beginner questions...

2012-03-26 Thread Sean Owen
An SQL database doesn't have much role to play in this kind of system, and that's no criticism of RDBMSes. The algorithms operate on very simple, nearly unstructured data and are essentially read-only. So the complexity of keys and transactions is just overhead. The simple, non-distributed

Re: cluster-based recommendation algorithm

2012-03-26 Thread Sean Owen
Can it be implemented? sure, but what you see is what is available. If you want a different clustering approach you would have to implement it. The algorithm there is not k-means. On Mon, Mar 26, 2012 at 8:49 PM, Ahmed Abdeen Hamed ahmed.elma...@gmail.com wrote: Hello, This might sound trivial

Re: Why I am getting different precision using 32 vs 64 bit

2012-03-26 Thread Sean Owen
This is no useful detail at all. What algorithm are you even running?? On Mon, Mar 26, 2012 at 11:29 PM, ziad kamel ziad.kame...@gmail.com wrote:  Dear developers , I run some recommendations on mahout of 32 and 64 bit machines (Ubuntu) . I found out that on 32 bit I am getting higher

Re: Significant - serendipity in recommending

2012-03-25 Thread Sean Owen
Au contraire, you can do exactly this with an IDRescorer. Divide by (the log of) and item's occurrences for example to penalize popular items. I don't recommend this. Stuff like the log-likelihood metric is already in a sense accounting for things that are just generally popular and normalizing

Re: Mahout beginner questions...

2012-03-25 Thread Sean Owen
but a good way to boost up speed could be to use caching recommender, meaning computing the recommendations in advanced (refresh it every X min\hours) and always recommend using the most updated recommendations, right?! -Original Message- From: Sean Owen [mailto:sro...@gmail.com] Sent: Sunday

Re: Mahout beginner questions...

2012-03-25 Thread Sean Owen
me if I'm wrong but a good way to boost up speed could be to use caching recommender, meaning computing the recommendations in advanced (refresh it every X min\hours) and always recommend using the most updated recommendations, right?! -Original Message- From: Sean Owen

Re: HadoopUtil

2012-03-24 Thread Sean Owen
Why are you posting to Mahout lists, 3 times, if you are asking about Hadoop? Etiquette foul. On Mar 24, 2012 10:41 AM, Bahadır Yılmaz bahadiryi...@gmail.com wrote: Hi everyone, i have a problem with HadoopUtil.overwriteOutput(**outPath).In intellij idea,i am using maven project and

Re: Significant - serendipity in recommending

2012-03-24 Thread Sean Owen
Define significant? On Sat, Mar 24, 2012 at 1:38 PM, ziad kamel ziad.kame...@gmail.com wrote: Dear developers, How can I know that the recommendations I get from Mahout is significant ? Is there a way to know that there is serendipity in recommending using certain recommender than other ?

Re: Mahout beginner questions...

2012-03-22 Thread Sean Owen
1. These are the JDBC-related classes. For example see MySQLJDBCDiffStorage or MySQLJDBCDataModel in integration/ 2. The distributed and non-distributed code are quite separate. At this scale I don't think you can use the non-distributed code to a meaningful degree. For example you could

Re: Mahout beginner questions...

2012-03-22 Thread Sean Owen
will be to use model based recommenders. Saying this, I wonder why there is such few model based recommenders, especially considering the fact that Mahout contain several data mining models implemented already? -Original Message- From: Sean Owen [mailto:sro...@gmail.com] Sent: Thursday

Re: Error Running mahout-core-0.5-job.jar

2012-03-22 Thread Sean Owen
That pretty much means what it says = delete temp. On Thu, Mar 22, 2012 at 6:06 PM, jeanbabyxu jessica...@aexp.com wrote: Thanks so much tianwild for pointing out the typo. Now it's running but I got a different error msg: Exception in thread main

Re: Error Running mahout-core-0.5-job.jar

2012-03-22 Thread Sean Owen
Yes. This prevents accidental overwrite, and mimics how Hadoop/HDFS generally act. On Thu, Mar 22, 2012 at 6:58 PM, jeanbabyxu jessica...@aexp.com wrote: I was able to manually clear out the output directory by using bin/hadoop dfs -rmr output. But do we have to remove all content in the

Re: How to add classes into mahout-score-0.5-job.jar?

2012-03-22 Thread Sean Owen
It is wherever you compiled your own classes -- it's up to you. SIMILARITY_EUCLEDEAN_DISTANCE is not a class. You should use 0.6 anyway. While you may find you have to make minor modifications if following the book, it's 99% compatible. On Thu, Mar 22, 2012 at 8:07 PM, jeanbabyxu

Re: Merging similarities from two different approaches

2012-03-22 Thread Sean Owen
What do you mean that you have a user-item association from a log-likelihood metric? Combining two values is easy in the sense that you can average them or something, but only if they are in the same units. Log likelihood may be viewed as a probability. The distance function you derive from it --

Re: Merging similarities from two different approaches

2012-03-22 Thread Sean Owen
now. Thanks very much, -Ahmed On Thu, Mar 22, 2012 at 5:26 PM, Sean Owen sro...@gmail.com wrote: What do you mean that you have a user-item association from a log-likelihood metric? Combining two values is easy in the sense that you can average them or something, but only

Re: Merging similarities from two different approaches

2012-03-22 Thread Sean Owen
Yes, but you can't use it as both things at once. I meant that you swap them at the broadest level -- at your original input. So all items are really users and vice versa. At the least you need two separate implementations, encapsulating two different notions of similarity. Similarity is

Re: Error Running mahout-core-0.5-job.jar

2012-03-21 Thread Sean Owen
It's -Dmapred.output.dir=output not --Dmapred.output.dir=output (one dash), but, that's not even the problem. I don't think you can specify -D options this way, as they are JVM arguments. You need to configure these in Hadoop's config files. This is not specific to Mahout. On Wed, Mar 21, 2012 at

Re: MongoDBDataModel in memory ?

2012-03-20 Thread Sean Owen
If you don't need Hadoop then this is pretty simple. You can just write a nested loop that computes all pairs off an ItemSimilarity implementation. If I recall rightly GenericItemSimilarity will do that for you off an existing ItemSimilarity and then has the results in memory as a new

Re: multiple Database-based data with Mahout

2012-03-20 Thread Sean Owen
No there is not such support right now. The most useful piece of code would be a DataModel implementation that combines the data in several other DataModels. That would easily let you read from several databases. The hard part there is merging data sets (what if two DBs have data for one

Re: Edit Distance

2012-03-19 Thread Sean Owen
No I don't think that really comes into play in any of the ML algorithms here. At least I do not recall seeing it. On Mon, Mar 19, 2012 at 3:44 PM, Ahmed Abdeen Hamed ahmed.elma...@gmail.com wrote: Hello, Does Mahout have support for Edit Distance between two Strings? I looked on the web

Re: MongoDBDataModel in memory ?

2012-03-18 Thread Sean Owen
Yep it's all in memory -- it would be too slow to access it out of Mongo. The purpose is just making it easy to read and re-read data into Mongo, and facilitate updates. If the data is too big to fit in memory you should look first at pruning your data -- can sampling 10% of it still give you

Re: Export to MongoDB

2012-03-17 Thread Sean Owen
What do you mean by indexed here? On Sat, Mar 17, 2012 at 10:56 PM, Pat Ferrel p...@occamsmachete.com wrote: I need to digest some mahout files and merge them into a MongoDB database. Since digesting would be a lot easier if the mahout keys were indexed I wonder if a seqdumper --format json

Re: ClassNotFoundException while using RecommenderJob

2012-03-15 Thread Sean Owen
You shouldn't have to add anything to your jar, if you use the supplied 'job' file which contains all transitive dependencies. If you do add your own jars, I think you need to unpack and repack them, not put them into the overall jar as a jar file, even with a MANIFEST.MF entry. I am not sure that

Re: ClassNotFoundException while using RecommenderJob

2012-03-15 Thread Sean Owen
that only the clustering and classification parts of mahout are really able to be distributed on a hadoop cluster. 2012/3/15 Sean Owen sro...@gmail.com You shouldn't have to add anything to your jar, if you use the supplied 'job' file which contains all transitive dependencies. If you do add

Re: Injecting content into item-item CF

2012-03-13 Thread Sean Owen
is: is there a way to compute these similarities offline? Thanks very much, -Ahmed On Tue, Mar 6, 2012 at 5:14 PM, Sean Owen sro...@gmail.com wrote: Sure, you just write your own ItemSimilarity implementation based on the content, whatever that may be. what you do there is mostly up to you; there's

Re: Injecting content into item-item CF

2012-03-13 Thread Sean Owen
Before I answer, I want to make sure we're on the same page. You are definitely describing a search problem. Was my guess at how you are also adding in something recommender-related accurate? Otherwise we may be talking past each other again. On Tue, Mar 13, 2012 at 5:35 PM, Ahmed Abdeen Hamed

Re: Injecting content into item-item CF

2012-03-13 Thread Sean Owen
OK, you have some users. You have some items, and those items have attributes. Nothing here connects users to items though, so how can any process estimate any additional user-item connections? You could compute item-item similarities, but that doesn't resolve this. Sorry I am really confused

Re: Injecting content into item-item CF

2012-03-13 Thread Sean Owen
genre, director, actor, and year of release. Using such an implementation within a traditional item This is the part that I am trying to understand and have a solution for. Thanks, -Ahmed On Tue, Mar 13, 2012 at 2:08 PM, Sean Owen sro...@gmail.com wrote: OK, you have some users

Re: The recommendation algorithm behind org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

2012-03-13 Thread Sean Owen
Yes it's item-based only. --similarityClassname chooses the metric but it is item-based. On Tue, Mar 13, 2012 at 11:53 PM, Rich cchuang...@gmail.com wrote: Hi, I have been digging into Mahout on Hadoop for the pas few days. I was wondering the recommendation algorithm that is used in

Re: Item Recommendations - Time based

2012-03-12 Thread Sean Owen
You can implement your own custom ItemSimilarity that computes this metric, or anything else you can imagine. In fact there is already a bit of API in DataModel for storing and retrieving timestamps too, so this should be easy. It's probably a bit easier said than done given the exact logic

Re: Cluster-based recommenders

2012-03-12 Thread Sean Owen
Sure -- to do this, you simply flip your items and users. Feed item IDs as user IDs and vice versa. Then you have a system that recommends users to items, really. And you can use clustering if you like, to do that. In fact you can use any algorithm. Sean On Mon, Mar 12, 2012 at 1:56 PM, Ahmed

Re: Item Recommendations - Time based

2012-03-12 Thread Sean Owen
Similarity computations need to be very fast. I don't know if you can pre-compute them since they're time-dependent and I assume need to use up-to-the-second information. You'll need to store something in memory to make this fast enough. That can make scale a problem, but, I am also guessing you

Re: Item Recommendations - Time based

2012-03-12 Thread Sean Owen
OK if that's the case, put the pre-computed values in a GenericItemSimilarity and you're done. Hadoop most certainly does not help you compute anything 'on the fly'. It might help you precompute. Don't worry about distribution until you're sure you have a big scale problem, and that usually takes

Re: Item Recommendations - Time based

2012-03-12 Thread Sean Owen
(It's out there as TanimotoCoefficientSimilarity -- not named JaccardSimilarity or anything.) On Mon, Mar 12, 2012 at 10:59 PM, Ted Dunning ted.dunn...@gmail.com wrote: I would generally recommend using the LLR similarity. But if you have an itch, scratch it.  I do think we have a tanimoto

Re: Trouble with deriving popular items from mahout

2012-03-11 Thread Sean Owen
This isn't a recommender problem -- it's simpler. It sounds like you just want to count the most frequently occurring items, and pairs of items. That's just a question of counting. On Sun, Mar 11, 2012 at 12:32 PM, mahout user mahoutu...@gmail.com wrote: Hello group, I am new to mahout..I am

Re: Trouble with deriving popular items from mahout

2012-03-11 Thread Sean Owen
No, it's so easy you can do it in about 20 lines of code so I don't think it really warrants a software component. On Sun, Mar 11, 2012 at 12:39 PM, mahout user mahoutu...@gmail.com wrote: Thanks Sean Owen,   is it any class available with mahout for doing this stuff?

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Sean Owen
If by #3 you mean you have preferences for many users, this is of course the standard input for a recommender, yes. If you also have some user-user similarity info beyond that, you can implement UserSimliarity and use GenericUserBasedRecommender to incorporate that. If you want to boost items

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Sean Owen
, 2012 at 6:25 PM, Sean Owen sro...@gmail.com wrote: If by #3 you mean you have preferences for many users, this is of course the standard input for a recommender, yes. If you also have some user-user similarity info beyond that, you can implement UserSimliarity and use GenericUserBasedRecommender

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Sean Owen
Recommender implementation which blends both item-based and user-based recommendations? On Sat, Mar 10, 2012 at 9:06 PM, Sean Owen sro...@gmail.com wrote: It really depends on what you mean by based on time, as it could mean many things. I'm assuming you mean that an item's seasonality should

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Sean Owen
, 2012 at 9:38 PM, Sean Owen sro...@gmail.com wrote: It sounds like you have substantially a search problem. You know the user's attributes, you know the items' attributes, and are just finding the closest match. That by itself doesn't need a recommender at all; it would just be extra complexity

Re: How/where to run DisplayKMeans example

2012-03-09 Thread Sean Owen
This means you are running on a headless machine without a monitor. The program needs to show a window with graphics but cant. On Mar 9, 2012 6:48 AM, rahul raghavendhra rahulraghavendh...@gmail.com wrote: hi Lance, i tried as u said, but now i got a new exception Exception in thread main

Re: R: Using recommenders with String identifiers

2012-03-09 Thread Sean Owen
In this case, the code in question is the non-distributed code rather than Hadoop. But yes I agree it will make a perhaps bigger difference on Hadoop. All of the Hadoop stuff uses integer keys. On Fri, Mar 9, 2012 at 2:10 AM, Paritosh Ranjan pran...@xebia.com wrote: Are these identifiers used as

Re: why log-likelihood similarity is faster than Tanimoto coefficient

2012-03-08 Thread Sean Owen
I don't expect they are different in speed. Both do about exactly the same thing and finish with a simple computation. On Thu, Mar 8, 2012 at 9:52 AM, Ayad Al-Qershi alqer...@gmail.com wrote: Dear All, can anyone tell me why running the recommender job with log-likelihood similarity performs

Re: Using recommenders with String identifiers

2012-03-08 Thread Sean Owen
No. It used to work this way, but was removed just because you get much better memory and performance using longs. It would be a lot of surgery to undo this. The best answer is to use longs. If you must use strings, IDMigrator does the trick quite well. On Thu, Mar 8, 2012 at 1:27 PM, Claudia

Re: packaging a recommender as a war file

2012-03-07 Thread Sean Owen
Yes this doesn't exist as a push-button solution anymore. There is no target that builds a .war. However it's pretty easy to resurrect the script from 0.5, or, simply configure your IDE to build a .war with the Mahout .jar, your .jar, and a one-liner web.xml that configures RecommenderServlet.

Re: override mapreduce compression?

2012-03-07 Thread Sean Owen
The client can override cluster defaults unless the cluster marks them final. On Wed, Mar 7, 2012 at 9:02 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Aren't hadoop site.xml settings on the driver's client usually overshadow whatever it is on the cluster? Or you don't have the privs to change

Re: DistributedRowMatrix - FileNotFoundException

2012-03-07 Thread Sean Owen
DistributedRowMatrix operates on IntWritable,VectorWritable in a sequence file, and it looks like you're feeding text. No, it doesn't accept some text-based format. On Wed, Mar 7, 2012 at 8:41 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ pmjimenez1...@hotmail.com wrote: Sorry but I can't understand how

Re: packaging a recommender as a war file

2012-03-07 Thread Sean Owen
RecommenderService.jws is a JWS file, which is one standard for making SOAP-based web services. RecommenderServlet is a 'raw' servlet wrapper. Both are just wrappers around a Recommender that expose it over HTTP. Neither is quite REST-ful; both are JavaEE, yes. You can do anything you want here

Re: DistributedRowMatrix - FileNotFoundException

2012-03-06 Thread Sean Owen
Your input is still text though, and I assume your'e trying to use TextInputFormat. You can't do this as it expects an IntWritable, and that means it expects input as a sequence file, via SequenceFileInputFormat. On Tue, Mar 6, 2012 at 7:21 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ

Re: override mapreduce compression?

2012-03-06 Thread Sean Owen
Mapper compression? -Dmapreduce.map.output.compress=false. I think the key was mapred.output.compress in Hadoop 0.20.0. I am not sure if there is reducer compression built-in, but, I could have missed it. On Tue, Mar 6, 2012 at 9:40 PM, Luke Forehand luke.foreh...@networkedinsights.com wrote:

Re: Injecting content into item-item CF

2012-03-06 Thread Sean Owen
Sure, you just write your own ItemSimilarity implementation based on the content, whatever that may be. what you do there is mostly up to you; there's not a framework for this. On Tue, Mar 6, 2012 at 10:09 PM, Ahmed Abdeen Hamed ahmed.elma...@gmail.com wrote: Hello friends, Is there an example

Re: override mapreduce compression?

2012-03-06 Thread Sean Owen
which is why I'm trying to override this param).  Passing -Dkey=value on the mahout command line does not seem to have any effect on the mapreduce job configuration from what I can tell.  Any ideas? -Luke On 3/6/12 3:48 PM, Sean Owen sro...@gmail.com wrote: Mapper compression

Re: override mapreduce compression?

2012-03-06 Thread Sean Owen
and in the longterm probably come up with a cleaner way to do this. Thanks! -Luke On 3/6/12 6:24 PM, Sean Owen sro...@gmail.com wrote: -D arguments are to the JVM so need to be set in HADOOP_OPTS (as I recall). Or you configure this in your Hadoop config files. It has no meaning to the driver script. Why

Re: override mapreduce compression?

2012-03-06 Thread Sean Owen
in the first place? Here is the header of one of the reducer parts that was written into /mahout/kmeans/clusters-5-final SEQ org.apache.hadoop.io.Text+org.apache.mahout.clustering.kmeans.Cluster )org.apache.hadoop.io.compress.SnappyCodec On 3/6/12 6:33 PM, Sean Owen sro...@gmail.com wrote

Re: Washing machines - Mahout algorithm advice

2012-03-03 Thread Sean Owen
I answered on SO: The only thing I can think of that sounds like this problem is PageRank. It's computed by a sort of iterative simluation. Each page has some influence (color) which flows via its links (socks its washed with) and at some point the page influence reaches a steady state (final

Re: Cassandra Data Model

2012-02-29 Thread Sean Owen
CassandraDataModel is not related to HMM. Maybe you could be more specific here. On Feb 29, 2012 4:43 AM, Srinivas Krishnan shrin.krish...@gmail.com wrote: I am currently designing my Data Model for a small cassandra cluster and wanted to incorporate the HMM model from Mahout. I could not find

Re: Cassandra Data Model

2012-02-29 Thread Sean Owen
Data model, which I am guessing is support for mapping on to Columns, SuperColumns etc or am I mistaken ? -srinivas On Wed, Feb 29, 2012 at 9:23 AM, Sean Owen sro...@gmail.com wrote: CassandraDataModel is not related to HMM. Maybe you could be more specific here. On Feb 29, 2012 4:43

Re: Cassandra Data Model

2012-02-29 Thread Sean Owen
to Hadoop. -srinivas On Wed, Feb 29, 2012 at 10:30 AM, Sean Owen sro...@gmail.com wrote: That is for non distributed recomenders, not using Hadoop. For anything else using Hadoop you use Cassandra by using it as an input to Hadoop. It is not specific to Mahout. On Feb 29, 2012 3:23 PM

Re: Item Recommender Does not read Filedatamodel

2012-02-29 Thread Sean Owen
Caused by: java.lang.IllegalArgumentException: Bad line: 444,25414 This is your problem. On Wed, Feb 29, 2012 at 12:21 PM, VIGNESH PRAJAPATI vignesh2...@gmail.comwrote: Hello Mahout Group, When i am going to rum my ItemBased Recommender on below given Dataset structure.It gives me this

Re: problem:while running RecommenderJob over Hadoop

2012-02-28 Thread Sean Owen
Your job file is corrupt or missing. Verify its there and try rebuilding. On Feb 28, 2012 7:54 AM, manish dunani manishd...@gmail.com wrote: I am newbie to mahout. can any body help me out to solve the following error.? When ever i try to run RecommenderJob over apache hadoop i got the

Re: problem:while running RecommenderJob over Hadoop

2012-02-28 Thread Sean Owen
this I didn't get any idea. sean owen: Your job file is corrupt or missing. Verify its there and try rebuilding. I am newbie to mahout. can any body  help me out to solve the following error.? When ever i try to run RecommenderJob over apache hadoop i got the following error:(Reference

Re: Mahout sample datasets for Recommender, classifier and clustering

2012-02-28 Thread Sean Owen
Oh its very easy: tr ; , in.csv | tr \ out.csv Or something close. On Feb 28, 2012 7:31 PM, VIGNESH PRAJAPATI vignesh2...@gmail.com wrote: Hello Daniel Glauser , Thanks for your suggestion, but I have 2,00,000 raws in my Csv file.so its require great modification. for solution,I want

Re: Documentation error for GenericUserPreferenceArray?

2012-02-27 Thread Sean Owen
Definitely a typo in the second passage. Ill fix when I get home unless someone beats me to it. On Feb 27, 2012 3:35 PM, Don Smith dsm...@likewise.com wrote: The documentation for GenericUserPreferenceArray says Like {@link GenericItemPreferenceArray} but stores preferences for one user (all

Re: Support of HBase

2012-02-16 Thread Sean Owen
I think this thread is talking about at least 4 different things. 1. There is no HBaseDataModel for non-distributed code, that uses the HBase driver presumably, but could be like there is CassandraDataModel. That's what I was talking about. 2. You could use a JDBC driver for HBase with

Re: Mahout Hosting Provider

2012-02-16 Thread Sean Owen
No, it's a library that you run where you like. There's no hosting for it per se but yeah you could run on Amazon. On Thu, Feb 16, 2012 at 8:30 AM, VIGNESH PRAJAPATI vignesh2...@gmail.com wrote: Hi Folks,  I am new to mahout.I want to know that is there any mahout hosting provider for Apache

Re: Update Mahout Wiki with latest Mahout Versions

2012-02-16 Thread Sean Owen
Hmm. I updated it in SVN and thought our fancy new svnpubsub system was supposed to push that for us. I'll ask if there's something else we need to do. On Thu, Feb 16, 2012 at 5:17 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Could someone update the Mahout wiki - http://mahout.apache.org

<    1   2   3   4   5   6   7   8   9   10   >