Re: [ANNOUNCE] Andrew Musselman, New Mahout PMC Chair

2018-07-19 Thread Sebastian Schelter
Congrats! 2018-07-19 9:31 GMT+02:00 Peng Zhang : > Congrats Andrew! > > On Thu, Jul 19, 2018 at 04:01 Andrew Musselman > > wrote: > > > Thanks Andy, looking forward to it! Thank you too for your support and > > dedication the past two years; here's to continued progress! > > > > Best > > Andrew

Re: UnsatisfiedLinkError: jniViennaCL

2017-05-10 Thread Sebastian Lehrig
to provide a non-viennacl variant as a fall-back for environments currently unsupported? Thanks and regards, Sebastian What I did: 1) installed viennacl brew install viennacl 2) installed gcc brew install gcc --without-multilib 3) modified pom.xml changed: linux-x86_64-viennacl.properties to: macosx

UnsatisfiedLinkError: jniViennaCL

2017-05-09 Thread Sebastian Lehrig
la version 2.10.5 - macOS Sierra 10.12.4 (MacBook Pro, Retina, 13-inch, Early 2015) - graphics card: Intel Iris Graphics 6100 1536 MB Thanks and regards, Sebastian

Re: New site and logo

2017-04-24 Thread Sebastian Feher
unsubscribe On Monday, April 24, 2017 9:08 AM, LuckyBoy wrote: unsubscribe On Sun, Apr 23, 2017 at 10:59 AM, Pat Ferrel wrote: > The Mahout site is moving to Jekyll with a bit if a new look and so it > might be nice to get an update of

Re: spark-itemsimilarity slower than itemsimilarity

2016-09-30 Thread Sebastian
in a distributed setting (task scheduling, serialization...) totally dominate the computation. Best, Sebastian On 30.09.2016 11:11, Arnau Sanchez wrote: Hi! Here you go: "ratings-clean" contains only pairs of (user, product) for those products with 4 or more user interactions (770k ->

Re: spark-itemsimilarity slower than itemsimilarity

2016-09-29 Thread Sebastian
Hi Arnau, I had a look at your ratings file and its kind of strange. Its pretty tiny (770k ratings, 8MB), but it has more than 250k distinct items. Out of these, only 50k have more than 3 interactions. So I think the first thing that you should do is throw out all the items with so few

Re: spark-itemsimilarity slower than itemsimilarity

2016-09-29 Thread Sebastian
Hi Arnau, The links to your logfiles don't work for me unfortunately. Are you sure you correctly setup Spark? That can be a bit tricky in YARN settings, sometimes one machine idles around... Best, Sebastian On 25.09.2016 18:01, Pat Ferrel wrote: AWS EMR is usually not very well suited

Re: Scaling up spark Iitem similarity on big data data sets

2016-06-23 Thread Sebastian
to increase the amount of downsampling. Another thing you can do is to remove the items with the highest amount of interactions from the dataset as they are not very interesting usually (everybody knows the topsellers already) and heavily impact the computation. Best, Sebastian On 23.06.2016 15:47

Re: Recommendation Engine based on Content Filtering

2016-05-06 Thread Sebastian
Please don't post data from real users on this list without their consent. On 06.05.2016 13:26, Sree Eedupuganti wrote: From the below sample data, i have to recommend user based on email any suggestions please. leticia9j...@gmail.com861102Associationshrutn...@aol.com

Re: Does mahout 0.5 fit hadoop-0.20.2?

2014-06-25 Thread Sebastian Schelter
Please use a recent version of mahout. 0.4 and 0.5 are totally outdated. -s On 06/25/2014 09:05 AM, seabiscuit08 wrote: Hi everyone, i am new in mahout. Our hadoop cluster is hadoop-0.20.2 ,i try out mahout-distribution-0.4 lda function, and it works well. But It can't inference new document

Re: divide a vector (sum) by a double, error

2014-06-16 Thread Sebastian Schelter
Its also not a good idea to put the vectors into a hashset, i don't think we have equals and hashcode correctly implemented for that Am 16.06.2014 18:21 schrieb Ted Dunning ted.dunn...@gmail.com: Patrice, This sounds like a classpath problem more than code error. Are you sure that you can

Re: Performance issues in Mahout recommendations

2014-06-06 Thread Sebastian Schelter
You should not use Hadoop for such a tiny dataset. Use the GenericItemBasedRecommender on a single machine in Java. --sebastian On 06/06/2014 11:10 AM, Warunika Ranaweera wrote: Hi, I am using Mahout's recommenditembased algorithm on a data set with nearly 10,000 (implicit) user ratings

Re: Performance issues in Mahout recommendations

2014-06-06 Thread Sebastian Schelter
1M ratings take up something like 20 megabytes. This is a datasize where it does not make any sense to use Hadoop. Just try the single machine implementation. --sebastian On 06/06/2014 12:01 PM, Warunika Ranaweera wrote: Hi Sebastian, Thanks for your prompt response. It's just a sample

Re: Performance issues in Mahout recommendations

2014-06-06 Thread Sebastian Schelter
thought of moving to Mahout. However, it seems like, for now, it's better to go with the single machine implementation. Thanks for your suggestions, Warunika On Fri, Jun 6, 2014 at 3:36 PM, Sebastian Schelter s...@apache.org wrote: 1M ratings take up something like 20 megabytes

Re: Indicator Matrix and Mahout + Solr recommender

2014-05-27 Thread Sebastian Schelter
containing an item with less than n interactions can be ignored. IIRC similar techniques are implemented for cosine and jaccard. Best, Sebastian On 05/27/2014 07:08 PM, Pat Ferrel wrote: On May 27, 2014, at 8:15 AM, Ted Dunning ted.dunn...@gmail.com wrote: The threshold should not normally

Re: Theory behind LogisticRegression in Mahout

2014-05-23 Thread Sebastian Schelter
We should add these links to the LR page on the website. --s On 05/23/2014 03:20 PM, Ted Dunning wrote: Ahh... my error then. Happily, Dmitriy and others have provided the requisite links. On Thu, May 22, 2014 at 11:50 PM, namit maheshwari namitmaheshwa...@gmail.com wrote: No I didnt find

Re: Setting mahout heapsize for rowsimilarity job

2014-05-23 Thread Sebastian Schelter
I don't think you should use RowSimilarity job for that case, if you only have 6 columns. Can you tell us a little bit about the data and what problem your are trying to solve? --sebastian On 05/23/2014 09:03 PM, Suneel Marthi wrote: I had seen this issue too with RSJ until 0.8. Switch

Re: Mahout recommendation in implicit feedback situation

2014-05-05 Thread Sebastian Schelter
0 or 1. --sebastian On 05/03/2014 05:00 PM, Alessandro Suglia wrote: Sorry Sebastian, maybe you haven't the possibility to read the post on SO, so I'll report the code here. I've already used the GenericBooleanPrefUserBasedRecommender in order to generate the recommendation and the results

Re: Fwd: Mahout Naive Bayes CSV Classification

2014-05-04 Thread Sebastian Schelter
/classification/bayesian.html --sebastian On 05/03/2014 07:40 PM, Jossef Harush wrote: I have these 2 CSV files: 1. train-set.csv 2. test-set.csv Both of them are in the same structure (with different content) and similar to this example (http://i.stack.imgur.com/jsckr.png) : [image: enter

Re: Mahout recommendation in implicit feedback situation

2014-05-03 Thread Sebastian Schelter
Hi Allessandro, what result do you expect and what do you get? Can you give a concrete example? --sebastian On 05/03/2014 12:11 PM, Alessandro Suglia wrote: Good morning, I've tried to create a recommender system using Mahout in an implicit feedback situation. What I'm trying to do

Re: Mahout recommendation in implicit feedback situation

2014-05-03 Thread Sebastian Schelter
You should try the org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender which has been built to handle such data. Best, Sebastian On 05/03/2014 04:34 PM, Alessandro Suglia wrote: I have described it in the SO's post: When I execute this code, the result

Re: Future of Frequent Pattern Mining

2014-05-01 Thread Sebastian Schelter
what we already aimed for in the 0.9 release and remove it. I'll prepare a patch. --sebastian On 04/28/2014 10:52 AM, Ted Dunning wrote: One thought is to extract the code, publish on github with warnings about no support. Then if there are requests, we can point them to the GH archive

Future of Frequent Pattern Mining

2014-04-28 Thread Sebastian Schelter
. I'd like to ask our users here on their opionion, is anybody opposed to removing the frequent pattern mining code from Mahout? Please shout out. --sebastian

Re: Future of Frequent Pattern Mining

2014-04-28 Thread Sebastian Schelter
questions and providing documentation. If someone opposes here who has that code in production, that could be a reason to retain it however. People wanting to use the code in the future can always download Mahout 0.9 which has the current implementation. --sebastian On 04/28/2014 08:23 AM, Michael

Re: Reading the wiki

2014-04-28 Thread Sebastian Schelter
Would someone be willing to open a jira ticket for this issue and fix the problem? --sebastian On 04/28/2014 01:05 AM, Ted Dunning wrote: Mathjax is both static content and server. There is an FAQ about this https problem. I think that part of the issue is that they don't use the same URL

Re: Future of Frequent Pattern Mining

2014-04-28 Thread Sebastian Schelter
. On Mon, Apr 28, 2014 at 2:19 AM, Sebastian Schelter s...@apache.org wrote: Hi, I'm resending this mail to also include the users list. To wrap up: We currently have a discussion whether our frequent pattern mining package should stay in the codebase. The original author suggested to remove

Re: Reading the wiki

2014-04-27 Thread Sebastian Schelter
What if we store a copy of the js file on our site and also serve it via https? On 04/27/2014 05:34 AM, Pat Ferrel wrote: Often CMSs have a way to configure https access to be used only for password or other secure areas of the site. No idea if the Apache CMS does this but worth asking. If

Welcome Pat Ferrel as new committer on Mahout

2014-04-24 Thread Sebastian Schelter
Hi, this is to announce that the Project Management Committee (PMC) for Apache Mahout has asked Pat Ferrel to become committer and we are pleased to announce that he has accepted. Being a committer enables easier contribution to the project since in addition to posting patches on JIRA it

Re: Spark Mahout with a CLI?

2014-04-20 Thread Sebastian Schelter
).columnIds()) RecommendationExamplesHelper.saveIndicatorMatrix(hashedCrossIndicatorMatrix, hdfs://some/path/for/output”) On Apr 16, 2014, at 10:00 AM, Pat Ferrel p...@occamsmachete.com wrote: Great, and an excellent example is at hand. In it I will play the user and contributor role, Sebastian

Re: org.apache.mahout.math.IndexException

2014-04-20 Thread Sebastian Schelter
Yes, it should give you the necessary information. The important part is this: Apply the patch with patch -p 0 -i path to patch Throw a --dry-run on there if you want to see what happens w/o screwing up your checkout. On 04/20/2014 09:47 PM, Mario Levitin wrote: Thanks Sebastian, I have

Re: simple idea for improving mahout docs over the next month?

2014-04-18 Thread Sebastian Schelter
Hm, I'm not so sure whether introducing another source for documentation than the webpage would be so helpful (there still lots of work to do on the website...), how do others see this? --sebastian On 04/17/2014 05:06 PM, Jay Vyas wrote: Hi sebastian: theoretically, one could extract all

Re: Performance Issue using item-based approach!

2014-04-18 Thread Sebastian Schelter
the recommender via a search engine. That makes a lot of code simply vanish. THis is also a well trod production path. On Thu, Apr 17, 2014 at 3:57 AM, Najum Ali naju...@googlemail.com wrote: @Sebastian wow … you are right. The original csv file is about 21mb and the corresponding precomputed

Re: Installation on Ubuntu

2014-04-18 Thread Sebastian Schelter
Which version do you use, it shouldn't be a problem with oracle java. --sebastian On 04/18/2014 09:39 PM, Christopher Eugene wrote: Hello, I want to install mahout on Ubuntu 14.04. I had previously tried in vain to install on 13.10. Could the version of Java be the problem? I am compiling

Re: Installation on Ubuntu

2014-04-18 Thread Sebastian Schelter
That is wrong, but you could use a server such as PredictionIO (which uses Mahout internally) with PHP. --sebastian On 04/18/2014 09:49 PM, Christopher Eugene wrote: @sebastian I have version 1.7. @Andrew I plan on using mahout with php since I heard that there is a new API or am I wrong

Re: Installation on Ubuntu

2014-04-18 Thread Sebastian Schelter
You can, but I'm not sure how much we can help you. Give it a try :) On 04/18/2014 10:11 PM, Christopher Eugene wrote: sorry I thought I replied to it :). I can ask predictionio related questions on the list too? On Fri, Apr 18, 2014 at 11:06 PM, Sebastian Schelter s...@apache.org wrote

Re: org.apache.mahout.math.IndexException

2014-04-18 Thread Sebastian Schelter
Hi Mario, this is indeed a bug. The problem is that the CF code (taste) uses long ids, while our math library internally uses int keys. I'll open a jira and post patch that will hopefully help you. --sebastian On 04/18/2014 11:03 PM, Mario Levitin wrote: In my dataset ID's are strings so I

Re: org.apache.mahout.math.IndexException

2014-04-18 Thread Sebastian Schelter
Mario, could you check whether the patch from https://issues.apache.org/jira/browse/MAHOUT-1517 fixes your problem? Best, Sebastian On 04/18/2014 11:03 PM, Mario Levitin wrote: In my dataset ID's are strings so I use MemoryIDMigrator. This migrator produces large longs. I'm not doing any

Re: simple idea for improving mahout docs over the next month?

2014-04-17 Thread Sebastian Schelter
used, how you did the precomputation and how you exactly measure the response time? --sebastian On 04/17/2014 10:49 AM, Najum Ali wrote: Hi guys, I´m pretty much new to mahout and I´m working with this problem here: I have created a precomputed item-item-similarity collection

Re: Performance Issue using item-based approach!

2014-04-17 Thread Sebastian Schelter
Could you take the output of the precomputation, feed it into a standalone recommender and test it there? On 04/17/2014 11:37 AM, Najum Ali wrote: @sebastian Are you sure that the precomputation is done only once and not in every request? Yes, a @Bean annotated Object is in Spring per

Re: Performance Issue using item-based approach!

2014-04-17 Thread Sebastian Schelter
Yes, just to make sure the problem is in the mahout code and not in the surrounding environment. On 04/17/2014 11:43 AM, Najum Ali wrote: @Sebastian What do u mean with a standalone recommender? A simple offline java main program? Am 17.04.2014 um 11:41 schrieb Sebastian Schelter s

Re: Is there any website documentation repository or tool for Apache Mahout?

2014-04-17 Thread Sebastian Schelter
The templates for the individual pages are in the svn under site/ in markdown format. You can use an online markdown editor to approximately see how they look like. We don't have a better solution yet, unfortunately. --sebastian Am 17.04.2014 20:09 schrieb Andrew Musselman andrew.mussel

Re: simple idea for improving mahout docs over the next month?

2014-04-16 Thread Sebastian Schelter
Hi Jay, I'm not sure what the benefit of this approach is, people can already post their questions to the mailinglist and get answers here, why would a google doc be helpful? --sebastian On 04/16/2014 09:31 PM, Jay Vyas wrote: hi mahout... i finally thought of a really easy way of ad-hoc

Documentation, Documentation, Documentation

2014-04-13 Thread Sebastian Schelter
related jira's for 1.0: https://issues.apache.org/jira/browse/MAHOUT-1441?jql=project%20%3D%20MAHOUT%20AND%20component%20%3D%20Documentation%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC Best, Sebastian

Re: PreferenceArray userID uniqeness?

2014-04-11 Thread Sebastian Schelter
Yes, its a unique identifier for a user. --sebastian On 04/11/2014 04:41 PM, Mike Summers wrote: Does the userId of a preferenceArray need to be unique across all entries in a FastByIDMap? I'm comparing two types of objects that contain the same set of traits however it's possible

Re: Best practice for partial cartesian product

2014-04-08 Thread Sebastian Schelter
I don't know a good name for that. The problems is that a quadratic amount of pairs needs to be emitted here. In our collaborative filtering code, we solve this through downsampling. --sebastian On 04/08/2014 10:08 AM, Reinis Vicups wrote: Hi, this is not mahout question directly, but I

Re: Best practice for partial cartesian product

2014-04-08 Thread Sebastian Schelter
Have a look at the sampleDown method in RowSimilarityJob: https://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java?view=markup On 04/08/2014 10:33 AM, Reinis Vicups wrote: Sebastian, thank your very much for your

Re: Can any one help

2014-04-08 Thread Sebastian Schelter
It seems there is a problem with your hdfs, how did you configure that? --sebastian On 04/08/2014 07:23 PM, Neetha wrote: Hi, I am trying to run Mahout -kmeans clustering on hadoop, but I am getting this error, hduser3@ubuntu:/usr/local/hadoop-1.0.1/mahout3$ bin/mahout seqdirectory \-i

Re: Solr+Mahout Recommender Demo Site

2014-04-06 Thread Sebastian Schelter
The top 3 recommendations based on videos you liked are very good! Nice job. On 04/06/2014 07:26 PM, Pat Ferrel wrote: After having integrated several versions of the Mahout and Myrrix recommenders at fairly large scale. I was interested in solving three problems that these did not directly

Re: Number of features for ALS

2014-03-30 Thread Sebastian Schelter
Use k-fold cross-validation or hold-out tests for estimating the quality of different parameter combinations. --sebastian On 03/30/2014 11:53 AM, Niklas Ekvall wrote: Hi, My name is Niklas Ekvall and I have a implementation of the recommender algorithm Large-scale Parallel Collaborative

Re: (help!) Can someone scan this

2014-03-29 Thread Sebastian Schelter
Jay, which version of Mahout are you using? Have you tried to explicitly set the temp path? --sebastian On 03/29/2014 01:52 AM, Jay Vyas wrote: Hi again mahout: Im wrapping a distributed recommender like this: https://raw.githubusercontent.com/jayunit100/bigpetstore/master/src/main/java

Re: The 3 distributed recommenders

2014-03-28 Thread Sebastian Schelter
Hi Jay, there's not much documentation unfortunately. We're in the process of creating that however. We removed the pseudo-distributed recommender, mainly because nobody ever used it. There are two research papers that could help you with understanding the other two distributed recommenders:

Number of features for ALS

2014-03-27 Thread Sebastian Schelter
Hi, does anyone know of a principled approach of choosing the number of features for ALS (other than cross-validation?) --sebastian

Re: Does Recommender System Overview Demo work?

2014-03-24 Thread Sebastian Schelter
Hi Bhargav, you are right, the content on the page is outdated and contains some errors. I've created a jira ticket to fix this [1]. Thank you for reporting the problem! [1] https://issues.apache.org/jira/browse/MAHOUT-1485 On 03/24/2014 04:41 AM, Bhargav Golla wrote: Hi I was wondering

Re: Does Recommender System Overview Demo work?

2014-03-24 Thread Sebastian Schelter
The webapp in Mahout does not offer much functionality. If you'd like to use Mahout via a webinterface, I suggest you either use predictionIO [1] or kornakapi [2]. Best, Sebastian [1} http://prediction.io [2] http://ssc.io/a-recommendation-webservice-in-10-minutes/ On 03/24/2014 02:29 PM

Re: Does Recommender System Overview Demo work?

2014-03-24 Thread Sebastian Schelter
: It was removed in 0.9 and am not sure if it was there in 0.8. I vaguely remember removing it in 0.9 based on a conversation with Manuel on user@. Manuel, if u could chime in here. On Monday, March 24, 2014 9:44 AM, Sebastian Schelter s...@apache.org wrote: The webapp in Mahout does not offer much

Re: Problem with K-Means clustering on Amazon EMR

2014-03-23 Thread Sebastian Schelter
Hi Konstantin, Great to see that you located the error. Could you open a jira issue and submit a patch that contains an updated error message? Thank you, Sebastian On 03/23/2014 02:57 PM, Konstantin Slisenko wrote: Hi! I investigated the situation. RandomSeedGenerator ( http://grepcode.com

Documentation, documentation, documentation

2014-03-22 Thread Sebastian Schelter
we have, so we can move on to new and exciting developments in Mahout! --sebastian

Re: Documentation, documentation, documentation

2014-03-22 Thread Sebastian Schelter
, Sebastian Schelter s...@apache.org wrote: Hi, It's great to see a lot of work being spent on cleaning up the website. I think we have already done a great job here, but there are still a few more pages that need work. I created a jira issue for every single page that needs some work, would

Re: Problem with K-Means clustering on Amazon EMR

2014-03-16 Thread Sebastian Schelter
I've also encountered a similar error once. It's really just the FileSystem.get call that needs to be modified. I think its a good idea to walk through the codebase and refactor this where necessary. --sebastian On 03/16/2014 05:16 PM, Andrew Musselman wrote: Another wild guess, I've had

Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Sebastian Schelter
Sebastian Schelter ssc.o...@googlemail.com: Those are autogenerated. On 03/13/2014 09:05 AM, Kevin Moulart wrote: Ok it does compile with maven in eclipse as well, but still, many imports are not recognized in the sources : - import org.apache.mahout.math.function.IntObjectProcedure; - import

Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Sebastian Schelter
Are executing maven in the topmost directory? On 03/13/2014 10:09 AM, Kevin Moulart wrote: I did, but then it fails because of these missing files : https://gist.github.com/kmoulart/9524828 Kévin Moulart 2014-03-13 9:57 GMT+01:00 Sebastian Schelter s...@apache.org: Maven should generate

Re: verbose output

2014-03-13 Thread Sebastian Schelter
To my knowledge, there is no such flag for mahout. You can check hadoop's logs for further information however. On 03/13/2014 10:21 AM, Mahmood Naderan wrote: Hi, Is there any verbosity flag for hadoop and mahout commands? I can not find such thing in the command line. Regards, Mahmood

Re: Website, urgent help needed

2014-03-13 Thread Sebastian Schelter
Hi Scott, Create a jira ticket and attach your scripts and a text version of the page there. Best, Sebastian On 03/12/2014 03:27 PM, Scott C. Cote wrote: I took the tour of the text analysis and pushed through despite the problems on the page. Commiters helped me over the hump where

Re: Problem with FileSystem in Kmeans

2014-03-12 Thread Sebastian Schelter
Hi Bikash, Have you tried adding hdfs:// to your input path? Maybe that helps. --sebastian On 03/11/2014 11:22 AM, Bikash Gupta wrote: Hi, I am running Kmeans in cluster where I am setting the configuration of fs.hdfs.impl and fs.file.impl before hand as mentioned below conf.set

Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
will walk away frustrated, because the website does not help them as it should. Best, Sebastian PS: I will make my standpoint on whether Mahout should do a 1.0 release depend on whether we manage to clean up and maintain our documentation.

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
We don't exactly have that page, but we have pages that touch parts of it, such as https://mahout.apache.org/users/basics/creating-vectors-from-text.html It would be great if you could create a jira ticket which lists the errors. I'll fix them then. Best, Sebastian On 03/12/2014 08:42 AM

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
along the lines of Cleaning up the documentation for k-Means on the website. Put a list of errors and corrections into the jira and I (or some other committer) will make sure to fix the website. Thanks, Sebastian On 03/12/2014 08:48 AM, Pavan Kumar N wrote: i ll help with clustering algorithms

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
doc), please created jira ticket in our issue tracker with a title along the lines of Cleaning up the documentation for Naive Bayes on the website. Put a list of errors and corrections into the jira and I (or some other committer) will make sure to fix the website. Best, Sebastian On 03/12

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
improving the javadoc, its totally fine if your english is not perfect, we can always ask a native speaker to read over it. If you start working on the javadoc, please create a jira issue for that work before you start. Best, Sebastian On 03/12/2014 09:30 AM, Kevin Moulart wrote: I can confirm

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
that there is no ticket existing for that. If it isnt, create a jira ticket with the name of the page in the title. --sebastian On 03/12/2014 11:20 AM, pramit choudhary wrote: Hi All, I would also like to participate in cleaning up the documentation. Since, I am fairly new to the Mahout infrastructure

Re: Few questions about SVM configuration in Mahout

2014-03-10 Thread Sebastian Schelter
Hi Quentin, Mahout does not have SVMs. Best, Sebastian On 03/10/2014 10:38 AM, Quentin-Gabriel Thurier wrote: Hi all, Just few questions about the configuration of an SVM in Mahout : - Is it possible to do a multi-class classification ? - Which kernels are already available (linear

Re: [blog post] Comparing Document Classification Functions of Lucene and Mahout

2014-03-09 Thread Sebastian Schelter
Hi Koji, I've added a link to your article to our website: https://mahout.apache.org/general/books-tutorials-and-talks.html On 03/07/2014 03:29 AM, Koji Sekiguchi wrote: Hello, I just posted an article on Comparing Document Classification Functions of Lucene and Mahout.

Re: Heap space

2014-03-09 Thread Sebastian Schelter
I usually do try and error. Start with some very large value and do a binary search :) --sebastian On 03/09/2014 01:30 PM, Mahmood Naderan wrote: Excuse me, I added the -Xmx option and restarted the hadoop services using sbin/stop-all.sh sbin/start-all.sh however still I get heap size error

Re: Welcome Andrew Musselman as new comitter

2014-03-08 Thread Sebastian Schelter
. --sebastian On 03/07/2014 06:41 PM, Pavan Kumar N wrote: Congratulations to Andrew. Would be nice to have some information/background on how PMC evaluated Andrew to become committer. Also would be nice what future aspects/algorithms of machine learning is mahout is going to focus on. I have been

Welcome Andrew Musselman as new comitter

2014-03-07 Thread Sebastian Schelter
words :) Sebastian

Re: Rework our website

2014-03-06 Thread Sebastian Schelter
Thank you very much! Could you create a jira ticket and post the links there? That would be awesome, then we can track that this stuff gets fixed. Best, Sebastian On 03/06/2014 02:58 PM, Kevin Moulart wrote: Hi I also prefer the second one. While I'm at it, there are several links that point

Re: Rework our website

2014-03-06 Thread Sebastian Schelter
? On Thursday, March 6, 2014 9:07 AM, Sebastian Schelter s...@apache.org wrote: Thank you very much! Could you create a jira ticket and post the links there? That would be awesome, then we can track that this stuff gets fixed. Best, Sebastian On 03/06/2014 02:58 PM, Kevin Moulart wrote: Hi I

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
Hi Juan, that is a good catch. CandidateItemsStrategy is the right place to implement this. Maybe we should simply extend its interface to add a parameter that says whether to keep or remove the current users items? We could even do this in the abstract base class then. --sebastian On 03

Rework our website

2014-03-05 Thread Sebastian Schelter
://people.apache.org/~ssc/mahout/mahout2.jpg Let me know what you think! Best, Sebastian

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
On 03/05/2014 01:23 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
item-based recommenders will not recommend any item that is not similar to at least one of the items the user interacted with, so AllSimilarItemsStrategy already selects the maximum set of items that could be potentially recommended to the user. --sebastian On 03/05/2014 05:38 PM, Tevfik

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
there. --sebastian On 03/05/2014 06:01 PM, Tevfik Aytekin wrote: It can even make things worse in SVD-based algorithms for which preference estimation is very fast. On Wed, Mar 5, 2014 at 7:00 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Sebastian, But in order not to select items

Re: Rework our website

2014-03-05 Thread Sebastian Schelter
At the moment, only committers can change the website unfortunately. If you have a text to add, I'm happy to work it in and add your name to our contributers list in the CHANGELOG. Best, Sebastian On 03/05/2014 04:58 PM, Scott C. Cote wrote: I had recently taken the text tour of mahout

Re: Mahout-232-0.8.patch using

2014-03-04 Thread Sebastian Schelter
I think you should rather choose a different library that already offers an SVM than trying to revive a 4 year old patch. --sebastian On 03/04/2014 08:51 AM, Amol Kakade wrote: Hi, I am new user of Mahout and want to run sample SVM algorithm with Mahout. Can you please list me steps to use

Re: how to recommend users already consumed items

2014-03-04 Thread Sebastian Schelter
I think we should introduce a new parameter for the recommend() method in the Recommender interface that tells whether already known items should be recommended or not. What do you think? Best, Sebastian On 03/04/2014 05:32 PM, Pat Ferrel wrote: I’d suggest a command line option if you want

Re: how to recommend users already consumed items

2014-03-04 Thread Sebastian Schelter
That's fine, I was talking about the non-distributed part only. This page has instructions on how to create patches: https://mahout.apache.org/developers/how-to-contribute.html Let me know if you need more infos! Best, Sebastian On 03/05/2014 12:27 AM, Mario Levitin wrote: I have created

Re: Issue updating a FileDataModel

2014-03-03 Thread Sebastian Schelter
your data to a database. --sebastian On 03/02/2014 11:11 PM, Juan José Ramos wrote: I am having issues refreshing my recommender, in particular with the DataModel. I am using a FileDataModel and a GenericItemBasedRecommender that also has a CachingItemSimilarity wrapping a FileItemSimilarity

Re: classification in standalone application in Apache Mahout 0.9

2014-03-03 Thread Sebastian Schelter
If you don't want to call a shell, I assume you don't want to use a Hadoop cluster, right? In that case, you should rather try Mahout's logistic regression classifier, which is tuned for usage on a single machine. --sebastian On 03/03/2014 03:07 PM, Hollow Quincy wrote: I am looking

Re: classification in standalone application in Apache Mahout 0.9

2014-03-03 Thread Sebastian Schelter
Its certainly possible to run Hadoop on a single machine, but it will give you terrible performance. We don't have a single machine implementation of naive bayes, so I'd really suggest you use the logistic regression code. --sebastian On 03/03/2014 03:15 PM, Hollow Quincy wrote: You

Re: Issue updating a FileDataModel

2014-03-03 Thread Sebastian Schelter
I think it depends on the difference between the time of the call to refresh() and the last modified time of the file. --sebastian On 03/03/2014 04:45 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I do not have concurrent updates, but they actually may happen very, very close

Re: Mahout-232-0.8.patch using

2014-03-03 Thread Sebastian Schelter
Hi Amol, SVMs are not integrated in Mahout. I'd suggest you try our logistic regression classifier instead. Best, Sebastian On 03/04/2014 08:51 AM, Amol Kakade wrote: Hi, I am new user of Mahout and want to run sample SVM algorithm with Mahout. Can you please list me steps to use Mahout-232

Re: parallelALS and RMSE TEST

2014-03-01 Thread Sebastian Schelter
. --sebastian On 02/27/2014 06:30 PM, AJ Rader wrote: Sean Owen srowen at gmail.com writes: Parallel ALS is exactly an example of where you can use matrix factorization for 0/1 data. On Mon, May 6, 2013 at 9:22 PM, Tevfik Aytekin tevfik.aytekin at gmail.com wrote: Hi Sean, Isn't boolean

Re: Load output of rowsimilarity to memory

2014-02-25 Thread Sebastian Schelter
Hi Juan, It would definitely be nice to have that in the API! It would be great if you could submit a patch after you implemented this. Best, Sebastian On 02/25/2014 10:52 AM, Juan José Ramos wrote: Thanks for the answer. That was the approach I had in mind in the first place the only

Re: Load output of rowsimilarity to memory

2014-02-25 Thread Sebastian Schelter
If you iterate over the vector, you will get Vector.Element objects. elem.index() gives you the id of the similar thing, elem.get() gives you the similarity value. --sebastian On 02/25/2014 11:58 AM, Juan José Ramos wrote: Regarding the parsing of a VectorWriteble object, what

Re: Use Naïve Bayes on a large CSV

2014-02-24 Thread Sebastian Schelter
NaiveBayes expects a SequenceFile as input. The key is the class label as Text, the value are the features as VectorWritable. --sebastian On 02/24/2014 11:51 AM, Kevin Moulart wrote: Hi again, I finally set my mind on going through java to make a sequence file for the naive bayes, but I still

Re: Load output of rowsimilarity to memory

2014-02-24 Thread Sebastian Schelter
The output of RowSimilarityJob can be loaded by the FileItemSimilarity. --sebastian On 02/24/2014 08:31 PM, Juan José Ramos wrote: Is there a way to reproduce this process: https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line inside Java

Re: Load output of rowsimilarity to memory

2014-02-24 Thread Sebastian Schelter
You're right, my bad. If you don't use RowSimilarityJob directly, but org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob (which calls RowSimilarityJob under the covers), your output will be a textfile that is directly usable with FileItemSimilarity. --sebastian On 02/24/2014

Re: Load output of rowsimilarity to memory

2014-02-24 Thread Sebastian Schelter
a GenericItemSimilarity from the list of ItemItemSimilarities, which is the in-memory item similarity you asked for. Hope that helps, Sebastian On 02/24/2014 10:04 PM, Juan José Ramos wrote: Correct me if I'm wrong, but is it not the ItemSimilarityJob mean to be for item-based CF? In particular

Re: Mahout on Spark?

2014-02-19 Thread Sebastian Schelter
. And be very careful with concepts. Something that i so far don't see happening with MLib. MLib seems to be old-style Mahout-like rush to become a collection of basic algorithms rather than coherent foundation. Admittedly, i havent looked very closely. On Tue, Feb 18, 2014 at 11:41 PM, Sebastian

Re: Mahout on Spark?

2014-02-18 Thread Sebastian Schelter
participating in the discussions. What are the ideas how a fruitful cooperation look like? Best, Sebastian PS: I ported LLR-based cooccurrence analysis (aka item-based recommendation) to Spark some time ago, but I haven't had time to test my code on a large dataset yet. I'd be happy to see

  1   2   3   4   5   6   >