Re: Reading the wiki

2014-04-27 Thread Ted Dunning
Mathjax is both static content and server. There is an FAQ about this https problem. I think that part of the issue is that they don't use the same URL for both http and https connections. http://www.mathjax.org/resources/faqs/#problem-https The URL that they suggest to use for getting

Re: Understanding LogLikelihood Similarity

2014-04-27 Thread Ted Dunning
Excellent. Look forward to hearing your reactions. On Mon, Apr 28, 2014 at 1:14 AM, Mario Levitin mariolevi...@gmail.comwrote: Not yet, but I will. Have you read my original paper on the topic of LLR? It explains the connection with chi^2 measures of association.

Re: Question Regarding Entropy calculation in Mahout

2014-04-22 Thread Ted Dunning
On Tue, Apr 22, 2014 at 12:11 AM, Darshan Sonagara darshan.sonag...@gmail.com wrote: But the problem is that i want check that whether my clustering is good or bad. so for that i need to calculate Entropy Value. I am not having any idea how to calculate entropy in mahout or by other

Re: Setting up a recommender

2014-04-21 Thread Ted Dunning
probably at least an alternative to using docs and CSVs to import the data from Mahout. On Aug 12, 2013, at 2:32 PM, Ted Dunning ted.dunn...@gmail.com wrote: Yes. That would be interesting. On Mon, Aug 12, 2013 at 1:25 PM, Gokhan Capan gkhn...@gmail.com wrote: A little

Re: org.apache.mahout.math.IndexException

2014-04-18 Thread Ted Dunning
Are you translating the ID's down into a range that will fit into int's? On Thu, Apr 17, 2014 at 3:02 PM, Mario Levitin mariolevi...@gmail.comwrote: Hi, I'm trying to run the ALS algorithm. However, I get the following error: Exception in thread pool-1-thread-3

Re: Performance Issue using item-based approach!

2014-04-18 Thread Ted Dunning
Ted Dunning ted.dunn...@gmail.com: Najum, You should also be able to use the ItemSimilarityJob to compute a limited indicator set. This is stepping off of the path you have been on, but it would allow you to deploy the recommender via a search engine. That makes a lot of code

Re: Performance Issue using item-based approach!

2014-04-18 Thread Ted Dunning
Shouldn't, yes. But for a toy dataset, it might work out. On Fri, Apr 18, 2014 at 10:25 AM, Sebastian Schelter ssc.o...@googlemail.com wrote: You can, but you shouldn't :) On 04/18/2014 07:23 PM, Ted Dunning wrote: You can always run Hadoop in a local mode. Nothing prevents a single

Re: Number of features for ALS

2014-04-08 Thread Ted Dunning
On Tue, Apr 8, 2014 at 9:40 AM, Niklas Ekvall niklas.ekv...@gmail.comwrote: Do plan to do any talks in Sweden soon? Is last week soon enough? :-(

Re: Number of features for ALS

2014-04-07 Thread Ted Dunning
, 2014 at 8:40 PM, Niklas Ekvall niklas.ekv...@gmail.comwrote: Thanks Pat! I did find a book by Ted Dunning and Ellen Friedman (Practical Machine Learning: Innovations in Recommendations) I guess I can us it to read more about co-occurrence recommender or co-occurrence analysis. Best, Niklas

Re: Solr+Mahout Recommender Demo Site

2014-04-06 Thread Ted Dunning
This can actually be simplified a bit by using ItemSimilarityJob to call RowSimilarityJob. Nice work overall. On Sun, Apr 6, 2014 at 10:21 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Pat, do you still want help putting this into a new mahout/examples, or work out how to do the

Re: Solr+Mahout Recommender Demo Site

2014-04-06 Thread Ted Dunning
On Mon, Apr 7, 2014 at 5:18 AM, Pat Ferrel p...@occamsmachete.com wrote: Combining this kind of metadata with CF data has been important to the big guys but elusive to the rest of us. And a recommender that seamlessly integrates the different methods is rare. Solr + Mahout does it better than

Re: Solr+Mahout Recommender Demo Site

2014-04-06 Thread Ted Dunning
On Mon, Apr 7, 2014 at 2:04 AM, Pat Ferrel p...@occamsmachete.com wrote: As I said below RSJ is actually all that is needed. But with the entire recommender also integrated we can compare the two in the demo framework. For instance one of the lines of recs on a video detail page (the top one)

Re: Solr+Mahout Recommender Demo Site

2014-04-06 Thread Ted Dunning
It looks like it works well. And it is gorgeous as well. Nice work. Very nice. On Sun, Apr 6, 2014 at 8:59 PM, SriSatish Ambati srisat...@0xdata.comwrote: It's quite good. Sri On Sun, Apr 6, 2014 at 10:26 AM, Pat Ferrel p...@occamsmachete.com wrote: After having integrated several

Re: Number of features for ALS

2014-03-30 Thread Ted Dunning
. Then these user+session type models. We can then combine these at another level to give recommendations based on what you like throughout time versus what you have been doing recently. -b On Thu, Mar 27, 2014 at 1:59 PM, Ted Dunning ted.dunn...@gmail.com wrote: For the poly

Re: Number of features for ALS

2014-03-30 Thread Ted Dunning
Yeah... what Pat said. Off-line evaluations are difficult. At most, they provide directional guidance to be refined using live A/B testing. Of course, A/B testing of recommenders comes with a new set of tricky issues like different recommenders learning from each other. On Sun, Mar 30, 2014 at

Re: (help!) Can someone scan this

2014-03-28 Thread Ted Dunning
Have you run the component jobs by hand successfully? On Fri, Mar 28, 2014 at 5:52 PM, Jay Vyas jayunit...@gmail.com wrote: Hi again mahout: Im wrapping a distributed recommender like this:

Re: Number of features for ALS

2014-03-27 Thread Ted Dunning
How can there be any other practical method? Essentially all of the mathematical assumptions under-pinning ALS are violated by the real world. Why would any mathematical consideration of the number of features be much more than heuristic? That said, you can make an information content argument.

Re: Number of features for ALS

2014-03-27 Thread Ted Dunning
iPhone On Mar 27, 2014, at 7:18, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Interesting topic, Ted, can you give examples of those mathematical assumptions under-pinning ALS which are violated by the real world? On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning ted.dunn...@gmail.com wrote: How

Re: Number of features for ALS

2014-03-27 Thread Ted Dunning
mathematical assumptions under-pinning ALS which are violated by the real world? On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning ted.dunn...@gmail.com wrote: How can there be any other practical method? Essentially all of the mathematical assumptions under-pinning ALS are violated

Re: market basket analysis of low sales volume products

2014-03-24 Thread Ted Dunning
On Mon, Mar 24, 2014 at 4:46 PM, Si Chen sic...@opensourcestrategies.comwrote: Thanks everybody for your feedback! I thought more about it, and basically our issue is that we have a lot of SKU's per brand, so there's not a lot of repeat sales of the same SKU's to make SKU to SKU market basket

Re: Introducing PredictionIO: A developer-friendly Mahout stack for production

2014-03-21 Thread Ted Dunning
On Fri, Mar 21, 2014 at 7:36 AM, Pat Ferrel p...@occamsmachete.com wrote: Read the AGPL carefully before deciding. It is widely avoided by OSS projects. It’s interpreted to infect your derived works with obligations you may not want to live with. There is probably a question about whether

Re: Using SSVD for dimensionality reduction on Mahout

2014-03-21 Thread Ted Dunning
Vijay, SSVD is not really appropriate with 12 columns. You aren't going to see any savings at all. It would be much better if you were to look at extraction of the 7 most interesting columns out of 1000. The problem is not that SSVD will fail, but rather that you will have to include all the

Re: Text clustering with hashing vector encoders

2014-03-20 Thread Ted Dunning
On Thu, Mar 20, 2014 at 12:39 PM, Johannes Schulte johannes.schu...@gmail.com wrote: For representing the cluster we have a separate job that assigns users (documents) to clusters and shows the most discriminating words for the cluster via the LogLikelihood class. The results are then

Re: market basket analysis of low sales volume products

2014-03-20 Thread Ted Dunning
I have done the equivalent thing with music (moving up from track to album to artist) with very good results. On Thu, Mar 20, 2014 at 5:58 PM, Martin, Nick nimar...@pssd.com wrote: I can tell you my experience is that it's absolutely informative to take a look at running the recommendation

Re: Text clustering with hashing vector encoders

2014-03-19 Thread Ted Dunning
...@yahoo.com wrote: +1 to this. We could then use Hamming Distance to compute the distances between Hashed Vectors. We have the code for HashedVector.java based on Moses Charikar's SimHash paper. On Tuesday, March 18, 2014 7:14 PM, Ted Dunning ted.dunn...@gmail.com wrote: Yes

Re: Text clustering with hashing vector encoders

2014-03-19 Thread Ted Dunning
On Wed, Mar 19, 2014 at 11:34 AM, Frank Scholten fr...@frankscholten.nlwrote: On Wed, Mar 19, 2014 at 12:13 AM, Ted Dunning ted.dunn...@gmail.com wrote: Yes. Hashing vector encoders will preserve distances when used with multiple probes. So if a token occurs two times in a document

Re: Introducing PredictionIO: A developer-friendly Mahout stack for production

2014-03-19 Thread Ted Dunning
AGPL is a complete show-stopper for contributions even for dependencies. Apache software can't critically depend on GPL components of any sort. As such, it doesn't make any sense to have components of Mahout designed to run only on a server that is AGPL. On Wed, Mar 19, 2014 at 11:53 AM,

Re: Text clustering with hashing vector encoders

2014-03-18 Thread Ted Dunning
Yes. Hashing vector encoders will preserve distances when used with multiple probes. Interpretation becomes somewhat difficult, but there is code available to reverse engineer labels on hashed vectors. IDF weighting is slightly tricky, but quite doable if you keep a dictionary of, say, the most

Re: Mahout parallel K-Means - algorithms analysis‏

2014-03-15 Thread Ted Dunning
We would love to help. Can you say which program and which classes you are looking at? On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon hiroshi_8...@hotmail.comwrote: To whom it may correspond, Hello, I have been checking the algorithm of Mahout 0.9 version k-means using MapReduce and I

Re: bug report

2014-03-13 Thread Ted Dunning
You have to be logged in to JIRA to do this. To log in, you may need to create an account. On Thu, Mar 13, 2014 at 11:33 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: https://issues.apache.org/jira/browse/MAHOUT/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel

Re: ClusteringUtils for Kmeans output

2014-03-09 Thread Ted Dunning
Can you file a JIRA and attach your patch? On Sun, Mar 9, 2014 at 8:03 AM, Bikash Gupta bikash.gupt...@gmail.comwrote: Info for everyone I have successfully forced Mahout to build with Guava 11.0.2. Error and fixes as mentioned below 1. Class: org.apache.mahout.math.stats.GroupTree -

Re: Mahout with Storm/Spark

2014-03-06 Thread Ted Dunning
WHich version are you using? On Thu, Mar 6, 2014 at 5:47 AM, vineet yadav vineet.yadav.i...@gmail.comwrote: Hi, I am using Mahout LDA algorithm for Topic Modeling on a huge no of documents(500k or more). Mahout is taking a lot of time, I am looking at other alternatives. I found the link(

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Ted Dunning
On Thu, Mar 6, 2014 at 7:46 AM, Kevin Moulart kevinmoul...@gmail.comwrote: [ERROR] /home/myCompny/Downloads/mahout9/math/src/main/java/org/apache/mahout/math/stats/GroupTree.java:[171,31] cannot find symbol Replace that line with: stack = new ArrayDequeGroupTree();

Re: Rework our website

2014-03-05 Thread Ted Dunning
Both are nice. I think you are right that the second is calmer. On Wed, Mar 5, 2014 at 4:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm.

Re: how to recommend users already consumed items

2014-03-02 Thread Ted Dunning
On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com wrote: You are not the only one to see this so I’d recommend creating an option for the Job, which will be checked before executing that line of code then submit it as a patch to the Jira you need to create in any case. That

Re: sparsification of a Mahout vector

2014-03-02 Thread Ted Dunning
Chirag, There isn't a fully baked answer to your needs, but there are components that can help you. For instance, the OnlineSummarizer can help you find a particular quantile. Iterating over the vector to fill that is easy enough: For example: Vector v; // original data

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2014-03-02 Thread Ted Dunning
] = (initial_rate/Math.sqrt(sum_of_squares[i][j])) *beta[i][j]*;* } } *beta *in the base class is rightly ( num_categories -1 * num_of features ) matrix. On Fri, Feb 28, 2014 at 11:57 PM, Ted Dunning ted.dunn...@gmail.com wrote: I have been swamped

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2014-02-28 Thread Ted Dunning
. Please ignore. On Sun, Dec 29, 2013 at 8:45 PM, Ted Dunning ted.dunn...@gmail.comwrote: :-) Many leaks are *very* subtle. One leak that had me going for weeks was in a news wire corpus. I couldn't figure out why the cross validation was so good and running the classifier on new

Re: Use Naïve Bayes on a large CSV

2014-02-24 Thread Ted Dunning
Kevin, While this is fresh in your mind can you prepare a javadoc patch that would have helped you out? And suggest other doc patches as well? On Mon, Feb 24, 2014 at 3:00 AM, Kevin Moulart kevinmoul...@gmail.comwrote: Thanks, that's about the clearest answer I got so far :) 2014-02-24

Re: Adapters for mahout inputs .... anyone working on this?

2014-02-22 Thread Ted Dunning
will take a look. On Fri, Feb 21, 2014 at 7:51 PM, Ted Dunning ted.dunn...@gmail.com wrote: Great idea. Hard to do well. Would it be possible for you to try to build a picture of all the pieces that need to be connected before you start building connectors and converters

Re: Adapters for mahout inputs .... anyone working on this?

2014-02-21 Thread Ted Dunning
Great idea. Hard to do well. Would it be possible for you to try to build a picture of all the pieces that need to be connected before you start building connectors and converters? On Fri, Feb 21, 2014 at 8:01 AM, Jay Vyas jayunit...@gmail.com wrote: Hi mahout. Was thinking about building

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Ted Dunning
of that vector from the cluster center. Correct me if I am wrong :) On Tue, Feb 18, 2014 at 10:53 AM, Ted Dunning ted.dunn...@gmail.com wrote: Bikash, Peter is just right. Yes, you can cluster on these few variables that you have. Probably you should translate location to x,y,z

Re: Mahout on Spark?

2014-02-18 Thread Ted Dunning
On Tue, Feb 18, 2014 at 1:58 PM, Nick Pentreath nick.pentre...@gmail.comwrote: My (admittedly heavily biased) view is Spark is a superior platform overall for ML. If the two communities can work together to leverage the strengths of Spark, and the large amount of good stuff in Mahout (as well

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Ted Dunning
Think about the question in terms of whether this will define a reasonable kind of distance between items or users. Can you first define what you want to do? Are you clustering users? Are you clustering items? If users, how could the data you provide give any kind of idea about which users are

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Ted Dunning
On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta bikash.gupt...@gmail.comwrote: Let say I am clustering users, I am providing their profile data to discover similarity between two user. So my input would be [UserId, Location, Age, Gender, Time Created ] Now if my UserId length is of minimum 10

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Ted Dunning
). On Tue, Feb 18, 2014 at 1:44 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta bikash.gupt...@gmail.com wrote: Let say I am clustering users, I am providing their profile data to discover similarity between two user. So my input would

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Ted Dunning
uniquely I need to provide their CustomerId Is my assumption correct? If yes then, will customerId affect the clustering output If no then how can I identify customer uniquely On Tue, Feb 18, 2014 at 2:55 AM, Ted Dunning ted.dunn...@gmail.com wrote: That really depends on what

Re: Dithering and Thompson Sampling

2014-02-15 Thread Ted Dunning
in conjunction with a feedback loop, which make the examples seem more like Multi-armed Bandit examples. Are you suggesting some feedback in recommender ranking or just using the same distribution assumptions used in TS? On Feb 8, 2014, at 12:13 PM, Ted Dunning ted.dunn...@gmail.com wrote: Thompson

Re: streaming kmeans vs incremental canopy/solr/kmeans

2014-02-14 Thread Ted Dunning
Scott, How much data do you have? How much do you plan to have? On Fri, Feb 14, 2014 at 8:04 AM, Scott C. Cote scottcc...@gmail.com wrote: Hello All, I have two questions (Q1, Q2). Q1: Am digging in to Text Analysis and am wrestling with competing analyzed data maintenance strategies.

Re: streaming kmeans vs incremental canopy/solr/kmeans

2014-02-14 Thread Ted Dunning
) where a doc is usually no longer than 20 or 30 words. SCott On 2/14/14 12:46 PM, Ted Dunning ted.dunn...@gmail.com wrote: Scott, How much data do you have? How much do you plan to have? On Fri, Feb 14, 2014 at 8:04 AM, Scott C. Cote scottcc...@gmail.com wrote: Hello All, I

Re: Popularity of recommender items

2014-02-14 Thread Ted Dunning
. On Feb 8, 2014, at 7:19 PM, Ted Dunning ted.dunn...@gmail.com wrote: I have different opinions about each piece. I think that cross recommendation is as core as RowSimilarityJob and should be a parallel implementation or integrated. Parallel is probably easier. It is even plausible

Re: Integrate two Navie Bayes models

2014-02-14 Thread Ted Dunning
. Do we have any option to put both models at a time? My expectation is classify the url even if it falls on model one or two. Thanks in advance. Reg, Venkat On Thu, Feb 13, 2014 at 1:16 PM, Ted Dunning ted.dunn...@gmail.com wrote: What do you mean by combine? On Wed, Feb 12

Re: Integrate two Navie Bayes models

2014-02-12 Thread Ted Dunning
What do you mean by combine? On Wed, Feb 12, 2014 at 9:38 PM, venkata ramana venkat.ecosyst...@gmail.com wrote: Hi All, I have developed two mahout naive bayes models. Two models are built based on the similar data. Let suppose I have 10 categories data. I split the data where row count

Re: Popularity of recommender items

2014-02-08 Thread Ted Dunning
and the art of recommendations moves on. If we add temporal data to preference data a bunch of new features come to mind, like hot lists or asymmetric train/query preference history. On Feb 6, 2014, at 9:43 PM, Ted Dunning ted.dunn...@gmail.com wrote: One way to deal with that is to build a model

Re: Popularity of recommender items

2014-02-08 Thread Ted Dunning
processing or core. On Feb 8, 2014, at 12:13 PM, Ted Dunning ted.dunn...@gmail.com wrote: … The reason that we aren't adding this like cross-rec and other things is that we have full-time jobs, mostly. Suneel is full-time on Mahout, but the rest are not. You seem more active than most.

Re: Extracting the topics of documents (LDA, Mahout 0.7)

2014-02-06 Thread Ted Dunning
I can't comment on the specific question that you ask, but it should not necessarily be expected that LDA will reconstruct the categories that you have in mind. It will develop categories that explain the data as well as it can, but that won't necessarily match the categories you intend. It is

Re: Popularity of recommender items

2014-02-06 Thread Ted Dunning
If you look at the indicator matrix (cooccurrence reduced by LLR), you will usually have asymmetry due to limitations on the number of indicators per row. This will give you some interesting results when you look at the column sums. I wouldn't call it popularity, but it is an interesting

Re: Extracting the topics of documents (LDA, Mahout 0.7)

2014-02-06 Thread Ted Dunning
(which are meaningful by the way) but the fact that a certain document is not assigned to the proper (LDA generated) category. The document to topics assignment is really bad... On Thu, Feb 6, 2014 at 5:08 PM, Ted Dunning ted.dunn...@gmail.com wrote: I can't comment on the specific

Re: Popularity of recommender items

2014-02-06 Thread Ted Dunning
One way to deal with that is to build a model that predicts the ultimate number of views/plays/purchases for the item based on history so far. If this model can be made Bayesian enough to sample from the posterior distribution of total popularity, then you can use the Thomson sampling trick

Re: Naive Bayes Classifier for Marketing data

2014-02-04 Thread Ted Dunning
Mandeep, I just worked through a similar example using the same data set but using the logistic regression learner. In order to use Naive bayes, you would need to convert the continuous variables to categorical variables by binning. On Mon, Feb 3, 2014 at 11:03 PM, mandeep singh

Re: SGD classifier demo app

2014-02-04 Thread Ted Dunning
Yes. On Tue, Feb 4, 2014 at 1:31 AM, Sebastian Schelter s...@apache.org wrote: Would be great to add this as an example to Mahout's codebase. On 02/04/2014 10:27 AM, Ted Dunning wrote: Frank, I just munched on your code and sent a pull request. In doing this, I made a bunch of changes

Re: Naive Bayes Classifier for Marketing data

2014-02-04 Thread Ted Dunning
, Feb 4, 2014 at 2:59 PM, Ted Dunning ted.dunn...@gmail.com wrote: Mandeep, I just worked through a similar example using the same data set but using the logistic regression learner. In order to use Naive bayes, you would need to convert the continuous variables to categorical variables

Re: Naive Bayes Classifier for Marketing data

2014-02-04 Thread Ted Dunning
, 2014 at 5:51 AM, unmesha sreeveni unmeshab...@gmail.comwrote: Sorry. But in defenitive guide i saw binning tags. I dnt know about whether numerical binning is possible in mapreduce. On Tue, Feb 4, 2014 at 7:09 PM, Ted Dunning ted.dunn...@gmail.com wrote: Not to be rude, but how would you

Re: Annotation based vectorizer

2014-02-03 Thread Ted Dunning
Looks nice. Where is the dictionary injected? Would type inferencing of the sort used in Guava Lists.newArrayList() help the verbosity? What is the type reference used for? What if the POJO has a Vector in it? Is there way to deal with that? How can I vectorize a second (test) data set

Re: Mahout 0.9 Release

2014-02-02 Thread Ted Dunning
The confusion is that the site uses the Apache CMS system. See here: http://www.apache.org/dev/cmsref.html On Sun, Feb 2, 2014 at 1:42 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Someone's got to update the web site to the latest release, I don't see a login or edit link to make the

Re: Mahout 0.9 Release

2014-02-02 Thread Ted Dunning
I just checked and the release has propagated to French mirrors. On Sun, Feb 2, 2014 at 1:22 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Mahout 0.9 has been pushed to the mirrors and is available for download at http://www.apache.org/dyn/closer.cgi/mahout/ On Friday, January 31, 2014

Re: generic latent variable recommender question

2014-01-26 Thread Ted Dunning
On Sun, Jan 26, 2014 at 9:36 AM, Pat Ferrel p...@occamsmachete.com wrote: I think I’ll leave dithering out until it goes live because it would seem to make the eyeball test easier. I doubt all these experiments will survive. With anti-flood if you turn the epsilon parameter to 1 (makes

Re: generic latent variable recommender question

2014-01-25 Thread Ted Dunning
Dithering is commonly done by re-ranking results using a noisy score. Take r to be the original rank (starting with 1). Then compute a score as s = log r + N(0,log \epsilon) and sort by this new score in ascending order. Items will be shuffled by this method in such a way that the

Re: generic latent variable recommender question

2014-01-25 Thread Ted Dunning
On Sat, Jan 25, 2014 at 4:33 PM, Pat Ferrel p...@occamsmachete.com wrote: BTW can you explain your notation? s = log r + N(0,log \epsilon) N?, \epsilon? r is rank N is normal distribution \epsilon is an arbitrary constant that drives the amount of mixing. Typical values are =4.

Re: The maintainer of FPG algorithm

2014-01-24 Thread Ted Dunning
. See https://issues.apache.org/jira/browse/MAHOUT-1355. On Wednesday, January 22, 2014 10:15 PM, Ted Dunning ted.dunn...@gmail.com wrote: There is no assignment of these things. Anybody can contribute. If you contribute regularly, then the component will survive

Re: Clustering in Mahout 0.9 candidate

2014-01-24 Thread Ted Dunning
Dang. This community stuff is awesome. Kudos to all you guys for jumping on this. My only nit is whether this should move to the dev list. On Fri, Jan 24, 2014 at 2:30 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Thanks guys, I will look at it this weekend too. On Fri, Jan

Re: generic latent variable recommender question

2014-01-24 Thread Ted Dunning
On Fri, Jan 24, 2014 at 7:08 PM, Koobas koo...@gmail.com wrote: I eliminate the ones that the user already has, and find the largest value among the others, right? yeah... Unless you are selling razor blades in which case, you don't eliminate repeats. Also, you may want to pass the results

Re: The maintainer of FPG algorithm

2014-01-22 Thread Ted Dunning
There is no assignment of these things. Anybody can contribute. If you contribute regularly, then the component will survive. The first things to do to help the PFGP component survive are to 1) do a quick scan of the history of the component both in JIRA and in the mailing list archives 2) do

Re: About Parallel Frequent Growth algorithm

2014-01-20 Thread Ted Dunning
On Mon, Jan 20, 2014 at 5:44 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: I was asked this question too and I had no clear answer. May be it wasn't right to remove FP from the codebase. The major problem was that we had no maintainers for the code.

Re: MAHOUT 0.9 Release - New URL

2014-01-17 Thread Ted Dunning
On Thu, Jan 16, 2014 at 7:35 AM, Sotiris Salloumis i...@eprice.gr wrote: c) Run through the unit tests: mvn clean test [ Passed: 370 milliseconds] ?! Was that seconds? Or really milliseconds?

Re: Classification of books

2014-01-16 Thread Ted Dunning
You generally want to do linguistic pre-processing (finding phrases, synonymizing certain forms such as abbreviations, tokenizing, dropping stop words, removing boilerplate, removing tables) before doing vectorization. Altogether, these form pre-processing. To classify books, you need to

Re: Logistic Regression cost function

2014-01-14 Thread Ted Dunning
) * instance.get(j) * gradientBase; Cheers, Frank On Mon, Jan 13, 2014 at 10:54 PM, Frank Scholten fr...@frankscholten.nl wrote: Thanks guys, I have some reading to do :-) On Mon, Jan 13, 2014 at 10:45 PM, Ted Dunning ted.dunn...@gmail.com wrote: The reference is to the web site

Re: Integrating browser with apache mahout !

2014-01-14 Thread Ted Dunning
. Thanks in advance. Regards On Mon, Jan 13, 2014 at 1:17 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Sat, Jan 11, 2014 at 11:44 PM, Abhishek Kumar abhishek.kumar.cs...@iitbhu.ac.in wrote: For this I need to somehow integrate apache mahout to a browser. I also need

Re: travelling salesman on Mahout

2014-01-13 Thread Ted Dunning
On Mon, Jan 13, 2014 at 8:42 AM, Pavan K Narayanan pavan.naraya...@gmail.com wrote: Please may I ask why TSP has been removed from Mahout. It was the Genetic Algorithms that were removed. The implementation was unmaintained and not scalable and thus not appropriate for Mahout. Its just

Re: Logistic Regression cost function

2014-01-13 Thread Ted Dunning
I think that this is the link in the code: http://leon.bottou.org/research/stochastic On Mon, Jan 13, 2014 at 11:58 AM, Frank Scholten fr...@frankscholten.nlwrote: Do you know which paper it is? He has quite a few publications. I don't see any mention of one of his papers in the code. I

Re: Logistic Regression cost function

2014-01-13 Thread Ted Dunning
The reference is to the web site in general. If anything, this blog is closest: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.177.3514rep=rep1type=pdf On Mon, Jan 13, 2014 at 1:14 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: I think this is the one. Yes, I don't see this paper

Re: Integrating browser with apache mahout !

2014-01-12 Thread Ted Dunning
On Sat, Jan 11, 2014 at 11:44 PM, Abhishek Kumar abhishek.kumar.cs...@iitbhu.ac.in wrote: For this I need to somehow integrate apache mahout to a browser. I also need to train my model on some server or database and then pipeline it to the client. Please help if you have any suggestions.

Re: travelling salesman on Mahout

2014-01-12 Thread Ted Dunning
:46, Ted Dunning ted.dunn...@gmail.com wrote: TSP is generally solved using a number of heuristics guiding a randomized search. Mahout has essentially no provision for helping with this. If you want a quick and dirty solution, I would recommend something like an evolutionary algorithm

Re: travelling salesman on Mahout

2014-01-12 Thread Ted Dunning
arc incidence matrix has been split for purpose of mapreduce during run time. (I am not concerned about the obtaining optimal solutions from Mahout) On 11 January 2014 00:46, Ted Dunning ted.dunn...@gmail.com wrote: TSP is generally solved using a number of heuristics guiding a randomized

Re: Item recommendation w/o users or preferences

2014-01-11 Thread Ted Dunning
Yes. Since each transaction contains several items, you might as well call that a row in the history matrix and go from there to cooccurrence analysis or matrix factorization (cooccurrence is easier and just as accurate if you have enough data). As Rachel mentions, you also can sometimes string

Re: Item recommendation w/o users or preferences

2014-01-11 Thread Ted Dunning
Tim, Can you be more specific about which bits are missing? Is it about the rationale for the log-likelihood ratio test? If so, that rationale is simply that the algorithm is simple and empirically has been shown to produce excellent results across a large array of applications (over a thousand

Re: travelling salesman on Mahout

2014-01-10 Thread Ted Dunning
TSP is generally solved using a number of heuristics guiding a randomized search. Mahout has essentially no provision for helping with this. If you want a quick and dirty solution, I would recommend something like an evolutionary algorithm in which you have segments that self-assemble or split

Re: Item recommendation w/o users or preferences

2014-01-10 Thread Ted Dunning
How is it that you have many transactions and have no user information? I thought that transactions were user information? On Fri, Jan 10, 2014 at 5:27 PM, Tim Smith timsmit...@hotmail.com wrote: Say I have a retail organization that doesn't sell a diverse set of products, eg 2000, but has

Re: Item recommendation w/o users or preferences

2014-01-10 Thread Ted Dunning
This talk of support and overlap smacks of very poor coocccurrence analysis. See http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html for a better option. On Fri, Jan 10, 2014 at 8:05 PM, Tim Smith timsmit...@hotmail.com wrote: Very awesome, thank you! I am twisting the support

Re: eigenvectors and eigenvalues of a matrix

2014-01-08 Thread Ted Dunning
, Jan 7, 2014 at 12:27 PM, Ted Dunning ted.dunn...@gmail.com wrote: The order of the singular values and vectors should tell you. For others who might be curious, the singular value decomposition breaks a matrix A into three factors A = U S V' Both U and V

Re: Question on OnlineLogisticRegression.iris() test case

2014-01-06 Thread Ted Dunning
This is an offset element which allows the model to have an intercept term in addition to terms for the predictor variables. On Mon, Jan 6, 2014 at 8:31 AM, Frank Scholten fr...@frankscholten.nlwrote: Hi, I am studying the LR / SGD code and I was wondering why in the iris test case the

Re: eigenvectors and eigenvalues of a matrix

2014-01-06 Thread Ted Dunning
The order of the singular values and vectors should tell you. For others who might be curious, the singular value decomposition breaks a matrix A into three factors A = U S V' Both U and V are orthonormal so that U' U = I and V' V = I. S is diagonal. An eigenvalue decomposition decomposes

Re: Equality of two DenseMatrix objects

2013-12-29 Thread Ted Dunning
On Sun, Dec 29, 2013 at 9:17 AM, Tharindu Rusira tharindurus...@gmail.comwrote: Thanks Chameera and Sebestian for sharing your expertise :) Just wanted to know the reason behind the absence of an equality check for Matrices. For what it is worth, here is the one liner that does this

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-12-29 Thread Ted Dunning
before you target them. This sort of time machine leak can be enormously more subtle than this. On Mon, Dec 2, 2013 at 1:50 PM, Gokhan Capan gkhn...@gmail.com wrote: Gokhan On Thu, Nov 28, 2013 at 3:18 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Wed, Nov 27, 2013 at 7:07 AM, Vishal

Re: Equality of two DenseMatrix objects

2013-12-29 Thread Ted Dunning
On Sun, Dec 29, 2013 at 7:30 PM, Tharindu Rusira tharindurus...@gmail.comwrote: Hi Ted, Thanks for taking this discussion back alive. It's true, as Sebestian mentioned, equality checking for matrices is an expensive task and Ted has come up with a smart one liner here(even though a

Re: SVM in Mahout

2013-12-24 Thread Ted Dunning
You might try logistic regression with regularization for a very similar result. On Mon, Dec 23, 2013 at 11:57 PM, Sebastian Schelter ssc.o...@googlemail.com wrote: Hi Tharindu, There is no SVM implementation in an official release. --sebastian On 24.12.2013 08:02, Tharindu Rusira

Re: SVM in Mahout

2013-12-24 Thread Ted Dunning
to go far, go with others. Remember, happiness is a way of travel not a destination A good traveller has no fixed plans, and is not intent on arriving. On 24 December 2013 11:11, Ted Dunning ted.dunn...@gmail.com wrote: You might try logistic regression with regularization for a very

Re: Random Forest possible error

2013-12-15 Thread Ted Dunning
of the ways to load data. And I found problem there. I am going to compare with other approach (partial, Breiman) to see what's the difference. My bad, well It's Saturday ! Sam On Sat, Dec 14, 2013 at 1:38 PM, Ted Dunning ted.dunn...@gmail.com wrote: Can you file a JIRA at https

Re: Random Forest possible error

2013-12-14 Thread Ted Dunning
Can you file a JIRA at https://issues.apache.org/jira/browse/MAHOUT ? It sounds like you have a test case in mind along with your fix. If you could package that work up as a patch file, then it would be much appreciated. On Sat, Dec 14, 2013 at 9:24 AM, sam wu swu5...@gmail.com wrote: Hi,

Re: Desicion Tree in Mahout

2013-12-12 Thread Ted Dunning
you should move forward to version 0.8 On Thu, Dec 12, 2013 at 5:17 AM, unmesha sreeveni unmeshab...@gmail.comwrote: Thanks Sigbjørn Dybdahl, I was waiting for the answer. Yes i downloaded mahout-distribution-0.6 Source.And went through *

<    1   2   3   4   5   6   7   8   9   10   >