Re: Dictionary file format in Lucene-Mahout integration

2013-06-05 Thread Suneel Marthi
Stuthi, seq2sparse is not the right tool if the input is lucene indexes and one would have to go with lucene.vectors for the same given the input. From: Stuti Awasthi stutiawas...@hcl.com To: user@mahout.apache.org user@mahout.apache.org; James Forth

Database connection pooling for a recommendation engine

2013-06-05 Thread Mike W.
Hello, I am considering to implement a recommendation engine for a small size website. The website will employ LAMP stack, and for some reasons the recommendation engine must be written in C++. It consists of an On-line Component and Off-line Component, both need to connect to MySQL. The

Re: Database connection pooling for a recommendation engine

2013-06-05 Thread Sean Owen
Not sure, is this really related to Mahout? I don't know of an equivalent of J2EE / Tomcat for C++, but there must be something. As a general principle, you will have to load your data into memory if you want to perform the computations on the fly in real time. So how you access the data isn't

Re: Database connection pooling for a recommendation engine

2013-06-05 Thread Manuel Blechschmidt
Hi Mike, the following paper contains some comparisons between different database stacks. I can also give you the QtSQL code if you are interested in it. http://www.manuel-blechschmidt.de/data/MMRPG2.pdf /Manuel Am 05.06.2013 um 13:44 schrieb Mike W.: Hello, I am considering to implement

Re: Dictionary file format in Lucene-Mahout integration

2013-06-05 Thread Grant Ingersoll
{code} File dictOutFile = new File(dictOut); log.info(Dictionary Output file: {}, dictOutFile); Writer writer = Files.newWriter(dictOutFile, Charsets.UTF_8); DelimitedTermInfoWriter tiWriter = new DelimitedTermInfoWriter(writer, delimiter, field); try {

Why are clustering emails not clustering similar stuff?

2013-06-05 Thread Jesvin Jose
I tried to cluster 1000 emails of a person using Kmeans, but clusters are not forming okay. For example if Facebook sends notifications about James Doe and 5 other people, I get 5 clusters like: :VL-858{n=7 Top Terms: doe = 10.066998481750488