Upgrading from 0.6 and ClassifierContext

2014-01-23 Thread Grant Ingersoll
Hi, I'm upgrading some classification code from 0.6 to 0.8 and am wondering what the replacement is for the ClassifierContext? Thanks, Grant

[OT] Uses Cases for Taming Text, 2nd ed.

2014-01-20 Thread Grant Ingersoll
Hi Mahout Users, Drew Farris, Tom Morton and I are currently working on the 2nd Edition of Taming Text (http://www.manning.com/ingersoll for first ed.) and are soliciting interested parties who would be willing to contribute to a chapter on practical use cases (i.e. you have something in

Re: Question about clusterdump

2013-08-22 Thread Grant Ingersoll
question but I find it hard to find details on these specifics. Many thanks, Will Grant Ingersoll | @gsingers http://www.lucidworks.com

Apache Mahout 0.8 Released

2013-07-25 Thread Grant Ingersoll
The Apache Mahout PMC is pleased to announce the release of Mahout 0.8. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the 3Cs), as well as the

Mahout 0.8 Release Candidate

2013-07-08 Thread Grant Ingersoll
A _preview_ of release artifacts for 0.8 are at https://repository.apache.org/content/repositories/orgapachemahout-113/org/apache/mahout/. This is not an official release. I will call a vote in a day or two, pending feedback on this thread, so please review/test. A _preview_ of the release

Re: Applying clustering techique

2013-06-13 Thread Grant Ingersoll
, 2013 at 1:06 PM, Grant Ingersoll gsing...@apache.org wrote: The CSVVectorIterator in the Integration package will take in a CSV file and produce vectors. It assumes that each row is the equivalent of a DenseVector (does MovieLens fit that?) If you need otherwise, I'd suggest starting

Re: Applying clustering techique

2013-06-12 Thread Grant Ingersoll
doubt is, Is there any need to convert the movielens rating.csv file into a sequence file. If needed what are the commands for applying clustering technique using mahout and the hadoop. Thanking you, Neetha Suan Thampi Grant Ingersoll | @gsingers

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

2013-06-08 Thread Grant Ingersoll
committers, this is a biased first proposal, please shout, if you see things different and want to have things kept. Best, Sebastian On 08.06.2013 16:42, Grant Ingersoll wrote: More tests are always welcome. On Jun 8, 2013, at 10:29 AM, Ravi Mummulla ravi.mummu...@gmail.com wrote: Hi Grant

Re: Dictionary file format in Lucene-Mahout integration

2013-06-06 Thread Grant Ingersoll
- rowid - cvb. lucene.vector will still give you higher performance at the cost of extra storage (and the fact that it doesn't work in M/R and can't handle multiple directories). I'd say we keep it for now. From: Grant Ingersoll gsing...@apache.org

Re: Dictionary file format in Lucene-Mahout integration

2013-06-05 Thread Grant Ingersoll
Grant Ingersoll | @gsingers http://www.lucidworks.com

Re: FP Growth

2013-06-04 Thread Grant Ingersoll
On Jun 2, 2013, at 10:42 AM, Sebastian Schelter s...@apache.org wrote: I don't think unmaintained code should stay in our codebase. +1 This will only create frustration amongst our users, as they will not get questions answered and bugs fixed. It would also be an obstacle for a 1.0

FP Growth

2013-06-01 Thread Grant Ingersoll
FP Growth seems to not have a lot of dev support. Are there users out there using it? Should it live on or get the axe prior to 1.0? -Grant

Re: seq2sparse in 0.8 throwing class not found for analyzers

2013-04-24 Thread Grant Ingersoll
. Grant Ingersoll | @gsingers http://www.lucidworks.com

[OT] Internships at LucidWorks

2013-02-13 Thread Grant Ingersoll
Hi, I'm looking for interns for the summer for those interested in Mahout and Machine Learning: Research Engineer Internship DESCRIPTION LucidWorks, the leading commercial company for Apache Lucene and Solr, is looking for interns to work on building next generation search, analytics and

Re: Clustering using Solr Index vs Lucene Index : Different Results

2013-01-30 Thread Grant Ingersoll
-- View this message in context: http://lucene.472066.n3.nabble.com/Clustering-using-Solr-Index-vs-Lucene-Index-Different-Results-tp4037198.html Sent from the Mahout User List mailing list archive at Nabble.com. Grant Ingersoll http

Re: If you're at Hadoop World this year

2012-10-21 Thread Grant Ingersoll
@ Cloudera hadoop: http://www.cloudera.com Grant Ingersoll http://www.lucidworks.com

Re: TFIDFPartialVectorReducer minDf

2012-09-22 Thread Grant Ingersoll
a solicitation of an offer to buy, any financial product. Grant Ingersoll http://www.lucidworks.com

SGD model sizes

2012-09-04 Thread Grant Ingersoll
Hi, I'm wondering if any has any rules of thumb around model size and memory usage for SGD? I'm doing some testing of it myself, but thought I would ask to see how it compares. Thanks, Grant

Re: clusterdump lucene document ID

2012-06-11 Thread Grant Ingersoll
Grant Ingersoll http://www.lucidimagination.com

Re: Commercializing Mahout: the Myrrix recommender platform

2012-04-21 Thread Grant Ingersoll
On Apr 20, 2012, at 12:05 PM, Hector Yee wrote: On a related note, wish i could share the data i have to see how these algorithms stack up to the ones we use for large scale learning. That certainly would be interesting. Are there other examples of large data sets people use? I know

Re: Commercializing Mahout: the Myrrix recommender platform

2012-04-06 Thread Grant Ingersoll
, as it's key to the long-term project health. It's most certainly going to be the year of the application layer (analytics, machine learning) for Big Data. Thank you! Sean Grant Ingersoll http://www.lucidimagination.com

[Job] Research Internships

2012-02-27 Thread Grant Ingersoll
Hi, I have internships open for this summer for students interested in working on search and machine learning. Description is below. -Grant Research Engineer Internship DESCRIPTION Lucid Imagination, the leading commercial company for Apache Lucene and Solr, is looking for interns to work

Re: Goals for Mahout 0.7

2012-02-24 Thread Grant Ingersoll
accompanied by some plan to address the contributions already in line in JIRA. It's not OK to be implicitly rejecting so much from the community by not planning to fix that first and foremost. Grant Ingersoll http://www.lucidimagination.com

Lucene Revolution in Boston in May (with a side of Mahout)

2012-02-24 Thread Grant Ingersoll
Hi Mahout's, Thought some here might be interested as search and machine learning often go together. -- Lucene Revolution will be here May 9-10 in Boston. Reserve your spot today with Early Bird pricing of $575. Committers and accepted speakers are entitled to free admission. Our CFP is

Re: 0.7 Priorities

2012-02-22 Thread Grant Ingersoll
On Feb 22, 2012, at 7:24 AM, Jake Mannix wrote: On recent threads on the dev@ list, and discussions off-list, it's pretty clear that we need to have cleanup be a priority for the next release. How about this for a formal proposal: - The 0.7 release will have issues (both new and on

[Job] Research Engineer at Lucid Imagination

2012-02-01 Thread Grant Ingersoll
, California TRAVEL Minimal Grant Ingersoll http://www.lucidimagination.com

Re: status of hadoop hidden markov model in mahout

2012-02-01 Thread Grant Ingersoll
On Jan 31, 2012, at 2:14 PM, Keary Cavin wrote: Dhruv, I downloaded the MAHOUT-627 patch and applied the files to the current mahout release. I'll let you know when I have questions. Note, the plan is to put this patch into 0.7 once the remaining test issue is fixed. -Grant

Re: term vectors not created in SparseVectorsFromSequenceFiles using tf weighting and maxDFSigma filtering

2012-01-24 Thread Grant Ingersoll
3 -seq Thanks, John On Sun, Jan 22, 2012 at 3:00 PM, Grant Ingersoll gsing...@apache.orgwrote: What were the command/options you were passing in? On Jan 18, 2012, at 4:26 PM, John Conwell wrote: I got latest from Trunk and built it, and when running SparseVectorsFromSequenceFiles

Re: term vectors not created in SparseVectorsFromSequenceFiles using tf weighting and maxDFSigma filtering

2012-01-22 Thread Grant Ingersoll
, minLLRValue, -1.0f, false, reduceTasks, chunkSize, sequentialAccessOutput, namedVectors); } -- Thanks, John C -- -- John C Grant Ingersoll http://www.lucidimagination.com

Re: Help needed on TF IDF.

2012-01-09 Thread Grant Ingersoll
please help? -- Regards Junaid Grant Ingersoll http://www.lucidimagination.com

Re: Mahout on EMR

2012-01-04 Thread Grant Ingersoll
EMR including clusterdumper following the instructions on: https://cwiki.apache.org/MAHOUT/mahout-on-elastic-mapreduce.html Thanks once again, Ipshita Grant Ingersoll http://www.lucidimagination.com

Re: Help regarding Apache Mahout.

2012-01-04 Thread Grant Ingersoll
the TF IDF from the documents present in a directory. Can you please help me with the Steps to go about it using Apache Mahout? Thank you. -- Regards Junaid Grant Ingersoll http://www.lucidimagination.com

Re: SGD and memory

2012-01-03 Thread Grant Ingersoll
task is to try and predict what project an email belongs to based on its content. Are these textual features? Or what? On Tue, Jan 3, 2012 at 2:53 PM, Grant Ingersoll gsing...@apache.org wrote: I'm trying to run the full ASF email SGD classifier problem and am facing heap size issues

Re: how to download data for example asf-email-examples.sh?

2012-01-02 Thread Grant Ingersoll
, there are some issues w/ this example and the SGD code that are still being worked through. See https://issues.apache.org/jira/browse/MAHOUT-904 for more info. Grant Ingersoll http://www.lucidimagination.com

Re: Will mahout arff.vector correctly convert string attributes?

2011-12-28 Thread Grant Ingersoll
a compressed binary format would be useful for representing such attributes, unless you also needed a count. Thanks, Don --- On Wed, 12/21/11, Grant Ingersoll gsing...@apache.org wrote: From: Grant Ingersoll gsing...@apache.org Subject: Re: Will mahout arff.vector correctly convert

Re: all keys going to one reducer in subgram step of CollocDriver (?)

2011-12-28 Thread Grant Ingersoll
to poke around does anyone agree this looks wrong? I'm running a 0.6-SNAPSHOT I cloned today from github. Was considering trying 0.5 but a quick look at recent changes doesn't seem to suggest this code has changed in awhile... Cheers, Mat Grant

Re: SequenceFile cast problems

2011-12-19 Thread Grant Ingersoll
Grant that was the point of my first question.. Now I'll take a look at the vector implementation. Thanks again Daniele On 14 December 2011 23:44, Grant Ingersoll gsing...@apache.org wrote: While Ted answered the Dissector question, your original issue, I believe, is that Mahout currently

Re: SequenceFile cast problems

2011-12-14 Thread Grant Ingersoll
:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) On 13 December 2011 19:52, Grant Ingersoll gsing...@apache.org wrote: What steps have you done? On Dec 13, 2011, at 12:29 PM, Daniele Volpi wrote: Hi everyone, I'm trying to implement the Naive Bayes

Re: SequenceFile cast problems

2011-12-13 Thread Grant Ingersoll
get this error: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.mahout.math.VectorWritable Do you have some hints on the right usage of this class? Thanks, Daniele Volpi Grant Ingersoll http

Re: mahout exception (lucene.vector)

2011-12-09 Thread Grant Ingersoll
happened? please help me, thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/mahout-exception-lucene-vector-tp3569144p3569144.html Sent from the Mahout User List mailing list archive at Nabble.com. Grant Ingersoll

Re: 20newsgroups example does not print verbose output

2011-12-04 Thread Grant Ingersoll
testclassifier \ -m ${WORK_DIR}/myproj-bydate/bayes-model \ -d ${WORK_DIR}/myproj-bydate/bayes-test-input \ -type bayes \ -ng 1 \ -source hdfs \ -v \ -method mapreduce Any suggestions? Thanks Grant Ingersoll http://www.lucidimagination.com

Re: DisplayKMean

2011-12-02 Thread Grant Ingersoll
Grant Ingersoll http://www.lucidimagination.com

Re: ASF archives?

2011-12-01 Thread Grant Ingersoll
I launched a micro instance and mounted the volume and downloaded it. That's the only way to get that exact data set that I am aware of. I've got a smaller sample up on the Lucid website. Otherwise, if you just want something like it, you can use your ASF credentials to get it. I can point

Re: Clustering graph coloring and layout

2011-12-01 Thread Grant Ingersoll
attached it, but those get stripped. I didn't realize that this was going to the list. Try here: http://dl.dropbox.com/u/36863361/cluster-viz.r And here for the image: http://dl.dropbox.com/u/36863361/xyz.png On Wed, Nov 30, 2011 at 4:04 PM, Grant Ingersoll gsing...@apache.org wrote: Can you

Re: Clustering graph coloring and layout

2011-11-30 Thread Grant Ingersoll
are near.xyz.png On Tue, Nov 29, 2011 at 8:03 AM, Grant Ingersoll gsing...@apache.org wrote: I'm still learning R, do you have code handy you could share? On Nov 29, 2011, at 6:25 AM, Ted Dunning wrote: Coloring is pretty easy in R, which is what I use. I just build a color map with the right

Clustering graph coloring and layout

2011-11-29 Thread Grant Ingersoll
://issues.apache.org/jira/browse/MAHOUT-899) but would really like to be able to produce much prettier visualizations out of the box. Grant Ingersoll http://www.lucidimagination.com

Re: Clustering graph coloring and layout

2011-11-29 Thread Grant Ingersoll
the transparency according to how seriously down-sampled the cluster is. That lets me get a good visual feel for the actual cluster size. On Tue, Nov 29, 2011 at 5:03 AM, Grant Ingersoll gsing...@apache.orgwrote: Anyone have an easy algorithm for coloring clusters in a nice way? That is, given k

Re: MinHash Clustering in Mahout

2011-11-28 Thread Grant Ingersoll
that the NGram attribute was set to the default value of 1 when creating the tf-idf vectors from sequence files. Suneel From: Grant Ingersoll gsing...@apache.org To: user@mahout.apache.org Sent: Tuesday, October 25, 2011 5:55 AM Subject: Re: MinHash

Re: mahout command problems

2011-11-27 Thread Grant Ingersoll
Grant Ingersoll http://www.lucidimagination.com

Re: Facing problem while fetching the document id from cluser

2011-11-25 Thread Grant Ingersoll
(); But it is returning null . Please help me to move further . Thanks and Regards, S SYED ABDUL KATHER Grant Ingersoll http://www.lucidimagination.com

Reminder: SF Mahout User Meeting

2011-11-25 Thread Grant Ingersoll
For those in the San Francisco area, there will be a Mahout User Meeting on Nov. 29th at Lucid Imagination's offices. Details and RSVP are at http://sf-mahout-11-11.eventbrite.com/ For those not in the SF area, I _believe_ we will be recording it and posting it.

Re: MinHash Clustering in Mahout

2011-11-23 Thread Grant Ingersoll
, such that I wonder if they are more or less empty. Running now to check. I am assuming that the NGram attribute was set to the default value of 1 when creating the tf-idf vectors from sequence files. Suneel From: Grant Ingersoll gsing...@apache.org

Re: MinHash Clustering in Mahout

2011-11-23 Thread Grant Ingersoll
From: Grant Ingersoll gsing...@apache.org To: user@mahout.apache.org Sent: Tuesday, October 25, 2011 5:55 AM Subject: Re: MinHash Clustering in Mahout On Oct 19, 2011, at 11:38 AM, Varun Thacker wrote: I was trying to run the MinHash algorithm

Re: clustering hardware requirements

2011-11-22 Thread Grant Ingersoll
in Action. If they do what I think they do, I will definitely try them, and probably complain on the list (Ted) if I can't interpret them right :). Thanks for the reply, -- Ioan Eugen Stan Grant Ingersoll http://www.lucidimagination.com

Re: Trouble understanding how to use the FP_Growth algorithm

2011-11-21 Thread Grant Ingersoll
, k, null, //returnableFeatures output, updater) Grant Ingersoll http://www.lucidimagination.com

Large Scale Clustering

2011-11-18 Thread Grant Ingersoll
Might be of interest: Clustering Very Large Multi-dimensional Datasets with MapReduce http://www.cs.cmu.edu/~jclopez/ref/kdd2011-mr-clustering.pdf Grant Ingersoll http://www.lucidimagination.com

Re: lsi

2011-11-17 Thread Grant Ingersoll
I've never implemented LSI. Is there a way to incrementally build the model (by simply indexing documents) or is it something that one only runs after the fact once one has built up the much bigger matrix? If it's the former, I bet it wouldn't be that hard to just implement the appropriate

Re: lsi

2011-11-14 Thread Grant Ingersoll
Might be useful: https://github.com/algoriffic/lsa4solr Looks like it hasn't been kept up to date. On Nov 13, 2011, at 1:47 PM, Sebastian Schelter wrote: Is there some documentation/tutorial available on how to build a LSI pipeline with mahout and lucene? --sebastian

Re: incosistent output while using clusterdumper

2011-11-11 Thread Grant Ingersoll
:0.011,2:0.032,..etc As seen above in MSV-441 there is no presence of : in the output whereas MSV-770 has ):-0.025. Can anyone throw some light as to what is the difference and why is it present there..?? Thanks. Grant Ingersoll http

Re: SGD TrainNewsGroups interim output

2011-11-09 Thread Grant Ingersoll
Cool, how about adding it to the Wiki? On Nov 9, 2011, at 8:15 AM, Suneel Marthi wrote: I can put together a doc if we don't already have one, know the SGD code pretty well. Regards, Suneel From: Grant Ingersoll grant.ingers...@gmail.com To: user

Re: NewsKMeansClustering - the result most people want seems to be missing

2011-11-09 Thread Grant Ingersoll
this far. Any help would be gratefully received. R Grant Ingersoll http://www.lucidimagination.com

Re: Minhash key groups

2011-11-08 Thread Grant Ingersoll
. -Grant On Nov 7, 2011, at 8:54 PM, Suneel Marthi wrote: Do we have an answer for this? Sent from my iPhone On Nov 2, 2011, at 7:20 AM, Grant Ingersoll gsing...@apache.org wrote: What's the Minhash key groups value used for in the MinhashDriver? I mean, I see it is used for building up

Watchmaker framework usage

2011-11-04 Thread Grant Ingersoll
We've been debating removing/archiving the Watchmaker integration in Mahout due to seeming lack of maintenance and interest. Is anybody actually using it? -Grant

Re: Can anybody explain the distance method in SquaredEuclideanDistanceMeasure?

2011-11-04 Thread Grant Ingersoll
, thanks guys. That would be a great addition! Also, javadoc would be helpful, so patches would be great there. Grant Ingersoll http://www.lucidimagination.com

Re: creating vectors from lucene index which does NOT store vectors

2011-11-04 Thread Grant Ingersoll
field. Also, I assume at some point this could be a map-reduce job in hadoop. I'm just asking for sanity check, or if there are any better ideas out there. Thanks Bob -- Grant Ingersoll http://www.lucidimagination.com

SF Apache Mahout User Meeting (MUM) Nov 29th @ Lucid Imagination HQ

2011-11-04 Thread Grant Ingersoll
two speakers giving presentations related to Mahout: Ted Dunning, MapR and Grant Ingersoll of Lucid Imagination (me). Both Ted and Grant are long time committers on the Mahout project. Ted's talk: How and why random projections work? Mine: Using Mahout to Cluster, Classify and Recommend

Minhash key groups

2011-11-02 Thread Grant Ingersoll
What's the Minhash key groups value used for in the MinhashDriver? I mean, I see it is used for building up the key out of the hashed values, but what's the significance of different values for it? The default is 2, what does it mean practically speaking if I choose, say, 10? AFAICT, it

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Grant Ingersoll
What functionality, specifically, are you proposing to remove? I know we had a lot of discussion around some of this stuff way back when as to how best to do it, but of course, that doesn't mean it has uptake. If it's on the Matrix, then doesn't it more easily get shipped around via the

Re: Embedding mahout in a java app

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 7:17 AM, Tharindu Mathew wrote: I want to create a java UI tool (based on a web app) that can pick and apply different algorithms available in Mahout to different data sets. Very cool! Keep us posted, as this would be immensely useful! Any chance it will be donated back?

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 10:58 AM, Jake Mannix wrote: On Wed, Nov 2, 2011 at 7:34 AM, Grant Ingersoll gsing...@apache.org wrote: What functionality, specifically, are you proposing to remove? I'm suggesting we kill, from Matrix.java and descendents, all of the following methods

How To Contribute

2011-11-02 Thread Grant Ingersoll
In the vein of users become contributors become committers: It seems there has been some spark of interest in contributing more, so I thought I would pass along a few pointers: 1. https://cwiki.apache.org/MAHOUT/how-to-contribute.html -- Details how to submit patches, etc. IDE codestyles at

Re: Production use cases of Mahout

2011-11-01 Thread Grant Ingersoll
available? -- Regards, Tharindu blog: http://mackiemathew.com/ Grant Ingersoll http://www.lucidimagination.com

Re: Exception in thread main org.apache.lucene.index.CorruptIndexException: unrecognized format -3 in file _b.fnm

2011-10-26 Thread Grant Ingersoll
...@gmail.com Grant Ingersoll http://www.lucidimagination.com

User vs. Item performance

2011-10-26 Thread Grant Ingersoll
I seem to recall past discussions on where one hits the bottleneck w/ user based recommendation approaches in Mahout, but I can't seem to locate it anymore. Anyone know off hand? Where do user based approaches hit their limits, more or less? Thanks, Grant

Re: User vs. Item performance

2011-10-26 Thread Grant Ingersoll
, but on Hadoop. On Wed, Oct 26, 2011 at 1:56 PM, Grant Ingersoll gsing...@apache.org wrote: I seem to recall past discussions on where one hits the bottleneck w/ user based recommendation approaches in Mahout, but I can't seem to locate it anymore. Anyone know off hand? Where do user based

Mahout Training/Talks at ApacheCon

2011-10-22 Thread Grant Ingersoll
Just a friendly nudge to those on the fence for ApacheCon in Vancouver this year that there will be both a Mahout training and some Mahout talks. I think a few of us committers will also be hacking Mahout on Tuesday if you are interested. Training info: http://na11.apachecon.com/talks/18395

Re: Exception thrown while running K-means clustering using Mahout

2011-10-22 Thread Grant Ingersoll
. Thanks a lot . Grant Ingersoll http://www.lucidimagination.com

Re: Bayes classifier can't get model when running on Hadoop

2011-10-17 Thread Grant Ingersoll
this model to classify new data, all sample will be classified to unknown My Environment: 1. Os : cent-os 5 2. Mahout : 0.5 3. Hadoop : 0.20.205 Thanks, Wangda Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011

Re: RecommenderJob and NaN

2011-10-14 Thread Grant Ingersoll
:17, Grant Ingersoll wrote: Were you able to get the data, Sebastian? On Oct 13, 2011, at 4:01 AM, Sebastian Schelter wrote: Grant, Can you share a little more details about the results, do you get any exceptions? Or do you just get no results? Using the NaNs inside the similarity matrix

Re: RecommenderJob and NaN

2011-10-14 Thread Grant Ingersoll
will probably have to tweak it. Lance On Thu, Oct 13, 2011 at 11:04 PM, Sebastian Schelter s...@apache.org wrote: Only got the raw data, how did you convert it to our standard recommender input? --sebastian On 14.10.2011 01:17, Grant Ingersoll wrote: Were you able to get the data

Re: RecommenderJob and NaN

2011-10-13 Thread Grant Ingersoll
this job worked for someone? On Wed, Oct 12, 2011 at 11:30 AM, Grant Ingersoll gsing...@apache.orgwrote: Both local and on EC2 On Oct 12, 2011, at 2:10 PM, Ken Krugler wrote: Hi Grant, Just curious, are you running this locally or distributed? I'd run into a similar issue, though

Re: RecommenderJob and NaN

2011-10-13 Thread Grant Ingersoll
at 7:33 AM, Lance Norskog goks...@gmail.com wrote: Is this job working well for anyone now? When was the last time this job worked for someone? On Wed, Oct 12, 2011 at 11:30 AM, Grant Ingersoll gsing...@apache.orgwrote: Both local and on EC2 On Oct 12, 2011, at 2:10 PM, Ken Krugler wrote

Re: RecommenderJob and NaN

2011-10-13 Thread Grant Ingersoll
Note, the next version (13df29e4fe97b4370f24d7e91ab5909de76f0f3b) doesn't work. Debugging. On Oct 13, 2011, at 9:31 PM, Grant Ingersoll wrote: OK, I can confirm that an earlier version (54300025dbdd6e688a4eb3d043016eb641067c7e in github/lucidimagination/mahout) worked. Now, to figure

Re: RecommenderJob and NaN

2011-10-13 Thread Grant Ingersoll
Looks like it is me. Still not sure why, but getting there. On Oct 13, 2011, at 10:35 PM, Grant Ingersoll wrote: Note, the next version (13df29e4fe97b4370f24d7e91ab5909de76f0f3b) doesn't work. Debugging. On Oct 13, 2011, at 9:31 PM, Grant Ingersoll wrote: OK, I can confirm

Re: No output for 20 Newsgroups testing

2011-10-12 Thread Grant Ingersoll
-for-20-Newsgroups-testing-tp3415474p3415474.html Sent from the Mahout User List mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com

Re: RecommenderJob and NaN

2011-10-12 Thread Grant Ingersoll
on how to interpret this as I haven't dug into the math here yet or figured out where those NaN are coming from originally. On Oct 11, 2011, at 2:55 PM, Grant Ingersoll wrote: On Oct 11, 2011, at 2:49 PM, Grant Ingersoll wrote: On Oct 11, 2011, at 12:36 PM, Sean Owen wrote: Where

Re: RecommenderJob and NaN

2011-10-12 Thread Grant Ingersoll
. When running locally, this wasn't getting cleared between loops, and thus I got wonky results. The same thing would have happened with JVM reuse enabled. -- Ken On Oct 12, 2011, at 3:28pm, Grant Ingersoll wrote: Digging some more: In AggregateAndRecommend, around lines 143, I

RecommenderJob and NaN

2011-10-11 Thread Grant Ingersoll
I'm running trunk RecommenderJob (via build-asf-email.sh) and am not getting any recommendations due to NaNs being calculated in the AggregateAndRecommend step. I'm not quite sure what is going on as it seems like this was working as little as two weeks ago (post Sebastian's big change to

Re: RecommenderJob and NaN

2011-10-11 Thread Grant Ingersoll
larger data set on Hadoop, it's just that's a whole lot harder to debug. On Tue, Oct 11, 2011 at 5:34 PM, Grant Ingersoll gsing...@apache.org wrote: I'm running trunk RecommenderJob (via build-asf-email.sh) and am not getting any recommendations due to NaNs being calculated

Re: RecommenderJob and NaN

2011-10-11 Thread Grant Ingersoll
On Oct 11, 2011, at 2:49 PM, Grant Ingersoll wrote: On Oct 11, 2011, at 12:36 PM, Sean Owen wrote: Where is the NaN coming up -- what has this value? simColumn seems to be the originator in the Aggregate step. For instance, my current breakpoint shows: {309682

Re: question about clustering

2011-10-10 Thread Grant Ingersoll
at 11:54 AM, Grant Ingersoll gsing...@apache.orgwrote: On Oct 2, 2011, at 11:52 PM, Walter Chang wrote: Hi , i have used mahout to produce kmeans clustering for my tf-idf result. I use the mahout command line to produce the clusters and it seems it successfully completes. $MAHOUT_HOME

Re: Any plan to support markov chain based recommender ?

2011-10-08 Thread Grant Ingersoll
to provide some? Thank you, -- Colin Wang Skype : colin.bin.wang Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com

Re: question about clustering

2011-10-06 Thread Grant Ingersoll
belongs to. Thanks a lot, Weide Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com

Re: San Francisco/Bay Area Mahout users group

2011-09-16 Thread Grant Ingersoll
for the users and devs of Mahout? I will be moving there next week and was curious to know about the networking opportunities with similar minded folks in the coming months. Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http

Re: 92% accuracy on Weka NaiveBayesMultinomial vs 66% with Mahout bayes

2011-09-16 Thread Grant Ingersoll
Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com

(C)NB classifier scores

2011-09-15 Thread Grant Ingersoll
What's the interpretation of scores for the output from the new (complementary) naive bayes classifiers? Larger is better, right? Thanks, Grant

Re: (C)NB classifier scores

2011-09-15 Thread Grant Ingersoll
to the complement class, you have highest affinity to the actual class which the data belongs to. Unless the new computation is spitting out positive numbers in which case its the largest. :-) On Thu, Sep 15, 2011 at 9:18 PM, Grant Ingersoll gsing...@apache.orgwrote: What's the interpretation

Re: vectors from pre-tokenized terms

2011-09-14 Thread Grant Ingersoll
to be ordered, but my features are not ordered. I would then use DictionaryVectorizer.createTermFrequencyVectors and TFIDFConverter.processTfIdf, just like in SparseVectorsFromSequenceFiles. Am I on the right track? Grant Ingersoll http

Re: Error while running any clustering tasks

2011-09-14 Thread Grant Ingersoll
... This is the error which I get: http://pastebin.com/ADPm0Vbx Am I missing any steps? Also on a side note is there a post on using MinHash in Mahout? -- Regards, Varun Thacker http://varunthacker.wordpress.com Grant Ingersoll

Re: Email and Collab. Filtering

2011-09-08 Thread Grant Ingersoll
directory (userVectors I think). On Thu, Sep 1, 2011 at 4:30 PM, Grant Ingersoll gsing...@apache.org wrote: On Sep 1, 2011, at 10:04 AM, Sean Owen wrote: Your input needs to be CSV if you want to use it all as-is. But, it quickly creates vectors out of things, so really you can comment out

  1   2   3   >