HI all,
I'm trying to run the seq2sparse tool with one of the lucene analyzers but it
throws a class not found exception
mahout seq2sparse -i ./contentDataDir/sequenced -o
./contentDataDir/sparseVectors --namedVector -wt tf -a
org.apache.lucene.analysis.EnglishAnalyzer
Ryan,
Hadoop splits based on the min size, as Matt mentioned, and the max split
size, and also the dfs.block.size. You can calculate the split size from that
as max(minSplit,min(maxSplit,blockSize)).
I have found that for CPU intensive operations on smaller data sets, like I was
doing with
Chris,
Thanks for reporting this.
I am able to replicate this problem with trunk. Created Mahout-1195 to track
this, I'll take a look sometime today.
From: Chris Harrington ch...@heystaks.com
To: user@mahout.apache.org
Sent: Monday, April 22, 2013 6:08 AM
Phew,...
The fix for this was a DUD.
In Lucene 4.2.1 the package name for this class was changed to
org.apache.lucene.analysis.en.EnglishAnalyzer.
Notice 'en' in the package path.
This should work.
From: Chris Harrington ch...@heystaks.com
To:
I'm losing a some documents when running seq2sparse. I think it's
because the documents are composed of common terms, and end up having
no terms at all once common words are pruned. I couldn't find
documentation that this is what's supposed to be happening though, so
I wanted to ask if this is
Hello,
I'm using Mahout in a system, where the typical response time should be
below 100ms. I'm using an item based recommender with float preference
values (with Tanimato similarity for now, which is passed into a
CachingItemSimilarity objec for performance reasonst). My model has around
7k
49 seconds is orders of magnitude too long -- something is very wrong
here, for so little data. Are you running this off a database? or are
you somehow counting the overhead of 3-4K network calls?
On Mon, Apr 22, 2013 at 11:22 PM, Gabor Bernat ber...@primeranks.net wrote:
Hello,
I'm using
Hello everyone,
I want to know if it's possible to do a clustering of documents in
SolrCloud indices (multiple index directories) and how would one
accomplish that.
---
I'm using Solr 4.2.1 and Mahout 0.8-SNAPSHOT
I can cluster documents from one Lucene/Solr index. I can even cluster
documents
Nope, and nope.
Note that this is an outlier example, however even in other cases it does
takes 500ms+ which is way to much for what I need.
Thanks,
Bernát GÁBOR
On Tue, Apr 23, 2013 at 12:53 AM, Sean Owen sro...@gmail.com wrote:
49 seconds is orders of magnitude too long -- something is