Yeah, that is what i had in mind as a simple solution. For examining bigger
result sets I always fear the cost of loading a lot of stored fields,
that's why i thought of including it in the scoring might be cool. It's not
possible with a plain Collector that maintains a priorityqueue of docid and
s
thanks for the reply ... I've discretized the continuous time series
observations and assigned them to symbols. The number of hidden states is
2: "out of control" and "not out of control -- 0 and 1. With the scenario
defined this way, i'm able to get good predictions from HMM. What i don't
know how
Yes what you are describing with diversification is something that I have
called anti-flood. It comes from the fact that we really are optimizing a
portfolio of recommendations rather than a batch of independent
recommendations. Doing this from first principles is very hard but there are
very s
Hi all,
Assume we want to run mahout examples like:
$HADOOP_HOME/bin/hadoop jar
$MAHOUT_HOME/core/target/mahout-core--job.jar
org.apache.mahout.classifier.df.tools.Describe -p
testdata/KDDTrain+.arff -f testdata/KDDTrain+.info -d N 3 C 2 N C 4 N
C 8 N 2 C 19 N L
it works well in command line, b
This data was for a mobile shopping app. Other answers below.
> On May 21, 2013, at 5:42 PM, Ted Dunning wrote:
>
> Inline
>
>
> On Tue, May 21, 2013 at 8:59 AM, Pat Ferrel wrote:
>
>> In the interest of getting some empirical data out about various
>> architectures:
>>
>> On Mon, May 20, 2
Okay i got it! I also always have used a basic form of dithering but we
always called it shuffle since it basically was / is Collections.shuffle on
a bigger list of results and therefore doesnt take the rank or score into
account. Will try that..
With diversification i really meant more something
I looked, and this job already uses a combiner called OuterProductCombiner.
In fact it was right there in the stack trace, oops. At least, it shows
this is happening in the mapper and the combiner is trying to do its job.
I am still pretty sure both io.sort.* parameters are relevant here.
Anyway
i am actually not sure how to manipulate use of combiners in hadoop. All i
can say that the code does make extensive use of combiners but they were
always "on" for me. I had no idea one might turn their use off.
On Wed, May 22, 2013 at 6:17 AM, Jakub Pawłowski
wrote:
> Yes, I was manipulating io
HMM's could be useful, but you have to define things a bit differently.
First of all, HMM's want symbolic inputs and want to give you symbolic
outputs. You don't get to see the internal state.
My first approach would be to use k-means clustering on short sequences of
your observed continuous var
I'm not knowledgable of statistics nor data analysis, so please be
gentle! I am using Mahout to predict time series out of control state. I've
had a fair amount of success classifying with SGD and Adaptive regression
approaches but want to see if Hidden Markov Models can do a better job for
my purp
I mean you would have to write one and modify the code to use it. I
don't know this job well enough to know whether it's possible or not
though. At least, this is getting directly at reducing the amount of
data spilled, rather than reducing the intermediate I/O needed to sort
it.
Doesn't io.sort.*
Yes, I was manipulating io.sort.factor too, it speeds up reducer, values
around 30 gives good result for me.
But my problem is not reducer, my problem is Bt-job map taks that spills
to drive.
You mentioned Combiner, how can I turn it on ? I'm running my job from
console like that
mahout ssvd
I feel like I've seen this too and it's just a bug. You're not running
out of memory.
Are you also setting io.sort.factor? that can help too. You might try
as high as 100.
Also have you tried a Combiner? if you can apply it it should help too
as it is designed to reduce the amount of stuff spille
See inline.
From: Stuti Awasthi
To: "'user@mahout.apache.org'"
Sent: Wednesday, May 22, 2013 7:02 AM
Subject: RE: Feature vector generation from Bag-of-Words
Hi Suneel,
I implemented your suggested approach. This was simple to implement and you
have mad
Hi,
I'm trying to tune mahout ssvd job to not spill so much, I'm trying to tune
io.sort.mb
1047
but when I try to put any bigger value, ie.
io.sort.mb
1247
according to hadoop source code this value can be as hight as 2047
http://grepcode.com/file/repository.cloudera.com/content/re
Hi,
I'm trying to tune mahout ssvd job to not spill so much, I'm trying to tune
io.sort.mb
1047
but when I try to put any bigger value, ie.
io.sort.mb
1247
according to hadoop source code this value can be as hight as 2047
http://grepcode.com/file/repository.cloudera.com/content/re
Hi Suneel,
I implemented your suggested approach. This was simple to implement and you
have made the steps pretty clear. Thankyou :) . I have few query in creating
Features using Multiset:
1. Can't we consider keyword Case Insensitiveness using multiset i.e my keyword
may be "Day" and in docum
17 matches
Mail list logo