[
https://issues.apache.org/jira/browse/MAHOUT-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhao zhendong updated MAHOUT-232:
-
Description:
After discussed with guys in this community, I decided to re-implement a
Sequ
[
https://issues.apache.org/jira/browse/MAHOUT-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhao zhendong updated MAHOUT-232:
-
Attachment: SequentialSVM_0.4.patch
1) Supporting sequential multi-classification (both one-vs.-o
[
https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802106#action_12802106
]
Ted Dunning commented on MAHOUT-228:
{quote}
make sure that L1 is sparsity inducing my
Hi Drew,
Including a source code in snapshots that will be great.
Currently, the HDFS reader does not work in 0.20.2. Without source code,
it's not convenient for me to debug the code.
Cheers,
Zhendong
On Sat, Jan 9, 2010 at 12:25 AM, Drew Farris wrote:
> I wonder if we can get the hadoop peo
[
https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802032#action_12802032
]
Olivier Grisel commented on MAHOUT-228:
---
For the records: I am working adding more te
On Mon, Jan 18, 2010 at 4:46 PM, Grant Ingersoll wrote:
>
> On Jan 18, 2010, at 12:34 PM, Benson Margulies wrote:
>
> > If it's SF on Thursday, someone will have to have a beer as my proxy.
>
> I volunteer ;-)
>
You're on.
> Sounds like a we have a post meetup meetup brewing. I'm not familiar
I would love to, but there is no chance I could make it that far.
On Mon, Jan 18, 2010 at 2:32 PM, Markus Weimer wrote:
> Hi,
>
> mloss.org will be hosting the workshop on Machine Learning Open Source
> Software at the International Conference on Machine Learning (MLOSS
> '10), following similar
On Jan 18, 2010, at 12:34 PM, Benson Margulies wrote:
> If it's SF on Thursday, someone will have to have a beer as my proxy.
I volunteer ;-)
Sounds like a we have a post meetup meetup brewing. I'm not familiar with the
area, anyone know where we can go afterwards? Also, I'll need a ride bac
On Mon, Jan 18, 2010 at 3:20 PM, Olivier Grisel wrote:
>
> In the mean time could you please give me a hint on how to value the
> probes of the binary randomizer w.r.t. the window size?
>
The basic trade-off is the standard hashed learning trade-off between number
of training examples, dimensiona
Hi,
mloss.org will be hosting the workshop on Machine Learning Open Source
Software at the International Conference on Machine Learning (MLOSS
'10), following similar workshops at NIPS. I believe it would be a
great venue to not only present mahout but also to get in touch with
other MLOSS project
2010/1/18 Ted Dunning :
> THANK YOU.
Thank you you! I was about to implement my own regularized sgd linear
classifier using hashed features when I first stumbled upon your patch
:)
> I have been very grumpy that I couldn't get to doing this yet.
>
> I will coordinate closely with you. I haven't
Hmm, if all you guys are going to be there, I may need to push back my
flight -
I'm scheduled to fly *out* of SFO right around the time of the Meetup, but
if I can push back that flight, I will.
-jake
On Mon, Jan 18, 2010 at 1:24 PM, Ted Dunning wrote:
> I'll be there.
>
> Sean, are you reall
Yes, I'm on the west coast for a week from tomorrow for various
reasons and so will certainly stop in. Looking forward to it.
Sean
On Mon, Jan 18, 2010 at 9:24 PM, Ted Dunning wrote:
> I'll be there.
>
> Sean, are you really going to be there? That would be fantastic.
For the past... 5 years? I've been using Spring as a DI container
at every job I've had. At LinkedIn, in fact we have extended
Spring extensively
(see here: http://www.springsource.com/files/SpringAtLinkedIn.pdf
for some details). It's incredibly powerful, and while the config files
can be prett
I'll be there.
Sean, are you really going to be there? That would be fantastic.
On Mon, Jan 18, 2010 at 6:02 AM, Grant Ingersoll wrote:
>
> On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote:
>
> > We should have a beer some time anyway and the beers we owe you for
> cleaning
> > up Colt more than
[
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801960#action_12801960
]
Ted Dunning commented on MAHOUT-153:
+1 to what Grant said. Go ahead and post a patch
THANK YOU.
I have been very grumpy that I couldn't get to doing this yet.
I will coordinate closely with you. I haven't used git yet in anger so it
will be a learning experience. Don't expect me to have time, though. ( I
will try ... but expect not to find a hole )
On Mon, Jan 18, 2010 at 12:
2010/1/18 Ted Dunning :
> These bounds were too tight in any case. I had to loosen other bounds
> during development and should have loosened these as well.
>
> Your change is a good one.
Great! so here is the sequel:
I have written a real training convergence test and identified and
fixed two b
These bounds were too tight in any case. I had to loosen other bounds
during development and should have loosened these as well.
Your change is a good one.
On Mon, Jan 18, 2010 at 6:03 AM, Olivier Grisel wrote:
> Is this a consequence of the recent RandomAccessSparseVector
> implementation chan
I am going to address IoC issues only on this thread. The other
repeatability issues should be address, but on the other thread.
On Mon, Jan 18, 2010 at 7:10 AM, Sean Owen wrote:
> > I am not especially in favor of my own Random patch. If people are
> > willing to run in 'fork-once' mode to get
[
https://issues.apache.org/jira/browse/MAHOUT-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman resolved MAHOUT-251.
-
Resolution: Fixed
r900519 wrapped up loose ends in the patch, adding new command line arguments
If it's SF on Thursday, someone will have to have a beer as my proxy.
I'll be back here in the snow.
On Mon, Jan 18, 2010 at 12:21 PM, Jeff Eastman
wrote:
> I'm planning on attending
> Jeff
>
>
> Grant Ingersoll wrote:
>>
>> On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote:
>>
>>
>>>
>>> We should
That looks like a bug, to me... not sure where it is though...
-jake
On Mon, Jan 18, 2010 at 6:03 AM, Olivier Grisel wrote:
> Hello,
>
> I am currently testing the MAHOUT-228-3.patch applied to the current
> trunk. The merge went mostly well except a couple of duplicated chunks
> in the patchs
[
https://issues.apache.org/jira/browse/MAHOUT-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801872#action_12801872
]
Jake Mannix commented on MAHOUT-261:
Ooooh, we need this in the Vectors.
> Give the pr
I'm planning on attending
Jeff
Grant Ingersoll wrote:
On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote:
We should have a beer some time anyway and the beers we owe you for cleaning
up Colt more than cancel any potential beer on this issue so I will be happy
to buy (Sean, you are included for
2010/1/18 Robin Anil :
> could you check the logs. you will see a bigger stack trace might lead back
> to mahout classes
In the tasktracker logs I could find a more complete stacktrace (jetty
related, not sign of mahout classes) and google could pointed me to
this:
https://issues.apache.org/jir
CXF has a very different requirement profile than Mahout. People want
to plug web service clients and servers into all kinds of
environments, and get all huffy if forced to use something like Spring
or Guice. Mahout, at this point in its career, at least, probably
doesn't have this problem.
The in
My 2 cents:
I wouldn't mind making all components that are non-deterministic in
nature having their constructor explicitly pass a RNG instance
(instead of using static magic).
That can be helpful when running several versions of the same
algorithms with different hyper-parameters in separate thre
could you check the logs. you will see a bigger stack trace might lead back
to mahout classes
On Mon, Jan 18, 2010 at 9:19 PM, Olivier Grisel wrote:
> 2010/1/18 Olivier Grisel :
> > 2010/1/18 Robin Anil :
> >> could you be specific on which map/reduce job you encountered the error
> ?
> >
> > I
On Mon, Jan 18, 2010 at 10:47 AM, Drew Farris wrote:
> On Mon, Jan 18, 2010 at 10:10 AM, Sean Owen wrote:
>
>> ... can I try again to drag attention to an actual problem? the
>> repeatability issue. This injection discussion is orthogonal to it.
>
Arrrg. Could we please have a thread for repeata
2010/1/18 Olivier Grisel :
> 2010/1/18 Robin Anil :
>> could you be specific on which map/reduce job you encountered the error ?
>
> I thought it was on:
>
> hadoop jar examples/target/mahout-examples-0.3-SNAPSHOT.job
> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i
> "wikipedi
On Mon, Jan 18, 2010 at 10:10 AM, Sean Owen wrote:
> ... can I try again to drag attention to an actual problem? the
> repeatability issue. This injection discussion is orthogonal to it.
Is the repeatability issue caused by the switch to forkOnce? What
specifically is the issue we're bumping up
I think I might be done with collections. I can't work up any
enthusiasm for iterators, or java.util. decorators, and I think I have
the basic functionality all in place. There are a number of perhaps
pointless ways in which Colt diverges from Java collections,
particularly in the area of return va
[
https://issues.apache.org/jira/browse/MAHOUT-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benson Margulies updated MAHOUT-261:
Resolution: Fixed
Status: Resolved (was: Patch Available)
Done.
> Give the primit
I created this subject thread so that you could use the other one for
repeatability.
[
https://issues.apache.org/jira/browse/MAHOUT-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benson Margulies updated MAHOUT-261:
Attachment: MAHOUT-261.patch
> Give the primitive-value maps an adjustOrPutValue call, like
Give the primitive-value maps an adjustOrPutValue call, like Trove.
---
Key: MAHOUT-261
URL: https://issues.apache.org/jira/browse/MAHOUT-261
Project: Mahout
Issue Type: Improve
[
https://issues.apache.org/jira/browse/MAHOUT-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benson Margulies updated MAHOUT-261:
Status: Patch Available (was: Open)
> Give the primitive-value maps an adjustOrPutValue ca
On Mon, Jan 18, 2010 at 2:59 PM, Benson Margulies wrote:
> Doing significant work in static code blocks leads to nothing but
> trouble, as the Random situation demonstrates.
I don't know that this is the conclusion? You're critiquing one means
of implementing injection, but neither of the two pro
[
https://issues.apache.org/jira/browse/MAHOUT-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benson Margulies resolved MAHOUT-259.
-
Resolution: Fixed
Fix Version/s: 0.3
committed.
> Remove all code for Object matr
[
https://issues.apache.org/jira/browse/MAHOUT-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benson Margulies reassigned MAHOUT-259:
---
Assignee: Benson Margulies
> Remove all code for Object matrices
> --
On Mon, Jan 18, 2010 at 9:42 AM, Sean Owen wrote:
> You can punt the choice all the way up to fix that. Then regular
> callers are forced to instantiate and supply the RNG in all cases, and
> the API has Randoms all over the place, and I suppose I don't quite
> like that aesthetically.
Point tak
Doing significant work in static code blocks leads to nothing but
trouble, as the Random situation demonstrates.
I thought it would be useful to describe what CXF does to avoid this.
While CXF does use Spring, we don't require users CXF to use Spring.
Instead, we use a simpler internal organizati
On Mon, Jan 18, 2010 at 2:36 PM, Drew Farris wrote:
> I'm suggesting that the instantiator/caller of the class choose
> between a regular and test-friendly RNG. In some classes that creator
> will be a unit test in other cases the creator will be another piece
> of production code. In some cases t
On Mon, Jan 18, 2010 at 9:23 AM, Sean Owen wrote:
> You're suggesting the class choose between a regular and test-friendly
> RNG, by calling one of two methods. Doesn't that put the decision with
> the class instead of externally? Right now it's already external.
> RandomUtils decides what to inst
You're suggesting the class choose between a regular and test-friendly
RNG, by calling one of two methods. Doesn't that put the decision with
the class instead of externally? Right now it's already external.
RandomUtils decides what to instantiate.
On Mon, Jan 18, 2010 at 2:21 PM, Drew Farris wro
On Mon, Jan 18, 2010 at 9:06 AM, Sean Owen wrote:
> (Separately you could argue we're going about this all wrong, by
> trying to depend on the exact output of the RNG..
No argument here. In practice I don't think we can really get around
using a pre-seeded RNG for tests.
> You've moved around t
2010/1/18 Robin Anil :
> could you be specific on which map/reduce job you encountered the error ?
I thought it was on:
hadoop jar examples/target/mahout-examples-0.3-SNAPSHOT.job
org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i
"wikipediadump/chunk-0001.xml" -o wikipediainput-
could you be specific on which map/reduce job you encountered the error ?
On Mon, Jan 18, 2010 at 7:28 PM, Olivier Grisel wrote:
> 2010/1/18 Robin Anil :
> > Its this kind of thing that forced to move to sequence files instead of
> > TextKeyValueInput format and other text based/ csv based format
On Mon, Jan 18, 2010 at 2:00 PM, Drew Farris wrote:
> In what cases would you want to reset them all remotely, at the
> beginning of each test?
You pretty much said it -- tests should start from a known, fixed
state, so that the result is the same each time, and we can assert
about the output. Th
Hello,
I am currently testing the MAHOUT-228-3.patch applied to the current
trunk. The merge went mostly well except a couple of duplicated chunks
in the patchs (probably applied otherwise to the trunk) and a
duplicated wordlist.
However to make the tests pass I add to reduce the precision of som
On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote:
> We should have a beer some time anyway and the beers we owe you for cleaning
> up Colt more than cancel any potential beer on this issue so I will be happy
> to buy (Sean, you are included for similar reasons if we ever see each
> other).
After t
On Mon, Jan 18, 2010 at 3:58 AM, Sean Owen wrote:
> The real fix is centralizing management of Random, tracking them, and
> being able to reset them all "remotely".
In what cases would you want to reset them all remotely, at the
beginning of each test?
> It is injected already -- that's the pur
2010/1/18 Robin Anil :
> Its this kind of thing that forced to move to sequence files instead of
> TextKeyValueInput format and other text based/ csv based formats. Kind of
> regretting the decision to go with tab separated format for BayesClassifier
> which i wrote it 2 years ago. I will be modify
[
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801755#action_12801755
]
Grant Ingersoll commented on MAHOUT-153:
Please keep the same issue. That way the
[
https://issues.apache.org/jira/browse/MAHOUT-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801750#action_12801750
]
Sean Owen commented on MAHOUT-260:
--
My take is that we have injection already, via RandomU
Its this kind of thing that forced to move to sequence files instead of
TextKeyValueInput format and other text based/ csv based formats. Kind of
regretting the decision to go with tab separated format for BayesClassifier
which i wrote it 2 years ago. I will be modifying this to use sparse vectors
[
https://issues.apache.org/jira/browse/MAHOUT-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801748#action_12801748
]
Benson Margulies commented on MAHOUT-260:
-
Well,
I thought I saw email go by to th
As I troll through the code at times trying to polish here and there I
notice small issues to bring up --
Line separators. Lots of code independently reads
System.getProperty("line.separator") in order to output a platform
specific line break. I argue this is actually slightly bad, since it
means
2010/1/18 Jeff Eastman :
> Sean Owen wrote:
>>
>> Could be. I took an indirect stab at mitigating possible sources of
>> this issue by increasing encapsulation in the tests -- I still believe
>> fields should never by non-private. This may start to surface the
>> behind-the-scenes dependencies and
[
https://issues.apache.org/jira/browse/MAHOUT-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated MAHOUT-260:
-
Attachment: MAHOUT-260_wrapper.patch
This is the wrapper-related change I had in mind. I think this reall
[
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801716#action_12801716
]
Pallavi Palleti commented on MAHOUT-153:
Hi all,
I am ready with my patch. However
This just avoids the class load in the test. I don't think it is necessary.
On Mon, Jan 18, 2010 at 1:04 AM, Sean Owen (JIRA) wrote:
> I still don't understand what this solves. We already 'fixed' the
> performance issue.
>
--
Ted Dunning, CTO
DeepDyve
[
https://issues.apache.org/jira/browse/MAHOUT-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801708#action_12801708
]
Sean Owen commented on MAHOUT-260:
--
I still don't understand what this solves. We already
Same here, I don't like Spring myself as it smells like
overengineering -- certainly for this case. I'm otherwise a luddite
though and could more broadly be convinced.
On Mon, Jan 18, 2010 at 2:49 AM, Ted Dunning wrote:
> I have had too many unpleasant experiences using Spring to be enthused abou
On Mon, Jan 18, 2010 at 2:24 AM, Drew Farris wrote:
> On Sun, Jan 17, 2010 at 9:10 PM, Sean Owen wrote:
>> There are already cases where code needs to control the seed (mostly
>> to serialize/deserialize the exact state of an object). I don't think
>> that's the issue per se? The issue is when an
Yeah this was my change that didn't work:
public class DummyOutputCollector
public class DummyOutputCollector, V
extends Writable>
The latter is more correct and as far as I know identical. I don't see
why this doesn't work, but I undid that.
I also don't understand why I had to 'mvn clean test
67 matches
Mail list logo