Re: build failure

2010-01-18 Thread Sean Owen
Yeah this was my change that didn't work: public class DummyOutputCollectorK extends WritableComparable, V extends Writable public class DummyOutputCollectorK extends WritableComparable?, V extends Writable The latter is more correct and as far as I know identical. I don't see why this doesn't

Re: Unit test lag?

2010-01-18 Thread Sean Owen
On Mon, Jan 18, 2010 at 2:24 AM, Drew Farris drew.far...@gmail.com wrote: On Sun, Jan 17, 2010 at 9:10 PM, Sean Owen sro...@gmail.com wrote: There are already cases where code needs to control the seed (mostly to serialize/deserialize the exact state of an object). I don't think that's the

Re: Unit test lag?

2010-01-18 Thread Sean Owen
Same here, I don't like Spring myself as it smells like overengineering -- certainly for this case. I'm otherwise a luddite though and could more broadly be convinced. On Mon, Jan 18, 2010 at 2:49 AM, Ted Dunning ted.dunn...@gmail.com wrote: I have had too many unpleasant experiences using

[jira] Commented: (MAHOUT-260) An alternative approach to RNG management

2010-01-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801708#action_12801708 ] Sean Owen commented on MAHOUT-260: -- I still don't understand what this solves. We already

Re: [jira] Commented: (MAHOUT-260) An alternative approach to RNG management

2010-01-18 Thread Ted Dunning
This just avoids the class load in the test. I don't think it is necessary. On Mon, Jan 18, 2010 at 1:04 AM, Sean Owen (JIRA) j...@apache.org wrote: I still don't understand what this solves. We already 'fixed' the performance issue. -- Ted Dunning, CTO DeepDyve

[jira] Commented: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans

2010-01-18 Thread Pallavi Palleti (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801716#action_12801716 ] Pallavi Palleti commented on MAHOUT-153: Hi all, I am ready with my patch.

Re: Eclipse and Maven Don't Agree

2010-01-18 Thread Olivier Grisel
2010/1/18 Jeff Eastman j...@windwardsolutions.com: Sean Owen wrote: Could be. I took an indirect stab at mitigating possible sources of this issue by increasing encapsulation in the tests -- I still believe fields should never by non-private. This may start to surface the behind-the-scenes

Random thought: line separators

2010-01-18 Thread Sean Owen
As I troll through the code at times trying to polish here and there I notice small issues to bring up -- Line separators. Lots of code independently reads System.getProperty(line.separator) in order to output a platform specific line break. I argue this is actually slightly bad, since it means

[jira] Commented: (MAHOUT-260) An alternative approach to RNG management

2010-01-18 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801748#action_12801748 ] Benson Margulies commented on MAHOUT-260: - Well, I thought I saw email go by to

Re: Random thought: line separators

2010-01-18 Thread Robin Anil
Its this kind of thing that forced to move to sequence files instead of TextKeyValueInput format and other text based/ csv based formats. Kind of regretting the decision to go with tab separated format for BayesClassifier which i wrote it 2 years ago. I will be modifying this to use sparse vectors

[jira] Commented: (MAHOUT-260) An alternative approach to RNG management

2010-01-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801750#action_12801750 ] Sean Owen commented on MAHOUT-260: -- My take is that we have injection already, via

[jira] Commented: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans

2010-01-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801755#action_12801755 ] Grant Ingersoll commented on MAHOUT-153: Please keep the same issue. That way the

Re: Unit test lag?

2010-01-18 Thread Drew Farris
On Mon, Jan 18, 2010 at 3:58 AM, Sean Owen sro...@gmail.com wrote: The real fix is centralizing management of Random, tracking them, and being able to reset them all remotely. In what cases would you want to reset them all remotely, at the beginning of each test? It is injected already --

Re: Unit test lag?

2010-01-18 Thread Grant Ingersoll
On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote: We should have a beer some time anyway and the beers we owe you for cleaning up Colt more than cancel any potential beer on this issue so I will be happy to buy (Sean, you are included for similar reasons if we ever see each other). After the

Low precision in MAHOUT-228 tests (online logistic regression)

2010-01-18 Thread Olivier Grisel
Hello, I am currently testing the MAHOUT-228-3.patch applied to the current trunk. The merge went mostly well except a couple of duplicated chunks in the patchs (probably applied otherwise to the trunk) and a duplicated wordlist. However to make the tests pass I add to reduce the precision of

Re: Unit test lag?

2010-01-18 Thread Sean Owen
On Mon, Jan 18, 2010 at 2:00 PM, Drew Farris drew.far...@gmail.com wrote: In what cases would you want to reset them all remotely, at the beginning of each test? You pretty much said it -- tests should start from a known, fixed state, so that the result is the same each time, and we can assert

Re: Random thought: line separators

2010-01-18 Thread Robin Anil
could you be specific on which map/reduce job you encountered the error ? On Mon, Jan 18, 2010 at 7:28 PM, Olivier Grisel olivier.gri...@ensta.orgwrote: 2010/1/18 Robin Anil robin.a...@gmail.com: Its this kind of thing that forced to move to sequence files instead of TextKeyValueInput

Re: Random thought: line separators

2010-01-18 Thread Olivier Grisel
2010/1/18 Robin Anil robin.a...@gmail.com: could you be specific on which map/reduce job you encountered the error ? I thought it was on: hadoop jar examples/target/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i wikipediadump/chunk-0001.xml

Re: Unit test lag?

2010-01-18 Thread Drew Farris
On Mon, Jan 18, 2010 at 9:06 AM, Sean Owen sro...@gmail.com wrote: (Separately you could argue we're going about this all wrong, by trying to depend on the exact output of the RNG.. No argument here. In practice I don't think we can really get around using a pre-seeded RNG for tests. You've

Re: Unit test lag?

2010-01-18 Thread Sean Owen
You're suggesting the class choose between a regular and test-friendly RNG, by calling one of two methods. Doesn't that put the decision with the class instead of externally? Right now it's already external. RandomUtils decides what to instantiate. On Mon, Jan 18, 2010 at 2:21 PM, Drew Farris

Re: Unit test lag?

2010-01-18 Thread Drew Farris
On Mon, Jan 18, 2010 at 9:23 AM, Sean Owen sro...@gmail.com wrote: You're suggesting the class choose between a regular and test-friendly RNG, by calling one of two methods. Doesn't that put the decision with the class instead of externally? Right now it's already external. RandomUtils decides

Re: Unit test lag?

2010-01-18 Thread Sean Owen
On Mon, Jan 18, 2010 at 2:36 PM, Drew Farris drew.far...@gmail.com wrote: I'm suggesting that the instantiator/caller of the class choose between a regular and test-friendly RNG. In some classes that creator will be a unit test in other cases the creator will be another piece of production

Re: Unit test lag?

2010-01-18 Thread Drew Farris
On Mon, Jan 18, 2010 at 9:42 AM, Sean Owen sro...@gmail.com wrote: You can punt the choice all the way up to fix that. Then regular callers are forced to instantiate and supply the RNG in all cases, and the API has Randoms all over the place, and I suppose I don't quite like that

[jira] Resolved: (MAHOUT-259) Remove all code for Object matrices

2010-01-18 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies resolved MAHOUT-259. - Resolution: Fixed Fix Version/s: 0.3 committed. Remove all code for Object

Re: Status, IoC, Random numbers, etc.

2010-01-18 Thread Sean Owen
On Mon, Jan 18, 2010 at 2:59 PM, Benson Margulies bimargul...@gmail.com wrote: Doing significant work in static code blocks leads to nothing but trouble, as the Random situation demonstrates. I don't know that this is the conclusion? You're critiquing one means of implementing injection, but

[jira] Updated: (MAHOUT-261) Give the primitive-value maps an adjustOrPutValue call, like Trove.

2010-01-18 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-261: Status: Patch Available (was: Open) Give the primitive-value maps an adjustOrPutValue

[jira] Created: (MAHOUT-261) Give the primitive-value maps an adjustOrPutValue call, like Trove.

2010-01-18 Thread Benson Margulies (JIRA)
Give the primitive-value maps an adjustOrPutValue call, like Trove. --- Key: MAHOUT-261 URL: https://issues.apache.org/jira/browse/MAHOUT-261 Project: Mahout Issue Type:

[jira] Updated: (MAHOUT-261) Give the primitive-value maps an adjustOrPutValue call, like Trove.

2010-01-18 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-261: Attachment: MAHOUT-261.patch Give the primitive-value maps an adjustOrPutValue call, like

Re: Status, IoC, Random numbers, etc.

2010-01-18 Thread Benson Margulies
I created this subject thread so that you could use the other one for repeatability.

[jira] Updated: (MAHOUT-261) Give the primitive-value maps an adjustOrPutValue call, like Trove.

2010-01-18 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-261: Resolution: Fixed Status: Resolved (was: Patch Available) Done. Give the

[math] collections cooked?

2010-01-18 Thread Benson Margulies
I think I might be done with collections. I can't work up any enthusiasm for iterators, or java.util. decorators, and I think I have the basic functionality all in place. There are a number of perhaps pointless ways in which Colt diverges from Java collections, particularly in the area of return

Re: Status, IoC, Random numbers, etc.

2010-01-18 Thread Drew Farris
On Mon, Jan 18, 2010 at 10:10 AM, Sean Owen sro...@gmail.com wrote: ... can I try again to drag attention to an actual problem? the repeatability issue. This injection discussion is orthogonal to it. Is the repeatability issue caused by the switch to forkOnce? What specifically is the issue

Re: Status, IoC, Random numbers, etc.

2010-01-18 Thread Benson Margulies
On Mon, Jan 18, 2010 at 10:47 AM, Drew Farris drew.far...@gmail.com wrote: On Mon, Jan 18, 2010 at 10:10 AM, Sean Owen sro...@gmail.com wrote: ... can I try again to drag attention to an actual problem? the repeatability issue. This injection discussion is orthogonal to it. Arrrg. Could we

Re: Random thought: line separators

2010-01-18 Thread Robin Anil
could you check the logs. you will see a bigger stack trace might lead back to mahout classes On Mon, Jan 18, 2010 at 9:19 PM, Olivier Grisel olivier.gri...@ensta.orgwrote: 2010/1/18 Olivier Grisel olivier.gri...@ensta.org: 2010/1/18 Robin Anil robin.a...@gmail.com: could you be specific

Re: Status, IoC, Random numbers, etc.

2010-01-18 Thread Olivier Grisel
My 2 cents: I wouldn't mind making all components that are non-deterministic in nature having their constructor explicitly pass a RNG instance (instead of using static magic). That can be helpful when running several versions of the same algorithms with different hyper-parameters in separate

Re: Status, IoC, Random numbers, etc.

2010-01-18 Thread Benson Margulies
CXF has a very different requirement profile than Mahout. People want to plug web service clients and servers into all kinds of environments, and get all huffy if forced to use something like Spring or Guice. Mahout, at this point in its career, at least, probably doesn't have this problem. The

Re: Random thought: line separators

2010-01-18 Thread Olivier Grisel
2010/1/18 Robin Anil robin.a...@gmail.com: could you check the logs. you will see a bigger stack trace might lead back to mahout classes In the tasktracker logs I could find a more complete stacktrace (jetty related, not sign of mahout classes) and google could pointed me to this:

Re: Unit test lag?

2010-01-18 Thread Jeff Eastman
I'm planning on attending Jeff Grant Ingersoll wrote: On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote: We should have a beer some time anyway and the beers we owe you for cleaning up Colt more than cancel any potential beer on this issue so I will be happy to buy (Sean, you are included

[jira] Commented: (MAHOUT-261) Give the primitive-value maps an adjustOrPutValue call, like Trove.

2010-01-18 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801872#action_12801872 ] Jake Mannix commented on MAHOUT-261: Ooooh, we need this in the Vectors. Give the

Re: Low precision in MAHOUT-228 tests (online logistic regression)

2010-01-18 Thread Jake Mannix
That looks like a bug, to me... not sure where it is though... -jake On Mon, Jan 18, 2010 at 6:03 AM, Olivier Grisel olivier.gri...@ensta.orgwrote: Hello, I am currently testing the MAHOUT-228-3.patch applied to the current trunk. The merge went mostly well except a couple of duplicated

Re: Unit test lag?

2010-01-18 Thread Benson Margulies
If it's SF on Thursday, someone will have to have a beer as my proxy. I'll be back here in the snow. On Mon, Jan 18, 2010 at 12:21 PM, Jeff Eastman j...@windwardsolutions.com wrote: I'm planning on attending Jeff Grant Ingersoll wrote: On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote: We

[jira] Resolved: (MAHOUT-251) Generalize Dirichlet models and model distributions to handle n-d and sparse vectors

2010-01-18 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Eastman resolved MAHOUT-251. - Resolution: Fixed r900519 wrapped up loose ends in the patch, adding new command line arguments

Re: Status, IoC, Random numbers, etc.

2010-01-18 Thread Ted Dunning
I am going to address IoC issues only on this thread. The other repeatability issues should be address, but on the other thread. On Mon, Jan 18, 2010 at 7:10 AM, Sean Owen sro...@gmail.com wrote: I am not especially in favor of my own Random patch. If people are willing to run in

Re: Low precision in MAHOUT-228 tests (online logistic regression)

2010-01-18 Thread Ted Dunning
These bounds were too tight in any case. I had to loosen other bounds during development and should have loosened these as well. Your change is a good one. On Mon, Jan 18, 2010 at 6:03 AM, Olivier Grisel olivier.gri...@ensta.orgwrote: Is this a consequence of the recent

Re: Low precision in MAHOUT-228 tests (online logistic regression)

2010-01-18 Thread Olivier Grisel
2010/1/18 Ted Dunning ted.dunn...@gmail.com: These bounds were too tight in any case.  I had to loosen other bounds during development and should have loosened these as well. Your change is a good one. Great! so here is the sequel: I have written a real training convergence test and

Re: Low precision in MAHOUT-228 tests (online logistic regression)

2010-01-18 Thread Ted Dunning
THANK YOU. I have been very grumpy that I couldn't get to doing this yet. I will coordinate closely with you. I haven't used git yet in anger so it will be a learning experience. Don't expect me to have time, though. ( I will try ... but expect not to find a hole ) On Mon, Jan 18, 2010 at

[jira] Commented: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans

2010-01-18 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801960#action_12801960 ] Ted Dunning commented on MAHOUT-153: +1 to what Grant said. Go ahead and post a patch

Re: Unit test lag?

2010-01-18 Thread Ted Dunning
I'll be there. Sean, are you really going to be there? That would be fantastic. On Mon, Jan 18, 2010 at 6:02 AM, Grant Ingersoll gsing...@apache.orgwrote: On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote: We should have a beer some time anyway and the beers we owe you for cleaning up

Re: Status, IoC, Random numbers, etc.

2010-01-18 Thread Jake Mannix
For the past... 5 years? I've been using Spring as a DI container at every job I've had. At LinkedIn, in fact we have extended Spring extensively (see here: http://www.springsource.com/files/SpringAtLinkedIn.pdf for some details). It's incredibly powerful, and while the config files can be

Re: Unit test lag?

2010-01-18 Thread Jake Mannix
Hmm, if all you guys are going to be there, I may need to push back my flight - I'm scheduled to fly *out* of SFO right around the time of the Meetup, but if I can push back that flight, I will. -jake On Mon, Jan 18, 2010 at 1:24 PM, Ted Dunning ted.dunn...@gmail.com wrote: I'll be there.

FYI: Workshop on Machine Learning Open Source Software 2010

2010-01-18 Thread Markus Weimer
Hi, mloss.org will be hosting the workshop on Machine Learning Open Source Software at the International Conference on Machine Learning (MLOSS '10), following similar workshops at NIPS. I believe it would be a great venue to not only present mahout but also to get in touch with other MLOSS

Re: Low precision in MAHOUT-228 tests (online logistic regression)

2010-01-18 Thread Ted Dunning
On Mon, Jan 18, 2010 at 3:20 PM, Olivier Grisel olivier.gri...@ensta.orgwrote: In the mean time could you please give me a hint on how to value the probes of the binary randomizer w.r.t. the window size? The basic trade-off is the standard hashed learning trade-off between number of training

Post Meetup Meetup was Re: Unit test lag?

2010-01-18 Thread Grant Ingersoll
On Jan 18, 2010, at 12:34 PM, Benson Margulies wrote: If it's SF on Thursday, someone will have to have a beer as my proxy. I volunteer ;-) Sounds like a we have a post meetup meetup brewing. I'm not familiar with the area, anyone know where we can go afterwards? Also, I'll need a ride

Re: FYI: Workshop on Machine Learning Open Source Software 2010

2010-01-18 Thread Ted Dunning
I would love to, but there is no chance I could make it that far. On Mon, Jan 18, 2010 at 2:32 PM, Markus Weimer mar...@weimo.de wrote: Hi, mloss.org will be hosting the workshop on Machine Learning Open Source Software at the International Conference on Machine Learning (MLOSS '10),

Re: Post Meetup Meetup was Re: Unit test lag?

2010-01-18 Thread Ted Dunning
On Mon, Jan 18, 2010 at 4:46 PM, Grant Ingersoll gsing...@apache.orgwrote: On Jan 18, 2010, at 12:34 PM, Benson Margulies wrote: If it's SF on Thursday, someone will have to have a beer as my proxy. I volunteer ;-) You're on. Sounds like a we have a post meetup meetup brewing. I'm

Re: [jira] Commented: (MAHOUT-238) Further Dependency Cleanup

2010-01-18 Thread zhao zhendong
Hi Drew, Including a source code in snapshots that will be great. Currently, the HDFS reader does not work in 0.20.2. Without source code, it's not convenient for me to debug the code. Cheers, Zhendong On Sat, Jan 9, 2010 at 12:25 AM, Drew Farris drew.far...@gmail.com wrote: I wonder if we

[jira] Commented: (MAHOUT-228) Need sequential logistic regression implementation using SGD techniques

2010-01-18 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802106#action_12802106 ] Ted Dunning commented on MAHOUT-228: {quote} make sure that L1 is sparsity inducing my

[jira] Updated: (MAHOUT-232) Implementation of sequential SVM solver based on Pegasos

2010-01-18 Thread zhao zhendong (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhao zhendong updated MAHOUT-232: - Attachment: SequentialSVM_0.4.patch 1) Supporting sequential multi-classification (both

[jira] Updated: (MAHOUT-232) Implementation of sequential SVM solver based on Pegasos

2010-01-18 Thread zhao zhendong (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhao zhendong updated MAHOUT-232: - Description: After discussed with guys in this community, I decided to re-implement a