On Mon, Jan 18, 2010 at 4:46 PM, Grant Ingersoll wrote:
>
> On Jan 18, 2010, at 12:34 PM, Benson Margulies wrote:
>
> > If it's SF on Thursday, someone will have to have a beer as my proxy.
>
> I volunteer ;-)
>
You're on.
> Sounds like a we have a post meetup meetup brewing. I'm not familiar
On Jan 18, 2010, at 12:34 PM, Benson Margulies wrote:
> If it's SF on Thursday, someone will have to have a beer as my proxy.
I volunteer ;-)
Sounds like a we have a post meetup meetup brewing. I'm not familiar with the
area, anyone know where we can go afterwards? Also, I'll need a ride bac
Hmm, if all you guys are going to be there, I may need to push back my
flight -
I'm scheduled to fly *out* of SFO right around the time of the Meetup, but
if I can push back that flight, I will.
-jake
On Mon, Jan 18, 2010 at 1:24 PM, Ted Dunning wrote:
> I'll be there.
>
> Sean, are you reall
Yes, I'm on the west coast for a week from tomorrow for various
reasons and so will certainly stop in. Looking forward to it.
Sean
On Mon, Jan 18, 2010 at 9:24 PM, Ted Dunning wrote:
> I'll be there.
>
> Sean, are you really going to be there? That would be fantastic.
I'll be there.
Sean, are you really going to be there? That would be fantastic.
On Mon, Jan 18, 2010 at 6:02 AM, Grant Ingersoll wrote:
>
> On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote:
>
> > We should have a beer some time anyway and the beers we owe you for
> cleaning
> > up Colt more than
If it's SF on Thursday, someone will have to have a beer as my proxy.
I'll be back here in the snow.
On Mon, Jan 18, 2010 at 12:21 PM, Jeff Eastman
wrote:
> I'm planning on attending
> Jeff
>
>
> Grant Ingersoll wrote:
>>
>> On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote:
>>
>>
>>>
>>> We should
I'm planning on attending
Jeff
Grant Ingersoll wrote:
On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote:
We should have a beer some time anyway and the beers we owe you for cleaning
up Colt more than cancel any potential beer on this issue so I will be happy
to buy (Sean, you are included for
On Mon, Jan 18, 2010 at 9:42 AM, Sean Owen wrote:
> You can punt the choice all the way up to fix that. Then regular
> callers are forced to instantiate and supply the RNG in all cases, and
> the API has Randoms all over the place, and I suppose I don't quite
> like that aesthetically.
Point tak
On Mon, Jan 18, 2010 at 2:36 PM, Drew Farris wrote:
> I'm suggesting that the instantiator/caller of the class choose
> between a regular and test-friendly RNG. In some classes that creator
> will be a unit test in other cases the creator will be another piece
> of production code. In some cases t
On Mon, Jan 18, 2010 at 9:23 AM, Sean Owen wrote:
> You're suggesting the class choose between a regular and test-friendly
> RNG, by calling one of two methods. Doesn't that put the decision with
> the class instead of externally? Right now it's already external.
> RandomUtils decides what to inst
You're suggesting the class choose between a regular and test-friendly
RNG, by calling one of two methods. Doesn't that put the decision with
the class instead of externally? Right now it's already external.
RandomUtils decides what to instantiate.
On Mon, Jan 18, 2010 at 2:21 PM, Drew Farris wro
On Mon, Jan 18, 2010 at 9:06 AM, Sean Owen wrote:
> (Separately you could argue we're going about this all wrong, by
> trying to depend on the exact output of the RNG..
No argument here. In practice I don't think we can really get around
using a pre-seeded RNG for tests.
> You've moved around t
On Mon, Jan 18, 2010 at 2:00 PM, Drew Farris wrote:
> In what cases would you want to reset them all remotely, at the
> beginning of each test?
You pretty much said it -- tests should start from a known, fixed
state, so that the result is the same each time, and we can assert
about the output. Th
On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote:
> We should have a beer some time anyway and the beers we owe you for cleaning
> up Colt more than cancel any potential beer on this issue so I will be happy
> to buy (Sean, you are included for similar reasons if we ever see each
> other).
After t
On Mon, Jan 18, 2010 at 3:58 AM, Sean Owen wrote:
> The real fix is centralizing management of Random, tracking them, and
> being able to reset them all "remotely".
In what cases would you want to reset them all remotely, at the
beginning of each test?
> It is injected already -- that's the pur
Same here, I don't like Spring myself as it smells like
overengineering -- certainly for this case. I'm otherwise a luddite
though and could more broadly be convinced.
On Mon, Jan 18, 2010 at 2:49 AM, Ted Dunning wrote:
> I have had too many unpleasant experiences using Spring to be enthused abou
On Mon, Jan 18, 2010 at 2:24 AM, Drew Farris wrote:
> On Sun, Jan 17, 2010 at 9:10 PM, Sean Owen wrote:
>> There are already cases where code needs to control the seed (mostly
>> to serialize/deserialize the exact state of an object). I don't think
>> that's the issue per se? The issue is when an
The Guice user guide is also very good at describing the benefits of
injection.
http://code.google.com/docreader/#p=google-guice&s=google-guice&t=Motivation
I also like the level of complexity that Guice introduces (nearly zero). My
major problem with Spring is that it introduces and mixes a bun
I prefer the injection method as well.
On Sun, Jan 17, 2010 at 7:51 PM, Drew Farris wrote:
> > If we want to go in Drew's suggested direction, we have to decide what
> > to do about seeds. We either need to define an
> > 'RandomNumberGeneratorFactory' interface which takes seeds and return
> > g
On Sun, Jan 17, 2010 at 10:31 PM, Benson Margulies
wrote:
> Have a look at the patch I posted to MAHOUT-260. It ducks the
> injection question for now.
This looks reasonable.
> However, what's perhaps most interesting is that it makes tests fail!
> Some tests get different answers with the stock
Have a look at the patch I posted to MAHOUT-260. It ducks the
injection question for now.
However, what's perhaps most interesting is that it makes tests fail!
Some tests get different answers with the stock JDK rng.
If we want to go in Drew's suggested direction, we have to decide what
to do abo
I've used spring a great deal as well and generally look pretty
favorably upon it, but readily admit there are definite cons to it to.
However, we can support the concept of injection without having to
commit to using one framework or another. Every class is instantiated
somewhere, so manual injec
OK, then the class name appeals to me. I'll propose a patch.
On Sun, Jan 17, 2010 at 9:49 PM, Ted Dunning wrote:
> I have had too many unpleasant experiences using Spring to be enthused about
> jumping fully into it for this one use case.
>
> On Sun, Jan 17, 2010 at 6:35 PM, Benson Margulies
> w
I have had too many unpleasant experiences using Spring to be enthused about
jumping fully into it for this one use case.
On Sun, Jan 17, 2010 at 6:35 PM, Benson Margulies wrote:
> One moral equivalent of Spring is a String property with a
> fully-qualified class name which RandomUtils instantiat
One moral equivalent of Spring is a String property with a
fully-qualified class name which RandomUtils instantiates to get its
RNG. Another is to actually inject the RNG object. Spring would get
really tempting here.
I've had an extended immersion in Spring via CXF, so I have a low
threshold for
On Sun, Jan 17, 2010 at 9:10 PM, Sean Owen wrote:
> There are already cases where code needs to control the seed (mostly
> to serialize/deserialize the exact state of an object). I don't think
> that's the issue per se? The issue is when an RNG lives beyond one
> test, and there are legitimate rea
On Sun, Jan 17, 2010 at 6:10 PM, Sean Owen wrote:
> There are already cases where code needs to control the seed (mostly
> to serialize/deserialize the exact state of an object).
>
That is an important case, but it should be deterministic and thus not a
problem for testing. Really the RNG is be
This could be my fault though my tests are passing. Let me look.
On Jan 18, 2010 2:15 AM, "Drew Farris" wrote:
Spoke too soon of course, some tests fail strangely locally:
/u01/eclipse/eclipse-mahout-workspace/mahout-svn/core/src/test/java/org/apache/mahout/ga/watchmaker/EvalMapperTest.java:[48
Spoke too soon of course, some tests fail strangely locally:
/u01/eclipse/eclipse-mahout-workspace/mahout-svn/core/src/test/java/org/apache/mahout/ga/watchmaker/EvalMapperTest.java:[48,25]
type parameter org.apache.hadoop.io.LongWritable is not within its
bound
/u01/eclipse/eclipse-mahout-workspa
There are already cases where code needs to control the seed (mostly
to serialize/deserialize the exact state of an object). I don't think
that's the issue per se? The issue is when an RNG lives beyond one
test, and there are legitimate reasons that may be so.
I don't see how a getTestRandom() met
Ted, It depends on the test implementation itself. Generally, I
believe the pattern that is followed is:
RandomUtils.useTestSeed();
Random r = RandomUtils.getRandom();
The potential issue I see is if any tests expected to run using a seed
>other< than the test seed. Now that we are no longer fork
I can imagine ways to nuke the problem as well.
On Sun, Jan 17, 2010 at 5:46 PM, Sean Owen wrote:
> I can imagine some semi-elaborate ways to actually explicitly manage
> and address this with a wrapper class.
>
--
Ted Dunning, CTO
DeepDyve
Not quite, and you have a good point. Each instance of an RNG is
seeded identically when testing. But if something holds an RNG open
across tests, it won't be reset somehow. I could imagine that if
there's a static RNG somewhere in a class, which would be reasonable.
(Or if a test isn't quite using
Do the RandomUtils reset the seed for every test as desired?
On Sun, Jan 17, 2010 at 5:38 PM, Drew Farris wrote:
> On Sun, Jan 17, 2010 at 2:55 PM, Sean Owen wrote:
> > Am I right that running tests in 1 JVM instead of n JVMs helps
> > mitigate this? because I just committed that change.
> >
>
And I think that we need to be robust in the face of either behavior. It
should be fine to initialize once.
On Sun, Jan 17, 2010 at 5:36 PM, Sean Owen wrote:
> I think you are right in that JVMs are allowed to wait until first use
> to load a class, but the one time I checked the Sun JVM it did
On Sun, Jan 17, 2010 at 2:55 PM, Sean Owen wrote:
> Am I right that running tests in 1 JVM instead of n JVMs helps
> mitigate this? because I just committed that change.
>
I just updated to HEAD, and this seems to have fixed the problem. Unit
tests are completing in times in-line with those repor
I think you are right in that JVMs are allowed to wait until first use
to load a class, but the one time I checked the Sun JVM it didn't work
that way. It actively loaded the class (which is also allowed). I
would bet dollars to donuts we'd find it doesn't wait.
On Mon, Jan 18, 2010 at 1:22 AM,
We should have a beer some time anyway and the beers we owe you for cleaning
up Colt more than cancel any potential beer on this issue so I will be happy
to buy (Sean, you are included for similar reasons if we ever see each
other).
Does the difference here matter? If we have zero or one class lo
No. We won't. The JDK RNG is fine for pretty much everything we do. I
agree that we should use a better generator for production use, but for
deterministic tests, there isn't an issue.
And frankly, I try to use algorithms are robust about the generator they
use. Some applications are really go
Sean, that's not how class loaders work AFAIK. the mere presence of an
import does not trigger the load. You have to touch it.
HOWEVER, if I am wrong, I will (a) buy the beer, and (b) add the
reflective code to get rid of the import.
On Sun, Jan 17, 2010 at 7:26 PM, Sean Owen wrote:
> Nope, sinc
Nope, since it imports MersenneTwisterRNG, that class will be
initialized the moment RandomUtils is loaded.
On Mon, Jan 18, 2010 at 12:19 AM, Benson Margulies
wrote:
> That would make a difference. If the code in RandomUtils never new's
> the Mersenne class, then it's static blocks would never ru
That would make a difference. If the code in RandomUtils never new's
the Mersenne class, then it's static blocks would never run. If
necessary, the Mersenne class could by loaded explicitly, but I don't
think we have to go that far.
So the question to me is whether we lose any test quality by usin
It sounds like the slow code gets triggered at class-loading time, so
no I don't think this would make a difference. But with the change I
committed we should only have one class loader in play, I think.
On Mon, Jan 18, 2010 at 12:00 AM, Benson Margulies
wrote:
> What if we used the plain old JDK
What if we used the plain old JDK rng when in test mode?
On Sun, Jan 17, 2010 at 3:16 PM, Olivier Grisel
wrote:
> 2010/1/17 Sean Owen :
>> Am I right that running tests in 1 JVM instead of n JVMs helps
>> mitigate this? because I just committed that change.
>
> I have the feeling it helps yes. I
2010/1/17 Sean Owen :
> Am I right that running tests in 1 JVM instead of n JVMs helps
> mitigate this? because I just committed that change.
I have the feeling it helps yes. I haven't timed the tests though.
--
Olivier
http://twitter.com/ogrisel - http://code.oliviergrisel.name
This is a way of saying "I don't know".
On Sun, Jan 17, 2010 at 12:02 PM, Ted Dunning wrote:
> That might help if the random class is loaded only once.
>
> If the different tests each use a new class loader (seems unlikely) then
> the static stuff will be executed multiply and the problem will b
That might help if the random class is loaded only once.
If the different tests each use a new class loader (seems unlikely) then the
static stuff will be executed multiply and the problem will be retained.
On Sun, Jan 17, 2010 at 11:55 AM, Sean Owen wrote:
> Am I right that running tests in 1
Am I right that running tests in 1 JVM instead of n JVMs helps
mitigate this? because I just committed that change.
On Sun, Jan 17, 2010 at 7:49 PM, Ted Dunning wrote:
> It doesn't affect the random numbers being generated.
>
> But it does eat bits of entropy from /dev/random. That can then get
It doesn't affect the random numbers being generated.
But it does eat bits of entropy from /dev/random. That can then get starved
and block until more entropy is derived. Since the reading is done in a
static block instead of on construction, the cost can't be avoided.
On Sun, Jan 17, 2010 at 4
2010/1/17 Drew Farris :
> Olivier,
>
> If you are still interested in trying to debug these, you could
> configure the surefire-plugin to use the options for opening up a port
> for remote debugging when it forks off the java process.
>
> see:
> http://maven.apache.org/plugins/maven-surefire-plugi
Olivier,
If you are still interested in trying to debug these, you could
configure the surefire-plugin to use the options for opening up a port
for remote debugging when it forks off the java process.
see:
http://maven.apache.org/plugins/maven-surefire-plugin/examples/debugging.html
The example
On Sun, Jan 17, 2010 at 1:36 PM, Drew Farris wrote:
> Using a fixed seed doesn't solve the problem due to the way
> SecureRandomSeedGenerator is loaded by MerseneTwisterRNG
OK yeah I understand now. I thought this thread was addressing the
determinism issue, but you're talking about performance.
Ok I have found three non deterministic tests so far that actually
consume entropy by calling generateSeed:
TransactionTreeTest
CacheTest
AverageAbsoluteDifferenceRecommenderEvaluatorTest
But using eclipse is not really helpful since I am forced to set the
forkMode to "never" to make my debugger
The real problem I originally brought up was that the unit tests were
horribly slow due to blocking on /dev/random.
On Sun, Jan 17, 2010 at 8:21 AM, Sean Owen wrote:
> I think I must be missing something --
>
> We don't use SecureRandom directly, so what would these effects have
> to do with slow
I'm sorry I really think I'm off on my own planet. What issue are you
trying to solve? Performance, or deterministic tests? I'm concerned
with the latter and still do not understand what this has to do with
it.
On Sun, Jan 17, 2010 at 1:31 PM, Olivier Grisel
wrote:
> 2010/1/17 Sean Owen :
>> I th
2010/1/17 Sean Owen :
> I think I must be missing something --
>
> We don't use SecureRandom directly, so what would these effects have
> to do with slow unit tests in our project?
Classloading MersenneTwisterRNG in turn class loads
DefaultSeedGenerator which has the following static block:
pri
I think I must be missing something --
We don't use SecureRandom directly, so what would these effects have
to do with slow unit tests in our project?
And also am I right that, if we use our own seed in
MersenneTwisterRNG, we still get deterministic behavior?
I'm going to change all our tests to
2010/1/17 Benson Margulies :
> On Sun, Jan 17, 2010 at 7:31 AM, Sean Owen wrote:
>> But does that affect code which instantiates a MersenneTwisterRNG with
>> its own seed?
>
> That's what it looked like to me, but I may have been depending on
> Olivier's analysis.
I confirm that the first call to
On Sun, Jan 17, 2010 at 7:31 AM, Sean Owen wrote:
> But does that affect code which instantiates a MersenneTwisterRNG with
> its own seed?
That's what it looked like to me, but I may have been depending on
Olivier's analysis.
>
> On Sun, Jan 17, 2010 at 12:24 PM, Benson Margulies
> wrote:
>>>
2010/1/17 Benson Margulies :
>> I don't know of any further issues with MersenneTwisterRNG though --
>> what's the issue? Don't care what it does with /dev/random as long as
>> in test mode we are seeding it with the same seed, and that's what
>>
>
> Olivier and I found the Mersenne code touching t
But does that affect code which instantiates a MersenneTwisterRNG with
its own seed?
On Sun, Jan 17, 2010 at 12:24 PM, Benson Margulies
wrote:
>> I don't know of any further issues with MersenneTwisterRNG though --
>> what's the issue? Don't care what it does with /dev/random as long as
>> in tes
> I don't know of any further issues with MersenneTwisterRNG though --
> what's the issue? Don't care what it does with /dev/random as long as
> in test mode we are seeding it with the same seed, and that's what
>
Olivier and I found the Mersenne code touching the
SecureRandomNumberGenerator, whic
Not sure what's going on or why that revision would have anything to
do with the slowdown... the only thing of substance it did was
actually let the SamplingIterator test run but it doesn't take long.
I agree with not forking a JVM per test, so will make that change.
Also, yes, we need tests to b
removing the maven repository does not solve the problem, neither a
fresh checkout of the trunk.
but older revisions don't show any slowdown!!! I tried the following revisions:
Those old revisions seem Ok:
r896946 | srowen |
I'm getting similar slowdowns with my VirtualBox Ubuntu 9.04
I'm suspecting that the problem is not -only- caused by RandomUtils because:
1. I'm familiar with MerseneTwisterRNG slowdowns (I use it a lot) but
the test time used to be reported accurately by maven. Now maven
reports that a test took
>>
> Unit tests should generally be using a fixed seed and not need to load a
> secure seed from dev/random. I would say that RandomUtils is probably the
> problem here. The secure seed should be loaded lazily only if the test seed
> is not in use.
The problem, as I see it, is that the uncommons
On Sat, Jan 16, 2010 at 1:40 PM, Drew Farris wrote:
> Mahout does per-test forking, which means we're forking off a new JVM
> for each unit text execution, this adds overhead to tests that takes
> 0.2s to complete. Is per-test forking strictly needed?
>
It shouldn't be. I would count it a bug i
Some tests are probably not calling:
RandomUtils.useTestSeed();
in a setUp() or static init. Maybe a mixin class MahoutTestCase base
class with a default static init that calls it would do.
Otherwise, I confirm that setting forkModel to "once" in maven/pom.xml
solves the issue (and all tests
Oh, I see. We have to give up on the MerseneTwisterRNG in tests and
just use the JRE. Is that OK?
On Sat, Jan 16, 2010 at 5:44 PM, Olivier Grisel
wrote:
> 2010/1/16 Drew Farris :
>> On Sat, Jan 16, 2010 at 4:42 PM, Benson Margulies
>> wrote:
>>> . Running through strace showed
that somethi
I see a way, but it involves loading this class explicitly with reflection.
I'll make a patch.
2010/1/16 Drew Farris :
> On Sat, Jan 16, 2010 at 4:42 PM, Benson Margulies
> wrote:
>> . Running through strace showed
>>> that something was attempting to reading from /dev/random. Sometimes
>>> it ran fine, but at least 25-30% it ended up blocking until the
>>> entropy pool is refilled. To tes
This is going to be a lot of fun. That class is in uncommons-math, and
the connection to it from Mahout is hardly obvious.
On Sat, Jan 16, 2010 at 5:34 PM, Benson Margulies wrote:
> It looks as if this could be related
to the loading of the SecureRandomSeedGenerator class.
>>>
>
> Let's fix
It looks as if this could be related
>>> to the loading of the SecureRandomSeedGenerator class.
>>
Let's fix that class to defer until there's a good reason to make a seed.
2010/1/16 Benson Margulies :
> . Running through strace showed
>> that something was attempting to reading from /dev/random. Sometimes
>> it ran fine, but at least 25-30% it ended up blocking until the
>> entropy pool is refilled. To test I moved /dev/random, and created a
>> link from /dev/urandom
On Sat, Jan 16, 2010 at 4:42 PM, Benson Margulies wrote:
> . Running through strace showed
>> that something was attempting to reading from /dev/random. Sometimes
>> it ran fine, but at least 25-30% it ended up blocking until the
>> entropy pool is refilled. To test I moved /dev/random, and create
. Running through strace showed
> that something was attempting to reading from /dev/random. Sometimes
> it ran fine, but at least 25-30% it ended up blocking until the
> entropy pool is refilled. To test I moved /dev/random, and created a
> link from /dev/urandom to /dev/random (the former doesn't
Recently I've been noticing that Mahout's unit tests generally take a
considerably long time to run, generally longer than what is reported
in the individual test output. I took a look as to why this was the
case and found a couple things:
Mahout does per-test forking, which means we're forking of
77 matches
Mail list logo