Re: Question about usage of LuceneTestCase

2018-08-22 Thread Tomoko Uchida
> You don't really have to figure out exactly what the combinations are,
> just execute the test with the "reproduce with" flags set, cut/paste
> the error message at the root of your local Solr source tree in a
> command prompt.

> ant test  -Dtestcase=CommitsImplTest
> -Dtests.method=testGetSegmentAttributes -Dtests.seed=35AF58F652536895
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=de
> -Dtests.timezone=Africa/Kigali -Dtests.asserts=true
> -Dtests.file.encoding=UTF-8

Thanks for correcting that. :)

> I doubt there's any real point in exercising Luke on non-FS based
> indexes, so disabling the randomization of the filesystem seems fine.

> @BeforeClass
> public static void beforeTriLevelCompositeIdRoutingTest() throws
Exception {
>   useFactory(null); // uses Standard or NRTCaching, FS based anyway.
> }

Current version of Luke supports FS based directory implementations only.
(I think it will be better if future versions support non-FS based custom
implementations, such as HdfsDirectoryFactory for users who need it.)
Disabling the randomization, at least for now, sounds reasonable to me too.
I'll try this way.



> It looks to me as if this test is asserting that the segment in an index
it
> just created has some attributes, but in fact it does not. Perhaps there
is
> a codec that does not store any attributes with its segments, and Luke
does
> not expect this, and maybe the codec is being selected randomly by the
> RandomIndexWriter?

Thanks for your investigation! I'll catch up with you.

Regards,
Tomoko

2018年8月23日(木) 6:03 Michael Sokolov :

> It looks to me as if this test is asserting that the segment in an index it
> just created has some attributes, but in fact it does not. Perhaps there is
> a codec that does not store any attributes with its segments, and Luke does
> not expect this, and maybe the codec is being selected randomly by the
> RandomIndexWriter?
>
> On Wed, Aug 22, 2018 at 4:54 PM Michael Sokolov 
> wrote:
>
> > Here's a seed that fails for me consistently in IntelliJ:
> > "FEF692F43FE50191:656E22441676701C" running CommitsImplTest. Warning: I
> > have a bunch of local changes that might have perturbed the randomness so
> > possibly it might not reproduce for others.  I just run the tests, open
> the
> > "Edit Configurations" dialog, paste in
> > -Dtests.seed=FEF692F43FE50191:656E22441676701C in the VM options box, and
> > then I can get the test to fail every time, it seems
> >
> > On Wed, Aug 22, 2018 at 1:11 PM Erick Erickson 
> > wrote:
> >
> >> bq. My understanding at this point is (though it may be a repeat of your
> >> words,)
> >> first we should find out the combinations behind the failures.
> >> If there are any particular patterns, there could be bugs, so we should
> >> fix
> >> it.
> >>
> >> You don't really have to figure out exactly what the combinations are,
> >> just execute the test with the "reproduce with" flags set, cut/paste
> >> the error message at the root of your local Solr source tree in a
> >> command prompt.
> >>
> >> ant test  -Dtestcase=CommitsImplTest
> >> -Dtests.method=testGetSegmentAttributes -Dtests.seed=35AF58F652536895
> >> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=de
> >> -Dtests.timezone=Africa/Kigali -Dtests.asserts=true
> >> -Dtests.file.encoding=UTF-8
> >>
> >> That should reproduce exactly the same results from random() and
> >> (hopefully) reliably reproduce the problem. Not sure how to mavenize
> >> it, but you shouldn't need to if you have Solr locally. If it fails
> >> every time, you can debug. I've had some luck just defining the
> >> tests.seed in my IDE and running the test there (I use IntelliJ, but
> >> I'm sure Eclipse and Netbeans etc. have an equivalent way to do
> >> things). If just setting the seed as a sysvar in your IDE doesn't do
> >> the trick, you can always define all of them in the IDE.
> >>
> >> Even setting all the sysvars in the IDE doesn't always work. That is
> >> executing the entire test from the command line can consistently fail
> >> but defining all the sysvars in the IDE succeeds. But when it does
> >> fail in the IDE it makes things _much_ easier ;)
> >>
> >> Second question:
> >>
> >> I doubt there's any real point in exercising Luke on non-FS based
> >> indexes, so disabling the randomization of the filesystem seems fine.
> >>
> >> See SolrTestCaseJ4, the "useFactory" method. You can do something like
> >> this in your test:
> >>
> >> @BeforeClass
> >> public static void beforeTriLevelCompositeIdRoutingTest() throws
> >> Exception {
> >>   useFactory(null); // uses Standard or NRTCaching, FS based anyway.
> >> }
> >>
> >> or even:
> >>
> >> useFactory("solr.StandardDirectoryFactory");
> >>
> >> I'm not sure about
> >> useFactory("org.apache.solr.core.HdfsDirectoryFactory");
> >>
> >> Or if you're really adventurous:
> >>
> >> @BeforeClass
> >> public static void beforeTriLevelCompositeIdRoutingTest() throws
> >> Exception {
> >>   switch (random().nextInt(2)) {
> >>

Re: Question about usage of LuceneTestCase

2018-08-22 Thread Michael Sokolov
It looks to me as if this test is asserting that the segment in an index it
just created has some attributes, but in fact it does not. Perhaps there is
a codec that does not store any attributes with its segments, and Luke does
not expect this, and maybe the codec is being selected randomly by the
RandomIndexWriter?

On Wed, Aug 22, 2018 at 4:54 PM Michael Sokolov  wrote:

> Here's a seed that fails for me consistently in IntelliJ:
> "FEF692F43FE50191:656E22441676701C" running CommitsImplTest. Warning: I
> have a bunch of local changes that might have perturbed the randomness so
> possibly it might not reproduce for others.  I just run the tests, open the
> "Edit Configurations" dialog, paste in
> -Dtests.seed=FEF692F43FE50191:656E22441676701C in the VM options box, and
> then I can get the test to fail every time, it seems
>
> On Wed, Aug 22, 2018 at 1:11 PM Erick Erickson 
> wrote:
>
>> bq. My understanding at this point is (though it may be a repeat of your
>> words,)
>> first we should find out the combinations behind the failures.
>> If there are any particular patterns, there could be bugs, so we should
>> fix
>> it.
>>
>> You don't really have to figure out exactly what the combinations are,
>> just execute the test with the "reproduce with" flags set, cut/paste
>> the error message at the root of your local Solr source tree in a
>> command prompt.
>>
>> ant test  -Dtestcase=CommitsImplTest
>> -Dtests.method=testGetSegmentAttributes -Dtests.seed=35AF58F652536895
>> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=de
>> -Dtests.timezone=Africa/Kigali -Dtests.asserts=true
>> -Dtests.file.encoding=UTF-8
>>
>> That should reproduce exactly the same results from random() and
>> (hopefully) reliably reproduce the problem. Not sure how to mavenize
>> it, but you shouldn't need to if you have Solr locally. If it fails
>> every time, you can debug. I've had some luck just defining the
>> tests.seed in my IDE and running the test there (I use IntelliJ, but
>> I'm sure Eclipse and Netbeans etc. have an equivalent way to do
>> things). If just setting the seed as a sysvar in your IDE doesn't do
>> the trick, you can always define all of them in the IDE.
>>
>> Even setting all the sysvars in the IDE doesn't always work. That is
>> executing the entire test from the command line can consistently fail
>> but defining all the sysvars in the IDE succeeds. But when it does
>> fail in the IDE it makes things _much_ easier ;)
>>
>> Second question:
>>
>> I doubt there's any real point in exercising Luke on non-FS based
>> indexes, so disabling the randomization of the filesystem seems fine.
>>
>> See SolrTestCaseJ4, the "useFactory" method. You can do something like
>> this in your test:
>>
>> @BeforeClass
>> public static void beforeTriLevelCompositeIdRoutingTest() throws
>> Exception {
>>   useFactory(null); // uses Standard or NRTCaching, FS based anyway.
>> }
>>
>> or even:
>>
>> useFactory("solr.StandardDirectoryFactory");
>>
>> I'm not sure about
>> useFactory("org.apache.solr.core.HdfsDirectoryFactory");
>>
>> Or if you're really adventurous:
>>
>> @BeforeClass
>> public static void beforeTriLevelCompositeIdRoutingTest() throws
>> Exception {
>>   switch (random().nextInt(2)) {
>>  case 0:
>> useFactory(null); // uses Standard or NRTCaching, FS based anyway.
>> break;
>> case 1:
>> useFactory("org.apache.solr.core.HdfsDirectoryFactory");
>> break;
>> // I guess whatever else you wanted...
>>
>> }
>>
>>
>> Frankly in this case I'd:
>>
>> 1> see if executing the full reproduce line consistently fails and if so
>> 2> try using the above to disable other filesystems. If that
>> consistently succeeds, consider it done.
>>
>> Since Luke is intended to be used on an existing index I don't see
>> much use in randomizing for edge cases. But that pre-supposes that
>> it's a problem with some of the directory implementations of course...
>>
>> Best,
>> Erick
>>
>> On Wed, Aug 22, 2018 at 8:13 AM, Tomoko Uchida
>>  wrote:
>> > Can I ask one more question.
>> >
>> > 4> If MIke's intuition that it's one of the file system randomizations
>> > that occasionally gets hit _and_ you determine that that's an invalid
>> > test case (and for Luke requiring that the FS-basesd tests are all
>> > that are necessary may be fine) I'm pretty sure you you can disable
>> > that randomization for your specific tests.
>> >
>> > As you may know, Luke calls relatively low Lucene APIs (such as
>> > o.a.l.u.IndexCommit or SegmentInfos) to show commit points, segment
>> files,
>> > etc. ("Commits" tab do this.)
>> > I am not sure about when we could/should disable randomization, could
>> you
>> > give me any cues for this? Or, real test cases that disable
>> randomization
>> > are helpful for me, I will search Lucene/Solr code base.
>> >
>> > Thanks,
>> > Tomoko
>> >
>> > 2018年8月22日(水) 21:58 Tomoko Uchida :
>> >
>> >> Thanks for your kind explanations,
>> >>
>> >> sorry of course I know what is the

Re: Question about usage of LuceneTestCase

2018-08-22 Thread Michael Sokolov
Here's a seed that fails for me consistently in IntelliJ:
"FEF692F43FE50191:656E22441676701C" running CommitsImplTest. Warning: I
have a bunch of local changes that might have perturbed the randomness so
possibly it might not reproduce for others.  I just run the tests, open the
"Edit Configurations" dialog, paste in
-Dtests.seed=FEF692F43FE50191:656E22441676701C in the VM options box, and
then I can get the test to fail every time, it seems

On Wed, Aug 22, 2018 at 1:11 PM Erick Erickson 
wrote:

> bq. My understanding at this point is (though it may be a repeat of your
> words,)
> first we should find out the combinations behind the failures.
> If there are any particular patterns, there could be bugs, so we should fix
> it.
>
> You don't really have to figure out exactly what the combinations are,
> just execute the test with the "reproduce with" flags set, cut/paste
> the error message at the root of your local Solr source tree in a
> command prompt.
>
> ant test  -Dtestcase=CommitsImplTest
> -Dtests.method=testGetSegmentAttributes -Dtests.seed=35AF58F652536895
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=de
> -Dtests.timezone=Africa/Kigali -Dtests.asserts=true
> -Dtests.file.encoding=UTF-8
>
> That should reproduce exactly the same results from random() and
> (hopefully) reliably reproduce the problem. Not sure how to mavenize
> it, but you shouldn't need to if you have Solr locally. If it fails
> every time, you can debug. I've had some luck just defining the
> tests.seed in my IDE and running the test there (I use IntelliJ, but
> I'm sure Eclipse and Netbeans etc. have an equivalent way to do
> things). If just setting the seed as a sysvar in your IDE doesn't do
> the trick, you can always define all of them in the IDE.
>
> Even setting all the sysvars in the IDE doesn't always work. That is
> executing the entire test from the command line can consistently fail
> but defining all the sysvars in the IDE succeeds. But when it does
> fail in the IDE it makes things _much_ easier ;)
>
> Second question:
>
> I doubt there's any real point in exercising Luke on non-FS based
> indexes, so disabling the randomization of the filesystem seems fine.
>
> See SolrTestCaseJ4, the "useFactory" method. You can do something like
> this in your test:
>
> @BeforeClass
> public static void beforeTriLevelCompositeIdRoutingTest() throws Exception
> {
>   useFactory(null); // uses Standard or NRTCaching, FS based anyway.
> }
>
> or even:
>
> useFactory("solr.StandardDirectoryFactory");
>
> I'm not sure about useFactory("org.apache.solr.core.HdfsDirectoryFactory");
>
> Or if you're really adventurous:
>
> @BeforeClass
> public static void beforeTriLevelCompositeIdRoutingTest() throws Exception
> {
>   switch (random().nextInt(2)) {
>  case 0:
> useFactory(null); // uses Standard or NRTCaching, FS based anyway.
> break;
> case 1:
> useFactory("org.apache.solr.core.HdfsDirectoryFactory");
> break;
> // I guess whatever else you wanted...
>
> }
>
>
> Frankly in this case I'd:
>
> 1> see if executing the full reproduce line consistently fails and if so
> 2> try using the above to disable other filesystems. If that
> consistently succeeds, consider it done.
>
> Since Luke is intended to be used on an existing index I don't see
> much use in randomizing for edge cases. But that pre-supposes that
> it's a problem with some of the directory implementations of course...
>
> Best,
> Erick
>
> On Wed, Aug 22, 2018 at 8:13 AM, Tomoko Uchida
>  wrote:
> > Can I ask one more question.
> >
> > 4> If MIke's intuition that it's one of the file system randomizations
> > that occasionally gets hit _and_ you determine that that's an invalid
> > test case (and for Luke requiring that the FS-basesd tests are all
> > that are necessary may be fine) I'm pretty sure you you can disable
> > that randomization for your specific tests.
> >
> > As you may know, Luke calls relatively low Lucene APIs (such as
> > o.a.l.u.IndexCommit or SegmentInfos) to show commit points, segment
> files,
> > etc. ("Commits" tab do this.)
> > I am not sure about when we could/should disable randomization, could you
> > give me any cues for this? Or, real test cases that disable randomization
> > are helpful for me, I will search Lucene/Solr code base.
> >
> > Thanks,
> > Tomoko
> >
> > 2018年8月22日(水) 21:58 Tomoko Uchida :
> >
> >> Thanks for your kind explanations,
> >>
> >> sorry of course I know what is the randomization seed,
> >> but your description and instruction is exactly what I wanted.
> >>
> >> > The randomization can cause different
> >> > combinations of "stuff" to happen. Say the locale is randomized to
> >> > Turkish and a token is also randomly generated that breaks _only_ with
> >> > that combination. You'd never explicitly be able to test all of those
> >> > kinds of combinations, thus the random() function. And there may be
> >> > many calls to random() by the time a test is run.
> >>
> >> My un

Re: Question about usage of LuceneTestCase

2018-08-22 Thread Erick Erickson
bq. My understanding at this point is (though it may be a repeat of your words,)
first we should find out the combinations behind the failures.
If there are any particular patterns, there could be bugs, so we should fix
it.

You don't really have to figure out exactly what the combinations are,
just execute the test with the "reproduce with" flags set, cut/paste
the error message at the root of your local Solr source tree in a
command prompt.

ant test  -Dtestcase=CommitsImplTest
-Dtests.method=testGetSegmentAttributes -Dtests.seed=35AF58F652536895
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=de
-Dtests.timezone=Africa/Kigali -Dtests.asserts=true
-Dtests.file.encoding=UTF-8

That should reproduce exactly the same results from random() and
(hopefully) reliably reproduce the problem. Not sure how to mavenize
it, but you shouldn't need to if you have Solr locally. If it fails
every time, you can debug. I've had some luck just defining the
tests.seed in my IDE and running the test there (I use IntelliJ, but
I'm sure Eclipse and Netbeans etc. have an equivalent way to do
things). If just setting the seed as a sysvar in your IDE doesn't do
the trick, you can always define all of them in the IDE.

Even setting all the sysvars in the IDE doesn't always work. That is
executing the entire test from the command line can consistently fail
but defining all the sysvars in the IDE succeeds. But when it does
fail in the IDE it makes things _much_ easier ;)

Second question:

I doubt there's any real point in exercising Luke on non-FS based
indexes, so disabling the randomization of the filesystem seems fine.

See SolrTestCaseJ4, the "useFactory" method. You can do something like
this in your test:

@BeforeClass
public static void beforeTriLevelCompositeIdRoutingTest() throws Exception {
  useFactory(null); // uses Standard or NRTCaching, FS based anyway.
}

or even:

useFactory("solr.StandardDirectoryFactory");

I'm not sure about useFactory("org.apache.solr.core.HdfsDirectoryFactory");

Or if you're really adventurous:

@BeforeClass
public static void beforeTriLevelCompositeIdRoutingTest() throws Exception {
  switch (random().nextInt(2)) {
 case 0:
useFactory(null); // uses Standard or NRTCaching, FS based anyway.
break;
case 1:
useFactory("org.apache.solr.core.HdfsDirectoryFactory");
break;
// I guess whatever else you wanted...

}


Frankly in this case I'd:

1> see if executing the full reproduce line consistently fails and if so
2> try using the above to disable other filesystems. If that
consistently succeeds, consider it done.

Since Luke is intended to be used on an existing index I don't see
much use in randomizing for edge cases. But that pre-supposes that
it's a problem with some of the directory implementations of course...

Best,
Erick

On Wed, Aug 22, 2018 at 8:13 AM, Tomoko Uchida
 wrote:
> Can I ask one more question.
>
> 4> If MIke's intuition that it's one of the file system randomizations
> that occasionally gets hit _and_ you determine that that's an invalid
> test case (and for Luke requiring that the FS-basesd tests are all
> that are necessary may be fine) I'm pretty sure you you can disable
> that randomization for your specific tests.
>
> As you may know, Luke calls relatively low Lucene APIs (such as
> o.a.l.u.IndexCommit or SegmentInfos) to show commit points, segment files,
> etc. ("Commits" tab do this.)
> I am not sure about when we could/should disable randomization, could you
> give me any cues for this? Or, real test cases that disable randomization
> are helpful for me, I will search Lucene/Solr code base.
>
> Thanks,
> Tomoko
>
> 2018年8月22日(水) 21:58 Tomoko Uchida :
>
>> Thanks for your kind explanations,
>>
>> sorry of course I know what is the randomization seed,
>> but your description and instruction is exactly what I wanted.
>>
>> > The randomization can cause different
>> > combinations of "stuff" to happen. Say the locale is randomized to
>> > Turkish and a token is also randomly generated that breaks _only_ with
>> > that combination. You'd never explicitly be able to test all of those
>> > kinds of combinations, thus the random() function. And there may be
>> > many calls to random() by the time a test is run.
>>
>> My understanding at this point is (though it may be a repeat of your
>> words,)
>> first we should find out the combinations behind the failures.
>> If there are any particular patterns, there could be bugs, so we should
>> fix it.
>>
>> Thanks,
>> Tomoko
>>
>> 2018年8月22日(水) 14:59 Erick Erickson :
>>
>>> The pseudo-random generator in the Lucene test framework is used to
>>> randomize lots of test conditions, we're talking about the file system
>>> implementation here, but there are lots of others. Whenever you see a
>>> call to random().whatever, that's the call to the framework's method.
>>>
>>> But here's the thing. The randomization can cause different
>>> combinations of "stuff" to happen. Say the locale is 

Re: Question about usage of LuceneTestCase

2018-08-22 Thread Tomoko Uchida
Can I ask one more question.

4> If MIke's intuition that it's one of the file system randomizations
that occasionally gets hit _and_ you determine that that's an invalid
test case (and for Luke requiring that the FS-basesd tests are all
that are necessary may be fine) I'm pretty sure you you can disable
that randomization for your specific tests.

As you may know, Luke calls relatively low Lucene APIs (such as
o.a.l.u.IndexCommit or SegmentInfos) to show commit points, segment files,
etc. ("Commits" tab do this.)
I am not sure about when we could/should disable randomization, could you
give me any cues for this? Or, real test cases that disable randomization
are helpful for me, I will search Lucene/Solr code base.

Thanks,
Tomoko

2018年8月22日(水) 21:58 Tomoko Uchida :

> Thanks for your kind explanations,
>
> sorry of course I know what is the randomization seed,
> but your description and instruction is exactly what I wanted.
>
> > The randomization can cause different
> > combinations of "stuff" to happen. Say the locale is randomized to
> > Turkish and a token is also randomly generated that breaks _only_ with
> > that combination. You'd never explicitly be able to test all of those
> > kinds of combinations, thus the random() function. And there may be
> > many calls to random() by the time a test is run.
>
> My understanding at this point is (though it may be a repeat of your
> words,)
> first we should find out the combinations behind the failures.
> If there are any particular patterns, there could be bugs, so we should
> fix it.
>
> Thanks,
> Tomoko
>
> 2018年8月22日(水) 14:59 Erick Erickson :
>
>> The pseudo-random generator in the Lucene test framework is used to
>> randomize lots of test conditions, we're talking about the file system
>> implementation here, but there are lots of others. Whenever you see a
>> call to random().whatever, that's the call to the framework's method.
>>
>> But here's the thing. The randomization can cause different
>> combinations of "stuff" to happen. Say the locale is randomized to
>> Turkish and a token is also randomly generated that breaks _only_ with
>> that combination. You'd never explicitly be able to test all of those
>> kinds of combinations, thus the random() function. And there may be
>> many calls to random() by the time a test is run.
>>
>> Here's the key. When "seeded" with the same number, the calls to
>> random() produce the exact same output every time. So say with seed1 I
>> get
>> nextInt() - 1
>> nextInt() - 67
>> nextBool() - true
>>
>> Whenever I use 1 as the seed, I'll get exactly the above. However, if
>> I use 2 as a seed, I might get
>> nextInt() - 93
>> nextInt() - 63
>> nextBool() - false
>>
>> So the short form is
>>
>> 1. randomization is used to try out various combinations.
>>
>> 2. using a particular seed guarantees that the randomization is
>> repeatable.
>>
>> 3.  when a test fails with a particular seed, running the test with
>> the _same_ seed will produce the same conditions, hopefully allowing
>> that particular error resulting from that particular combination to be
>> reproduced reliably (and fixed).
>>
>> 4. at least that's the theory and in practice it works quite well.
>> There is no _guarantee_ that the test will fail using the same seed,
>> sometimes the failures are a result of subtle timing etc, which is not
>> under control of the randomization. I breathe a sigh of relief,
>> though, when a test _does_ reproduce with a particular seed 'cause
>> then I have a hope of knowing the issue is actually fixed ;).
>>
>>
>> Best,
>> Erick
>>
>> On Tue, Aug 21, 2018 at 3:56 PM, Tomoko Uchida
>>  wrote:
>> > Thanks a lot for your information & insights,
>> >
>> > I will try to reproduce the errors and investigate the results.
>> > And, maybe I should learn more about internal of the test framework,
>> > I'm not familiar with it and still do not understand what does "seed"
>> means
>> > exactly in this context.
>> >
>> > Regards,
>> > Tomoko
>> >
>> > 2018年8月22日(水) 1:05 Erick Erickson :
>> >
>> >> Couple of things (and I know you've been around for a while, so pardon
>> >> me if it's all old hat to you):
>> >>
>> >> 1> if you run the entire "reproduce with" line and can get a
>> >> consistent failure, then you are half way there, nothing is as
>> >> frustrating as not getting failures reliably. The critical bit is
>> >> often the -Dtests.seed. As Michael mentioned, there are various
>> >> randomizations done for _many_ things in Lucene tests using a random
>> >> generator.  tests.seed, well, seeds that generator so it produces the
>> >> same numbers every time it's run with that seed. You'll see lots of
>> >> calls to a static ramdom() method calls. I'll add that if you want to
>> >> use randomness in your tests, use that method and do _not_ use a local
>> >> instance of Java's Random.
>> >>
>> >> 2> MIke: You say IntelliJ succeeds. But that'll use a new random()
>> >> seed. Once you run a test, in the upper right (on my version at
>> >

Re: Question about usage of LuceneTestCase

2018-08-22 Thread Tomoko Uchida
Thanks for your kind explanations,

sorry of course I know what is the randomization seed,
but your description and instruction is exactly what I wanted.

> The randomization can cause different
> combinations of "stuff" to happen. Say the locale is randomized to
> Turkish and a token is also randomly generated that breaks _only_ with
> that combination. You'd never explicitly be able to test all of those
> kinds of combinations, thus the random() function. And there may be
> many calls to random() by the time a test is run.

My understanding at this point is (though it may be a repeat of your words,)
first we should find out the combinations behind the failures.
If there are any particular patterns, there could be bugs, so we should fix
it.

Thanks,
Tomoko

2018年8月22日(水) 14:59 Erick Erickson :

> The pseudo-random generator in the Lucene test framework is used to
> randomize lots of test conditions, we're talking about the file system
> implementation here, but there are lots of others. Whenever you see a
> call to random().whatever, that's the call to the framework's method.
>
> But here's the thing. The randomization can cause different
> combinations of "stuff" to happen. Say the locale is randomized to
> Turkish and a token is also randomly generated that breaks _only_ with
> that combination. You'd never explicitly be able to test all of those
> kinds of combinations, thus the random() function. And there may be
> many calls to random() by the time a test is run.
>
> Here's the key. When "seeded" with the same number, the calls to
> random() produce the exact same output every time. So say with seed1 I
> get
> nextInt() - 1
> nextInt() - 67
> nextBool() - true
>
> Whenever I use 1 as the seed, I'll get exactly the above. However, if
> I use 2 as a seed, I might get
> nextInt() - 93
> nextInt() - 63
> nextBool() - false
>
> So the short form is
>
> 1. randomization is used to try out various combinations.
>
> 2. using a particular seed guarantees that the randomization is repeatable.
>
> 3.  when a test fails with a particular seed, running the test with
> the _same_ seed will produce the same conditions, hopefully allowing
> that particular error resulting from that particular combination to be
> reproduced reliably (and fixed).
>
> 4. at least that's the theory and in practice it works quite well.
> There is no _guarantee_ that the test will fail using the same seed,
> sometimes the failures are a result of subtle timing etc, which is not
> under control of the randomization. I breathe a sigh of relief,
> though, when a test _does_ reproduce with a particular seed 'cause
> then I have a hope of knowing the issue is actually fixed ;).
>
>
> Best,
> Erick
>
> On Tue, Aug 21, 2018 at 3:56 PM, Tomoko Uchida
>  wrote:
> > Thanks a lot for your information & insights,
> >
> > I will try to reproduce the errors and investigate the results.
> > And, maybe I should learn more about internal of the test framework,
> > I'm not familiar with it and still do not understand what does "seed"
> means
> > exactly in this context.
> >
> > Regards,
> > Tomoko
> >
> > 2018年8月22日(水) 1:05 Erick Erickson :
> >
> >> Couple of things (and I know you've been around for a while, so pardon
> >> me if it's all old hat to you):
> >>
> >> 1> if you run the entire "reproduce with" line and can get a
> >> consistent failure, then you are half way there, nothing is as
> >> frustrating as not getting failures reliably. The critical bit is
> >> often the -Dtests.seed. As Michael mentioned, there are various
> >> randomizations done for _many_ things in Lucene tests using a random
> >> generator.  tests.seed, well, seeds that generator so it produces the
> >> same numbers every time it's run with that seed. You'll see lots of
> >> calls to a static ramdom() method calls. I'll add that if you want to
> >> use randomness in your tests, use that method and do _not_ use a local
> >> instance of Java's Random.
> >>
> >> 2> MIke: You say IntelliJ succeeds. But that'll use a new random()
> >> seed. Once you run a test, in the upper right (on my version at
> >> least), IntelliJ will show you a little box with the test name and you
> >> can "edit configurations" on it. I often have luck by editing the
> >> configuration and adding the test seed to the "VM option" box for the
> >> test, just the "-Dtests.seed=35AF58F652536895" part. You can add all
> >> of the -D flags in the "reproduce with" line if you want, but often
> >> just the seed works for me. If that works and you track it down, do
> >> remember to take that seed _out_ of the "VM options" box rather than
> >> forget it as I have done ;)
> >>
> >> 3> Mark Miller's beasting script can be used to run a zillion tests
> >> over night: https://gist.github.com/markrmiller/dbdb792216dc98b018ad
> >>
> >> 4> If MIke's intuition that it's one of the file system randomizations
> >> that occasionally gets hit _and_ you determine that that's an invalid
> >> test case (and for Luke requiring that the