[jira] [Commented] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-26 Thread ASF subversion and git services (JIRA)
8192: - Commit e80ee7fff85918e68c212757c0e6c4bddbdb5ab6 in lucene-solr's branch refs/heads/branch_7x from [~rcmuir] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e80ee7f ] LUCENE-8192: always enforce index-time offsets are correct with BaseTokenStreamTestCase > Remove offsetsAre

[jira] [Resolved] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-26 Thread Robert Muir (JIRA)
ect from BaseTokenStreamTestCase > - > > Key: LUCENE-8192 > URL: https://issues.apache.org/jira/browse/LUCENE-8192 > Project: Lucene - Core > Issue Type: Bug >

[jira] [Commented] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-26 Thread ASF subversion and git services (JIRA)
8192: - Commit e595541ef3f9642632ac85d03c62616b5f70f1e4 in lucene-solr's branch refs/heads/master from [~rcmuir] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e595541 ] LUCENE-8192: always enforce index-time offsets are correct with BaseTokenStreamTestCase > Remove offsetsAre

[jira] [Commented] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-12 Thread Adrien Grand (JIRA)
ct from BaseTokenStreamTestCase > - > > Key: LUCENE-8192 > URL: https://issues.apache.org/jira/browse/LUCENE-8192 > Project: Lucene - Core > Issue Type: Bug >

[jira] [Commented] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-06 Thread Michael McCandless (JIRA)
yay! > Remove offsetsAreCorrect from BaseTokenStreamTestCase > - > > Key: LUCENE-8192 > URL: https://issues.apache.org/jira/browse/LUCENE-8192 > Project: Lucene - Core >

[jira] [Commented] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-04 Thread Robert Muir (JIRA)
hese checks weren't really "under" the boolean, but it was difficult to see that. I moved them in the latest patch to make this more obvious, but it doesn't change the logic. > Remove offsetsAreCorrect f

[jira] [Updated] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-04 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-8192: Attachment: LUCENE-8192.patch > Remove offsetsAreCorrect from BaseTokenStreamTestC

[jira] [Commented] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-04 Thread Robert Muir (JIRA)
sInc checks that indexwriter will do too. I'll update the patch. > Remove offsetsAreCorrect from BaseTokenStreamTestCase > - > > Key: LUCENE-8192 > URL: https://issues.apache.org

[jira] [Commented] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-04 Thread Robert Muir (JIRA)
step? It removes some useless leniency. > Remove offsetsAreCorrect from BaseTokenStreamTestCase > - > > Key: LUCENE-8192 > URL: https://issues.apache.org/jira/browse/LUCENE-8192 >

[jira] [Updated] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-04 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-8192: Attachment: LUCENE-8192_take_two.patch > Remove offsetsAreCorrect from BaseTokenStreamTestC

[jira] [Commented] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-04 Thread Robert Muir (JIRA)
seems to be a higher bar, and even tests for filters that claim to support graphs (SynonymGraphFilter) screw this up? Just at a glance, it seems like we want to separate these concerns. The first one should not be optional. > Remove offsetsAreCorrect f

[jira] [Updated] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-04 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-8192: Attachment: LUCENE-8192_prototype.patch > Remove offsetsAreCorrect from BaseTokenStreamTestC

[jira] [Created] (LUCENE-8192) Remove offsetsAreCorrect from BaseTokenStreamTestCase

2018-03-04 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-8192: --- Summary: Remove offsetsAreCorrect from BaseTokenStreamTestCase Key: LUCENE-8192 URL: https://issues.apache.org/jira/browse/LUCENE-8192 Project: Lucene - Core

BaseTokenStreamTestCase Custom Debug Output

2017-06-22 Thread ben.demott
Hey all, I wrote a small BaseTokenStreamTestCase, so when a test fails, it gives some useful debugging output explaining the token stream. Makes it pretty easy to get your tests / offsets configured properly. Any comments/ - is BaseTokenStreamTestCase the right place to add this utility logic to

[jira] [Commented] (LUCENE-7622) Should BaseTokenStreamTestCase catch analyzers that create duplicate tokens?

2017-01-07 Thread Uwe Schindler (JIRA)
is still useful, but no longer so drastic. So sorry for being unclear. 🤓 Maybe I change or remove the last sentence in my comment to remove the misunderstanding. > Should BaseTokenStreamTestCase catch analyzers that create dupl

[jira] [Commented] (LUCENE-7622) Should BaseTokenStreamTestCase catch analyzers that create duplicate tokens?

2017-01-07 Thread Robert Muir (JIRA)
> Should BaseTokenStreamTestCase catch analyzers that create duplicate tokens? > > > Key: LUCENE-7622 > URL: https://issues.apache.org/jira/browse/LUCENE-7622 > Project: Luc

[jira] [Commented] (LUCENE-7622) Should BaseTokenStreamTestCase catch analyzers that create duplicate tokens?

2017-01-07 Thread Uwe Schindler (JIRA)
text by duplicating them > Should BaseTokenStreamTestCase catch analyzers that create duplicate tokens? > > > Key: LUCENE-7622 > URL: https://issues.apache.org/jira/browse/LU

[jira] [Commented] (LUCENE-7622) Should BaseTokenStreamTestCase catch analyzers that create duplicate tokens?

2017-01-07 Thread Uwe Schindler (JIRA)
#x27;t do this the statistics would be wrong. I agree, for this case it would be better to have a separate field, but some people like to have it in the same. > Should BaseTokenStreamTestCase catch analyzers that create dupl

[jira] [Updated] (LUCENE-7622) Should BaseTokenStreamTestCase catch analyzers that create duplicate tokens?

2017-01-07 Thread Michael McCandless (JIRA)
o pursuing this further now ... I think it's maybe too anal to insist on this from all analyzers ... so I'm posting the patch here in case anyone else gets itchy! > Should BaseTokenStreamTestCase catch analyzers that

[jira] [Created] (LUCENE-7622) Should BaseTokenStreamTestCase catch analyzers that create duplicate tokens?

2017-01-07 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-7622: -- Summary: Should BaseTokenStreamTestCase catch analyzers that create duplicate tokens? Key: LUCENE-7622 URL: https://issues.apache.org/jira/browse/LUCENE-7622

Re: BaseTokenStreamTestCase

2014-05-17 Thread Nitzan Shaked
call restoreState(); > clearAttributes() is not needed before restoreState(). > > If you don’t do this, your filter will work incorrect if other filters > come **after** it. > > > > The assertion in BaseTokenStreamTestCase is therefore correct and really > mandatory. There ar

RE: BaseTokenStreamTestCase

2014-05-16 Thread Uwe Schindler
assertion in BaseTokenStreamTestCase is therefore correct and really mandatory. There are many filters that show how to do this token inserting correctly. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de <http://www.thetaphi.de/> eMail: u...@theta

BaseTokenStreamTestCase

2014-05-16 Thread Nitzan Shaked
Hi all While writing the unit tests for a new token filter I came across an issue(?) with BaseTokenStreamTestCase.assertTokenStreamContents(): it goes to some length to assure that clearAttributes() was called for every token produced by the filter under test. I suppose this helps most of the tim

Re: BaseTokenStreamTestCase

2014-05-16 Thread Robert Muir
its not really a use case: you have to clear attributes when creating a new token or you will have dirty state that is not appropriate... On Fri, May 16, 2014 at 12:28 AM, Nitzan Shaked wrote: > Hi all > > While writing the unit tests for a new token filter I came across an > issue(?) with BaseTo

[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-05 Thread Commit Tag Bot (JIRA)
ramework. > Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseToken

[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-05 Thread Commit Tag Bot (JIRA)
te), e.g., oal.analysis.miscellaneous.EmptyTokenStream. Remove EmptyTokenizer from test-framework. > Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseToken

[jira] [Resolved] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
! > Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseTokenStreamTestCase > -- > >

[jira] [Updated] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
ith EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseTokenStreamTestCase > -- > > Key: LUCENE-4656 >

[jira] [Updated] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
and EmptyTokenStream > (without CharTermAttribute), fix BaseTokenStreamTestCase > -- > > Key: LUCENE-4656 > URL: https://issu

[jira] [Updated] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
ith EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseTokenStreamTestCase > -- > > Key: LUCENE-4656 >

[jira] [Updated] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
and EmptyTokenStream > (without CharTermAttribute), fix BaseTokenStreamTestCase > -- > > Key: LUCENE-4656 > URL: https://issu

[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
s in affected files. I will commit this later and backport. > Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseTokenStre

[jira] [Updated] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
ribute), fix BaseTokenStreamTestCase > -- > > Key: LUCENE-4656 > URL: https://issues.apache.org/jira/browse/LUCENE-4656 >

[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
atch. > Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseTokenStreamTestCase > -- > >

[jira] [Updated] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
ith EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseTokenStreamTestCase > -- > > Key: LUCENE-4656 >

[jira] [Updated] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
ithout CharTermAttribute), fix BaseTokenStreamTestCase > -- > > Key: LUCENE-4656 > URL: https://issues.apache.org/jira/

[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
core! Only in 4.x's TestDocument! > Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseToke

[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Adrien Grand (JIRA)
your patch and they passed, so +1. +1 to removing EmptyTokenizer too. > Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseTokenStre

[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
that horrible piece of sh* :-) > Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseTokenStre

[jira] [Updated] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseTokenStreamTestCase > -- > > K

[jira] [Commented] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Robert Muir (JIRA)
/java to queryparser/src/test at least as an improvement, since it is kinda funky. > Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream > (witho

[jira] [Updated] (LUCENE-4656) Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream (without CharTermAttribute), fix BaseTokenStreamTestCase

2013-01-03 Thread Uwe Schindler (JIRA)
CharTermAttribute), fix BaseTokenStreamTestCase (was: Fix EmptyTokenizer) > Fix IndexWriter working together with EmptyTokenizer and EmptyTokenStream > (without CharTermAttribute), fix BaseTokenStreamTe

[jira] [Resolved] (LUCENE-3911) improve BaseTokenStreamTestCase random string generation

2012-03-24 Thread Robert Muir (Resolved) (JIRA)
: if you want to see what the test strings look like now, have a look at ant test -Dtestcase=TestMockAnalyzer -Dtestmethod=testRandomStrings -Dtests.verbose=true > improve BaseTokenStreamTestCase random string generat

[jira] [Updated] (LUCENE-3911) improve BaseTokenStreamTestCase random string generation

2012-03-24 Thread Robert Muir (Updated) (JIRA)
randomRealistic so in that case we get whole words in the same unicode block (good for stemmers), also sometimes uses randomRegexpIshString, so we get lots of punctuation (good for tokenizers/filters, etc) > improve BaseTokenStreamTestCase random string generat

[jira] [Commented] (LUCENE-3911) improve BaseTokenStreamTestCase random string generation

2012-03-24 Thread Robert Muir (Commented) (JIRA)
n use this too. > improve BaseTokenStreamTestCase random string generation > > > Key: LUCENE-3911 > URL: https://issues.apache.org/jira/browse/LUCENE-3911 > Project: Lucene -

[jira] [Commented] (LUCENE-3911) improve BaseTokenStreamTestCase random string generation

2012-03-24 Thread Michael McCandless (Commented) (JIRA)
reat! > improve BaseTokenStreamTestCase random string generation > > > Key: LUCENE-3911 > URL: https://issues.apache.org/jira/browse/LUCENE-3911 > Project: Lucene - Java >

[jira] [Updated] (LUCENE-3911) improve BaseTokenStreamTestCase random string generation

2012-03-24 Thread Robert Muir (Updated) (JIRA)
_testUtil string generation methods too :) > improve BaseTokenStreamTestCase random string generation > > > Key: LUCENE-3911 > URL: https://issues.apache.org/jira/bro

[jira] [Commented] (LUCENE-3911) improve BaseTokenStreamTestCase random string generation

2012-03-24 Thread Robert Muir (Commented) (JIRA)
hort words, since the maxWordLength we pass in is really a max... but we would want that to be the exact number of elements. I'll improve this. > improve BaseTokenStreamTestCase random string generation > > >

[jira] [Updated] (LUCENE-3911) improve BaseTokenStreamTestCase random string generation

2012-03-24 Thread Robert Muir (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3911: Attachment: LUCENE-3911.patch > improve BaseTokenStreamTestCase random string generat

[jira] [Created] (LUCENE-3911) improve BaseTokenStreamTestCase random string generation

2012-03-24 Thread Robert Muir (Created) (JIRA)
improve BaseTokenStreamTestCase random string generation Key: LUCENE-3911 URL: https://issues.apache.org/jira/browse/LUCENE-3911 Project: Lucene - Java Issue Type: Task

[jira] [Resolved] (LUCENE-3905) BaseTokenStreamTestCase should test analyzers on real-ish content

2012-03-23 Thread Michael McCandless (Resolved) (JIRA)
> BaseTokenStreamTestCase should test analyzers on real-ish content > - > > Key: LUCENE-3905 > URL: https://issues.apache.org/jira/browse/LUCENE-3905 > Project: Lucene - Java >

[jira] [Commented] (LUCENE-3905) BaseTokenStreamTestCase should test analyzers on real-ish content

2012-03-23 Thread Michael McCandless (Commented) (JIRA)
for ngram love... > BaseTokenStreamTestCase should test analyzers on real-ish content > - > > Key: LUCENE-3905 > URL: https://issues.apache.org/jira/browse/LUCENE-3905 &g

[jira] [Commented] (LUCENE-3905) BaseTokenStreamTestCase should test analyzers on real-ish content

2012-03-23 Thread Robert Muir (Commented) (JIRA)
s an improvement! > BaseTokenStreamTestCase should test analyzers on real-ish content > - > > Key: LUCENE-3905 > URL: https://issues.apache.org/jira/browse/LUCENE-3905 &g

[jira] [Commented] (LUCENE-3905) BaseTokenStreamTestCase should test analyzers on real-ish content

2012-03-23 Thread Michael McCandless (Commented) (JIRA)
are unfortunately not OK: they use up tons of RAM when you send random/big tokens through them, because they don't have the same 1024 character limit... I think we should open a new issue for them... in fact I think repairing them could make a good GSoC! > BaseTokenStreamTestCase

[jira] [Commented] (LUCENE-3905) BaseTokenStreamTestCase should test analyzers on real-ish content

2012-03-23 Thread Robert Muir (Commented) (JIRA)
the filter versions of these the same way? e.g. if i have mocktokenizer + (edge)ngramfilter, are they ok? > BaseTokenStreamTestCase should test analyzers on real-ish content > - > >

[jira] [Commented] (LUCENE-3905) BaseTokenStreamTestCase should test analyzers on real-ish content

2012-03-23 Thread Robert Muir (Commented) (JIRA)
e to the first 1024 chars, but that doesn't mean they can't implement end() correctly so that at least highlighting on multivalued fields etc works. > BaseTokenStreamTestCase should test analyzers on

[jira] [Updated] (LUCENE-3905) BaseTokenStreamTestCase should test analyzers on real-ish content

2012-03-23 Thread Michael McCandless (Updated) (JIRA)
tokenizers... > BaseTokenStreamTestCase should test analyzers on real-ish content > - > > Key: LUCENE-3905 > URL: https://issues.apache.org/jira/browse/LUCENE-3905 >

[jira] [Created] (LUCENE-3905) BaseTokenStreamTestCase should test analyzers on real-ish content

2012-03-23 Thread Michael McCandless (Created) (JIRA)
BaseTokenStreamTestCase should test analyzers on real-ish content - Key: LUCENE-3905 URL: https://issues.apache.org/jira/browse/LUCENE-3905 Project: Lucene - Java Issue Type

[jira] [Updated] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Robert Muir (Updated) (JIRA)
ake BaseTokenStreamTestCase a bit more evil > > > Key: LUCENE-3894 > URL: https://issues.apache.org/jira/browse/LUCENE-3894 > Project: Lucene - Java > Issue Type: Improvement >Repo

[jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Robert Muir (Commented) (JIRA)
d trivial test (testHugeDoc) found the IO-311 bug, what if we didn't have that silly test? I'll add a patch. > Make BaseTokenStreamTestCase a bit more evil > > > Key: LUCENE-3894 >

[jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Michael McCandless (Commented) (JIRA)
Rob! > Make BaseTokenStreamTestCase a bit more evil > > > Key: LUCENE-3894 > URL: https://issues.apache.org/jira/browse/LUCENE-3894 > Project: Lucene - Java > Issue Type: Improvem

[jira] [Resolved] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3894. Resolution: Fixed > Make BaseTokenStreamTestCase a bit more e

[jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Robert Muir (Commented) (JIRA)
d' is not really new, its from commons-io! we should open a bug over there... > Make BaseTokenStreamTestCase a bit more evil > > > Key: LUCENE-3894 > URL: https://issues.apach

[jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Michael McCandless (Commented) (JIRA)
thod needs to use the incoming offset (ie, pass location + offset, not location, as 2nd arg to input.read)? Does testHugeDoc then pass? > Make BaseTokenStreamTestCase a bit more evil > > >

[jira] [Updated] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Robert Muir (Updated) (JIRA)
t for icutokenizer now passes (spoonfeeding caught a bug). But, now testHugeDoc fails... (not a random test). > Make BaseTokenStreamTestCase a bit more evil > > > Key: LUCENE-3894 > URL: https://is

[jira] [Updated] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Michael McCandless (Updated) (JIRA)
ake BaseTokenStreamTestCase a bit more evil > > > Key: LUCENE-3894 > URL: https://issues.apache.org/jira/browse/LUCENE-3894 > Project: Lucene - Java > Issue Type: Improvement >

[jira] [Updated] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Michael McCandless (Updated) (JIRA)
/NGramTokenizers to work w/ spoon feeding, but otherwise no analyzers seem to be failing, at least on one run... I had to do some sneaky things with MockTokenizer to work around its state machine... > Make BaseTokenStreamTestCase a bit more e

[jira] [Created] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Michael McCandless (Created) (JIRA)
Make BaseTokenStreamTestCase a bit more evil Key: LUCENE-3894 URL: https://issues.apache.org/jira/browse/LUCENE-3894 Project: Lucene - Java Issue Type: Improvement Reporter: Michael

[jira] [Resolved] (LUCENE-3848) basetokenstreamtestcase should fail if tokenstream starts with posinc=0

2012-03-16 Thread Robert Muir (Resolved) (JIRA)
MockGraphTokenFilter into tests. > basetokenstreamtestcase should fail if tokenstream starts with posinc=0 > --- > > Key: LUCENE-3848 > URL: https://issues.apache.org/jira/bro

[jira] [Commented] (LUCENE-3848) basetokenstreamtestcase should fail if tokenstream starts with posinc=0

2012-03-15 Thread Robert Muir (Commented) (JIRA)
nc=0' to posinc=1 anyway. > basetokenstreamtestcase should fail if tokenstream starts with posinc=0 > --- > > Key: LUCENE-3848 > URL: https://issues.apache.org/j

[jira] [Commented] (LUCENE-3848) basetokenstreamtestcase should fail if tokenstream starts with posinc=0

2012-03-15 Thread Michael McCandless (Commented) (JIRA)
+1 > basetokenstreamtestcase should fail if tokenstream starts with posinc=0 > --- > > Key: LUCENE-3848 > URL: https://issues.apache.org/jira/browse/LUCENE-3848 >

[jira] [Updated] (LUCENE-3848) basetokenstreamtestcase should fail if tokenstream starts with posinc=0

2012-03-15 Thread Robert Muir (Updated) (JIRA)
grate Mike's nice MockGraphTokenFilter *yet* but will do this under a separate issue: its likely to expose a few bugs :) > basetokenstreamtestcase should fail if tokenstream starts

[jira] [Updated] (LUCENE-3848) basetokenstreamtestcase should fail if tokenstream starts with posinc=0

2012-03-05 Thread Robert Muir (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3848: Fix Version/s: (was: 3.6) > basetokenstreamtestcase should fail if tokenstream sta

[jira] [Updated] (LUCENE-3848) basetokenstreamtestcase should fail if tokenstream starts with posinc=0

2012-03-04 Thread Michael McCandless (Updated) (JIRA)
MockGraphTokenFilter we can use to randomly insert fake graph arcs... > basetokenstreamtestcase should fail if tokenstream starts with posinc=0 > --- > > Key: LUCENE-3848 >

[jira] [Updated] (LUCENE-3848) basetokenstreamtestcase should fail if tokenstream starts with posinc=0

2012-03-04 Thread Robert Muir (Updated) (JIRA)
vingfilter. > basetokenstreamtestcase should fail if tokenstream starts with posinc=0 > --- > > Key: LUCENE-3848 > URL: https://issues.apache.org/jira/browse/LUCENE-3848 >

[jira] [Created] (LUCENE-3848) basetokenstreamtestcase should fail if tokenstream starts with posinc=0

2012-03-04 Thread Robert Muir (Created) (JIRA)
basetokenstreamtestcase should fail if tokenstream starts with posinc=0 --- Key: LUCENE-3848 URL: https://issues.apache.org/jira/browse/LUCENE-3848 Project: Lucene - Java

[jira] [Resolved] (LUCENE-3717) Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs

2012-01-24 Thread Robert Muir (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3717. - Resolution: Fixed > Add fake charfilter to BaseTokenStreamTestCase to find offsets b

[jira] [Updated] (LUCENE-3717) Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs

2012-01-24 Thread Robert Muir (Updated) (JIRA)
computing end() from the trimmed length * not calling correctOffset * not checking return value of Reader.read causing bugs in some situations (e.g. empty stringreader) > Add fake charfilter to BaseTokenStreamTestCase to find offsets b

[jira] [Commented] (LUCENE-3717) Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs

2012-01-23 Thread Robert Muir (Commented) (JIRA)
just remains to add the random test to all remaining tokenstreams... > Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs > --- > > Key: LUCENE-3717 >

[jira] [Reopened] (LUCENE-3717) Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs

2012-01-22 Thread Robert Muir (Reopened) (JIRA)
g the current patch as a start but i think we should check every tokenizer/filter/etc and just clean this up. > Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs > --- > >

[jira] [Updated] (LUCENE-3717) Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs

2012-01-22 Thread Robert Muir (Updated) (JIRA)
charfilters. * WikipediaTokenizer broken in many ways, in general the tokenizer keeps a ton of state variables, but never resets this state. patch fixes these but I'm sure adding more tests to the remaining filters will find more bugs. > Add fake charfilter to BaseTokenStream

[jira] [Resolved] (LUCENE-3717) Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs

2012-01-22 Thread Robert Muir (Resolved) (JIRA)
all using checkRandomData (i think most are), just to see if we have any other bugs sitting out there. It would be nice to have these offsets all under control for the next release. > Add fake charfilter to BaseTokenStreamTestCase to find offsets b

[jira] [Created] (LUCENE-3717) Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs

2012-01-22 Thread Robert Muir (Created) (JIRA)
Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs --- Key: LUCENE-3717 URL: https://issues.apache.org/jira/browse/LUCENE-3717 Project: Lucene - Java Issue

[jira] [Updated] (LUCENE-3717) Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs

2012-01-22 Thread Robert Muir (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3717: Attachment: LUCENE-3717.patch > Add fake charfilter to BaseTokenStreamTestCase to f