Lucene 8.7 error searching an index created with 8.3

2020-11-23 Thread Nicolás Lichtmaier
I'm seeing errors like this one (using backwards codecs): java.lang.ArrayIndexOutOfBoundsException: Index 69 out of bounds for length 33     at org.apache.lucene.codecs.lucene50.ForUtil.readBlock(ForUtil.java:196)     at org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.r

Re: Using Lucene for technical documentation

2020-11-23 Thread Erick Erickson
You might be able to get something “good enough” with one of the pattern tokenizers, see: https://lucene.apache.org/solr/guide/8_6/tokenizers.html. Won’t be 100% of course. And Paul’s comments are well taken, especially since your input will be inconsistent I’d guess. How much you want to bet t

Re: Using Lucene for technical documentation

2020-11-23 Thread Paul Libbrecht
Hello Trevor, I don’t know of an analyzer for mixes of code and text but I know of an analyser for mixes of code and formulæ. Clearly, you could build a custom analyzer that would tokenize differently depending on weather you’re in code or in text. That’s no super hard. However, where thin