Hi,
I am trying to implement a custom tokenizer for my application and I have
few queries regarding the same.
1. Is there a way to provide an existing analyzer (say EnglishAnanlyzer)
the custom tokenizer and make it use this tokenizer instead of say
StandardTokenizer?
2. Why are analyzers such as S
Thanks Robert for pointing out the difference.
On Sun, Jan 11, 2015 at 10:29 PM, Robert Muir wrote:
> files are either per-segment or per-commit.
>
> the first only returns per-segment files. this means it won't include
> any per-commit files:
> * segments_N itself
> * generational .liv for dele
Hi Uwe,
Thanks a lot for the detailed reply. I'll see how far I get with it, but being
quite new to Lucene, it seems I am lacking a bit of background information to
fully understand the response below. In particular, I need to do some
background reading on how token streams and readers work, I
files are either per-segment or per-commit.
the first only returns per-segment files. this means it won't include
any per-commit files:
* segments_N itself
* generational .liv for deletes
* generational .fnm/.dvd/etc for docvalues updates.
the second includes per-commit files, too. it doesnt incl
I wanted to know whats the difference betwen the two ways that I am getting
a list of all segment files belonging to a segment?
method1 never returns .liv files.
https://gist.github.com/vthacker/98065232c3d2da579700
--
Regards,
Varun Thacker
http://www.vthacker.in/
the highlighter's SimpleSpanFragmenter has a bug
documented in https://issues.apache.org/jira/browse/LUCENE-2229
that practically makes it unusable with PhraseQuery
I can confirm that the bug still exists in version 4.10
(the JIRA issue was created back in year 2010)
the symptom is that
if there
On Sat, Jan 10, 2015 at 7:58 PM, Tom Burton-West wrote:
> Thanks Mike,
>
> We run our Solr 3.x indexing with 10GB/shard. I've been testing Solr 4
> with 4,6, and 8GB for heap. As of Friday night when the indexes were about
> half done (about 400GB on disk) only the 4GB had issues. I'll find out
Hi,
First, there is also a migrate guide next to the changes log:
http://lucene.apache.org/core/4_10_3/MIGRATE.html
1. If you implement analyzer, you have to override createComponents() which
return TokenStreamComponents objects. See other Analyzer’s source code to
understand how to use
Hi all,
I am currently in the process of upgrading a search engine application from
Lucene 3.5.0 to version 4.10.3. There have been some substantial API changes in
version 4 that break backward compatibility. I have managed to fix most of
them, but a few issues remain that I could use some h