On Sat, Jan 10, 2015 at 7:58 PM, Tom Burton-West tburt...@umich.edu wrote:
Thanks Mike,
We run our Solr 3.x indexing with 10GB/shard. I've been testing Solr 4
with 4,6, and 8GB for heap. As of Friday night when the indexes were about
half done (about 400GB on disk) only the 4GB had issues.
Hi all,
I am currently in the process of upgrading a search engine application from
Lucene 3.5.0 to version 4.10.3. There have been some substantial API changes in
version 4 that break backward compatibility. I have managed to fix most of
them, but a few issues remain that I could use some
Hi,
First, there is also a migrate guide next to the changes log:
http://lucene.apache.org/core/4_10_3/MIGRATE.html
1. If you implement analyzer, you have to override createComponents() which
return TokenStreamComponents objects. See other Analyzer’s source code to
understand how to
the highlighter's SimpleSpanFragmenter has a bug
documented in https://issues.apache.org/jira/browse/LUCENE-2229
that practically makes it unusable with PhraseQuery
I can confirm that the bug still exists in version 4.10
(the JIRA issue was created back in year 2010)
the symptom is that
if there
I wanted to know whats the difference betwen the two ways that I am getting
a list of all segment files belonging to a segment?
method1 never returns .liv files.
https://gist.github.com/vthacker/98065232c3d2da579700
--
Regards,
Varun Thacker
http://www.vthacker.in/
files are either per-segment or per-commit.
the first only returns per-segment files. this means it won't include
any per-commit files:
* segments_N itself
* generational .liv for deletes
* generational .fnm/.dvd/etc for docvalues updates.
the second includes per-commit files, too. it doesnt
Hi Uwe,
Thanks a lot for the detailed reply. I'll see how far I get with it, but being
quite new to Lucene, it seems I am lacking a bit of background information to
fully understand the response below. In particular, I need to do some
background reading on how token streams and readers work,
Thanks Robert for pointing out the difference.
On Sun, Jan 11, 2015 at 10:29 PM, Robert Muir rcm...@gmail.com wrote:
files are either per-segment or per-commit.
the first only returns per-segment files. this means it won't include
any per-commit files:
* segments_N itself
* generational
Hi,
I am trying to implement a custom tokenizer for my application and I have
few queries regarding the same.
1. Is there a way to provide an existing analyzer (say EnglishAnanlyzer)
the custom tokenizer and make it use this tokenizer instead of say
StandardTokenizer?
2. Why are analyzers such as