Re: IndexReader close listeners and NRT

2013-11-08 Thread Michael McCandless
On Fri, Nov 8, 2013 at 12:22 AM, Ravikumar Govindarajan
ravikumar.govindara...@gmail.com wrote:
 So, in your code, reader is the top-level reader, not the one
 segment you are pulling a scorer on (context.reader()).

 So you are building your cache on the top-level reader, not the
 segment's reader?  Is that intentional?  (It's not NRT friendly).

 Not really. It is an IndexSearcher(AtomicReader) that populates the BitSet

Hmm, I see the code referencing reader but it never assigns it?  So
I assumed this was your toplevel reader (somewhere).  Maybe you are
missing an AtomicReader reader = context.getReader() in that code?

 But, yes, your ReaderClosedListener will be called once that top-level
 reader is closed, and that will then evict its entries from the cache.

 This is the current problem I am facing. I actually want to key on
 CoreClosedListener for this cache, but lucene exposes only a
 ReaderClosedListener(), which causes frequent purge/build of the cache
 during NRT life-cycle.

 Is it possible to hook into a CoreClosedListener somehow, so that I can
 mimic FieldCacheImpl behavior and become free from NRT logic?

You can cast the AtomicReader to SegmentReader and call .addCoreClosedListener?

 Also, when we have a getCoreCacheKey() exposed from IndexReader, should we
 also not have a addCoreClosedListener() in it? Will it cause too much
 confusion, as only SegmentReader might have a valid impl for that method?

You really should only use .getCoreCacheKey on SegmentReader; all
other impls will just return this (and then you have full cache
turnover after every NRT reopen).

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Twitter analyser

2013-11-08 Thread Lance Norskog
This is a parts-of-speech analyzer for tweets. It would make your index 
far more useful.


http://www.ark.cs.cmu.edu/TweetNLP/

On 11/04/2013 11:40 PM, Stéphane Nicoll wrote:

Hi,

I am building an application that indexes tweet and offer some basic
search facilities on them.

I am trying to find a combination where the following would work:

* foo matches the foo word, a mention (@foo) or the hashtag (#foo)
* @foo only matches the mention
* #foo matches only the hashtag

It should matches complete word so I used the WhiteSpaceAnalyzer for indexing.

Any recommendation for this use case?

Thanks !
S.

Sent from my iPhone

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org