Re: How to add ASCIIFoldingFilter in ClassicAnalyzer

2016-10-20 Thread Kumaran Ramasubramanian
Hi Adrien

Thanks a lot for the pointer.


--
Kumaran R


On Wed, Oct 19, 2016 at 8:07 PM, Adrien Grand  wrote:

> You would need to override the wrapComponents method in order to wrap the
> tokenstream. See for instance Lucene's LimitTokenCountAnalyzer.
>
> Le mar. 18 oct. 2016 à 18:46, Kumaran Ramasubramanian 
> a écrit :
>
> > Hi Adrien
> >
> > How to do this? Any Pointers?
> >
> > ​
> > > If it is fine to add the ascii folding filter at the end of the
> analysis
> >
> > chain, then you could use AnalyzerWrapper. ​
> > >
> >
> >
> >
> >
> > ​-
> > Kumaran R​
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Oct 11, 2016 at 9:59 PM, Kumaran Ramasubramanian <
> > kums@gmail.com
> > > wrote:
> >
> > >
> > >
> > > @Ahmet, Uwe: Thanks a lot for your suggestion. Already i have written
> > > custom analyzer as you said. But just trying to avoid new component in
> my
> > > search flow.
> > >
> > > @Adrien: how to add filter using AnalyzerWrapper. Any pointers?
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Oct 11, 2016 at 8:16 PM, Uwe Schindler 
> wrote:
> > >
> > >> I'd suggest to use CustomAnalyzer for defining your own analyzer. This
> > >> allows to build your own analyzer with the components (tokenizers and
> > >> filters) you like to have.
> > >>
> > >> Uwe
> > >>
> > >> -
> > >> Uwe Schindler
> > >> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >> http://www.thetaphi.de
> > >> eMail: u...@thetaphi.de
> > >>
> > >> > -Original Message-
> > >> > From: Adrien Grand [mailto:jpou...@gmail.com]
> > >> > Sent: Tuesday, October 11, 2016 4:37 PM
> > >> > To: java-user@lucene.apache.org
> > >> > Subject: Re: How to add ASCIIFoldingFilter in ClassicAnalyzer
> > >> >
> > >> > Hi Kumaran,
> > >> >
> > >> > If it is fine to add the ascii folding filter at the end of the
> > analysis
> > >> > chain, then you could use AnalyzerWrapper. Otherwise, you need to
> > >> create a
> > >> > new analyzer that has the same analysis chain as ClassicAnalyzer,
> plus
> > >> an
> > >> > ASCIIFoldingFilter.
> > >> >
> > >> > Le mar. 11 oct. 2016 à 16:22, Kumaran Ramasubramanian
> > >> > 
> > >> > a écrit :
> > >> >
> > >> > > Hi All,
> > >> > >
> > >> > >   Is there any way to add ASCIIFoldingFilter over ClassicAnalyzer
> > >> without
> > >> > > writing a new custom analyzer ? should i extend
> StopwordAnalyzerBase
> > >> > again?
> > >> > >
> > >> > >
> > >> > > I know that ClassicAnalyzer is final. any special purpose for
> making
> > >> it as
> > >> > > final? Because, StandardAnalyzer was not final before ?
> > >> > >
> > >> > > public final class ClassicAnalyzer extends StopwordAnalyzerBase
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Kumaran R
> > >> > >
> > >>
> > >>
> > >> -
> > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >>
> > >>
> > >
> >
>


RE: Can ByteBufferIndexInput use buffering?

2016-10-20 Thread Uwe Schindler
Hi,

adding buffering to ByteBufferIndexInput would not only be an anti-pattern, it 
would also slowdown. What is the sense of coping data from memory location A to 
memory location B before reading?

I'd suggest to read this and understand what virtual memory and 
ByteBufferIndexInput does before trying to do anything like this: 
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Kind regards,
Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Ravikumar Govindarajan [mailto:ravikumar.govindara...@gmail.com]
> Sent: Thursday, October 20, 2016 9:26 AM
> To: java-user@lucene.apache.org
> Subject: Can ByteBufferIndexInput use buffering?
> 
> When we use NIOFSDirectory, lucene internally uses buffering via
> BufferedIndexInput (1KB etc...) while reading from the file..
> 
> However, for MmapDirectory (ByteBufferIndexInput) there is no such
> buffering & data is read from the mapped bytes directly...
> 
> Will it be too much of a performance drag if I wrap ByteBufferIndexInput
> with a BufferedIndex? I mean like, is it an anti-pattern of zero-copy reads
> etc...
> 
> Any help is much appreciated
> 
> --
> Ravi


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Can ByteBufferIndexInput use buffering?

2016-10-20 Thread Michael McCandless
The fact that MMapIndexInput does no buffering is an important
performance gain vs NIOFSDirectory which e.g. on seeking to a term
loads way too many bytes.

Why do you want to add buffering to it?

The OS should already do a good job keeping recently accessed pages
hot, doing the buffering for you.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Oct 20, 2016 at 3:25 AM, Ravikumar Govindarajan
 wrote:
> When we use NIOFSDirectory, lucene internally uses buffering via
> BufferedIndexInput (1KB etc...) while reading from the file..
>
> However, for MmapDirectory (ByteBufferIndexInput) there is no such
> buffering & data is read from the mapped bytes directly...
>
> Will it be too much of a performance drag if I wrap ByteBufferIndexInput
> with a BufferedIndex? I mean like, is it an anti-pattern of zero-copy reads
> etc...
>
> Any help is much appreciated
>
> --
> Ravi

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ReaderManager, more drama with things not being closed before closing the Directory

2016-10-20 Thread Michael McCandless
Maybe you can contribute the code you have for managing multiple
indices and we can iterate/debug from there?

Somehow we need to expose this failure in a standalone test case so we
can isolate it.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Oct 20, 2016 at 1:57 AM, Trejkaz  wrote:
> Hi all.
>
> I seem to have a situation where ReaderManager is reducing a refCount
> to 0 before it actually releases all its references.
>
> It's difficult because it's all mixed up in our framework for multiple
> ReaderManagers, which I'm still not convinced works because the
> concurrency is impossible to figure out, and probably won't be allowed
> to publish in order to have anyone at Lucene look at it either. (Which
> is why I hope that someone at Lucene figures out how to manage more
> than one index reliably one day...)
>
> The stack trace trying to close the directory is just trying to
> refresh the reader, but I guess this reader was the last one using a
> Directory, so now we're closing that as well:
>
> java.lang.RuntimeException: Resources inside the directory did not
> get closed before closing the directory
> at 
> com.acme.storage.textindex.store.CloseCheckingDirectory.close(CloseCheckingDirectory.java:109)
> at 
> com.acme.storage.textindex.index.DefaultIndexReaderSharer$IndexReaderWrapper.release(DefaultIndexReaderSharer.java:146)
> at 
> com.acme.storage.textindex.index.DefaultIndexReaderSharer$IndexReaderWrapper.access$100(DefaultIndexReaderSharer.java:77)
> at 
> com.acme.storage.textindex.index.DefaultIndexReaderSharer.release(DefaultIndexReaderSharer.java:45)
> at 
> com.acme.storage.textindex.DefaultTextIndex$WrappingReaderManager$1.doClose(DefaultTextIndex.java:370)
> at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:253)
> at 
> com.acme.storage.textindex.DefaultTextIndex$WrappingReaderManager.decRef(DefaultTextIndex.java:331)
> at 
> com.acme.storage.textindex.DefaultTextIndex$WrappingReaderManager.decRef(DefaultTextIndex.java:306)
> at 
> org.apache.lucene.search.ReferenceManager.release(ReferenceManager.java:274)
> at 
> org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:189)
> at 
> org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
>
> The stack trace which opened the resource and didn't close it is
> apparently the first reader which ReaderManager:
>
> Caused by: java.lang.RuntimeException: unclosed IndexInput: _7d.tvd
> at 
> com.acme.storage.textindex.store.CloseCheckingDirectory.addOpenResource(CloseCheckingDirectory.java:82)
> at 
> com.acme.storage.textindex.store.CloseCheckingDirectory.openInput(CloseCheckingDirectory.java:57)
> at 
> org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.(CompressingTermVectorsReader.java:144)
> at 
> org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat.vectorsReader(CompressingTermVectorsFormat.java:91)
> at 
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:120)
> at org.apache.lucene.index.SegmentReader.(SegmentReader.java:65)
> at 
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
> at 
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
> at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731)
> at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
> at 
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
> at 
> com.acme.storage.textindex.index.DefaultIndexReaderSharer$CustomReaderManager.(DefaultIndexReaderSharer.java:164)
>
> But if it's the first reader held by the ReaderManager, I wouldn't
> expect the refCount to be 0, so it shouldn't be closing the directory.
>
> I can't reproduce this myself, so I can't just dump out conveniently
> placed messages to figure out how it's happening...
>
> But has anyone else seen something like this?
>
> CustomReaderManager is probably shareable, it just does this:
>
> private static class CustomReaderManager extends
> ReferenceManager {
> private CustomReaderManager(Directory directory) throws IOException {
> current =
> UnInvertingDirectoryReader.wrap(DirectoryReader.open(directory));
> }
>
> @Override
> protected void decRef(DirectoryReader reference) throws IOException {
> reference.decRef();
> }
>
> @Override
> protected DirectoryReader refreshIfNeeded(DirectoryReader
> referenceToRefresh) throws IOException {
> return DirectoryReader.openIfChanged(referenceToRefresh);
> }
>
> @Override
> protected boolean tryIncRef(DirectoryReader reference) {
> return 

Can ByteBufferIndexInput use buffering?

2016-10-20 Thread Ravikumar Govindarajan
When we use NIOFSDirectory, lucene internally uses buffering via
BufferedIndexInput (1KB etc...) while reading from the file..

However, for MmapDirectory (ByteBufferIndexInput) there is no such
buffering & data is read from the mapped bytes directly...

Will it be too much of a performance drag if I wrap ByteBufferIndexInput
with a BufferedIndex? I mean like, is it an anti-pattern of zero-copy reads
etc...

Any help is much appreciated

--
Ravi