Re: How to add ASCIIFoldingFilter in ClassicAnalyzer
Hi Adrien Thanks a lot for the pointer. -- Kumaran R On Wed, Oct 19, 2016 at 8:07 PM, Adrien Grandwrote: > You would need to override the wrapComponents method in order to wrap the > tokenstream. See for instance Lucene's LimitTokenCountAnalyzer. > > Le mar. 18 oct. 2016 à 18:46, Kumaran Ramasubramanian > a écrit : > > > Hi Adrien > > > > How to do this? Any Pointers? > > > > > > > If it is fine to add the ascii folding filter at the end of the > analysis > > > > chain, then you could use AnalyzerWrapper. > > > > > > > > > > > > > - > > Kumaran R > > > > > > > > > > > > > > > > > > > > On Tue, Oct 11, 2016 at 9:59 PM, Kumaran Ramasubramanian < > > kums@gmail.com > > > wrote: > > > > > > > > > > > @Ahmet, Uwe: Thanks a lot for your suggestion. Already i have written > > > custom analyzer as you said. But just trying to avoid new component in > my > > > search flow. > > > > > > @Adrien: how to add filter using AnalyzerWrapper. Any pointers? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Oct 11, 2016 at 8:16 PM, Uwe Schindler > wrote: > > > > > >> I'd suggest to use CustomAnalyzer for defining your own analyzer. This > > >> allows to build your own analyzer with the components (tokenizers and > > >> filters) you like to have. > > >> > > >> Uwe > > >> > > >> - > > >> Uwe Schindler > > >> H.-H.-Meier-Allee 63, D-28213 Bremen > > >> http://www.thetaphi.de > > >> eMail: u...@thetaphi.de > > >> > > >> > -Original Message- > > >> > From: Adrien Grand [mailto:jpou...@gmail.com] > > >> > Sent: Tuesday, October 11, 2016 4:37 PM > > >> > To: java-user@lucene.apache.org > > >> > Subject: Re: How to add ASCIIFoldingFilter in ClassicAnalyzer > > >> > > > >> > Hi Kumaran, > > >> > > > >> > If it is fine to add the ascii folding filter at the end of the > > analysis > > >> > chain, then you could use AnalyzerWrapper. Otherwise, you need to > > >> create a > > >> > new analyzer that has the same analysis chain as ClassicAnalyzer, > plus > > >> an > > >> > ASCIIFoldingFilter. > > >> > > > >> > Le mar. 11 oct. 2016 à 16:22, Kumaran Ramasubramanian > > >> > > > >> > a écrit : > > >> > > > >> > > Hi All, > > >> > > > > >> > > Is there any way to add ASCIIFoldingFilter over ClassicAnalyzer > > >> without > > >> > > writing a new custom analyzer ? should i extend > StopwordAnalyzerBase > > >> > again? > > >> > > > > >> > > > > >> > > I know that ClassicAnalyzer is final. any special purpose for > making > > >> it as > > >> > > final? Because, StandardAnalyzer was not final before ? > > >> > > > > >> > > public final class ClassicAnalyzer extends StopwordAnalyzerBase > > >> > > > > > >> > > > > >> > > > > >> > > -- > > >> > > Kumaran R > > >> > > > > >> > > >> > > >> - > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> > > >> > > > > > >
RE: Can ByteBufferIndexInput use buffering?
Hi, adding buffering to ByteBufferIndexInput would not only be an anti-pattern, it would also slowdown. What is the sense of coping data from memory location A to memory location B before reading? I'd suggest to read this and understand what virtual memory and ByteBufferIndexInput does before trying to do anything like this: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Kind regards, Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Ravikumar Govindarajan [mailto:ravikumar.govindara...@gmail.com] > Sent: Thursday, October 20, 2016 9:26 AM > To: java-user@lucene.apache.org > Subject: Can ByteBufferIndexInput use buffering? > > When we use NIOFSDirectory, lucene internally uses buffering via > BufferedIndexInput (1KB etc...) while reading from the file.. > > However, for MmapDirectory (ByteBufferIndexInput) there is no such > buffering & data is read from the mapped bytes directly... > > Will it be too much of a performance drag if I wrap ByteBufferIndexInput > with a BufferedIndex? I mean like, is it an anti-pattern of zero-copy reads > etc... > > Any help is much appreciated > > -- > Ravi - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Can ByteBufferIndexInput use buffering?
The fact that MMapIndexInput does no buffering is an important performance gain vs NIOFSDirectory which e.g. on seeking to a term loads way too many bytes. Why do you want to add buffering to it? The OS should already do a good job keeping recently accessed pages hot, doing the buffering for you. Mike McCandless http://blog.mikemccandless.com On Thu, Oct 20, 2016 at 3:25 AM, Ravikumar Govindarajanwrote: > When we use NIOFSDirectory, lucene internally uses buffering via > BufferedIndexInput (1KB etc...) while reading from the file.. > > However, for MmapDirectory (ByteBufferIndexInput) there is no such > buffering & data is read from the mapped bytes directly... > > Will it be too much of a performance drag if I wrap ByteBufferIndexInput > with a BufferedIndex? I mean like, is it an anti-pattern of zero-copy reads > etc... > > Any help is much appreciated > > -- > Ravi - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: ReaderManager, more drama with things not being closed before closing the Directory
Maybe you can contribute the code you have for managing multiple indices and we can iterate/debug from there? Somehow we need to expose this failure in a standalone test case so we can isolate it. Mike McCandless http://blog.mikemccandless.com On Thu, Oct 20, 2016 at 1:57 AM, Trejkazwrote: > Hi all. > > I seem to have a situation where ReaderManager is reducing a refCount > to 0 before it actually releases all its references. > > It's difficult because it's all mixed up in our framework for multiple > ReaderManagers, which I'm still not convinced works because the > concurrency is impossible to figure out, and probably won't be allowed > to publish in order to have anyone at Lucene look at it either. (Which > is why I hope that someone at Lucene figures out how to manage more > than one index reliably one day...) > > The stack trace trying to close the directory is just trying to > refresh the reader, but I guess this reader was the last one using a > Directory, so now we're closing that as well: > > java.lang.RuntimeException: Resources inside the directory did not > get closed before closing the directory > at > com.acme.storage.textindex.store.CloseCheckingDirectory.close(CloseCheckingDirectory.java:109) > at > com.acme.storage.textindex.index.DefaultIndexReaderSharer$IndexReaderWrapper.release(DefaultIndexReaderSharer.java:146) > at > com.acme.storage.textindex.index.DefaultIndexReaderSharer$IndexReaderWrapper.access$100(DefaultIndexReaderSharer.java:77) > at > com.acme.storage.textindex.index.DefaultIndexReaderSharer.release(DefaultIndexReaderSharer.java:45) > at > com.acme.storage.textindex.DefaultTextIndex$WrappingReaderManager$1.doClose(DefaultTextIndex.java:370) > at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:253) > at > com.acme.storage.textindex.DefaultTextIndex$WrappingReaderManager.decRef(DefaultTextIndex.java:331) > at > com.acme.storage.textindex.DefaultTextIndex$WrappingReaderManager.decRef(DefaultTextIndex.java:306) > at > org.apache.lucene.search.ReferenceManager.release(ReferenceManager.java:274) > at > org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:189) > at > org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) > > The stack trace which opened the resource and didn't close it is > apparently the first reader which ReaderManager: > > Caused by: java.lang.RuntimeException: unclosed IndexInput: _7d.tvd > at > com.acme.storage.textindex.store.CloseCheckingDirectory.addOpenResource(CloseCheckingDirectory.java:82) > at > com.acme.storage.textindex.store.CloseCheckingDirectory.openInput(CloseCheckingDirectory.java:57) > at > org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.(CompressingTermVectorsReader.java:144) > at > org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat.vectorsReader(CompressingTermVectorsFormat.java:91) > at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:120) > at org.apache.lucene.index.SegmentReader.(SegmentReader.java:65) > at > org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58) > at > org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50) > at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731) > at > org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50) > at > org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) > at > com.acme.storage.textindex.index.DefaultIndexReaderSharer$CustomReaderManager.(DefaultIndexReaderSharer.java:164) > > But if it's the first reader held by the ReaderManager, I wouldn't > expect the refCount to be 0, so it shouldn't be closing the directory. > > I can't reproduce this myself, so I can't just dump out conveniently > placed messages to figure out how it's happening... > > But has anyone else seen something like this? > > CustomReaderManager is probably shareable, it just does this: > > private static class CustomReaderManager extends > ReferenceManager { > private CustomReaderManager(Directory directory) throws IOException { > current = > UnInvertingDirectoryReader.wrap(DirectoryReader.open(directory)); > } > > @Override > protected void decRef(DirectoryReader reference) throws IOException { > reference.decRef(); > } > > @Override > protected DirectoryReader refreshIfNeeded(DirectoryReader > referenceToRefresh) throws IOException { > return DirectoryReader.openIfChanged(referenceToRefresh); > } > > @Override > protected boolean tryIncRef(DirectoryReader reference) { > return
Can ByteBufferIndexInput use buffering?
When we use NIOFSDirectory, lucene internally uses buffering via BufferedIndexInput (1KB etc...) while reading from the file.. However, for MmapDirectory (ByteBufferIndexInput) there is no such buffering & data is read from the mapped bytes directly... Will it be too much of a performance drag if I wrap ByteBufferIndexInput with a BufferedIndex? I mean like, is it an anti-pattern of zero-copy reads etc... Any help is much appreciated -- Ravi