date:20140514

Re: writer.updateDocument() not working (possible bug?)

2014-05-14 Thread Michael McCandless

How did you produce the document that you are sending to
updateDocument?  Are you loading it from IndexReader.document() or
IndexSearcher.doc(), changing it, then passing that to
IW.updateDocument?  If so, that's probably your bug: a loaded document
is not identical to the original Document you indexed.  In 5.0 we've
fixed this to be strongly typed ...

Mike McCandless

http://blog.mikemccandless.com


On Tue, May 13, 2014 at 9:24 AM, Jamie  wrote:
> Greetings
>
> I am using Lucene NRT search. After calling writer.updateDocument(term, doc)
> and then search(), the document is no longer visible in the search results.
> The program must be restarted to see it again. In addition, the update is
> not being applied. The original document (before the update) is visible in
> the search results. If updateDocument(term,doc) is called, passing the
> original doc (without any changes), the doc is still removed from the index
> (i.e. the change is not the cause). On each search I am calling indexReader
> = DirectoryReader.open(writer, true); We have tried to call commit() and/or
> close() immediately after the update, but it makes no difference.
>
> This occurs both in Lucene 4.7.2 and 4.8. As far as we know, our code used
> to work with prior versions of Lucene. Has anyone encountered this?
>
> Regards
>
> Jamie
>
>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene: Index Writer to write in multiple file instead make one heavy file

2014-05-14 Thread Yogesh patel

Thanks for reply!!!

Can you please provide me sample code for it? I got the idea but i dont
know how to implement it.

Thanks


On Tue, May 13, 2014 at 7:02 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> You can tell the MergePolicy to limit the maximum size of segments it
> should merge.
>
> Also, you should try to upgrade: 3.0.1 is REALLY old.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, May 13, 2014 at 1:58 AM, Yogesh patel
>  wrote:
> > HI
> >
> > I am using lucene 3.0.1. I am writing many documents with lucene
> > Indexwriter. But Indexwriter add all documents into file which becomes
> more
> > than 4GB in my case. so can i distribute files or partitioned ?
> >
> > --
> >
> >
> >
> >
> > *Regards,Yogesh Patel*
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 




*Regards,Yogesh Patel*

Issue with Lucene 3.6.1 and MMapDirectory

2014-05-14 Thread Liviu Matei

Hi,

I am encountering the following issue with Lucene 3.6.1 if you could let me
know if I am doing something wrong / there is a mistake I am making it
would be great.

In order to improve the performance of the application that I am working at
I went to the approach of reusing the IndexReader and reopening it every 8
hours in order to get the latest changes. (IndexReader is declared as a
global static variable). The search method is called from multiple threads
in parallel so the index reader is shared between threads. Now if I don't
close the old index reader I am noticing increases of virtual memory with
every new reindex reopen (this should not be an issue on 64 bit Linux
correct - this is the configuration I am using and the indexes are on a
shared mount NTFS file system ).
   Also from time to times I noticed JVM crasches with the following stack:
 Thread Stack Trace:
at memcpy+160()@0x381aa7b060
-- Java stack --
at java/nio/DirectByteBuffer.get(DirectByteBuffer.java:294)[optimized]
at
org/apache/lucene/store/MMapDirectory$MMapIndexInput.readBytes(MMapDirectory.java:298)[optimized]
at org/apache/lucene/store/DataInput.readBytes(DataInput.java:72)
at
org/apache/lucene/index/CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:275)[optimized]
at
org/apache/lucene/store/BufferedIndexInput.refill(BufferedIndexInput.java:270)[optimized]
at
org/apache/lucene/store/BufferedIndexInput.readByte(BufferedIndexInput.java:40)[inlined]
at
org/apache/lucene/store/DataInput.readVInt(DataInput.java:107)[inlined]
at
org/apache/lucene/store/BufferedIndexInput.readVInt(BufferedIndexInput.java:217)[optimized]
at org/apache/lucene/index/FieldsReader.doc(FieldsReader.java:235)
at
org/apache/lucene/index/SegmentReader.document(SegmentReader.java:492)
at
org/apache/lucene/index/DirectoryReader.document(DirectoryReader.java:568)
at org/apache/lucene/index/MultiReader.document(MultiReader.java:252)
at org/apache/lucene/index/IndexReader.document(IndexReader.java:1138)
at
org/apache/lucene/search/IndexSearcher.doc(IndexSearcher.java:258)[inlined]


Can you please tell me if all this corruption is caused by the fact that I
am not closing the old IndexReader. But if I close if given that it is
share by multiple threads I will need to check each time before doing the
search if IndexReader is still open correct? Let's say in a thread I am
reopening the IndexReader and in another thread I am afterwards reusing the
old one in that case I should do the check correct? Or is there a smarter
mechanism in place.

Any help with this would be more than welcome.


Thank you very much and best regards,
Liviu

Re: best choice for ramBufferSizeMB

2014-05-14 Thread Shai Erera

Well, first make sure that you set ramBufferSizeMB to well below the max
Java heap size, otherwise you could run into OOMs.

While a larger RAM buffer may speed up indexing (since it flushes less
often to disk), it's not the only factor that affects indexing speed.

For instance, if a big portion of your indexing work is reading the files
from a slow storage device (maybe NFS share, remote Http etc.), then that
could easily shadow any benefits of using large RAM buffer.

Also, do you index with a single or multiple threads? Lucene supports
multi-threaded indexing, and it's recommended to do whenever you can, e.g.
when you run on a sufficiently strong HW (4+ cores...).

Another thing, in the past I noticed that too long RAM buffers did not
improve indexing at all e.g. if your underlying IO system is slow (e.g.
indexing to an NFS share, distributed file-system etc.), then the cost of
flushing a big RAM buffer became significant, more than indexing in RAM,
and e.g. I did not observe improvements when using ramBufferSizeMB=512 vs
128. Also, using a big RAM buffer uses more space on the heap, and makes
the job of the GC harder. So I think it might be that a too big RAM buffer
may actually slow things down, rather than speed up.

Indexing speed is affected by multiple parameters, the RAM buffer is only
one of them...

Shai

On Wed, May 14, 2014 at 4:33 PM, Gudrun Siedersleben <
siedersle...@mpdl.mpg.de> wrote:

> Hi all,
>
> we want to speed up building our lucene index.  We set ramBufferSize to
> some values between 32 and 128 MB, but that does not make any difference
> concerning the time used for reindexing. We did not set maxBufferedDocs, ..
> which could conflict.
> We start the JVM with the following JAVA_OPTS:
>
> -Xms128m -Xmx512m -XX:MaxPermSize=256m
>
> What is the recommended value for ramBufferSizeMB depending on JAVA_OPTS
> and perhaps other lucene parameters set? We use Lucene 3.6.0.
>
> Best regards
>
> Gudrun
>
>
>

Re: best choice for ramBufferSizeMB

2014-05-14 Thread Michael McCandless

Generally larger is better, as long as JVM's heap is big enough to
allow IW to use that RAM.

Large flushed segments means less merging later ...

Mike McCandless

http://blog.mikemccandless.com


On Wed, May 14, 2014 at 9:33 AM, Gudrun Siedersleben
 wrote:
> Hi all,
>
> we want to speed up building our lucene index.  We set ramBufferSize to some 
> values between 32 and 128 MB, but that does not make any difference 
> concerning the time used for reindexing. We did not set maxBufferedDocs, .. 
> which could conflict.
> We start the JVM with the following JAVA_OPTS:
>
> -Xms128m -Xmx512m -XX:MaxPermSize=256m
>
> What is the recommended value for ramBufferSizeMB depending on JAVA_OPTS and 
> perhaps other lucene parameters set? We use Lucene 3.6.0.
>
> Best regards
>
> Gudrun
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: ConcurrentModificationException in ICU analyzer

2014-05-14 Thread Robert Muir

fyi: this bug was already found and fixed in ICU's trunk:
http://bugs.icu-project.org/trac/ticket/10767


On Wed, May 14, 2014 at 4:32 AM, Robert Muir  wrote:
> This looks like a bug in ICU? I'll try to reproduce it. We are also a
> little out of date, maybe they've already fixed it.
>
> Thank you for reporting this.
>
> On Fri, May 9, 2014 at 12:14 PM, feedly team  wrote:
>> I am using the 4.7.0 ICU analyzer (via elastic search) and noticed this
>> exception in the logs. It's sporadic. Any ideas what is going on or if this
>> is already fixed:
>>
>> java.util.ConcurrentModificationException
>>
>> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894)
>>
>> at java.util.HashMap$KeyIterator.next(HashMap.java:928)
>>
>> at
>> com.ibm.icu.text.RuleBasedBreakIterator.getEngineFor(RuleBasedBreakIterator.java:1011)
>>
>> at
>> com.ibm.icu.text.RuleBasedBreakIterator.handleNext(RuleBasedBreakIterator.java:1085)
>>
>> at
>> com.ibm.icu.text.RuleBasedBreakIterator.next(RuleBasedBreakIterator.java:449)
>>
>> at
>> org.apache.lucene.analysis.icu.segmentation.BreakIteratorWrapper$RBBIWrapper.next(BreakIteratorWrapper.java:96)
>>
>> at
>> org.apache.lucene.analysis.icu.segmentation.CompositeBreakIterator.next(CompositeBreakIterator.java:60)
>>
>> at
>> org.apache.lucene.analysis.icu.segmentation.ICUTokenizer.incrementTokenBuffer(ICUTokenizer.java:212)
>>
>> at
>> org.apache.lucene.analysis.icu.segmentation.ICUTokenizer.incrementToken(ICUTokenizer.java:106)
>>
>> at
>> org.apache.lucene.analysis.icu.ICUNormalizer2Filter.incrementToken(ICUNormalizer2Filter.java:80)
>>
>> at
>> org.apache.lucene.analysis.miscellaneous.KeywordMarkerFilter.incrementToken(KeywordMarkerFilter.java:45)
>>
>> at
>> org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:82)
>>
>> at
>> org.apache.lucene.analysis.en.EnglishPossessiveFilter.incrementToken(EnglishPossessiveFilter.java:57)
>>
>> at
>> org.apache.lucene.analysis.util.ElisionFilter.incrementToken(ElisionFilter.java:52)
>>
>> at
>> org.apache.lucene.analysis.icu.ICUNormalizer2Filter.incrementToken(ICUNormalizer2Filter.java:80)
>>
>> at
>> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:182)
>>
>> at
>> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
>>
>> at
>> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
>>
>> at
>> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:455)
>>
>> at
>> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1534)
>>
>> at
>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1204)
>>
>> at
>> org.elasticsearch.index.engine.internal.InternalEngine.innerCreate(InternalEngine.java:451)
>>
>> at
>> org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:382)
>>
>> at
>> org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:374)
>>
>> at
>> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:431)
>>
>> at
>> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:160)
>>
>> at
>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
>>
>> at
>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
>> at java.lang.Thread.run(Thread.java:722)

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

AW: Issue with Lucene 3.6.1 and MMapDirectory

2014-05-14 Thread Clemens Wyss DEV

> But if I close if given that it is share by multiple threads I will need to 
> check each time 
>before doing the search if IndexReader is still open correct?
You can make use of IndexReader#incRef/#decRef , i.e.
ir.incRef();
try
{

Or maybe SearcherManager 
http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html  
may fit your needs?

-Ursprüngliche Nachricht-
Von: Liviu Matei [mailto:liviu...@gmail.com] 
Gesendet: Mittwoch, 14. Mai 2014 11:06
An: java-user@lucene.apache.org
Betreff: Issue with Lucene 3.6.1 and MMapDirectory

Hi,

I am encountering the following issue with Lucene 3.6.1 if you could let me 
know if I am doing something wrong / there is a mistake I am making it would be 
great.

In order to improve the performance of the application that I am working at I 
went to the approach of reusing the IndexReader and reopening it every 8 hours 
in order to get the latest changes. (IndexReader is declared as a global static 
variable). The search method is called from multiple threads in parallel so the 
index reader is shared between threads. Now if I don't close the old index 
reader I am noticing increases of virtual memory with every new reindex reopen 
(this should not be an issue on 64 bit Linux correct - this is the 
configuration I am using and the indexes are on a shared mount NTFS file system 
).
   Also from time to times I noticed JVM crasches with the following stack:
 Thread Stack Trace:
at memcpy+160()@0x381aa7b060
-- Java stack --
at java/nio/DirectByteBuffer.get(DirectByteBuffer.java:294)[optimized]
at
org/apache/lucene/store/MMapDirectory$MMapIndexInput.readBytes(MMapDirectory.java:298)[optimized]
at org/apache/lucene/store/DataInput.readBytes(DataInput.java:72)
at
org/apache/lucene/index/CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:275)[optimized]
at
org/apache/lucene/store/BufferedIndexInput.refill(BufferedIndexInput.java:270)[optimized]
at
org/apache/lucene/store/BufferedIndexInput.readByte(BufferedIndexInput.java:40)[inlined]
at
org/apache/lucene/store/DataInput.readVInt(DataInput.java:107)[inlined]
at
org/apache/lucene/store/BufferedIndexInput.readVInt(BufferedIndexInput.java:217)[optimized]
at org/apache/lucene/index/FieldsReader.doc(FieldsReader.java:235)
at
org/apache/lucene/index/SegmentReader.document(SegmentReader.java:492)
at
org/apache/lucene/index/DirectoryReader.document(DirectoryReader.java:568)
at org/apache/lucene/index/MultiReader.document(MultiReader.java:252)
at org/apache/lucene/index/IndexReader.document(IndexReader.java:1138)
at
org/apache/lucene/search/IndexSearcher.doc(IndexSearcher.java:258)[inlined]


Can you please tell me if all this corruption is caused by the fact that I am 
not closing the old IndexReader. But if I close if given that it is share by 
multiple threads I will need to check each time before doing the search if 
IndexReader is still open correct? Let's say in a thread I am reopening the 
IndexReader and in another thread I am afterwards reusing the old one in that 
case I should do the check correct? Or is there a smarter mechanism in place.

Any help with this would be more than welcome.


Thank you very much and best regards,
Liviu

Re: ConcurrentModificationException in ICU analyzer

2014-05-14 Thread Robert Muir

I opened https://issues.apache.org/jira/browse/LUCENE-5671

for now, if you are able to use the latest release of ICU, it should
prevent the bug.

On Wed, May 14, 2014 at 11:47 AM, Robert Muir  wrote:
> fyi: this bug was already found and fixed in ICU's trunk:
> http://bugs.icu-project.org/trac/ticket/10767
>
>
> On Wed, May 14, 2014 at 4:32 AM, Robert Muir  wrote:
>> This looks like a bug in ICU? I'll try to reproduce it. We are also a
>> little out of date, maybe they've already fixed it.
>>
>> Thank you for reporting this.
>>
>> On Fri, May 9, 2014 at 12:14 PM, feedly team  wrote:
>>> I am using the 4.7.0 ICU analyzer (via elastic search) and noticed this
>>> exception in the logs. It's sporadic. Any ideas what is going on or if this
>>> is already fixed:
>>>
>>> java.util.ConcurrentModificationException
>>>
>>> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894)
>>>
>>> at java.util.HashMap$KeyIterator.next(HashMap.java:928)
>>>
>>> at
>>> com.ibm.icu.text.RuleBasedBreakIterator.getEngineFor(RuleBasedBreakIterator.java:1011)
>>>
>>> at
>>> com.ibm.icu.text.RuleBasedBreakIterator.handleNext(RuleBasedBreakIterator.java:1085)
>>>
>>> at
>>> com.ibm.icu.text.RuleBasedBreakIterator.next(RuleBasedBreakIterator.java:449)
>>>
>>> at
>>> org.apache.lucene.analysis.icu.segmentation.BreakIteratorWrapper$RBBIWrapper.next(BreakIteratorWrapper.java:96)
>>>
>>> at
>>> org.apache.lucene.analysis.icu.segmentation.CompositeBreakIterator.next(CompositeBreakIterator.java:60)
>>>
>>> at
>>> org.apache.lucene.analysis.icu.segmentation.ICUTokenizer.incrementTokenBuffer(ICUTokenizer.java:212)
>>>
>>> at
>>> org.apache.lucene.analysis.icu.segmentation.ICUTokenizer.incrementToken(ICUTokenizer.java:106)
>>>
>>> at
>>> org.apache.lucene.analysis.icu.ICUNormalizer2Filter.incrementToken(ICUNormalizer2Filter.java:80)
>>>
>>> at
>>> org.apache.lucene.analysis.miscellaneous.KeywordMarkerFilter.incrementToken(KeywordMarkerFilter.java:45)
>>>
>>> at
>>> org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:82)
>>>
>>> at
>>> org.apache.lucene.analysis.en.EnglishPossessiveFilter.incrementToken(EnglishPossessiveFilter.java:57)
>>>
>>> at
>>> org.apache.lucene.analysis.util.ElisionFilter.incrementToken(ElisionFilter.java:52)
>>>
>>> at
>>> org.apache.lucene.analysis.icu.ICUNormalizer2Filter.incrementToken(ICUNormalizer2Filter.java:80)
>>>
>>> at
>>> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:182)
>>>
>>> at
>>> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
>>>
>>> at
>>> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
>>>
>>> at
>>> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:455)
>>>
>>> at
>>> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1534)
>>>
>>> at
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1204)
>>>
>>> at
>>> org.elasticsearch.index.engine.internal.InternalEngine.innerCreate(InternalEngine.java:451)
>>>
>>> at
>>> org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:382)
>>>
>>> at
>>> org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:374)
>>>
>>> at
>>> org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:431)
>>>
>>> at
>>> org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:160)
>>>
>>> at
>>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
>>>
>>> at
>>> org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
>>>
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>
>>> at java.lang.Thread.run(Thread.java:722)

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

AW: Issue with Lucene 3.6.1 and MMapDirectory

2014-05-14 Thread Clemens Wyss DEV

Not closing an IndexReader most probably (to say the least) results in a 
mem-leak -> OOM

> But if I close if given that it is share by multiple threads I will 
>need to check each time before doing the search if IndexReader is still open 
>correct?
You can make use of IndexReader#incRef/#decRef , i.e.
ir.incRef();
try
{

}
finally
{
ir.decRef();
}
...
IFF ir.getRefCount() > 1 THEN you are safe to close the "old" ir.

Maybe  SearcherManager 
http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html  
fits your needs?


-Ursprüngliche Nachricht-
Von: Liviu Matei [mailto:liviu...@gmail.com] 
Gesendet: Mittwoch, 14. Mai 2014 11:06
An: java-user@lucene.apache.org
Betreff: Issue with Lucene 3.6.1 and MMapDirectory

Hi,

I am encountering the following issue with Lucene 3.6.1 if you could let me 
know if I am doing something wrong / there is a mistake I am making it would be 
great.

In order to improve the performance of the application that I am working at I 
went to the approach of reusing the IndexReader and reopening it every 8 hours 
in order to get the latest changes. (IndexReader is declared as a global static 
variable). The search method is called from multiple threads in parallel so the 
index reader is shared between threads. Now if I don't close the old index 
reader I am noticing increases of virtual memory with every new reindex reopen 
(this should not be an issue on 64 bit Linux correct - this is the 
configuration I am using and the indexes are on a shared mount NTFS file system 
).
   Also from time to times I noticed JVM crasches with the following stack:
 Thread Stack Trace:
at memcpy+160()@0x381aa7b060
-- Java stack --
at java/nio/DirectByteBuffer.get(DirectByteBuffer.java:294)[optimized]
at
org/apache/lucene/store/MMapDirectory$MMapIndexInput.readBytes(MMapDirectory.java:298)[optimized]
at org/apache/lucene/store/DataInput.readBytes(DataInput.java:72)
at
org/apache/lucene/index/CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:275)[optimized]
at
org/apache/lucene/store/BufferedIndexInput.refill(BufferedIndexInput.java:270)[optimized]
at
org/apache/lucene/store/BufferedIndexInput.readByte(BufferedIndexInput.java:40)[inlined]
at
org/apache/lucene/store/DataInput.readVInt(DataInput.java:107)[inlined]
at
org/apache/lucene/store/BufferedIndexInput.readVInt(BufferedIndexInput.java:217)[optimized]
at org/apache/lucene/index/FieldsReader.doc(FieldsReader.java:235)
at
org/apache/lucene/index/SegmentReader.document(SegmentReader.java:492)
at
org/apache/lucene/index/DirectoryReader.document(DirectoryReader.java:568)
at org/apache/lucene/index/MultiReader.document(MultiReader.java:252)
at org/apache/lucene/index/IndexReader.document(IndexReader.java:1138)
at
org/apache/lucene/search/IndexSearcher.doc(IndexSearcher.java:258)[inlined]


Can you please tell me if all this corruption is caused by the fact that I am 
not closing the old IndexReader. But if I close if given that it is share by 
multiple threads I will need to check each time before doing the search if 
IndexReader is still open correct? Let's say in a thread I am reopening the 
IndexReader and in another thread I am afterwards reusing the old one in that 
case I should do the check correct? Or is there a smarter mechanism in place.

Any help with this would be more than welcome.


Thank you very much and best regards,
Liviu

Merger performance degradation on 3.6.1

2014-05-14 Thread danielv

Hi,

We have about 550M records index (~800GB) and we merge thousands of mini
indexes once a week using hadoop - 45 mappers on 2 hadoop nodes.
After upgrading to Lucene 3.6.1 we noticed that the merge process
continuously slowing down.
After we test a couple of options it looks like we found the source of the
problem but have no idea how to fix it.
What we do - first we merge all mini-indexes to one intermediate mini-index,
and than this one to the big (final) one.
The difference is deleted_records existence in mini-index:
In case we have no deleted_records from merged mini-indexes - merger run
about 2h with about 05s-2s per mini-index
If we have deleted_records - after about 10 minutes we see dramatic
degradation in time of merging mini-indexes to intermediate one (if first
100-200 mini-indexes merge take less than a second, after 10 minutes is take
more than 10s for one mini-index and after hour or two it is a couple of
minutes!)

This one from jstack of mapper:

   java.lang.Thread.State: RUNNABLE
at java.lang.Thread.isAlive(Native Method)
at
org.apache.lucene.util.CloseableThreadLocal.purge(CloseableThreadLocal.java:115)
- locked <0x0007db0d6140> (a java.util.WeakHashMap)
at
org.apache.lucene.util.CloseableThreadLocal.maybePurge(CloseableThreadLocal.java:105)
at
org.apache.lucene.util.CloseableThreadLocal.get(CloseableThreadLocal.java:88)
at
org.apache.lucene.index.TermInfosReader.getThreadResources(TermInfosReader.java:160)
at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:184)
at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172)
at
org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:66)
at
org.apache.lucene.index.BufferedDeletesStream.applyTermDeletes(BufferedDeletesStream.java:346)
- locked <0x0007805766f0> (a
org.apache.lucene.index.BufferedDeletesStream)
at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:248)
- locked <0x0007805766f0> (a
org.apache.lucene.index.BufferedDeletesStream)
at
org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3615)
- locked <0x0007805739a0> (a
org.apache.lucene.index.IndexWriter)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3552)
at
org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3120)
at
org.apache.lucene.index.IndexWriter.addIndexesNoOptimize(IndexWriter.java:3064)

We try to use org.apache.lucene.index.IndexWriter.addIndexes instead of
org.apache.lucene.index.IndexWriter.addIndexesNoOptimize - same behavior.

How can we eliminate this behavior and get improvement in performance of our
merge?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Merger-performance-degradation-on-3-6-1-tp4135593.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Can RAMDirectory work for gigabyte data which needs refreshing of the index all the time?

2014-05-14 Thread Cheng

Hi,

I have an index of multiple gigabytes which serves 5-10 threads and needs
refreshing very often. I wonder if RAMDirectory is the good candidate for
this purpose. If not, what kind of directory is better?

Thanks,
Cheng

Re: writer.updateDocument() not working (possible bug?)

Re: Lucene: Index Writer to write in multiple file instead make one heavy file

Issue with Lucene 3.6.1 and MMapDirectory

Re: best choice for ramBufferSizeMB

Re: best choice for ramBufferSizeMB

Re: ConcurrentModificationException in ICU analyzer

AW: Issue with Lucene 3.6.1 and MMapDirectory

Re: ConcurrentModificationException in ICU analyzer

AW: Issue with Lucene 3.6.1 and MMapDirectory

Merger performance degradation on 3.6.1

Can RAMDirectory work for gigabyte data which needs refreshing of the index all the time?

11 matches

Site Navigation

Mail list logo

Footer information