[jira] Created: (LUCENE-1509) IndexCommit.getFileNames() should not return dups

2009-01-02 Thread Michael McCandless (JIRA)
IndexCommit.getFileNames() should not return dups
-

 Key: LUCENE-1509
 URL: https://issues.apache.org/jira/browse/LUCENE-1509
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.4, 2.9
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9


If the index was created with autoCommit false, and more than 1
segment was flushed during the IndexWriter session, then the shared
doc-store files are incorrectly duplicated in
IndexCommit.getFileNames().  This is because that method is walking
through each SegmentInfo, appending its files to a list.  Since
multiple SegmentInfo's may share the doc store files, this causes dups.

To fix this, I've added a SegmentInfos.files(...) method, and
refactored all places that were computing their files one SegmentInfo
at a time to use this new method instead.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1509) IndexCommit.getFileNames() should not return dups

2009-01-02 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1509:
---

Attachment: LUCENE-1509.patch

Attached patch.  I plan to commit in a day or two.

> IndexCommit.getFileNames() should not return dups
> -
>
> Key: LUCENE-1509
> URL: https://issues.apache.org/jira/browse/LUCENE-1509
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.4, 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1509.patch
>
>
> If the index was created with autoCommit false, and more than 1
> segment was flushed during the IndexWriter session, then the shared
> doc-store files are incorrectly duplicated in
> IndexCommit.getFileNames().  This is because that method is walking
> through each SegmentInfo, appending its files to a list.  Since
> multiple SegmentInfo's may share the doc store files, this causes dups.
> To fix this, I've added a SegmentInfos.files(...) method, and
> refactored all places that were computing their files one SegmentInfo
> at a time to use this new method instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-02 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660322#action_12660322
 ] 

Mark Miller commented on LUCENE-1483:
-

So what looks like a promising strategy?

Off the top I am thinking something as simple as:

start with ORD with no fallback on the largest.
if the next segments are fairly large, use ORD_VAL
if the segments get somewhat smaller, move to ORD_DEM

Oddly, I've seen VAL perform well in certain situations, so maybe it has its 
place, but I don't know where yet.

> Change IndexSearcher multisegment searches to search each individual segment 
> using a single HitCollector
> 
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing 
> for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-02 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660322#action_12660322
 ] 

markrmil...@gmail.com edited comment on LUCENE-1483 at 1/2/09 6:24 AM:
-

So what looks like a promising strategy?

Off the top I am thinking something as simple as:

start with ORD with no fallback on the largest.
if the next segments are fairly large, use ORD_VAL
if the segments get somewhat smaller, move to ORD_DEM

Oddly, I've seen VAL perform well in certain situations, so maybe it has its 
place, but I don't know where yet.

*edit*

Oh, yeah, queue size should also play a roll in the switching 

  was (Author: markrmil...@gmail.com):
So what looks like a promising strategy?

Off the top I am thinking something as simple as:

start with ORD with no fallback on the largest.
if the next segments are fairly large, use ORD_VAL
if the segments get somewhat smaller, move to ORD_DEM

Oddly, I've seen VAL perform well in certain situations, so maybe it has its 
place, but I don't know where yet.
  
> Change IndexSearcher multisegment searches to search each individual segment 
> using a single HitCollector
> 
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing 
> for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1510) InstantiatedIndexReader throws NullPointerException in norms() when used with a MultiReader

2009-01-02 Thread Robert Newson (JIRA)
InstantiatedIndexReader throws NullPointerException in norms() when used with a 
MultiReader
---

 Key: LUCENE-1510
 URL: https://issues.apache.org/jira/browse/LUCENE-1510
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/*
Affects Versions: 2.4
Reporter: Robert Newson



When using InstantiatedIndexReader under a MultiReader where the other Reader 
contains documents, a NullPointerException is thrown here;

 public void norms(String field, byte[] bytes, int offset) throws IOException {
byte[] norms = getIndex().getNormsByFieldNameAndDocumentNumber().get(field);
System.arraycopy(norms, 0, bytes, offset, norms.length);
  }

the 'norms' variable is null. Performing the copy only when norms is not null 
does work, though I'm sure it's not the right fix.

java.lang.NullPointerException
at 
org.apache.lucene.store.instantiated.InstantiatedIndexReader.norms(InstantiatedIndexReader.java:297)
at org.apache.lucene.index.MultiReader.norms(MultiReader.java:273)
at 
org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:70)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:131)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:112)
at org.apache.lucene.search.Searcher.search(Searcher.java:136)
at org.apache.lucene.search.Searcher.search(Searcher.java:146)
at 
org.apache.lucene.store.instantiated.TestWithMultiReader.test(TestWithMultiReader.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:164)
at junit.framework.TestCase.runBare(TestCase.java:130)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:120)
at junit.framework.TestSuite.runTest(TestSuite.java:230)
at junit.framework.TestSuite.run(TestSuite.java:225)
at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1510) InstantiatedIndexReader throws NullPointerException in norms() when used with a MultiReader

2009-01-02 Thread Robert Newson (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Newson updated LUCENE-1510:
--

Attachment: TestWithMultiReader.java


Test case to demonstrate NPE.

> InstantiatedIndexReader throws NullPointerException in norms() when used with 
> a MultiReader
> ---
>
> Key: LUCENE-1510
> URL: https://issues.apache.org/jira/browse/LUCENE-1510
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Affects Versions: 2.4
>Reporter: Robert Newson
> Attachments: TestWithMultiReader.java
>
>
> When using InstantiatedIndexReader under a MultiReader where the other Reader 
> contains documents, a NullPointerException is thrown here;
>  public void norms(String field, byte[] bytes, int offset) throws IOException 
> {
> byte[] norms = 
> getIndex().getNormsByFieldNameAndDocumentNumber().get(field);
> System.arraycopy(norms, 0, bytes, offset, norms.length);
>   }
> the 'norms' variable is null. Performing the copy only when norms is not null 
> does work, though I'm sure it's not the right fix.
> java.lang.NullPointerException
>   at 
> org.apache.lucene.store.instantiated.InstantiatedIndexReader.norms(InstantiatedIndexReader.java:297)
>   at org.apache.lucene.index.MultiReader.norms(MultiReader.java:273)
>   at 
> org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:70)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:131)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:112)
>   at org.apache.lucene.search.Searcher.search(Searcher.java:136)
>   at org.apache.lucene.search.Searcher.search(Searcher.java:146)
>   at 
> org.apache.lucene.store.instantiated.TestWithMultiReader.test(TestWithMultiReader.java:41)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:164)
>   at junit.framework.TestCase.runBare(TestCase.java:130)
>   at junit.framework.TestResult$1.protect(TestResult.java:106)
>   at junit.framework.TestResult.runProtected(TestResult.java:124)
>   at junit.framework.TestResult.run(TestResult.java:109)
>   at junit.framework.TestCase.run(TestCase.java:120)
>   at junit.framework.TestSuite.runTest(TestSuite.java:230)
>   at junit.framework.TestSuite.run(TestSuite.java:225)
>   at 
> org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
>   at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Too many open files

2009-01-02 Thread Nuno Seco

Hello.

I'm struggling with the following exception:

Exception in thread "Lucene Merge Thread #1037" 
org.apache.lucene.index.MergePolicy$MergeException: 
java.io.FileNotFoundException: 
/home/plopes/aktwise/server-commons/data/lucenedata/_80c.tii (Too many 
open files)
  at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:309) 

  at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:286) 

Caused by: java.io.FileNotFoundException: 
/home/plopes/aktwise/server-commons/data/lucenedata/_80c.tii (Too many 
open files)



I have only one indexwriter instantiated, and every time i update the 
index (add or remove a document), i commit the indexwriter and run the 
following code in order to make the searcher aware of the new documents:


private synchronized void refreshSearcher() throws 
CorruptIndexException, IOException

  {
  try
  {
  IndexReader reader = 
searcher.getIndexReader().reopen();  
searcher.close();

  searcher = new IndexSearcher(reader);

  if (reader != searcher.getIndexReader())
  searcher.getIndexReader().close();   
  }

  catch (Exception e)
  {
  e.printStackTrace();
  }

  }


I have tried lowering the merge factor and the number of threads by 
executing the following:
((ConcurrentMergeScheduler)writer.getMergeScheduler()).setMaxThreadCount(1); 


writer.setMergeFactor(3);


I am using lucene 2.4.0 and there is only one thread manipulating the 
index.


any help would be appreciated.

Nuno Seco




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Too many open files

2009-01-02 Thread Nuno Seco
I have just noticed that I subscribed the wrong list. I meant to 
subscribe and email the Java User List.


Sorry

Nuno Seco wrote:

Hello.

I'm struggling with the following exception:

Exception in thread "Lucene Merge Thread #1037" 
org.apache.lucene.index.MergePolicy$MergeException: 
java.io.FileNotFoundException: 
/home/plopes/aktwise/server-commons/data/lucenedata/_80c.tii (Too many 
open files)
  at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:309) 

  at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:286) 

Caused by: java.io.FileNotFoundException: 
/home/plopes/aktwise/server-commons/data/lucenedata/_80c.tii (Too many 
open files)



I have only one indexwriter instantiated, and every time i update the 
index (add or remove a document), i commit the indexwriter and run the 
following code in order to make the searcher aware of the new documents:


private synchronized void refreshSearcher() throws 
CorruptIndexException, IOException

  {
  try
  {
  IndexReader reader = 
searcher.getIndexReader().reopen();  
searcher.close();

  searcher = new IndexSearcher(reader);

  if (reader != searcher.getIndexReader())
  searcher.getIndexReader().close();   
  }

  catch (Exception e)
  {
  e.printStackTrace();
  }

  }


I have tried lowering the merge factor and the number of threads by 
executing the following:
((ConcurrentMergeScheduler)writer.getMergeScheduler()).setMaxThreadCount(1); 


writer.setMergeFactor(3);


I am using lucene 2.4.0 and there is only one thread manipulating the 
index.


any help would be appreciated.

Nuno Seco




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org