[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654751#action_12654751
 ] 

Michael McCandless commented on LUCENE-1483:


Mark did you intend to attach the patch here?

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-09 Thread Mark Miller

Michael McCandless wrote:


I think it does make sense (it's well defined).  This is what the 
SubsearcherTopDocs.convertTopDoc method is doing (in the 
multisearcher.take2.patch on LUCENE-1471).


In fact, returning by document order is a particularly trivial sort, 
since you'd just have to concatenate the results coming out of the 
pqueues (ie you wouldn't need a 2nd pqueue).  In fact, any SortField[] 
that contains a SortField.FIELD_DOC could be truncated since that sort 
order is "total".  But these are minor optimizations which we 
shouldn't worry about for now...


Mike
Yeah, right again. Just trying to get out of what wasn't working and 
seemed like it should without work from me.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1483:


Attachment: LUCENE-1483.patch

I had meant to attach a patch, but then a bunch of stuff wasn't working...

This is still a poor mans patch. I need to switch to using the expose 
subreaders patch. This also doesnt include the multisearcher sort patch yet, 
because when I tried the first one (2nd rev) everything broke. I'll work on 
integrating that later.

I think all tests pass except for the very last sort test.

Some cleanup needed, including the possible drop of using MultiSearcher itself.

Basically, its still in a proof of concept stage.

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-09 Thread Michael McCandless


Mark Miller wrote:


Michael McCandless wrote:


Mark Miller wrote:


Mark Miller wrote:


Which new sort stuff are you referring to?  Is it LUCENE-1471?


Yes. First thing I did was try and patch this in, but the sort  
tests failed. It would be the right order, but like the two  
center docs would be reversed or something. No time to dig in, so  
I just switch to the trunk MultiSearcher and all tests passed  
except for the two with the above issues.
Spoke too soon. Wasnt LUCENE-1471's fault, it was just hitting  
different aspects of an issue thats messed up with the old  
MultiSearcher as well.


OK.  If you're building on LUCENE-1471, make sure you start from  
the first patch.  It'd be good to factor that logic (2nd pqueue for  
merging) out so it can be reused b/w IndexSearcher & MultiSearcher.
I actually worked with the second. I'll take a look at the first  
instead. I'm sticking with using the MultiSearcher for the first  
patch - it can be worked out later if it speed things up.


OK.  And, the first now has a 2nd iteration (factors  
ParallelMultiSearcher to do the merge sort too).


Does returning by document id order even make sense with this  
though? Did it make sense with MultiSearcher? They are pseudo ids  
(mapped), so it almost seems I can't support that right...it would  
depend on the order of the readers.


I think it does make sense (it's well defined).  This is what the  
SubsearcherTopDocs.convertTopDoc method is doing (in the  
multisearcher.take2.patch on LUCENE-1471).


In fact, returning by document order is a particularly trivial sort,  
since you'd just have to concatenate the results coming out of the  
pqueues (ie you wouldn't need a 2nd pqueue).  In fact, any SortField[]  
that contains a SortField.FIELD_DOC could be truncated since that sort  
order is "total".  But these are minor optimizations which we  
shouldn't worry about for now...


Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-09 Thread Michael McCandless


Mark Miller wrote:


Mark Miller wrote:

Mark Miller wrote:


Which new sort stuff are you referring to?  Is it LUCENE-1471?


Yes. First thing I did was try and patch this in, but the sort  
tests failed. It would be the right order, but like the two center  
docs would be reversed or something. No time to dig in, so I just  
switch to the trunk MultiSearcher and all tests passed except for  
the two with the above issues.

Got the auto detection working though.
Bah, I didn't. Brought up an old bug I've seen before - if you use  
multisearcher and an index doesn't have the field, AUTO won't work.  
Advice I always got was don't use AUTO, but even Lucene uses it  
internally. Thought I had a workarount, but didn't quite work. Not  
sure what to do about this one - I'll have to mull it and the ids  
issue over a bit I suppose.



Hmm... I think we have to keep the AUTO -> true type resolution that  
MultiReader would do?  Ie, ask MultiReader for the TermEnum, not the  
first sub-reader, for resolving.


In fact we should factor out an explicit method to do this; it's  
currently in ExtendedFieldCache.autoCache.createValue.


As long as you do that resolving up front w/ the MultiReader, and pass  
only resolved SortField[] to each sub-reader, that should fix it?


Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2008-12-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-1482:
---

Attachment: (was: LUCENE-1482.patch)

> Replace infoSteram by a logging framework (SLF4J)
> -
>
> Key: LUCENE-1482
> URL: https://issues.apache.org/jira/browse/LUCENE-1482
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 2.4.1, 2.9
>
> Attachments: LUCENE-1482.patch, slf4j-api-1.5.6.jar, 
> slf4j-nop-1.5.6.jar
>
>
> Lucene makes use of infoStream to output messages in its indexing code only. 
> For debugging purposes, when the search application is run on the customer 
> side, getting messages from other code flows, like search, query parsing, 
> analysis etc can be extremely useful.
> There are two main problems with infoStream today:
> 1. It is owned by IndexWriter, so if I want to add logging capabilities to 
> other classes I need to either expose an API or propagate infoStream to all 
> classes (see for example DocumentsWriter, which receives its infoStream 
> instance from IndexWriter).
> 2. I can either turn debugging on or off, for the entire code.
> Introducing a logging framework can allow each class to control its logging 
> independently, and more importantly, allows the application to turn on 
> logging for only specific areas in the code (i.e., org.apache.lucene.index.*).
> I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, 
> as it names states, a facade over different logging frameworks. As such, you 
> can include the slf4j.jar in your application, and it recognizes at deploy 
> time what is the actual logging framework you'd like to use. SLF4J comes with 
> several adapters for Java logging, Log4j and others. If you know your 
> application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in 
> your classpath, and your logging statements will use Java logging underneath 
> the covers.
> This makes the logging code very simple. For a class A the logger will be 
> instantiated like this:
> public class A {
>   private static final logger = LoggerFactory.getLogger(A.class);
> }
> And will later be used like this:
> public class A {
>   private static final logger = LoggerFactory.getLogger(A.class);
>   public void foo() {
> if (logger.isDebugEnabled()) {
>   logger.debug("message");
> }
>   }
> }
> That's all !
> Checking for isDebugEnabled is very quick, at least using the JDK14 adapter 
> (but I assume it's fast also over other logging frameworks).
> The important thing is, every class controls its own logger. Not all classes 
> have to output logging messages, and we can improve Lucene's logging 
> gradually, w/o changing the API, by adding more logging messages to 
> interesting classes.
> I will submit a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1471) Faster MultiSearcher.search merge docs

2008-12-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654755#action_12654755
 ] 

Michael McCandless commented on LUCENE-1471:


Luke, it looks like the 2nd patch lost the necessary mods to 
FieldDocSortedHitQueue -- can you post a new patch that includes it?  Thanks.

> Faster MultiSearcher.search merge docs 
> ---
>
> Key: LUCENE-1471
> URL: https://issues.apache.org/jira/browse/LUCENE-1471
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1471.patch, multisearcher.patch, 
> multisearcher.take2.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> MultiSearcher.search places sorted search results from individual searchers 
> into a PriorityQueue.  This can be made to be more optimal by taking 
> advantage of the fact that the results returned are already sorted.  
> The proposed solution places the sub-searcher results iterator into a custom 
> PriorityQueue that produces the sorted ScoreDocs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1471) Faster MultiSearcher.search merge docs

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654791#action_12654791
 ] 

Mark Miller commented on LUCENE-1471:
-

Re: thread, Something makes me think a method more like the IndexWriter merge 
stuff would be better - a max of 3 or n threads used type of thing. One thread 
per sub searcher worries me.

> Faster MultiSearcher.search merge docs 
> ---
>
> Key: LUCENE-1471
> URL: https://issues.apache.org/jira/browse/LUCENE-1471
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1471.patch, multisearcher.patch, 
> multisearcher.take2.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> MultiSearcher.search places sorted search results from individual searchers 
> into a PriorityQueue.  This can be made to be more optimal by taking 
> advantage of the fact that the results returned are already sorted.  
> The proposed solution places the sub-searcher results iterator into a custom 
> PriorityQueue that produces the sorted ScoreDocs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2008-12-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-1482:
---

Attachment: (was: LUCENE-1482.patch)

> Replace infoSteram by a logging framework (SLF4J)
> -
>
> Key: LUCENE-1482
> URL: https://issues.apache.org/jira/browse/LUCENE-1482
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 2.4.1, 2.9
>
> Attachments: slf4j-api-1.5.6.jar, slf4j-nop-1.5.6.jar
>
>
> Lucene makes use of infoStream to output messages in its indexing code only. 
> For debugging purposes, when the search application is run on the customer 
> side, getting messages from other code flows, like search, query parsing, 
> analysis etc can be extremely useful.
> There are two main problems with infoStream today:
> 1. It is owned by IndexWriter, so if I want to add logging capabilities to 
> other classes I need to either expose an API or propagate infoStream to all 
> classes (see for example DocumentsWriter, which receives its infoStream 
> instance from IndexWriter).
> 2. I can either turn debugging on or off, for the entire code.
> Introducing a logging framework can allow each class to control its logging 
> independently, and more importantly, allows the application to turn on 
> logging for only specific areas in the code (i.e., org.apache.lucene.index.*).
> I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, 
> as it names states, a facade over different logging frameworks. As such, you 
> can include the slf4j.jar in your application, and it recognizes at deploy 
> time what is the actual logging framework you'd like to use. SLF4J comes with 
> several adapters for Java logging, Log4j and others. If you know your 
> application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in 
> your classpath, and your logging statements will use Java logging underneath 
> the covers.
> This makes the logging code very simple. For a class A the logger will be 
> instantiated like this:
> public class A {
>   private static final logger = LoggerFactory.getLogger(A.class);
> }
> And will later be used like this:
> public class A {
>   private static final logger = LoggerFactory.getLogger(A.class);
>   public void foo() {
> if (logger.isDebugEnabled()) {
>   logger.debug("message");
> }
>   }
> }
> That's all !
> Checking for isDebugEnabled is very quick, at least using the JDK14 adapter 
> (but I assume it's fast also over other logging frameworks).
> The important thing is, every class controls its own logger. Not all classes 
> have to output logging messages, and we can improve Lucene's logging 
> gradually, w/o changing the API, by adding more logging messages to 
> interesting classes.
> I will submit a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2008-12-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-1482:
---

Attachment: LUCENE-1482.patch

Forgot to clean up some code in tests which made use of JDK logging.

> Replace infoSteram by a logging framework (SLF4J)
> -
>
> Key: LUCENE-1482
> URL: https://issues.apache.org/jira/browse/LUCENE-1482
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 2.4.1, 2.9
>
> Attachments: LUCENE-1482.patch, slf4j-api-1.5.6.jar, 
> slf4j-nop-1.5.6.jar
>
>
> Lucene makes use of infoStream to output messages in its indexing code only. 
> For debugging purposes, when the search application is run on the customer 
> side, getting messages from other code flows, like search, query parsing, 
> analysis etc can be extremely useful.
> There are two main problems with infoStream today:
> 1. It is owned by IndexWriter, so if I want to add logging capabilities to 
> other classes I need to either expose an API or propagate infoStream to all 
> classes (see for example DocumentsWriter, which receives its infoStream 
> instance from IndexWriter).
> 2. I can either turn debugging on or off, for the entire code.
> Introducing a logging framework can allow each class to control its logging 
> independently, and more importantly, allows the application to turn on 
> logging for only specific areas in the code (i.e., org.apache.lucene.index.*).
> I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, 
> as it names states, a facade over different logging frameworks. As such, you 
> can include the slf4j.jar in your application, and it recognizes at deploy 
> time what is the actual logging framework you'd like to use. SLF4J comes with 
> several adapters for Java logging, Log4j and others. If you know your 
> application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in 
> your classpath, and your logging statements will use Java logging underneath 
> the covers.
> This makes the logging code very simple. For a class A the logger will be 
> instantiated like this:
> public class A {
>   private static final logger = LoggerFactory.getLogger(A.class);
> }
> And will later be used like this:
> public class A {
>   private static final logger = LoggerFactory.getLogger(A.class);
>   public void foo() {
> if (logger.isDebugEnabled()) {
>   logger.debug("message");
> }
>   }
> }
> That's all !
> Checking for isDebugEnabled is very quick, at least using the JDK14 adapter 
> (but I assume it's fast also over other logging frameworks).
> The important thing is, every class controls its own logger. Not all classes 
> have to output logging messages, and we can improve Lucene's logging 
> gradually, w/o changing the API, by adding more logging messages to 
> interesting classes.
> I will submit a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654798#action_12654798
 ] 

Mark Miller commented on LUCENE-1483:
-

Quick micro bench - did it twice and both times came out 17% slower. Hopefully 
get a lot of that back with the new MultiSearcher sort stuff and maybe some 
optimizations.


{code}
OLD
 [java] > Report Sum By (any) Name (12 about 42 out of 43)
 [java] Operation   round 2 mrg   runCnt   recsPerRun   
 rec/s  elapsedSecavgUsedMemavgTotalMem
 [java] Rounds  0 1  501  2020012 
11,532.6  175.1649,688,960200,736,768
 [java] Run_4 -  -  -  -  -  -  -   0 1  50 -  -   1 -  - 2020012 -   
11,532.6 -  - 175.16 -  49,688,960 -  200,736,768
 [java] Populate- -   -4   50 
21,446.3   93.2681,397,296156,942,336
 [java] CreateIndex -  -  -  -  -   - -   - - - -  -   4 -  -  -  - 0 -  -  
-  0.0 -  -   0.23 -  16,492,194 -  112,984,064
 [java] MAddDocs_50 - -   -4   50 
28,656.2   69.7983,686,552153,223,168
 [java] Optimize -  -  -  -  -  -   - -   - - - -  -   4 -  -  -  - 0 -  -  
-  0.0 -  -  23.22 - 101,362,928 -  156,942,336
 [java] CloseIndex  - -   -40   
   0.00.0081,397,296156,942,336
 [java] TestSortSpeed -  -  -  -  - - -   - - - -  -   4 -  -  - 5003 -  -  
 246.0 -  -  81.35 -  98,312,320 -  157,941,760
 [java] OpenReader  - -   -41   
 266.70.0181,397,296156,942,336
 [java] LoadFieldCacheAndSearch -   - -   - - - -  -   4 -  -  -  - 1 -  -  
-  6.2 -  -   0.64 -  90,550,496 -  156,942,336
 [java] SearchWithSort_5000 - -   -4 5000   
 247.9   80.69   101,017,720157,941,760
 [java] CloseReader -  -  -  -  -   - -   - - - -  -   4 -  -  -  - 1 -  - 
4,000.0 -  -   0.00 -  95,036,504 -  157,941,760
 [java] 
 [java] ###  D O N E !!! ###
 [java] 

NEW
 [java] > Report Sum By (any) Name (12 about 42 out of 43)
 [java] Operation   round 2 mrg   runCnt   recsPerRun   
 rec/s  elapsedSecavgUsedMemavgTotalMem
 [java] Rounds  0 1  501  2020012 
10,445.5  193.38   125,468,912208,535,552
 [java] Run_4 -  -  -  -  -  -  -   0 1  50 -  -   1 -  - 2020012 -   
10,445.5 -  - 193.38 - 125,468,912 -  208,535,552
 [java] Populate- -   -4   50 
20,650.1   96.8584,097,072162,316,288
 [java] CreateIndex -  -  -  -  -   - -   - - - -  -   4 -  -  -  - 0 -  -  
-  0.0 -  -   0.12 -  16,564,602 -  116,604,928
 [java] MAddDocs_50 - -   -4   50 
28,772.4   69.5187,705,952159,956,992
 [java] Optimize -  -  -  -  -  -   - -   - - - -  -   4 -  -  -  - 0 -  -  
-  0.0 -  -  27.20 -  99,096,816 -  162,316,288
 [java] CloseIndex  - -   -40   
   0.00.0084,097,072162,316,288
 [java] TestSortSpeed -  -  -  -  - - -   - - - -  -   4 -  -  - 5003 -  -  
 208.5 -  -  95.99 -  98,749,480 -  164,020,224
 [java] OpenReader  - -   -41   
 222.20.0284,097,072162,316,288
 [java] LoadFieldCacheAndSearch -   - -   - - - -  -   4 -  -  -  - 1 -  -  
-  5.0 -  -   0.81 -  90,882,496 -  163,725,312
 [java] SearchWithSort_5000 - -   -4 5000   
 210.2   95.1795,207,336164,020,224
 [java] CloseReader -  -  -  -  -   - -   - - - -  -   4 -  -  -  - 1 -  - 
4,000.0 -  -   0.00 -  93,868,880 -  163,905,536
 [java] 
 [java] ###  D O N E !!! ###
 [java] 

{/code}

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-09 Thread Mark Miller

Michael McCandless wrote:


Mark Miller wrote:


Mark Miller wrote:

Mark Miller wrote:


Which new sort stuff are you referring to?  Is it LUCENE-1471?


Yes. First thing I did was try and patch this in, but the sort 
tests failed. It would be the right order, but like the two center 
docs would be reversed or something. No time to dig in, so I just 
switch to the trunk MultiSearcher and all tests passed except for 
the two with the above issues.

Got the auto detection working though.
Bah, I didn't. Brought up an old bug I've seen before - if you use 
multisearcher and an index doesn't have the field, AUTO won't work. 
Advice I always got was don't use AUTO, but even Lucene uses it 
internally. Thought I had a workarount, but didn't quite work. Not 
sure what to do about this one - I'll have to mull it and the ids 
issue over a bit I suppose.



Hmm... I think we have to keep the AUTO -> true type resolution that 
MultiReader would do?  Ie, ask MultiReader for the TermEnum, not the 
first sub-reader, for resolving.


In fact we should factor out an explicit method to do this; it's 
currently in ExtendedFieldCache.autoCache.createValue.


As long as you do that resolving up front w/ the MultiReader, and pass 
only resolved SortField[] to each sub-reader, that should fix it?


Mike
Your right. I get caught up in the mode of trying to hack it to work 
quick before I do it right.


- Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654798#action_12654798
 ] 

[EMAIL PROTECTED] edited comment on LUCENE-1483 at 12/9/08 5:41 AM:
--

Quick micro bench - did it twice and both times came out 17% slower. Hopefully 
get a lot of that back with the new MultiSearcher sort stuff and maybe some 
optimizations.

{panel:title=OLD}
{noformat}
 [java] > Report Sum By (any) Name (12 about 42 out of 43)
 [java] Operation   round 2 mrg   runCnt   recsPerRun   
 rec/s  elapsedSecavgUsedMemavgTotalMem
 [java] Rounds  0 1  501  2020012 
11,532.6  175.1649,688,960200,736,768
 [java] Run_4 -  -  -  -  -  -  -   0 1  50 -  -   1 -  - 2020012 -   
11,532.6 -  - 175.16 -  49,688,960 -  200,736,768
 [java] Populate- -   -4   50 
21,446.3   93.2681,397,296156,942,336
 [java] CreateIndex -  -  -  -  -   - -   - - - -  -   4 -  -  -  - 0 -  -  
-  0.0 -  -   0.23 -  16,492,194 -  112,984,064
 [java] MAddDocs_50 - -   -4   50 
28,656.2   69.7983,686,552153,223,168
 [java] Optimize -  -  -  -  -  -   - -   - - - -  -   4 -  -  -  - 0 -  -  
-  0.0 -  -  23.22 - 101,362,928 -  156,942,336
 [java] CloseIndex  - -   -40   
   0.00.0081,397,296156,942,336
 [java] TestSortSpeed -  -  -  -  - - -   - - - -  -   4 -  -  - 5003 -  -  
 246.0 -  -  81.35 -  98,312,320 -  157,941,760
 [java] OpenReader  - -   -41   
 266.70.0181,397,296156,942,336
 [java] LoadFieldCacheAndSearch -   - -   - - - -  -   4 -  -  -  - 1 -  -  
-  6.2 -  -   0.64 -  90,550,496 -  156,942,336
 [java] SearchWithSort_5000 - -   -4 5000   
 247.9   80.69   101,017,720157,941,760
 [java] CloseReader -  -  -  -  -   - -   - - - -  -   4 -  -  -  - 1 -  - 
4,000.0 -  -   0.00 -  95,036,504 -  157,941,760
 [java] 
 [java] ###  D O N E !!! ###
 [java] 
{noformat}
{panel}
{panel:title=NEW}
{noformat}
 [java] > Report Sum By (any) Name (12 about 42 out of 43)
 [java] Operation   round 2 mrg   runCnt   recsPerRun   
 rec/s  elapsedSecavgUsedMemavgTotalMem
 [java] Rounds  0 1  501  2020012 
10,445.5  193.38   125,468,912208,535,552
 [java] Run_4 -  -  -  -  -  -  -   0 1  50 -  -   1 -  - 2020012 -   
10,445.5 -  - 193.38 - 125,468,912 -  208,535,552
 [java] Populate- -   -4   50 
20,650.1   96.8584,097,072162,316,288
 [java] CreateIndex -  -  -  -  -   - -   - - - -  -   4 -  -  -  - 0 -  -  
-  0.0 -  -   0.12 -  16,564,602 -  116,604,928
 [java] MAddDocs_50 - -   -4   50 
28,772.4   69.5187,705,952159,956,992
 [java] Optimize -  -  -  -  -  -   - -   - - - -  -   4 -  -  -  - 0 -  -  
-  0.0 -  -  27.20 -  99,096,816 -  162,316,288
 [java] CloseIndex  - -   -40   
   0.00.0084,097,072162,316,288
 [java] TestSortSpeed -  -  -  -  - - -   - - - -  -   4 -  -  - 5003 -  -  
 208.5 -  -  95.99 -  98,749,480 -  164,020,224
 [java] OpenReader  - -   -41   
 222.20.0284,097,072162,316,288
 [java] LoadFieldCacheAndSearch -   - -   - - - -  -   4 -  -  -  - 1 -  -  
-  5.0 -  -   0.81 -  90,882,496 -  163,725,312
 [java] SearchWithSort_5000 - -   -4 5000   
 210.2   95.1795,207,336164,020,224
 [java] CloseReader -  -  -  -  -   - -   - - - -  -   4 -  -  -  - 1 -  - 
4,000.0 -  -   0.00 -  93,868,880 -  163,905,536
 [java] 
 [java] ###  D O N E !!! ###
 [java] 

{noformat}
{panel}

  was (Author: [EMAIL PROTECTED]):
Quick micro bench - did it twice and both times came out 17% slower. 
Hopefully get a lot of that back with the new MultiSearcher sort stuff and 
maybe some optimizations.


{code}
OLD
 [java] > Report Sum By (any) Name (12 about 42 out of 43)
 [java] Operation   round 2 mrg   runCnt   recsPerRun   
 rec/s  elapsedSecavgUsedMemavgTotalMem
 [java] Rounds  0 1  501  2020012 
11,532.6  175.1649,688,960200,736,768
 [java] Run_4 -  -  -  -  -  -  -   0 1  50 -  -   1 -  - 2020012 -   
11,532.6 -  - 175.16 -  49,688,960 -  200,736,768
 [java] Populate 

[jira] Updated: (LUCENE-1471) Faster MultiSearcher.search merge docs

2008-12-09 Thread Luke Nezda (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Nezda updated LUCENE-1471:
---

Attachment: multisearcher.take3.patch

Doh.  Sorry Michael, I reverted my local changes and tested this patch :).

I agree Mark, unbounded number of Threads little worrisome.

> Faster MultiSearcher.search merge docs 
> ---
>
> Key: LUCENE-1471
> URL: https://issues.apache.org/jira/browse/LUCENE-1471
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1471.patch, multisearcher.patch, 
> multisearcher.take2.patch, multisearcher.take3.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> MultiSearcher.search places sorted search results from individual searchers 
> into a PriorityQueue.  This can be made to be more optimal by taking 
> advantage of the fact that the results returned are already sorted.  
> The proposed solution places the sub-searcher results iterator into a custom 
> PriorityQueue that produces the sorted ScoreDocs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2008-12-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-1482:
---

Attachment: slf4j-nop-1.5.6.jar
slf4j-api-1.5.6.jar
LUCENE-1482.patch

Thanks Doug,

I've replaced the JDK14 jar with the NOP jar and deleted the logging test I 
added (since NOP does not log anything).

> Replace infoSteram by a logging framework (SLF4J)
> -
>
> Key: LUCENE-1482
> URL: https://issues.apache.org/jira/browse/LUCENE-1482
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 2.4.1, 2.9
>
> Attachments: LUCENE-1482.patch, slf4j-api-1.5.6.jar, 
> slf4j-nop-1.5.6.jar
>
>
> Lucene makes use of infoStream to output messages in its indexing code only. 
> For debugging purposes, when the search application is run on the customer 
> side, getting messages from other code flows, like search, query parsing, 
> analysis etc can be extremely useful.
> There are two main problems with infoStream today:
> 1. It is owned by IndexWriter, so if I want to add logging capabilities to 
> other classes I need to either expose an API or propagate infoStream to all 
> classes (see for example DocumentsWriter, which receives its infoStream 
> instance from IndexWriter).
> 2. I can either turn debugging on or off, for the entire code.
> Introducing a logging framework can allow each class to control its logging 
> independently, and more importantly, allows the application to turn on 
> logging for only specific areas in the code (i.e., org.apache.lucene.index.*).
> I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, 
> as it names states, a facade over different logging frameworks. As such, you 
> can include the slf4j.jar in your application, and it recognizes at deploy 
> time what is the actual logging framework you'd like to use. SLF4J comes with 
> several adapters for Java logging, Log4j and others. If you know your 
> application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in 
> your classpath, and your logging statements will use Java logging underneath 
> the covers.
> This makes the logging code very simple. For a class A the logger will be 
> instantiated like this:
> public class A {
>   private static final logger = LoggerFactory.getLogger(A.class);
> }
> And will later be used like this:
> public class A {
>   private static final logger = LoggerFactory.getLogger(A.class);
>   public void foo() {
> if (logger.isDebugEnabled()) {
>   logger.debug("message");
> }
>   }
> }
> That's all !
> Checking for isDebugEnabled is very quick, at least using the JDK14 adapter 
> (but I assume it's fast also over other logging frameworks).
> The important thing is, every class controls its own logger. Not all classes 
> have to output logging messages, and we can improve Lucene's logging 
> gradually, w/o changing the API, by adding more logging messages to 
> interesting classes.
> I will submit a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654805#action_12654805
 ] 

Marvin Humphrey commented on LUCENE-1483:
-

> Quick micro bench - did it twice and both times came out 17% slower.

I'd guess that all the OO construction/destruction costs in this part of your 
patch are slowing things down.

{code}
+Searchable[] searchers = new Searchable[readers.length];
+for(int i = 0; i < readers.length; i++) {
+  searchers[i] = new IndexSearcher(readers[i]);
+}
+
+MultiSearcher multiSearcher = new MultiSearcher(searchers);
+return multiSearcher.search(weight, filter, nDocs, sort);
{code}

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller
Great, because that's prob the main optimation spot we have. I also  
made things a bit difficult with the 50 merge factory. I'll try a 10  
later.


- Mark


On Dec 9, 2008, at 9:20 AM, "Marvin Humphrey (JIRA)" <[EMAIL PROTECTED]>  
wrote:




   [ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654805#action_12654805 
 ]


Marvin Humphrey commented on LUCENE-1483:
-


Quick micro bench - did it twice and both times came out 17% slower.


I'd guess that all the OO construction/destruction costs in this  
part of your patch are slowing things down.


{code}
+Searchable[] searchers = new Searchable[readers.length];
+for(int i = 0; i < readers.length; i++) {
+  searchers[i] = new IndexSearcher(readers[i]);
+}
+
+MultiSearcher multiSearcher = new MultiSearcher(searchers);
+return multiSearcher.search(weight, filter, nDocs, sort);
{code}

Change IndexSearcher to use MultiSearcher semantics for sorted  
searches
--- 



   Key: LUCENE-1483
   URL: https://issues.apache.org/jira/browse/LUCENE-1483
   Project: Lucene - Java
Issue Type: Improvement
  Affects Versions: 2.9
  Reporter: Mark Miller
  Priority: Minor
   Attachments: LUCENE-1483.patch


Here is a quick test patch. FieldCache for sorting is done at the  
individual IndexReader level and reloading the fieldcache on reopen  
can be much faster as only changed segments need to be reloaded.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Yonik Seeley
On Tue, Dec 9, 2008 at 9:23 AM, Mark Miller <[EMAIL PROTECTED]> wrote:
> Great, because that's prob the main optimation spot we have. I also made
> things a bit difficult with the 50 merge factory. I'll try a 10 later.

It's useful to report the number of segments in the index too.  Even
with high merge factors, you can get lucky and have very few segments.

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654812#action_12654812
 ] 

Mark Miller commented on LUCENE-1483:
-

I'll be sure to include that info with the next set of results. 

I don't think those results represent getting lucky though: its 4 rounds and 2 
runs with the same results (17% both runs). Nothing scientific, just did it 
real quick to get a base feel of the slowdown before the patch is finished up.

Here is the alg I used:

{noformat}

merge.factor=mrg:50
compound=false

sort.rng=2:1:2:1

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory
#directory=RamDirectory

doc.stored=true
doc.tokenized=true
doc.term.vector=false
doc.add.log.step=10

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.SortableSimpleDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ "Rounds"
{ "Run"
  ResetSystemErase

  { "Populate"
-CreateIndex
{ "MAddDocs" AddDoc(100) > : 50
-Optimize
-CloseIndex
  }

  { "TestSortSpeed"
OpenReader  
{ "LoadFieldCacheAndSearch" SearchWithSort(sort_field:int) > : 1 
{ "SearchWithSort" SearchWithSort(sort_field) > : 5000
CloseReader 
  
  }

  NewRound
 } : 4

} 

RepSumByName

{noformat}

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654812#action_12654812
 ] 

[EMAIL PROTECTED] edited comment on LUCENE-1483 at 12/9/08 6:55 AM:
--

I'll be sure to include that info with the next set of results. 

I don't think those results represent getting lucky though: its 4 rounds and 2 
runs with the same results (17% both runs). Nothing scientific, just did it 
real quick to get a base feel of the slowdown before the patch is finished up.

Here is the alg I used:

{noformat}

merge.factor=mrg:50
compound=false

sort.rng=2:1:2:1

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory
#directory=RamDirectory

doc.stored=true
doc.tokenized=true
doc.term.vector=false
doc.add.log.step=10

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.SortableSimpleDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ "Rounds"
{ "Run"
  ResetSystemErase

  { "Populate"
-CreateIndex
{ "MAddDocs" AddDoc(100) > : 50
-CloseIndex
  }

  { "TestSortSpeed"
OpenReader  
{ "LoadFieldCacheAndSearch" SearchWithSort(sort_field:int) > : 1 
{ "SearchWithSort" SearchWithSort(sort_field) > : 5000
CloseReader 
  
  }

  NewRound
 } : 4

} 

RepSumByName

{noformat}

  was (Author: [EMAIL PROTECTED]):
I'll be sure to include that info with the next set of results. 

I don't think those results represent getting lucky though: its 4 rounds and 2 
runs with the same results (17% both runs). Nothing scientific, just did it 
real quick to get a base feel of the slowdown before the patch is finished up.

Here is the alg I used:

{noformat}

merge.factor=mrg:50
compound=false

sort.rng=2:1:2:1

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory
#directory=RamDirectory

doc.stored=true
doc.tokenized=true
doc.term.vector=false
doc.add.log.step=10

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.SortableSimpleDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ "Rounds"
{ "Run"
  ResetSystemErase

  { "Populate"
-CreateIndex
{ "MAddDocs" AddDoc(100) > : 50
-Optimize
-CloseIndex
  }

  { "TestSortSpeed"
OpenReader  
{ "LoadFieldCacheAndSearch" SearchWithSort(sort_field:int) > : 1 
{ "SearchWithSort" SearchWithSort(sort_field) > : 5000
CloseReader 
  
  }

  NewRound
 } : 4

} 

RepSumByName

{noformat}
  
> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654812#action_12654812
 ] 

[EMAIL PROTECTED] edited comment on LUCENE-1483 at 12/9/08 7:00 AM:
--

I'll be sure to include that info with the next set of results. 

I don't think those results represent getting lucky though: its 4 rounds and 2 
runs with the same results (17% both runs). Nothing scientific, just did it 
real quick to get a base feel of the slowdown before the patch is finished up.

*EDIT* Just like I forgot to take the optimize out of the sort alg when I 
pasted it here, looks like I missed it for the benches as well. Disregard those 
numbers.

Here is the alg I used:

{noformat}

merge.factor=mrg:50
compound=false

sort.rng=2:1:2:1

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory
#directory=RamDirectory

doc.stored=true
doc.tokenized=true
doc.term.vector=false
doc.add.log.step=10

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.SortableSimpleDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ "Rounds"
{ "Run"
  ResetSystemErase

  { "Populate"
-CreateIndex
{ "MAddDocs" AddDoc(100) > : 50
-CloseIndex
  }

  { "TestSortSpeed"
OpenReader  
{ "LoadFieldCacheAndSearch" SearchWithSort(sort_field:int) > : 1 
{ "SearchWithSort" SearchWithSort(sort_field) > : 5000
CloseReader 
  
  }

  NewRound
 } : 4

} 

RepSumByName

{noformat}

  was (Author: [EMAIL PROTECTED]):
I'll be sure to include that info with the next set of results. 

I don't think those results represent getting lucky though: its 4 rounds and 2 
runs with the same results (17% both runs). Nothing scientific, just did it 
real quick to get a base feel of the slowdown before the patch is finished up.

Here is the alg I used:

{noformat}

merge.factor=mrg:50
compound=false

sort.rng=2:1:2:1

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory
#directory=RamDirectory

doc.stored=true
doc.tokenized=true
doc.term.vector=false
doc.add.log.step=10

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.SortableSimpleDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ "Rounds"
{ "Run"
  ResetSystemErase

  { "Populate"
-CreateIndex
{ "MAddDocs" AddDoc(100) > : 50
-CloseIndex
  }

  { "TestSortSpeed"
OpenReader  
{ "LoadFieldCacheAndSearch" SearchWithSort(sort_field:int) > : 1 
{ "SearchWithSort" SearchWithSort(sort_field) > : 5000
CloseReader 
  
  }

  NewRound
 } : 4

} 

RepSumByName

{noformat}
  
> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1483:


Comment: was deleted

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654820#action_12654820
 ] 

Michael McCandless commented on LUCENE-831:
---

Marvin, does KS/Lucy have something like FieldCache?  If so, what API do you 
use?  Is it iterator-only?

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654822#action_12654822
 ] 

Mark Miller commented on LUCENE-1483:
-

Okay, I straightened things out, and now it looks like possibly no loss (for 
few segments anyway). Last I looked at the index, only 6 segments. I've got to 
put real time into all this later though. Only been able to give it some very 
backgroundish time this morning.

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654828#action_12654828
 ] 

Yonik Seeley commented on LUCENE-1483:
--

bq. Okay, I straightened things out, and now it looks like possibly no loss

So if there was a 17%  loss on the optimized index, and very little loss on a 
segmented index, I assume that means that matching/scoring is enough slower on 
the segmented index that the loss in sorting performance doesn't matter as much?


> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654834#action_12654834
 ] 

Mark Miller commented on LUCENE-1483:
-

Ignore those first results entirely. It turns out I had the latest 1471 patched 
in. That shouldnt slow down a single segment though. Neither this or 1471 
should have slowed things down because they only affect multisegment and 
multiindex searches I thought. Odd, but I just junked all of that and started 
fresh, did the tests a little closer to right, and see the numbers looking the 
same. Didn't want to get too into benching before its sorted out a bit more.  
I'll try to get enough time to be more rigorous later though.  My free moments 
are under heavy attack by the female that appears to have made herself at home 
in my house.

As a side not, 1471 doesn't work in a couple ways with this patch - it throws 
both a nullpointer exception and a class cast exception in different 
circumstances.

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654839#action_12654839
 ] 

Michael McCandless commented on LUCENE-1483:


I think there should be very little impact to performance, for single or multi 
segment indices, for the search itself against a warmed reader.  (And actually 
LUCENE-1471 should make things a wee bit faster, especially if n,m are 
largeish, though this will typically be in the noise).

But warming after reopen should be much faster with this patch (we should try 
to measure that too).

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654842#action_12654842
 ] 

Mark Miller commented on LUCENE-1483:
-

bq.I think there should be very little impact to performance, for single or 
multi segment indices, for the search itself against a warmed reader. (And 
actually LUCENE-1471 should make things a wee bit faster, especially if n,m are 
largeish, though this will typically be in the noise). 

That seems to be inline with what I got with 6 segments. I'm running some 30-50 
seg range tests on my other laptop now.

bq. But warming after reopen should be much faster with this patch (we should 
try to measure that too).

I've got a base alg for that type of thing around somewhere too, from 831. It 
should be about the same, which means pretty dramatic reopen improvements if 
you multiple segments, especially if the new segment is small. Its likely to be 
small in comparison to all of the segs anyway, which means pretty great 
improvements. 

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

2008-12-09 Thread Grant Ingersoll
See http://lucene.markmail.org/message/fu34tuomnqejchfj?q=RemoteSearchable 
 for just such a proposal


On Dec 8, 2008, at 1:52 PM, Doug Cutting (JIRA) wrote:



   [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654513 
#action_12654513 ]


Doug Cutting commented on LUCENE-1473:
--

Would it take any more lines of code to remove Serializeable from  
the core classes and re-implement RemoteSearchable in a separate  
layer on top of the core APIs?  That layer could be a contrib module  
and could get all the externalizeable love it needs.  It could  
support a specific popular subset of query and filter classes,  
rather than arbitrary Query implementations.  It would be  
extensible, so that if folks wanted to support new kinds of queries,  
they easily could.  This other approach seems like a slippery slope,  
complicating already complex code with new concerns.  It would be  
better to encapsulate these concerns in a layer atop APIs whose back- 
compatibility we already make promises about, no?



Implement standard Serialization across Lucene versions
---

   Key: LUCENE-1473
   URL: https://issues.apache.org/jira/browse/LUCENE-1473
   Project: Lucene - Java
Issue Type: Bug
Components: Search
  Affects Versions: 2.4
  Reporter: Jason Rutherglen
  Priority: Minor
   Attachments: custom-externalizable-reader.patch,  
LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch,  
LUCENE-1473.patch


 Original Estimate: 8h
Remaining Estimate: 8h

To maintain serialization compatibility between Lucene versions,  
serialVersionUID needs to be added to classes that implement  
java.io.Serializable.  java.io.Externalizable may be implemented in  
classes for faster performance.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2008-12-09 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654875#action_12654875
 ] 

Doug Cutting commented on LUCENE-1482:
--

safeDebugMsg is protected in a public class, which means it will appear in the 
javadoc, which it should not.  Also, logging the thread ID should be done by 
the logging system, not by Lucene.  So that method should just be removed, no?

Also, you've added braces to all of the log statements.  This is in conformance 
with our style guidelines, but I prefer that logging add a minimum of vertical 
space, so that more real logic is visible at once.  I suggest you not make this 
style change in this patch, but propose it separately, if at all.


> Replace infoSteram by a logging framework (SLF4J)
> -
>
> Key: LUCENE-1482
> URL: https://issues.apache.org/jira/browse/LUCENE-1482
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 2.4.1, 2.9
>
> Attachments: LUCENE-1482.patch, slf4j-api-1.5.6.jar, 
> slf4j-nop-1.5.6.jar
>
>
> Lucene makes use of infoStream to output messages in its indexing code only. 
> For debugging purposes, when the search application is run on the customer 
> side, getting messages from other code flows, like search, query parsing, 
> analysis etc can be extremely useful.
> There are two main problems with infoStream today:
> 1. It is owned by IndexWriter, so if I want to add logging capabilities to 
> other classes I need to either expose an API or propagate infoStream to all 
> classes (see for example DocumentsWriter, which receives its infoStream 
> instance from IndexWriter).
> 2. I can either turn debugging on or off, for the entire code.
> Introducing a logging framework can allow each class to control its logging 
> independently, and more importantly, allows the application to turn on 
> logging for only specific areas in the code (i.e., org.apache.lucene.index.*).
> I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, 
> as it names states, a facade over different logging frameworks. As such, you 
> can include the slf4j.jar in your application, and it recognizes at deploy 
> time what is the actual logging framework you'd like to use. SLF4J comes with 
> several adapters for Java logging, Log4j and others. If you know your 
> application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in 
> your classpath, and your logging statements will use Java logging underneath 
> the covers.
> This makes the logging code very simple. For a class A the logger will be 
> instantiated like this:
> public class A {
>   private static final logger = LoggerFactory.getLogger(A.class);
> }
> And will later be used like this:
> public class A {
>   private static final logger = LoggerFactory.getLogger(A.class);
>   public void foo() {
> if (logger.isDebugEnabled()) {
>   logger.debug("message");
> }
>   }
> }
> That's all !
> Checking for isDebugEnabled is very quick, at least using the JDK14 adapter 
> (but I assume it's fast also over other logging frameworks).
> The important thing is, every class controls its own logger. Not all classes 
> have to output logging messages, and we can improve Lucene's logging 
> gradually, w/o changing the API, by adding more logging messages to 
> interesting classes.
> I will submit a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654895#action_12654895
 ] 

Mark Miller commented on LUCENE-1483:
-

Ill bench again after this issue is polished up, but it looks like at 100 
segments I am seeing the 20% drop. I didn't see any drop at 6 segments in a 
retest.

Ill do some longer, more thought out benchmarks when the patch is in better 
shape.

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1314) IndexReader.clone

2008-12-09 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1314:
-

Summary: IndexReader.clone  (was: IndexReader.reopen(boolean force))

> IndexReader.clone
> -
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.3.1
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, 
> lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, 
> lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, 
> lucene-1314.patch
>
>
> Based on discussion 
> http://www.nabble.com/IndexReader.reopen-issue-td18070256.html.  The problem 
> is reopen returns the same reader if there are no changes, so if docs are 
> deleted from the new reader, they are also reflected in the previous reader 
> which is not always desired behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654922#action_12654922
 ] 

Michael McCandless commented on LUCENE-1483:



Hmmm.

OK I think I see what could explain this: insertion into the pqueue is
fairly costly.  So, because we now make 100 pqueues, each gathering
top N results, we are paying much more insertion cost overall than the
single queue that IndexSearcher(MultiReader) uses.

So how about still doing the searches per-sub-reader(searcher),
but, make a HitCollector that gathers the results into a single
pqueue, passing that HitCollector to each sub-searcher?

If that turns out OK, then I think it would make LUCENE-1471 moot
because we should similarly change MultiSearcher to use a single
shared pqueue.

Actually I think this approach should be a bit faster, because there
is some very small method call overhead to how MultiReader implements
TermDocs/Positions by "concatenating" its sub-readers.  So by pushing
Searcher down onto each SegmentReader we should gain a bit, but it
could very well be in the noise.  For this reason we may in fact want
to do this same thing for the "normal" (sort by relevance)
IndexSearcher.search.

I wish I thought of this sooner.  Sorry for the runaround Mark!


> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1484) Remove SegmentReader.document synchronization

2008-12-09 Thread Jason Rutherglen (JIRA)
Remove SegmentReader.document synchronization
-

 Key: LUCENE-1484
 URL: https://issues.apache.org/jira/browse/LUCENE-1484
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4
Reporter: Jason Rutherglen


This is probably the last synchronization issue in Lucene.  It is the document 
method in SegmentReader.  It is avoidable by using a threadlocal for 
FieldsReader.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1475) Expose sub-IndexReaders from MultiReader or MultiSegmentReader

2008-12-09 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654923#action_12654923
 ] 

Jason Rutherglen commented on LUCENE-1475:
--

RE:
"It should return an empty array, not null when there are no sub readers."

It should return null because there are no results.  An empty array almost 
implies the SegmentReader can contain other readers or that they may show up in 
the future.  IMO the API is garbage anyways because it should be using an 
interface like the JDK classes do.

MM:
"What should be returned if a Multi*Reader has embedded Multi*Readers as 
sub-readers?"

I don't like this approach and the comments seem sound like over engineering a 
simple solution.  If the user wants all the sub of sub readers, they need to 
write that code externally to Lucene.  Otherwise it is not easy to know what 
the sub readers are for the given reader.  

> Expose sub-IndexReaders from MultiReader or MultiSegmentReader
> --
>
> Key: LUCENE-1475
> URL: https://issues.apache.org/jira/browse/LUCENE-1475
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-1475.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> MultiReader and MultiSegmentReader are package protected and do not expose 
> the underlying sub-IndexReaders.  A way to expose the sub-readers is to have 
> an interface that an IndexReader may be cast to that exposes the underlying 
> readers.  
> This is for realtime indexing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1475) Expose sub-IndexReaders from MultiReader or MultiSegmentReader

2008-12-09 Thread robert engels (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654924#action_12654924
 ] 

robert engels commented on LUCENE-1475:
---

That is not correct. By returning a non-null array, it is trivially to get an 
ordered list of all subreaders using simple recursion.

It does not need to be an interface... no reason, if adding a new method to 
IndexReader (and changing the implementations).

> Expose sub-IndexReaders from MultiReader or MultiSegmentReader
> --
>
> Key: LUCENE-1475
> URL: https://issues.apache.org/jira/browse/LUCENE-1475
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-1475.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> MultiReader and MultiSegmentReader are package protected and do not expose 
> the underlying sub-IndexReaders.  A way to expose the sub-readers is to have 
> an interface that an IndexReader may be cast to that exposes the underlying 
> readers.  
> This is for realtime indexing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1485) Use OpenBitSet instead of BitVector in SegmentReader

2008-12-09 Thread Jason Rutherglen (JIRA)
Use OpenBitSet instead of BitVector in SegmentReader


 Key: LUCENE-1485
 URL: https://issues.apache.org/jira/browse/LUCENE-1485
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4
Reporter: Jason Rutherglen
Priority: Minor


Tried out BitVector.get vs OpenBitSet.get here's the results which are about 
the same after running 25 times in milliseconds.  It is assumed that 
implementing DocIdSetIterator for in SegmentTermDocs will speed things up more.

bit set size: 10,485,760
set bits count: 524,032
openbitset: 68
bitvector: 89

24% speed increase.

I will implement a patch that adds the WriteableBitSet interface and make a 
subclass of OpenBitSet that is writeable to disk.  We're working on an isSparse 
method for OpenBitSet.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1485) Use OpenBitSet instead of BitVector in SegmentReader

2008-12-09 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1485:
-

Description: 
Tried out BitVector.get vs OpenBitSet.get here's the results which are about 
the same after running 25 times in milliseconds.  It is assumed that 
implementing DocIdSetIterator in SegmentTermDocs will speed things up more.

bit set size: 10,485,760
set bits count: 524,032
openbitset: 68
bitvector: 89

24% speed increase.

I will implement a patch that adds the WriteableBitSet interface and make a 
subclass of OpenBitSet that is writeable to disk.  We're working on an isSparse 
method for OpenBitSet.  

  was:
Tried out BitVector.get vs OpenBitSet.get here's the results which are about 
the same after running 25 times in milliseconds.  It is assumed that 
implementing DocIdSetIterator for in SegmentTermDocs will speed things up more.

bit set size: 10,485,760
set bits count: 524,032
openbitset: 68
bitvector: 89

24% speed increase.

I will implement a patch that adds the WriteableBitSet interface and make a 
subclass of OpenBitSet that is writeable to disk.  We're working on an isSparse 
method for OpenBitSet.  


> Use OpenBitSet instead of BitVector in SegmentReader
> 
>
> Key: LUCENE-1485
> URL: https://issues.apache.org/jira/browse/LUCENE-1485
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Tried out BitVector.get vs OpenBitSet.get here's the results which are about 
> the same after running 25 times in milliseconds.  It is assumed that 
> implementing DocIdSetIterator in SegmentTermDocs will speed things up more.
> bit set size: 10,485,760
> set bits count: 524,032
> openbitset: 68
> bitvector: 89
> 24% speed increase.
> I will implement a patch that adds the WriteableBitSet interface and make a 
> subclass of OpenBitSet that is writeable to disk.  We're working on an 
> isSparse method for OpenBitSet.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1485) Use OpenBitSet instead of BitVector in SegmentReader

2008-12-09 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1485:
-

Attachment: TestDeletedDocsSpeed.java

TestDeletedDocsSpeed.java

Executes get on BitVector and OpenBitSet.  FastGet is called on OpenBitSet.  

> Use OpenBitSet instead of BitVector in SegmentReader
> 
>
> Key: LUCENE-1485
> URL: https://issues.apache.org/jira/browse/LUCENE-1485
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: TestDeletedDocsSpeed.java
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Tried out BitVector.get vs OpenBitSet.get here's the results which are about 
> the same after running 25 times in milliseconds.  It is assumed that 
> implementing DocIdSetIterator in SegmentTermDocs will speed things up more.
> bit set size: 10,485,760
> set bits count: 524,032
> openbitset: 68
> bitvector: 89
> 24% speed increase.
> I will implement a patch that adds the WriteableBitSet interface and make a 
> subclass of OpenBitSet that is writeable to disk.  We're working on an 
> isSparse method for OpenBitSet.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654940#action_12654940
 ] 

Doug Cutting commented on LUCENE-1483:
--

> make a HitCollector that gathers the results into a single pqueue

That's good when everything's local, but bad when things are distributed.  If 
we movE RemoteSearchable to contrib (as discussed in LUCENE-1314) then this may 
not be a problem, but we might still leave hooks so that someone can write a 
search that uses a separate top-queue per remote segment.

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654962#action_12654962
 ] 

Michael McCandless commented on LUCENE-1483:


{quote}
>> make a HitCollector that gathers the results into a single pqueue
>
> That's good when everything's local, but bad when things are distributed. If 
> we movE RemoteSearchable to contrib (as discussed in LUCENE-1314) then this 
> may not be a problem, but we might still leave hooks so that someone can 
> write a search that uses a separate top-queue per remote segment.
{quote}

Good point; so this means we can't blindly do this optimization to 
MultiSearcher (w/o having option to do separate queues, merged in the end).  
But for IndexSearcher(Multi*Reader).search it should be safe?

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654970#action_12654970
 ] 

Doug Cutting commented on LUCENE-1483:
--

> But for IndexSearcher(Multi*Reader).search it should be safe?

Right.  Perhaps this is a reason to encourage folks to use MultiReader instead 
of MultiSearcher.  Are there cases, other than distributed, where MultiSearcher 
is required?  If not, perhaps it could be moved to the contrib/distributed 
layer too.

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Bounds checking in BItVector

2008-12-09 Thread Jason Rutherglen
I ran another test on the speed of BitVector vs. OpenBitSet.  Unless the
DocIdSetIterator is faster for OpenBitSet vs an equivalent for BitVector, BV
is faster when it's bounds checking is removed.  I'm trying to figure out a
good way to allow a modified version of BitVector that does not do bounds
checking.  BitVector is final, but perhaps making it non-final and creating
a FastBitVector instantiated by SegmentReader from a (is there an
alternative) system property is one way to go.


[jira] Commented: (LUCENE-1485) Use OpenBitSet instead of BitVector in SegmentReader

2008-12-09 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654994#action_12654994
 ] 

Jason Rutherglen commented on LUCENE-1485:
--

This above test was using the -client option in the JVM on Mac OS X.  Using 
-server the numbers look almost the same for OpenBitSet and BitVector with 
BitVector being slightly faster.  

> Use OpenBitSet instead of BitVector in SegmentReader
> 
>
> Key: LUCENE-1485
> URL: https://issues.apache.org/jira/browse/LUCENE-1485
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: TestDeletedDocsSpeed.java
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Tried out BitVector.get vs OpenBitSet.get here's the results which are about 
> the same after running 25 times in milliseconds.  It is assumed that 
> implementing DocIdSetIterator in SegmentTermDocs will speed things up more.
> bit set size: 10,485,760
> set bits count: 524,032
> openbitset: 68
> bitvector: 89
> 24% speed increase.
> I will implement a patch that adds the WriteableBitSet interface and make a 
> subclass of OpenBitSet that is writeable to disk.  We're working on an 
> isSparse method for OpenBitSet.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1478) Missing possibility to supply custom FieldParser when sorting search results

2008-12-09 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655002#action_12655002
 ] 

Uwe Schindler commented on LUCENE-1478:
---

Just a note:
For the FieldCache it is also important, that the parser is a singleton or 
implements hashCode() and equals(). If not, each call to sort with another 
SortField using a different parser instance (but from same class) would create 
a new FieldCache. This is why I said, that SortComparators and Parsers should 
generally be made static final class variables (and so singletons) like I have 
done it in TrieUtils. With that you can be sure, that all SortFields hit the 
same cache entry when looking up using FieldCacheImpl.Entry. The use of 
hashCode and equals for parsers is the other variant, but it does not make 
really sense (as long as parsers and comparators do not have an 
instance-specific state).

> Missing possibility to supply custom FieldParser when sorting search results
> 
>
> Key: LUCENE-1478
> URL: https://issues.apache.org/jira/browse/LUCENE-1478
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.4
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 2.9
>
> Attachments: LUCENE-1478-cleanup.patch, 
> LUCENE-1478-no-superinterface.patch, LUCENE-1478.patch, LUCENE-1478.patch, 
> LUCENE-1478.patch, LUCENE-1478.patch, LUCENE-1478.patch
>
>
> When implementing the new TrieRangeQuery for contrib (LUCENE-1470), I was 
> confronted by the problem that the special trie-encoded values (which are 
> longs in a special encoding) cannot be sorted by Searcher.search() and 
> SortField. The problem is: If you use SortField.LONG, you get 
> NumberFormatExceptions. The trie encoded values may be sorted using 
> SortField.String (as the encoding is in such a way, that they are sortable as 
> Strings), but this is very memory ineffective.
> ExtendedFieldCache gives the possibility to specify a custom LongParser when 
> retrieving the cached values. But you cannot use this during searching, 
> because there is no possibility to supply this custom LongParser to the 
> SortField.
> I propose a change in the sort classes:
> Include a pointer to the parser instance to be used in SortField (if not 
> given use the default). My idea is to create a SortField using a new 
> constructor
> {code}SortField(String field, int type, Object parser, boolean reverse){code}
> The parser is "object" because all current parsers have no super-interface. 
> The ideal solution would be to have:
> {code}SortField(String field, int type, FieldCache.Parser parser, boolean 
> reverse){code}
> and FieldCache.Parser is a super-interface (just empty, more like a 
> marker-interface) of all other parsers (like LongParser...). The sort 
> implementation then must be changed to respect the given parser (if not 
> NULL), else use the default FieldCache.get without parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Issue Comment Edited: (LUCENE-1478) Missing possibility to supply custom FieldParser when sorting search results

2008-12-09 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655002#action_12655002
 ] 

thetaphi edited comment on LUCENE-1478 at 12/9/08 2:55 PM:


Just a note:
For the FieldCache it is also important, that the parser is a singleton or 
implements hashCode() and equals(). If not, each call to sort with another 
SortField using a different parser instance (but from the same class) would 
create a new FieldCache. This is why I said, that SortComparators and Parsers 
should generally be made static final members (and so singletons) like I have 
done it in TrieUtils. With that you can be sure, that all SortFields hit the 
same cache entry when looking up using FieldCacheImpl.Entry. The implementation 
of hashCode and equals for parsers is the other variant, but it does not make 
really sense (as long as parsers and comparators do not have an 
instance-specific state).

  was (Author: thetaphi):
Just a note:
For the FieldCache it is also important, that the parser is a singleton or 
implements hashCode() and equals(). If not, each call to sort with another 
SortField using a different parser instance (but from same class) would create 
a new FieldCache. This is why I said, that SortComparators and Parsers should 
generally be made static final class variables (and so singletons) like I have 
done it in TrieUtils. With that you can be sure, that all SortFields hit the 
same cache entry when looking up using FieldCacheImpl.Entry. The use of 
hashCode and equals for parsers is the other variant, but it does not make 
really sense (as long as parsers and comparators do not have an 
instance-specific state).
  
> Missing possibility to supply custom FieldParser when sorting search results
> 
>
> Key: LUCENE-1478
> URL: https://issues.apache.org/jira/browse/LUCENE-1478
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.4
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 2.9
>
> Attachments: LUCENE-1478-cleanup.patch, 
> LUCENE-1478-no-superinterface.patch, LUCENE-1478.patch, LUCENE-1478.patch, 
> LUCENE-1478.patch, LUCENE-1478.patch, LUCENE-1478.patch
>
>
> When implementing the new TrieRangeQuery for contrib (LUCENE-1470), I was 
> confronted by the problem that the special trie-encoded values (which are 
> longs in a special encoding) cannot be sorted by Searcher.search() and 
> SortField. The problem is: If you use SortField.LONG, you get 
> NumberFormatExceptions. The trie encoded values may be sorted using 
> SortField.String (as the encoding is in such a way, that they are sortable as 
> Strings), but this is very memory ineffective.
> ExtendedFieldCache gives the possibility to specify a custom LongParser when 
> retrieving the cached values. But you cannot use this during searching, 
> because there is no possibility to supply this custom LongParser to the 
> SortField.
> I propose a change in the sort classes:
> Include a pointer to the parser instance to be used in SortField (if not 
> given use the default). My idea is to create a SortField using a new 
> constructor
> {code}SortField(String field, int type, Object parser, boolean reverse){code}
> The parser is "object" because all current parsers have no super-interface. 
> The ideal solution would be to have:
> {code}SortField(String field, int type, FieldCache.Parser parser, boolean 
> reverse){code}
> and FieldCache.Parser is a super-interface (just empty, more like a 
> marker-interface) of all other parsers (like LongParser...). The sort 
> implementation then must be changed to respect the given parser (if not 
> NULL), else use the default FieldCache.get without parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655005#action_12655005
 ] 

Mark Miller commented on LUCENE-1483:
-

bq. make a HitCollector that gathers the results into a single pqueue 

Well it certainly made the code cleaner and the patch a bit nicer, but on first 
quick test, I still see the 20% slowdown with 100 more/less segments.

I'm looking through too see where I may have done something funny.

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655025#action_12655025
 ] 

Michael McCandless commented on LUCENE-1483:


bq. on first quick test, I still see the 20% slowdown with 100 more/less 
segments.

Argh!  Can you post your current patch?

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1483:


Attachment: LUCENE-1483.patch

Here is what I've got. The final sort test still fails, but the rest should 
pass.

- Mark

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch, LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655036#action_12655036
 ] 

Mark Miller commented on LUCENE-1483:
-

I also did a quick reopen alg. The speed gain on this can really vary depending 
on index access patterns. I tried adding 50 (very small) docs with a random 
sort field. Then added 5 docs and then reopen 10 times. Repeat all 4 times. 
Comparison is of the time it takes to load the fieldcache and do one search. 
With this patch it came out about 40-50% faster. Obviously going to depend on 
many factors in real world though. In certain applications I'm sure it could be 
many times faster or slower.

> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch, LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655048#action_12655048
 ] 

Mark Miller commented on LUCENE-1483:
-

Doing a little profiling on the new code and off the top results of interest 
are:

FieldSortedHitQueue.lessThan(object,object) approx 12%
FieldSortedHitQueue.insertWIthOverflow(object) approx 12%
MultiReaderTopFieldDocCollector.collect(int,float) 6.3%
FieldSortedHitQueue$4.compare() 5.3 %

and on...



> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch, LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1484) Remove SegmentReader.document synchronization

2008-12-09 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1484:
-

Attachment: LUCENE-1484.patch

LUCENE-1484.patch

- FieldsReader implements Cloneable
- fieldsReaderLocal added to SegmentReader
- TestIndexReader, TestFieldsReader, TestSegmentReader, 
TestParallelMultiSearcher passes

> Remove SegmentReader.document synchronization
> -
>
> Key: LUCENE-1484
> URL: https://issues.apache.org/jira/browse/LUCENE-1484
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
> Attachments: LUCENE-1484.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> This is probably the last synchronization issue in Lucene.  It is the 
> document method in SegmentReader.  It is avoidable by using a threadlocal for 
> FieldsReader.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655048#action_12655048
 ] 

[EMAIL PROTECTED] edited comment on LUCENE-1483 at 12/9/08 6:15 PM:
--

Doing a little profiling on the new code and off the top results of interest 
are:

FieldSortedHitQueue.lessThan(object,object) approx 12%
FieldSortedHitQueue.insertWIthOverflow(object) approx 12%
MultiReaderTopFieldDocCollector.collect(int,float) 6.3%
FieldSortedHitQueue$4.compare() 5.3 %

and on...


For Lucene trunk, a day or two ago:

FieldSortedHitQueue.insertWIthOverflow(object) approx 11%
TopFieldDocCollector.collect(int,float) 7.1%
FieldSortedHitQueue.lessThan(object,object) approx 6.7%
FieldSortedHitQueue.updateMaxScore 3.2%
FieldSortedHitQueue$4.compare() 3.2 %


  was (Author: [EMAIL PROTECTED]):
Doing a little profiling on the new code and off the top results of 
interest are:

FieldSortedHitQueue.lessThan(object,object) approx 12%
FieldSortedHitQueue.insertWIthOverflow(object) approx 12%
MultiReaderTopFieldDocCollector.collect(int,float) 6.3%
FieldSortedHitQueue$4.compare() 5.3 %

and on...


  
> Change IndexSearcher to use MultiSearcher semantics for sorted searches
> ---
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-1483.patch, LUCENE-1483.patch
>
>
> Here is a quick test patch. FieldCache for sorting is done at the individual 
> IndexReader level and reloading the fieldcache on reopen can be much faster 
> as only changed segments need to be reloaded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: jira attachments ?

2008-12-09 Thread Andrew Myers

Hi Robert,

I'm part of the JIRA support team and one of our devs brought this up  
so I've taken a quick look to see if i can try and replicate your  
problem.


I just upgraded to Safari 3.2 on OS X 10.5 and tested on a couple of  
different versions of JIRA (3.12.2 and 3.13.2) and didn't have any  
troubles. I also had a go at testing on https://issues.apache.org/jira/browse/TST-76 
 and that worked without issues


Unfortunately I don't have a version of 10.4 handy to test at the  
moment, so maybe its something to do with the combination of 10.4 and  
safari 3.2


Apologies if this comes through twice, I hadn't subscribed to the list  
before so I'm not sure it worked :)


Andrew
Atlassian Support




Begin forwarded message:


 Original Message 
Subject:Re: jira attachments ?
Date:   Thu, 4 Dec 2008 18:07:41 -0600
From:   robert engels <[EMAIL PROTECTED]>
Reply-To:   java-dev@lucene.apache.org
To: java-dev@lucene.apache.org
References: 	<[EMAIL PROTECTED]>  
<[EMAIL PROTECTED]> <[EMAIL PROTECTED] 
> <[EMAIL PROTECTED]> <[EMAIL PROTECTED] 
> <[EMAIL PROTECTED]>




Could be...  I will try next time...

Seems a strange (and serious) bug in Jira (I have no problems  
with  other "add attachment" sites) ...


On Dec 4, 2008, at 5:59 PM, Michael McCandless wrote:



Hmmm the only time I've seen this was also with Safari  
(though  on an older version).  It caused me to switch [back] to  
Firefox.   Try Firefox?


Mike

robert engels wrote:


I am using Safari 3.2 (on OSX Tiger).

On Dec 4, 2008, at 5:38 PM, Michael McCandless wrote:



Robert which browser are you using?

Mike

robert engels wrote:

Dear God, I've been blocked ! What will the Lucene community   
do ! :)


On Dec 4, 2008, at 3:27 PM, Uwe Schindler wrote:


Hi Robert,

two minutes ago I uploaded a patch...

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [EMAIL PROTECTED]


From: robert engels [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 04, 2008 9:37 PM
To: java-dev@lucene.apache.org
Subject: jira attachments ?

I am having a problem posting an attachment to Jira. Just   
spins, and

spins...

Everything else seems to work fine (comments, etc.).

Anyone else experiencing this?

Thanks.

- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-- ---
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: java-dev- 
[EMAIL PROTECTED]





--- --
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




 -
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]










[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-09 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655081#action_12655081
 ] 

Marvin Humphrey commented on LUCENE-831:


> Marvin, does KS/Lucy have something like FieldCache? If so, what API do you
> use? Is it iterator-only? 

At present, KS only caches the docID -> ord map as an array.  It builds that
array by iterating over the terms in the sort field's Lexicon and mapping the
docIDs from each term's posting list.

Building the docID -> ord array is straightforward for a single-segment
SegLexicon.  The multi-segment case requires that several SegLexicons be
collated using a priority queue.  In KS, there's a MultiLexicon class which
handles this; I don't believe that Lucene has an analogous class.

Relying on the docID -> ord array alone works quite well until you get to the
MultiSearcher case.  As you know, at that point you need to be able to
retrieve the actual field values from the ordinal numbers, so that you can
compare across multiple searchers (since the ordinal values are meaningless).

{code}
Lex_Seek_By_Num(lexicon, term_num);
field_val = Lex_Get_Term(lexicon);
{code}

The problem is that seeking by ordinal value on a MultiLexicon iterator
requires a gnarly implementation and is very expensive.  I got it working, but
I consider it a dead-end design and a failed experiment.

The planned replacement for these iterator-based quasi-FieldCaches involves
several topics of recent discussion:

  1) A "keyword" field type, implemented using a format similar to what Nate 
 and I came up with for the lexicon index.
  2) Write per-segment docID -> ord maps at index time for sort fields.
  3) Memory mapping.
  4) Segment-centric searching.

We'd mmap the pre-composed docID -> ord map and use it for intra-segment
sorting.  The keyword field type would be implemented in such a way that we'd
be able to mmap a few files and get a per-segment field cache, which we'd then
use to sort hits from multiple segments.

> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2008-12-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-1482:
---

Attachment: LUCENE-1482-2.patch

I kept safeDebugMsg because it was used by a class which extended IndexWriter 
and relied on that method to be called. However, I fixed the class by 
overriding testPoint instead. So I can now remove safeDebugMsg.

As for the output format, I agree that it should be handled by the logging 
system, but wanted to confirm that with other members before I change it. I'm 
glad that you agree to that too.

Attached is a new patch which removes the method.

> Replace infoSteram by a logging framework (SLF4J)
> -
>
> Key: LUCENE-1482
> URL: https://issues.apache.org/jira/browse/LUCENE-1482
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 2.4.1, 2.9
>
> Attachments: LUCENE-1482-2.patch, LUCENE-1482.patch, 
> slf4j-api-1.5.6.jar, slf4j-nop-1.5.6.jar
>
>
> Lucene makes use of infoStream to output messages in its indexing code only. 
> For debugging purposes, when the search application is run on the customer 
> side, getting messages from other code flows, like search, query parsing, 
> analysis etc can be extremely useful.
> There are two main problems with infoStream today:
> 1. It is owned by IndexWriter, so if I want to add logging capabilities to 
> other classes I need to either expose an API or propagate infoStream to all 
> classes (see for example DocumentsWriter, which receives its infoStream 
> instance from IndexWriter).
> 2. I can either turn debugging on or off, for the entire code.
> Introducing a logging framework can allow each class to control its logging 
> independently, and more importantly, allows the application to turn on 
> logging for only specific areas in the code (i.e., org.apache.lucene.index.*).
> I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, 
> as it names states, a facade over different logging frameworks. As such, you 
> can include the slf4j.jar in your application, and it recognizes at deploy 
> time what is the actual logging framework you'd like to use. SLF4J comes with 
> several adapters for Java logging, Log4j and others. If you know your 
> application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in 
> your classpath, and your logging statements will use Java logging underneath 
> the covers.
> This makes the logging code very simple. For a class A the logger will be 
> instantiated like this:
> public class A {
>   private static final logger = LoggerFactory.getLogger(A.class);
> }
> And will later be used like this:
> public class A {
>   private static final logger = LoggerFactory.getLogger(A.class);
>   public void foo() {
> if (logger.isDebugEnabled()) {
>   logger.debug("message");
> }
>   }
> }
> That's all !
> Checking for isDebugEnabled is very quick, at least using the JDK14 adapter 
> (but I assume it's fast also over other logging frameworks).
> The important thing is, every class controls its own logger. Not all classes 
> have to output logging messages, and we can improve Lucene's logging 
> gradually, w/o changing the API, by adding more logging messages to 
> interesting classes.
> I will submit a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]