[JENKINS] Solr-trunk - Build # 1544 - Failure

2011-06-26 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Solr-trunk/1544/

All tests passed

Build Log (for compile errors):
[...truncated 18107 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-06-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2458:
--

Attachment: SOLR-2458.patch

Attaching final patch which will be committed shortly. Added better error 
handling.

 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
Assignee: Jan Høydahl
  Labels: post.jar
 Fix For: 3.3

 Attachments: SOLR-2458.patch, SOLR-2458.patch, SOLR-2458.patch


 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-06-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055040#comment-13055040
 ] 

Jan Høydahl commented on LUCENE-3130:
-

The feature is absolutely needed. Probably it's enough to be able to specify a 
global term boost factor per query for all synonyms, so Robert's method would 
work for me.

Another usecase is Phonetic variants. Currently I use a separate field for 
phonetic normalization and include it with a lower weight in DisMax. If 
phonetic variant instead was stored alongside the original with posIncr=0 and 
tokenType=phonetic, I could instead specify a deboost factor for phonetic terms 
and even highlighting would work ootb!

Yet another is lower/upper case search. If the LowerCaseFilter would keep the 
original token and add a lowercased token on same posIncr with 
tokenType=lowercase, we could support case insensitive match with preference 
for correct case.

If user needs different boost for different fields, perhaps the TokenType name 
could be configurable on each filter.

 Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
 give lower boosts
 ---

 Key: LUCENE-3130
 URL: https://issues.apache.org/jira/browse/LUCENE-3130
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man

 A recent thread asked if there was anyway to use QueryTime synonyms such that 
 matches on the original term specified by the user would score higher then 
 matches on the synonym.  It occurred to me later that a float Attribute could 
 be set by the SynonymFilter in such situations, and QueryParser could use 
 that float as a boost in the resulting Query.  IThis would be fairly 
 straightforward for the simple synonyms = BooleamQuery case, but we'd have 
 to decide how to handle the case of synonyms with multiple terms that produce 
 MTPQ, possibly just punt for now)
 Likewise, there may be other TokenFilters that inject artificial tokens at 
 query time where it also might make sense to have a reduced boost factor...
 * SynonymFilter
 * CommonGramsFilter
 * WordDelimiterFilter
 * etc...
 In all of these cases, the amount of the boost could me configured, and for 
 back compact could default to 1.0 (or null to not set a boost at all)
 Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
 the boost attribute into the payload attribute, these same filters could give 
 penalizing payloads to terms when used at index time) could give 
 penalizing payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2618) Indexing and search on more then one type (Mapping)

2011-06-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055041#comment-13055041
 ] 

Jan Høydahl commented on SOLR-2618:
---

You might want to talk to Chris Male who held a talk about improving SolrJ for 
interacting with domain objects at Berlin Buzzwords: 
http://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/IntegratingSolrJEEApplications.pdf

I think your idea about storing a class name with the document and using 
reflection to pick the right domain object is interesting..

 Indexing and search on more then one type (Mapping)
 ---

 Key: SOLR-2618
 URL: https://issues.apache.org/jira/browse/SOLR-2618
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 3.2
Reporter: Monica Storfjord
Priority: Minor

 It would be very beneficial for a project that I am currently working on to 
 have the ability to index and search on various subclasses of an object and 
 map the objects directly to the actual domain-object. This functionality 
 exist in Hibernate search for instance. Is this something that future 
 releases have in mind? I would think that this is something that will make 
 the value of Solr more efficient to a lot of users. 
 We are testing SolrJ 3.2 with the use of the SolrJ client and the web 
 interface to index change and search. It should be possible to make a 
 solution that map against a special type field(like field name=classtype 
 type=class) in schemas.xml that are indexed every time and use reflection 
 against the actual class?
 - Monica
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2620) Remove log4j jar from the clustering contrib (uses slf4j).

2011-06-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055047#comment-13055047
 ] 

Jan Høydahl commented on SOLR-2620:
---

You should commit the CHANGES.TXT entry to trunk as well

 Remove log4j jar from the clustering contrib (uses slf4j).
 --

 Key: SOLR-2620
 URL: https://issues.apache.org/jira/browse/SOLR-2620
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Clustering
Affects Versions: 3.3
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 3.3




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-06-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-2458.
---

   Resolution: Fixed
Fix Version/s: (was: 3.3)

Committed for trunk and 3.x

 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
Assignee: Jan Høydahl
  Labels: post.jar
 Attachments: SOLR-2458.patch, SOLR-2458.patch, SOLR-2458.patch


 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Need to create new version 3.4 in JIRA

2011-06-26 Thread Jan Høydahl
Now that 3.3 is being shipped we need 3.4 version in JIRA. I seem not to have 
rights for this

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-06-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055050#comment-13055050
 ] 

Robert Muir commented on LUCENE-3130:
-

{quote}
Currently I use a separate field for phonetic normalization and include it with 
a lower weight in DisMax. If phonetic variant instead was stored alongside the 
original with posIncr=0 and tokenType=phonetic, I could instead specify a 
deboost factor for phonetic terms and even highlighting would work ootb!
{quote}

This doesn't make any sense to me: how is this better shoved into one field 
than two fields? I don't see any advantage at all. field A with original terms 
and field B with phonetic terms is no less efficient in the index than having 
field AB with both mixed up, but keeping them separate keeps code and 
configurations simple.

As for the highlighting, that sounds like a highlighting problem, not an 
analysis problem. If its often the case that users use things like copyField 
and do this boosting, then highlighting in Solr needs to be fixed to correlate 
the offsets back to the original stored field: but we need not make analysis 
more complicated because of this limitation.


{quote}
If the LowerCaseFilter would keep the original token and add a lowercased token 
on same posIncr with tokenType=lowercase, we could support case insensitive 
match with preference for correct case.
{quote}

I don't think we should complicate our tokenfilters with such things: in this 
case I think it would just make the code more complicated and make relevance 
worse: often case is totally meaningless and boosting terms for some arbitrary 
reason will skew scores.

This is for the same reason as above. If you want to do this, I think you 
should use two fields, one with no case, and one with case, and boost one of 
them. 


 Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
 give lower boosts
 ---

 Key: LUCENE-3130
 URL: https://issues.apache.org/jira/browse/LUCENE-3130
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man

 A recent thread asked if there was anyway to use QueryTime synonyms such that 
 matches on the original term specified by the user would score higher then 
 matches on the synonym.  It occurred to me later that a float Attribute could 
 be set by the SynonymFilter in such situations, and QueryParser could use 
 that float as a boost in the resulting Query.  IThis would be fairly 
 straightforward for the simple synonyms = BooleamQuery case, but we'd have 
 to decide how to handle the case of synonyms with multiple terms that produce 
 MTPQ, possibly just punt for now)
 Likewise, there may be other TokenFilters that inject artificial tokens at 
 query time where it also might make sense to have a reduced boost factor...
 * SynonymFilter
 * CommonGramsFilter
 * WordDelimiterFilter
 * etc...
 In all of these cases, the amount of the boost could me configured, and for 
 back compact could default to 1.0 (or null to not set a boost at all)
 Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
 the boost attribute into the payload attribute, these same filters could give 
 penalizing payloads to terms when used at index time) could give 
 penalizing payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Need to create new version 3.4 in JIRA

2011-06-26 Thread Robert Muir
I created this in JIRA over a week ago, it exists!

On Sun, Jun 26, 2011 at 7:00 AM, Jan Høydahl jan@cominvent.com wrote:
 Now that 3.3 is being shipped we need 3.4 version in JIRA. I seem not to have 
 rights for this

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Need to create new version 3.4 in JIRA

2011-06-26 Thread Simon Willnauer
On Sun, Jun 26, 2011 at 1:00 PM, Jan Høydahl jan@cominvent.com wrote:
 Now that 3.3 is being shipped we need 3.4 version in JIRA. I seem not to have 
 rights for this

power granted :) you are an JIRA admin now on both solr  lucene!

simon

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3218) Make CFS appendable

2011-06-26 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3218:


Fix Version/s: 3.4

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3201) improved compound file handling

2011-06-26 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3201:


Fix Version/s: (was: 3.3)
   3.4

 improved compound file handling
 ---

 Key: LUCENE-3201
 URL: https://issues.apache.org/jira/browse/LUCENE-3201
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3201.patch, LUCENE-3201.patch


 Currently CompoundFileReader could use some improvements, i see the following 
 problems
 * its CSIndexInput extends bufferedindexinput, which is stupid for 
 directories like mmap.
 * it seeks on every readInternal
 * its not possible for a directory to override or improve the handling of 
 compound files.
 for example: it seems if you were impl'ing this thing from scratch, you would 
 just wrap the II directly (not extend BufferedIndexInput,
 and add compound file offset X to seek() calls, and override length(). But of 
 course, then you couldnt throw read past EOF always when you should,
 as a user could read into the next file and be left unaware.
 however, some directories could handle this better. for example MMapDirectory 
 could return an indexinput that simply mmaps the 'slice' of the CFS file.
 its underlying bytebuffer etc naturally does bounds checks already etc, so it 
 wouldnt need to be buffered, not even needing to add any offsets to seek(),
 as its position would just work.
 So I think we should try to refactor this so that a Directory can customize 
 how compound files are handled, the simplest 
 case for the least code change would be to add this to Directory.java:
 {code}
   public Directory openCompoundInput(String filename) {
 return new CompoundFileReader(this, filename);
   }
 {code}
 Because most code depends upon the fact compound files are implemented as a 
 Directory and transparent. at least then a subclass could override...
 but the 'recursion' is a little ugly... we could still label it 
 expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-06-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2458:
--

  Component/s: (was: clients - java)
Affects Version/s: (was: 3.1)
Fix Version/s: 3.4

 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
Reporter: Jan Høydahl
Assignee: Jan Høydahl
  Labels: post.jar
 Fix For: 3.4

 Attachments: SOLR-2458.patch, SOLR-2458.patch, SOLR-2458.patch


 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Need to create new version 3.4 in JIRA

2011-06-26 Thread Jan Høydahl
Perhaps you did it only on the LUCENE JIRA? I had to create it for SOLR just 
now.
Thanks for Admin karma Simon :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 26. juni 2011, at 13.04, Robert Muir wrote:

 I created this in JIRA over a week ago, it exists!
 
 On Sun, Jun 26, 2011 at 7:00 AM, Jan Høydahl jan@cominvent.com wrote:
 Now that 3.3 is being shipped we need 3.4 version in JIRA. I seem not to 
 have rights for this
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2620) Remove log4j jar from the clustering contrib (uses slf4j).

2011-06-26 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055054#comment-13055054
 ] 

Dawid Weiss commented on SOLR-2620:
---

This JAR was no longer in trunk -- somebody removed it earlier.

 Remove log4j jar from the clustering contrib (uses slf4j).
 --

 Key: SOLR-2620
 URL: https://issues.apache.org/jira/browse/SOLR-2620
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Clustering
Affects Versions: 3.3
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 3.3




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2011-06-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1979:
--

Fix Version/s: 3.4
   Labels: UpdateProcessor  (was: )

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
  Labels: UpdateProcessor
 Fix For: 3.4

 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch


 Language identification from document fields, and mapping of field names to 
 language-specific fields based on detected language.
 Wrap the Tika LanguageIdentifier in an UpdateProcessor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3212) Supply FilterIndexReader based on any o.a.l.search.Filter

2011-06-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055061#comment-13055061
 ] 

Michael McCandless commented on LUCENE-3212:


I think this is idea is similar to the CachedFilterIndexReader on LUCENE-1536?  
See 
https://issues.apache.org/jira/browse/LUCENE-1536?focusedCommentId=12908914page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12908914

 Supply FilterIndexReader based on any o.a.l.search.Filter
 -

 Key: LUCENE-3212
 URL: https://issues.apache.org/jira/browse/LUCENE-3212
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


 When coding LUCENE-2919 (PKIndexSplitter), Mike and me had the idea, how to 
 effectively apply filters on the lowest level (before query execution). This 
 is very useful for e.g. security Filters that simply hide some documents. 
 Currently when you apply the filter after searching, lots of useless work was 
 done like scoring filtered documents, iterating term positions (for 
 Phrases),...
 This patch will provide a FilterIndexReader subclass (4.0 only, 3.x is too 
 complicated to implement), that hides filtered documents by returning them in 
 getDeletedDocs(). In contrast to LUCENE-2919, the filtering will work on 
 per-segment (without SlowMultiReaderWrapper), so per segment search keeps 
 available and reopening can be done very efficient, as the filter is only 
 calculated on openeing new or changed segments.
 This filter should improve use-cases where the filter can be applied one time 
 before all queries (like security filters) on (re-)opening the IndexReader.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-1742) Wrap SegmentInfos in public class

2011-06-26 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-1742:



 Wrap SegmentInfos in public class 
 --

 Key: LUCENE-1742
 URL: https://issues.apache.org/jira/browse/LUCENE-1742
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 2.9

 Attachments: LUCENE-1742.patch, LUCENE-1742.patch, LUCENE-1742.patch, 
 LUCENE-1742.patch, LUCENE-1742.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not 
 need to be in the org.apache.lucene.index package.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-1742) Wrap SegmentInfos in public class

2011-06-26 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1742:
---

Comment: was deleted

(was: [Thanks|http://rullymisar.com/])

 Wrap SegmentInfos in public class 
 --

 Key: LUCENE-1742
 URL: https://issues.apache.org/jira/browse/LUCENE-1742
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 2.9

 Attachments: LUCENE-1742.patch, LUCENE-1742.patch, LUCENE-1742.patch, 
 LUCENE-1742.patch, LUCENE-1742.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not 
 need to be in the org.apache.lucene.index package.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-1742) Wrap SegmentInfos in public class

2011-06-26 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1742.


Resolution: Fixed

 Wrap SegmentInfos in public class 
 --

 Key: LUCENE-1742
 URL: https://issues.apache.org/jira/browse/LUCENE-1742
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 2.9

 Attachments: LUCENE-1742.patch, LUCENE-1742.patch, LUCENE-1742.patch, 
 LUCENE-1742.patch, LUCENE-1742.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not 
 need to be in the org.apache.lucene.index package.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055063#comment-13055063
 ] 

Michael McCandless commented on LUCENE-3179:


Thanks for fixing these Uwe!

I actually don't like how generic OBS has become... ie, that all methods have 
an int and long version, that the OBS doesn't know how many bits it holds (I 
added this field recently, but only for assertions), that some methods grow 
the number of bits and others don't, some methods accept out-of-bounds indices 
(negative and  numBits), etc.  I think it's grown to accommodate too many 
users but I'm not sure what we should do to fix this.  Maybe factor out 
(yet another) bit set impl that doesn't grow, knows its number of bits, has 
these fast getNext/getPrev set bit methods, operates only on int indices, etc.

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Assignee: Paul Elschot
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3179-fix.patch, LUCENE-3179-fix.patch, 
 LUCENE-3179.patch, LUCENE-3179.patch, LUCENE-3179.patch, TestBitUtil.java, 
 TestOpenBitSet.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055064#comment-13055064
 ] 

Michael McCandless commented on LUCENE-3179:


bq. One more comment: When working on the code, the symmetry all other methods 
have between long and int is broken here. For consistency we should add the 
long method, too. I just don't like the missing consistency.

I think we should add the long version, for consistency.

bq. Also: OpenBitSet.nextSetBit() does not use Long.numberOfTrailingZeroes() 
but the new prevSetBit() does. As both methods have intrinsics, why only use 
one of them? Yonik?

Good question!  In testing on this issue, above, Dawid and Paul found the 
intrinsics were faster on modern JREs... seems like nextSetBit should cutover 
too?

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Assignee: Paul Elschot
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3179-fix.patch, LUCENE-3179-fix.patch, 
 LUCENE-3179.patch, LUCENE-3179.patch, LUCENE-3179.patch, TestBitUtil.java, 
 TestOpenBitSet.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading

2011-06-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055069#comment-13055069
 ] 

Michael McCandless commented on LUCENE-3228:


bq.  lets just commit the package-list files for all third party libs we use 
into dev-tools and completely eliminate the need for net when building javadocs.

+1

Hitting build failures because we can't download these package lists is silly.

 build should allow you (especially hudson) to refer to a local javadocs 
 installation instead of downloading
 ---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir

 Currently, we fail on all javadocs warnings.
 However, you get a warning if it cannot download the package-list from sun.com
 So I think we should allow you optionally set a sysprop using linkoffline.
 Then we would get much less hudson fake failures
 I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-26 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Made the signature of EasySimilarity.score() a bit saner.

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading

2011-06-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055071#comment-13055071
 ] 

Robert Muir commented on LUCENE-3228:
-

I agree with hossman too. I'm just a javadocs dummy and was doing what I could 
to stop the 30minute builds.

I cant figure out this linkoffline (at least with my experiments its 
confusing)... but this sounds great.

 build should allow you (especially hudson) to refer to a local javadocs 
 installation instead of downloading
 ---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir

 Currently, we fail on all javadocs warnings.
 However, you get a warning if it cannot download the package-list from sun.com
 So I think we should allow you optionally set a sysprop using linkoffline.
 Then we would get much less hudson fake failures
 I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3225) Optimize TermsEnum.seek when caller doesn't need next term

2011-06-26 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3225.


Resolution: Fixed

 Optimize TermsEnum.seek when caller doesn't need next term
 --

 Key: LUCENE-3225
 URL: https://issues.apache.org/jira/browse/LUCENE-3225
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-3225.patch, LUCENE-3225.patch


 Some codecs are able to save CPU if the caller is only interested in
 exact matches.  EG, Memory codec and SimpleText can do more efficient
 FSTEnum lookup if they know the caller doesn't need to know the term
 following the seek term.
 We have cases like this in Lucene, eg when IW deletes documents by
 Term, if the term is not found in a given segment then it doesn't need
 to know the ceiling term.  Likewise when TermQuery looks up the term
 in each segment.
 I had done this change as part of LUCENE-3030, which is a new terms
 index that's able to save seeking for exact-only lookups, but now that
 we have Memory codec that can also save CPU I think we should commit
 this today.
 The change adds a boolean onlyExact param to seek(BytesRef).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #160: POMs out of sync

2011-06-26 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/160/

No tests ran.

Build Log (for compile errors):
[...truncated 12698 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-26 Thread Jahangir Anwari (JIRA)
FastVectorHighlighter - add position offset to 
FieldPhraseList.WeightedPhraseInfo.Toffs
---

 Key: LUCENE-3243
 URL: https://issues.apache.org/jira/browse/LUCENE-3243
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.2
 Environment: Lucene 3.2
Reporter: Jahangir Anwari
Priority: Minor


Needed to return position offsets along with highlighted snippets when using 
FVH for highlighting. 

Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
patch I was able to get the fragInfo for a particular Phrase search. Currently 
the Toffs(Term offsets) class only stores the start and end offset.

To get the position offset, I added the position offset information in Toffs 
and FieldPhraseList class.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-26 Thread Jahangir Anwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jahangir Anwari updated LUCENE-3243:


Attachment: (was: LUCENE-3243.patch.diff)

 FastVectorHighlighter - add position offset to 
 FieldPhraseList.WeightedPhraseInfo.Toffs
 ---

 Key: LUCENE-3243
 URL: https://issues.apache.org/jira/browse/LUCENE-3243
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.2
 Environment: Lucene 3.2
Reporter: Jahangir Anwari
Priority: Minor
  Labels: feature, lucene

 Needed to return position offsets along with highlighted snippets when using 
 FVH for highlighting. 
 Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
 patch I was able to get the fragInfo for a particular Phrase search. 
 Currently the Toffs(Term offsets) class only stores the start and end offset.
 To get the position offset, I added the position offset information in Toffs 
 and FieldPhraseList class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-26 Thread Jahangir Anwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jahangir Anwari updated LUCENE-3243:


Attachment: LUCENE-3243.patch.diff

 FastVectorHighlighter - add position offset to 
 FieldPhraseList.WeightedPhraseInfo.Toffs
 ---

 Key: LUCENE-3243
 URL: https://issues.apache.org/jira/browse/LUCENE-3243
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.2
 Environment: Lucene 3.2
Reporter: Jahangir Anwari
Priority: Minor
  Labels: feature, lucene

 Needed to return position offsets along with highlighted snippets when using 
 FVH for highlighting. 
 Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
 patch I was able to get the fragInfo for a particular Phrase search. 
 Currently the Toffs(Term offsets) class only stores the start and end offset.
 To get the position offset, I added the position offset information in Toffs 
 and FieldPhraseList class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

2011-06-26 Thread Jahangir Anwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jahangir Anwari updated LUCENE-3243:


Attachment: LUCENE-3243.patch.diff

 FastVectorHighlighter - add position offset to 
 FieldPhraseList.WeightedPhraseInfo.Toffs
 ---

 Key: LUCENE-3243
 URL: https://issues.apache.org/jira/browse/LUCENE-3243
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.2
 Environment: Lucene 3.2
Reporter: Jahangir Anwari
Priority: Minor
  Labels: feature, lucene
 Attachments: LUCENE-3243.patch.diff


 Needed to return position offsets along with highlighted snippets when using 
 FVH for highlighting. 
 Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) 
 patch I was able to get the fragInfo for a particular Phrase search. 
 Currently the Toffs(Term offsets) class only stores the start and end offset.
 To get the position offset, I added the position offset information in Toffs 
 and FieldPhraseList class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3171) BlockJoinQuery/Collector

2011-06-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055096#comment-13055096
 ] 

Michael McCandless commented on LUCENE-3171:


bq. The possible inefficiency is the same as the one for a any sparsely filled 
OpenBitSet.

Ahh, OK.  Though, I suspect this (the linear scan OBS does for next/prevSetBit) 
is a minor cost overall, if indeed the app has so many child docs per parent 
that a sparse bit set would be warranted?  Ie, the Query/Collector would still 
be visiting these many child docs per parent, I guess?  (Unless the query hits 
few results).

I don't think a jdoc warning is really required for this... but I'm fine if you 
want to add one?

I'll commit this soon and resolve LUCENE-2454 as duplicate!

 BlockJoinQuery/Collector
 

 Key: LUCENE-3171
 URL: https://issues.apache.org/jira/browse/LUCENE-3171
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/other
Reporter: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3171.patch, LUCENE-3171.patch, LUCENE-3171.patch


 I created a single-pass Query + Collector to implement nested docs.
 The approach is similar to LUCENE-2454, in that the app must index
 documents in join order, as a block (IW.add/updateDocuments), with
 the parent doc at the end of the block, except that this impl is one
 pass.
 Once you join at indexing time, you can take any query that matches
 child docs and join it up to the parent docID space, using
 BlockJoinQuery.  You then use BlockJoinCollector, which sorts parent
 docs by provided Sort, to gather results, grouped by parent; this
 collector finds any BlockJoinQuerys (using Scorer.visitScorers) and
 retains the child docs corresponding to each collected parent doc.
 After searching is done, you retrieve the TopGroups from a provided
 BlockJoinQuery.
 Like LUCENE-2454, this is less general than the arbitrary joins in
 Solr (SOLR-2272) or parent/child from ElasticSearch
 (https://github.com/elasticsearch/elasticsearch/issues/553), since you
 must do the join at indexing time as a doc block, but it should be
 able to handle nested joins as well as joins to multiple tables,
 though I don't yet have test cases for these.
 I put this in a new Join module (modules/join); I think as we
 refactor join impls we should put them here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] release 3.3 (take two)

2011-06-26 Thread Michael McCandless
+1

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jun 26, 2011 at 11:12 AM, Robert Muir rcm...@gmail.com wrote:
 Artifacts here:

 http://s.apache.org/lusolr330rc1

 working release notes here:

 http://wiki.apache.org/lucene-java/ReleaseNote33
 http://wiki.apache.org/solr/ReleaseNote33

 To see the changes between the previous release candidate (rc0):
 svn diff -r 1139028:1139775
 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3

 Here is my +1

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2949) FastVectorHighlighter FieldTermStack could likely benefit from using TermVectorMapper

2011-06-26 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055115#comment-13055115
 ] 

Mike Sokolov commented on LUCENE-2949:
--

This looks like the same issue as LUCENENET-350?

 FastVectorHighlighter FieldTermStack could likely benefit from using 
 TermVectorMapper
 -

 Key: LUCENE-2949
 URL: https://issues.apache.org/jira/browse/LUCENE-2949
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.0.3, 4.0
Reporter: Grant Ingersoll
Assignee: Koji Sekiguchi
Priority: Minor
  Labels: FastVectorHighlighter, Highlighter
 Fix For: 3.3

 Attachments: LUCENE-2949.patch


 Based on my reading of the FieldTermStack constructor that loads the vector 
 from disk, we could probably save a bunch of time and memory by using the 
 TermVectorMapper callback mechanism instead of materializing the full array 
 of terms into memory and then throwing most of them out.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3212) Supply FilterIndexReader based on any o.a.l.search.Filter

2011-06-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055116#comment-13055116
 ] 

Uwe Schindler commented on LUCENE-3212:
---

It's similar, but I dont understand the impl there. I would simply override 
getDeletedDocs to return the deleted docs ored with the filtered. Then you dont 
need to override terms() and fields().

 Supply FilterIndexReader based on any o.a.l.search.Filter
 -

 Key: LUCENE-3212
 URL: https://issues.apache.org/jira/browse/LUCENE-3212
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


 When coding LUCENE-2919 (PKIndexSplitter), Mike and me had the idea, how to 
 effectively apply filters on the lowest level (before query execution). This 
 is very useful for e.g. security Filters that simply hide some documents. 
 Currently when you apply the filter after searching, lots of useless work was 
 done like scoring filtered documents, iterating term positions (for 
 Phrases),...
 This patch will provide a FilterIndexReader subclass (4.0 only, 3.x is too 
 complicated to implement), that hides filtered documents by returning them in 
 getDeletedDocs(). In contrast to LUCENE-2919, the filtering will work on 
 per-segment (without SlowMultiReaderWrapper), so per segment search keeps 
 available and reopening can be done very efficient, as the filter is only 
 calculated on openeing new or changed segments.
 This filter should improve use-cases where the filter can be applied one time 
 before all queries (like security filters) on (re-)opening the IndexReader.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-26 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3179:
--

Attachment: LUCENE-3179-long-ntz.patch

Here the patch with the long version and Long.numberOfTrailingZeroes() instead 
of BitUtils.ntz().

Path was already available on my checkout. We should only also test the long 
versions (according to Clover all of them are not really tested).

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Assignee: Paul Elschot
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3179-fix.patch, LUCENE-3179-fix.patch, 
 LUCENE-3179-long-ntz.patch, LUCENE-3179.patch, LUCENE-3179.patch, 
 LUCENE-3179.patch, TestBitUtil.java, TestOpenBitSet.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()

2011-06-26 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3179:
--

Attachment: LUCENE-3179-long-ntz.patch

New patch that also improves tests to check all uncovered long methods (of 
course the indexes are still  Integer.MAX_VALUE(.

 OpenBitSet.prevSetBit()
 ---

 Key: LUCENE-3179
 URL: https://issues.apache.org/jira/browse/LUCENE-3179
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Paul Elschot
Assignee: Paul Elschot
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3179-fix.patch, LUCENE-3179-fix.patch, 
 LUCENE-3179-long-ntz.patch, LUCENE-3179-long-ntz.patch, LUCENE-3179.patch, 
 LUCENE-3179.patch, LUCENE-3179.patch, TestBitUtil.java, TestOpenBitSet.patch


 Find a previous set bit in an OpenBitSet.
 Useful for parent testing in nested document query execution LUCENE-2454 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3231) Add fixed size DocValues int variants expose Arrays where possible

2011-06-26 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3231:


Attachment: LUCENE-3231.patch

here is a new patch, 

* adds Field API for new int types
* adds tests for getArray / hasArray
* adds tests for new Int types
* unifies some of the existing tests
* adds javadocs

I think we ready here... all tests pass 

 Add fixed size DocValues int variants  expose Arrays where possible
 

 Key: LUCENE-3231
 URL: https://issues.apache.org/jira/browse/LUCENE-3231
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3231.patch, LUCENE-3231.patch


 currently we only have variable bit packed ints implementation. for flexible 
 scoring or loading field caches it is desirable to have fixed int 
 implementations for 8, 16, 32 and 64 bit. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it

2011-06-26 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1536:
---

Attachment: LUCENE-1536.patch

Initial patch for trunk... lots of nocommits, but tests all pass and I
think this is [roughly] the approach we should take to get fast(er)
Filter perf.

Conceptually, this change is fairly easy, because the flex APIs all
accept a Bits to apply low-level filtering.  However, this Bits is
inverted vs the Filter that callers pass to IndexSearcher (skipDocs vs
keepDocs), so, my patch inverts 1) the meaning of this first arg to
the Docs/AndPositions enums (it becomes an acceptDocs instead of
skipDocs), and 2) deleted docs coming back from IndexReaders (renames
IR.getDeletedDocs - IR.getNotDeletedDocs).

That change (inverting the Bits to be keepDocs not skipDocs) is the
vast majority of the patch.

The real change is to add DocIdSet.getRandomAccessBits and
bitsIncludesDeletedDocs, which IndexSearcher then consults to figure
out whether to push the filter low instead of high.  I then fixed
OpenBitSet to return this from getRandomAccessBits, and fixed
CachingWrapperFilter to turn this on/off as well as state whether
deleted docs were folded into the filter.

This means filters cached with CachingWrapperFilter will apply low,
and if it's DeletesMode.RECACHE then it's a single filter that's
applied (else I wrap with an AND NOT deleted check per docID), but
custom filters are also free to impl these methods to have their
filters applied low.


 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3212) Supply FilterIndexReader based on any o.a.l.search.Filter

2011-06-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055146#comment-13055146
 ] 

Michael McCandless commented on LUCENE-3212:


That's a good point -- I'm not sure why I didn't just override getDeletedDocs!  
It seems like that should work fine.

 Supply FilterIndexReader based on any o.a.l.search.Filter
 -

 Key: LUCENE-3212
 URL: https://issues.apache.org/jira/browse/LUCENE-3212
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


 When coding LUCENE-2919 (PKIndexSplitter), Mike and me had the idea, how to 
 effectively apply filters on the lowest level (before query execution). This 
 is very useful for e.g. security Filters that simply hide some documents. 
 Currently when you apply the filter after searching, lots of useless work was 
 done like scoring filtered documents, iterating term positions (for 
 Phrases),...
 This patch will provide a FilterIndexReader subclass (4.0 only, 3.x is too 
 complicated to implement), that hides filtered documents by returning them in 
 getDeletedDocs(). In contrast to LUCENE-2919, the filtering will work on 
 per-segment (without SlowMultiReaderWrapper), so per segment search keeps 
 available and reopening can be done very efficient, as the filter is only 
 calculated on openeing new or changed segments.
 This filter should improve use-cases where the filter can be applied one time 
 before all queries (like security filters) on (re-)opening the IndexReader.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3203) Rate-limit IO used by merging

2011-06-26 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3203.


   Resolution: Fixed
Fix Version/s: (was: 3.3)
   (was: 4.0)
   IOContext branch

 Rate-limit IO used by merging
 -

 Key: LUCENE-3203
 URL: https://issues.apache.org/jira/browse/LUCENE-3203
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: IOContext branch

 Attachments: LUCENE-3203.patch, LUCENE-3203.patch


 Large merges can mess up searches and increase NRT reopen time (see
 http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html).
 A simple rate limiter improves the spikey NRT reopen times during big
 merges, so I think we should somehow make this possible.  Likely this
 would reduce impact on searches as well.
 Typically apps that do indexing and searching on same box are in no
 rush to see the merges complete so this is a good tradeoff.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-06-26 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055151#comment-13055151
 ] 

Bill Bell commented on SOLR-2242:
-

OK. Here are some test cases.

I am getting a weird error on running it: ant 
-Dtestcase=NumFacetTermsFacetsTest test

{code}
junit-sequential:
[junit] Testsuite: org.apache.solr.request.NumFacetTermsFacetsTest
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 4.072 sec
[junit] 
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=NumFacetTermsFacetsTest 
-Dtestmethod=testNumFacetTermsFacetCounts 
-Dtests.seed=3921835369594659663:-3219730304883530389
[junit] *** BEGIN 
org.apache.solr.request.NumFacetTermsFacetsTest.testNumFacetTermsFacetCounts: 
Insane FieldCache usage(s) ***
[junit] SUBREADER: Found caches for descendants of 
DirectoryReader(segments_3 _0(4.0):C6)+hgid_i1
[junit] 'DirectoryReader(segments_3 _0(4.0):C6)'='hgid_i1',class 
org.apache.lucene.search.FieldCache$DocTermsIndex,org.apache.lucene.search.cache.DocTermsIndexCreator@603bb3eb=org.apache.lucene.search.cache.DocTermsIndexCreator$DocTermsIndexImpl#1026179434
 (size =~ 372 bytes)
[junit] 
'org.apache.lucene.index.SegmentCoreReaders@7e8905bd'='hgid_i1',int,org.apache.lucene.search.cache.IntValuesCreator@30781822=org.apache.lucene.search.cache.CachedArray$IntValues#291172425
 (size =~ 92 bytes)
[junit] 
[junit] *** END 
org.apache.solr.request.NumFacetTermsFacetsTest.testNumFacetTermsFacetCounts: 
Insane FieldCache usage(s) ***
[junit] -  ---
[junit] Testcase: 
testNumFacetTermsFacetCounts(org.apache.solr.request.NumFacetTermsFacetsTest):  
  FAILED
[junit] 
org.apache.solr.request.NumFacetTermsFacetsTest.testNumFacetTermsFacetCounts: 
Insane FieldCache usage(s) found expected:0 but was:1
[junit] junit.framework.AssertionFailedError: 
org.apache.solr.request.NumFacetTermsFacetsTest.testNumFacetTermsFacetCounts: 
Insane FieldCache usage(s) found expected:0 but was:1
[junit] at 
org.apache.lucene.util.LuceneTestCase.assertSaneFieldCaches(LuceneTestCase.java:725)
[junit] at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:620)
[junit] at 
org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:96)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1430)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348)
[junit] 
[junit] 
[junit] Test org.apache.solr.request.NumFacetTermsFacetsTest FAILED

{code}

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, 
 SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional 

[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field

2011-06-26 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Attachment: SOLR-2242-notworkingtest.patch

The test case gives an error. Not familiar with this error

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-notworkingtest.patch, SOLR-2242.patch, 
 SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-06-26 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055155#comment-13055155
 ] 

Bill Bell commented on SOLR-2242:
-

I think it has to do with a NPE in group ion 4.0 it fails on other code. 
Nothing to do with this patch.

{code}

  assertQ(check group and facet counts with numFacetTerms=1,
req(q, id:[1 TO 6]
,indent, on
,facet, true
,group, true
,group.field, hgid_i1
,f.hgid_i1.facet.limit, -1
,f.hgid_i1.facet.mincount, 1
,f.hgid_i1.facet.numFacetTerms, 1
,facet.field, hgid_i1
)
,*[count(//arr[@name='groups'])=1]
,*[count(//lst[@name='facet_fields']/lst[@name='hgid_i1']/int)=1] 
// there are 1 unique items
,//lst[@name='hgid_i1']/int[@name='numFacetTerms'][.='4']
);

{code}

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-notworkingtest.patch, SOLR-2242.patch, 
 SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [VOTE] release 3.3 (take two)

2011-06-26 Thread Steven A Rowe
+1

I looked at the differences, and then just ran tests on the Solr and Lucene 
source tarballs.

Steve

 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Sunday, June 26, 2011 11:12 AM
 To: dev@lucene.apache.org
 Subject: [VOTE] release 3.3 (take two)
 
 Artifacts here:
 
 http://s.apache.org/lusolr330rc1
 
 working release notes here:
 
 http://wiki.apache.org/lucene-java/ReleaseNote33
 http://wiki.apache.org/solr/ReleaseNote33
 
 To see the changes between the previous release candidate (rc0):
 svn diff -r 1139028:1139775
 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3
 
 Here is my +1
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-06-26 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055158#comment-13055158
 ] 

Bill Bell commented on SOLR-2242:
-

{code}
junit-sequential:
[junit] Testsuite: org.apache.solr.request.NumFacetTermsFacetsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.48 sec
[junit] 
{code}

I fixed the NamedList() generic too.




 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-notworkingtest.patch, SOLR-2242.patch, 
 SOLR-2242.shard.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field

2011-06-26 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Attachment: SOLR-2242.shard.withtests.patch

I left the group in there, we can uncomment when it starts working again (if it 
does).


 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-notworkingtest.patch, SOLR-2242.patch, 
 SOLR-2242.shard.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] release 3.3 (take two)

2011-06-26 Thread Sanne Grinovero
+1

All tests are fine on both Infinispan and Hibernate Search.

While I understand that often APIs needed changes, I'm very happy to
state that for the first time three mayor releases are fully API
compatible!
(As far as tested on these projects, Lucene versions 3.1.0, 3.2.0,
3.3.0 are drop-in compatible replacements)

Regards,
Sanne

2011/6/26 Steven A Rowe sar...@syr.edu:
 +1

 I looked at the differences, and then just ran tests on the Solr and Lucene 
 source tarballs.

 Steve

 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Sunday, June 26, 2011 11:12 AM
 To: dev@lucene.apache.org
 Subject: [VOTE] release 3.3 (take two)

 Artifacts here:

 http://s.apache.org/lusolr330rc1

 working release notes here:

 http://wiki.apache.org/lucene-java/ReleaseNote33
 http://wiki.apache.org/solr/ReleaseNote33

 To see the changes between the previous release candidate (rc0):
 svn diff -r 1139028:1139775
 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3

 Here is my +1

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-06-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055169#comment-13055169
 ] 

Uwe Schindler commented on LUCENE-1536:
---

Hi Mike,
nicae patch, only little bit big. I reviewed the essential parts like applying 
the filter in IndexSearcher, real cool. Also CachingWrapperFilter looks fine 
(not closely reviewed).

My question: Do we really need to make the delDocs inverse in *this* issue? The 
IndexSearcher impl can also be done using a simple OrNotBits(delDocs, 
filterDocs) wrapper (instead AndBits) implementation and NotBits (if no delDocs 
available)? The patch is unreadable because of that. In general, reversing the 
delDocs might be a good idea, but we should do it separate and hard (not allow 
both variants implemented by IndexReader  Co.). The method name 
getNotDeletedDocs() should also be getVisibleDocs() or similar [I don't like 
double negation].

About the filters: I like the new API (it is as discussed before), so the 
DocIdSet is extended by an optional getBits() method, defaulting to null.

About the impls: FieldCacheRangeFilter can also implement getBits() directly as 
FieldCache is random access. It should just return an own Bits impl for the 
DocIdSet that checks the filtering in get(index).

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-06-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055181#comment-13055181
 ] 

Uwe Schindler commented on LUCENE-1536:
---

One more comment about DocIdSet.bitsIncludesDeletedDocs(). I think the default 
in DocIdSet and of course OpenBitSet should be true, because current filters 
always respect deleted docs (this was a requirement: MTQ uses deleted docs, 
FCRF explicitely ands it in). So the default is fine here. Of course 
CachingWrapperFilter sets this to false if the SegmentReader got new deletes.

 if a filter can support random access API, we should use it
 ---

 Key: LUCENE-1536
 URL: https://issues.apache.org/jira/browse/LUCENE-1536
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.4
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, 
 LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
 LUCENE-1536.patch


 I ran some performance tests, comparing applying a filter via
 random-access API instead of current trunk's iterator API.
 This was inspired by LUCENE-1476, where we realized deletions should
 really be implemented just like a filter, but then in testing found
 that switching deletions to iterator was a very sizable performance
 hit.
 Some notes on the test:
   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
   * I test across multiple queries.  1-X means an OR query, eg 1-4
 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
 AND 3 AND 4.  u s means united states (phrase search).
   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
 95, 98, 99, 99.9 (filter is non-null but all bits are set),
 100 (filter=null, control)).
   * Method high means I use random-access filter API in
 IndexSearcher's main loop.  Method low means I use random-access
 filter API down in SegmentTermDocs (just like deleted docs
 today).
   * Baseline (QPS) is current trunk, where filter is applied as iterator up
 high (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3212) Supply FilterIndexReader based on any o.a.l.search.Filter

2011-06-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055185#comment-13055185
 ] 

Uwe Schindler commented on LUCENE-3212:
---

I don't think, this issue is obsolete with LUCENE-1536:
If you have one filter thats e.g. applied for one user every time, maybe for 
all his queries, it can live as long as the SegmentReader lives. So simply 
wrapping the IndexReader with a Filter has much more flexibility, as its done 
one time on creating the IndexReader - so I think, this filter could 
additionally live in contrib. If we have RandomAccessFilters, this one and also 
PKIndexSplitter (which will only use this FIR and drop its own impl) can 
directly use the Bits supplied by the Filter's DocIdSet.

 Supply FilterIndexReader based on any o.a.l.search.Filter
 -

 Key: LUCENE-3212
 URL: https://issues.apache.org/jira/browse/LUCENE-3212
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


 When coding LUCENE-2919 (PKIndexSplitter), Mike and me had the idea, how to 
 effectively apply filters on the lowest level (before query execution). This 
 is very useful for e.g. security Filters that simply hide some documents. 
 Currently when you apply the filter after searching, lots of useless work was 
 done like scoring filtered documents, iterating term positions (for 
 Phrases),...
 This patch will provide a FilterIndexReader subclass (4.0 only, 3.x is too 
 complicated to implement), that hides filtered documents by returning them in 
 getDeletedDocs(). In contrast to LUCENE-2919, the filtering will work on 
 per-segment (without SlowMultiReaderWrapper), so per segment search keeps 
 available and reopening can be done very efficient, as the filter is only 
 calculated on openeing new or changed segments.
 This filter should improve use-cases where the filter can be applied one time 
 before all queries (like security filters) on (re-)opening the IndexReader.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] release 3.3 (take two)

2011-06-26 Thread Chris Male
+1

On Mon, Jun 27, 2011 at 8:32 AM, Sanne Grinovero
sanne.grinov...@gmail.comwrote:

 +1

 All tests are fine on both Infinispan and Hibernate Search.

 While I understand that often APIs needed changes, I'm very happy to
 state that for the first time three mayor releases are fully API
 compatible!
 (As far as tested on these projects, Lucene versions 3.1.0, 3.2.0,
 3.3.0 are drop-in compatible replacements)

 Regards,
 Sanne

 2011/6/26 Steven A Rowe sar...@syr.edu:
  +1
 
  I looked at the differences, and then just ran tests on the Solr and
 Lucene source tarballs.
 
  Steve
 
  -Original Message-
  From: Robert Muir [mailto:rcm...@gmail.com]
  Sent: Sunday, June 26, 2011 11:12 AM
  To: dev@lucene.apache.org
  Subject: [VOTE] release 3.3 (take two)
 
  Artifacts here:
 
  http://s.apache.org/lusolr330rc1
 
  working release notes here:
 
  http://wiki.apache.org/lucene-java/ReleaseNote33
  http://wiki.apache.org/solr/ReleaseNote33
 
  To see the changes between the previous release candidate (rc0):
  svn diff -r 1139028:1139775
  https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3
 
  Here is my +1
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Software Developer | JTeam BV.| www.jteam.nl


[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2011-06-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1979:
--

Attachment: SOLR-1979.patch

Fixed threshold so that Tika distance 0.1 gives certainty 0.5 and distance 0.02 
gives certainty 0.9. The default threshold of 0.5 now works pretty well, at 
least for the tests...

*New parameters:*
Field name mapping is now configurable to user defined pattern, so to map 
ABC_title to title_lang, you set:
{code}
langid.map.pattern=ABC_(.*)
langid.map.replace=$1_{lang}
{code}
A parameter to map multiple detected languages to same field regex. I.e. to map 
both Japanese, Korean and Chinese texts to a field *_cjk, do:
{code}langid.map.lcmap=jp:cjk zh:cjk ko:cjk{code}
Turn off validation of field names against schema (useful if you want to rename 
or delete fields later in the UpdateChain):
{code}langid.enforceSchema=false{code}

*Other changes*
Removed default on langField, i.e. if langField is not specified, the detected 
language will not be written anywhere. A typical minimal config for only 
detecting language and writing to a field is now:
{code}
processor 
class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
   defaults
 str name=langid.fltitle,subject,text,keywords/str
 str name=langid.langFieldlanguage_s/str
   /defaults
/processor
{code}

Also added multiple other languages to the tests.

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
  Labels: UpdateProcessor
 Fix For: 3.4

 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch


 Language identification from document fields, and mapping of field names to 
 language-specific fields based on detected language.
 Wrap the Tika LanguageIdentifier in an UpdateProcessor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9102 - Failure

2011-06-26 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9102/

All tests passed

Build Log (for compile errors):
[...truncated 16720 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar

2011-06-26 Thread Chris Male (JIRA)
Contrib/Module-uptodate assume name matches path and jar


 Key: LUCENE-3244
 URL: https://issues.apache.org/jira/browse/LUCENE-3244
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Reporter: Chris Male


With adding a new 'queries' module, I am trying to change the project name of 
contrib/queries to queries-contrib.  However currently the contrib-uptodate 
assumes that the name property is used in the path and in the jar name.

By using the name in the path, I must set the value to 'queries' (since the 
path is contrib/queries).  However because the project name is now 
queries-contrib, the actual jar file will be 
lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as is 
expected.

Consequently I think we need to separate the path name from the jar name 
properties.  For simplicity I think adding a new jar-name property will 
suffice, which can be optional and if omitted, is filled in with the name 
property.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar

2011-06-26 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055239#comment-13055239
 ] 

Chris Male commented on LUCENE-3244:


Actually I now see the ability to set the full jarfile in the contrib-uptodate 
macro.  I still want to avoid this, since it requires the invoker of the macro 
to know the full path.

Instead I think having an optional 'project-name' property will suffice.

 Contrib/Module-uptodate assume name matches path and jar
 

 Key: LUCENE-3244
 URL: https://issues.apache.org/jira/browse/LUCENE-3244
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Reporter: Chris Male

 With adding a new 'queries' module, I am trying to change the project name of 
 contrib/queries to queries-contrib.  However currently the contrib-uptodate 
 assumes that the name property is used in the path and in the jar name.
 By using the name in the path, I must set the value to 'queries' (since the 
 path is contrib/queries).  However because the project name is now 
 queries-contrib, the actual jar file will be 
 lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as 
 is expected.
 Consequently I think we need to separate the path name from the jar name 
 properties.  For simplicity I think adding a new jar-name property will 
 suffice, which can be optional and if omitted, is filled in with the name 
 property.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar

2011-06-26 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3244:
---

Attachment: LUCENE-3244.patch

Patch adds contrib-src-name attribute to contrib-uptodate.  This allows the 
name of the src for the contrib to be different to the contrib's project name.  

The name attribute is now assumed to be the project name.  

If the contrib-src-name property is omitted, name is used.

I have code that makes use of this (in changing the queries contrib to 
queries-contrib) and have verified it works.

I'd be great if someone could review this to see any implications I might have 
missed.

 Contrib/Module-uptodate assume name matches path and jar
 

 Key: LUCENE-3244
 URL: https://issues.apache.org/jira/browse/LUCENE-3244
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Reporter: Chris Male
 Attachments: LUCENE-3244.patch


 With adding a new 'queries' module, I am trying to change the project name of 
 contrib/queries to queries-contrib.  However currently the contrib-uptodate 
 assumes that the name property is used in the path and in the jar name.
 By using the name in the path, I must set the value to 'queries' (since the 
 path is contrib/queries).  However because the project name is now 
 queries-contrib, the actual jar file will be 
 lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as 
 is expected.
 Consequently I think we need to separate the path name from the jar name 
 properties.  For simplicity I think adding a new jar-name property will 
 suffice, which can be optional and if omitted, is filled in with the name 
 property.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2383) Velocity: Generalize range and date facet display

2011-06-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2383:
--

Attachment: SOLR-2383-branch_3x.patch

This patch (SOLR-2383-branch_3x.patch) works with 3x branch. Ready for commit?

 Velocity: Generalize range and date facet display
 -

 Key: SOLR-2383
 URL: https://issues.apache.org/jira/browse/SOLR-2383
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
  Labels: facet, range, velocity
 Fix For: 3.3

 Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, 
 SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, 
 SOLR-2383.patch


 Velocity (/browse) GUI has hardcoded price range facet and a hardcoded 
 manufacturedate_dt date facet. Need general solution which work for any 
 facet.range and facet.date.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar

2011-06-26 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055265#comment-13055265
 ] 

Steven Rowe commented on LUCENE-3244:
-

bq. Can the good stuff in the queries contrib move to the module, and the 
sandbox stuff (if any) go somewhere else?!

+1

 Contrib/Module-uptodate assume name matches path and jar
 

 Key: LUCENE-3244
 URL: https://issues.apache.org/jira/browse/LUCENE-3244
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Reporter: Chris Male
 Attachments: LUCENE-3244.patch


 With adding a new 'queries' module, I am trying to change the project name of 
 contrib/queries to queries-contrib.  However currently the contrib-uptodate 
 assumes that the name property is used in the path and in the jar name.
 By using the name in the path, I must set the value to 'queries' (since the 
 path is contrib/queries).  However because the project name is now 
 queries-contrib, the actual jar file will be 
 lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as 
 is expected.
 Consequently I think we need to separate the path name from the jar name 
 properties.  For simplicity I think adding a new jar-name property will 
 suffice, which can be optional and if omitted, is filled in with the name 
 property.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2383) Velocity: Generalize range and date facet display

2011-06-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2383:
--

Fix Version/s: (was: 3.3)
   4.0
   3.4

 Velocity: Generalize range and date facet display
 -

 Key: SOLR-2383
 URL: https://issues.apache.org/jira/browse/SOLR-2383
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
  Labels: facet, range, velocity
 Fix For: 3.4, 4.0

 Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, 
 SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, 
 SOLR-2383.patch


 Velocity (/browse) GUI has hardcoded price range facet and a hardcoded 
 manufacturedate_dt date facet. Need general solution which work for any 
 facet.range and facet.date.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-26 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-3234:
---

Attachment: LUCENE-3234.patch

Updated patch attached. I added CHANGES.txt entries for Lucene and Solr, used 
Integer.MAX_VALUE for the default and added @param for phraseLimit in the new 
constructor javadoc. Will commit soon.

 Provide limit on phrase analysis in FastVectorHighlighter
 -

 Key: LUCENE-3234
 URL: https://issues.apache.org/jira/browse/LUCENE-3234
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3
Reporter: Mike Sokolov
Assignee: Koji Sekiguchi
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, 
 LUCENE-3234.patch


 With larger documents, FVH can spend a lot of time trying to find the 
 best-scoring snippet as it examines every possible phrase formed from 
 matching terms in the document.  If one is willing to accept
 less-than-perfect scoring by limiting the number of phrases that are 
 examined, substantial speedups are possible.  This is analogous to the 
 Highlighter limit on the number of characters to analyze.
 The patch includes an artifical test case that shows  1000x speedup.  In a 
 more normal test environment, with English documents and random queries, I am 
 seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
 effect of selecting the first possible snippet in the document.  Most of our 
 sites operate in this way (just show the first snippet), so this would be a 
 big win for us.
 With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
 phraseLimit, you may not get substantial speedup in the normal case, but you 
 do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar

2011-06-26 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055269#comment-13055269
 ] 

Chris Male commented on LUCENE-3244:


Absolutely.  I intended to do that afterward I had resolved the FunctionQuery 
moving (as its a dependency for many other issues).  Would you guys prefer I do 
that and not make this change? Or are you okay with this change as well?

 Contrib/Module-uptodate assume name matches path and jar
 

 Key: LUCENE-3244
 URL: https://issues.apache.org/jira/browse/LUCENE-3244
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Reporter: Chris Male
 Attachments: LUCENE-3244.patch


 With adding a new 'queries' module, I am trying to change the project name of 
 contrib/queries to queries-contrib.  However currently the contrib-uptodate 
 assumes that the name property is used in the path and in the jar name.
 By using the name in the path, I must set the value to 'queries' (since the 
 path is contrib/queries).  However because the project name is now 
 queries-contrib, the actual jar file will be 
 lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as 
 is expected.
 Consequently I think we need to separate the path name from the jar name 
 properties.  For simplicity I think adding a new jar-name property will 
 suffice, which can be optional and if omitted, is filled in with the name 
 property.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar

2011-06-26 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055269#comment-13055269
 ] 

Chris Male edited comment on LUCENE-3244 at 6/27/11 1:11 AM:
-

Absolutely.  I intended to do that after I had resolved the FunctionQuery 
moving (as its a dependency for many other issues).  Would you guys prefer I do 
that and not make this change? Or are you okay with this change as well?

  was (Author: cmale):
Absolutely.  I intended to do that afterward I had resolved the 
FunctionQuery moving (as its a dependency for many other issues).  Would you 
guys prefer I do that and not make this change? Or are you okay with this 
change as well?
  
 Contrib/Module-uptodate assume name matches path and jar
 

 Key: LUCENE-3244
 URL: https://issues.apache.org/jira/browse/LUCENE-3244
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Reporter: Chris Male
 Attachments: LUCENE-3244.patch


 With adding a new 'queries' module, I am trying to change the project name of 
 contrib/queries to queries-contrib.  However currently the contrib-uptodate 
 assumes that the name property is used in the path and in the jar name.
 By using the name in the path, I must set the value to 'queries' (since the 
 path is contrib/queries).  However because the project name is now 
 queries-contrib, the actual jar file will be 
 lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as 
 is expected.
 Consequently I think we need to separate the path name from the jar name 
 properties.  For simplicity I think adding a new jar-name property will 
 suffice, which can be optional and if omitted, is filled in with the name 
 property.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-26 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-3234:
---

Attachment: LUCENE-3234.patch

Oops, wrong patch. This one is correct.

 Provide limit on phrase analysis in FastVectorHighlighter
 -

 Key: LUCENE-3234
 URL: https://issues.apache.org/jira/browse/LUCENE-3234
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3
Reporter: Mike Sokolov
Assignee: Koji Sekiguchi
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, 
 LUCENE-3234.patch, LUCENE-3234.patch


 With larger documents, FVH can spend a lot of time trying to find the 
 best-scoring snippet as it examines every possible phrase formed from 
 matching terms in the document.  If one is willing to accept
 less-than-perfect scoring by limiting the number of phrases that are 
 examined, substantial speedups are possible.  This is analogous to the 
 Highlighter limit on the number of characters to analyze.
 The patch includes an artifical test case that shows  1000x speedup.  In a 
 more normal test environment, with English documents and random queries, I am 
 seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
 effect of selecting the first possible snippet in the document.  Most of our 
 sites operate in this way (just show the first snippet), so this would be a 
 big win for us.
 With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
 phraseLimit, you may not get substantial speedup in the normal case, but you 
 do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2383) Velocity: Generalize range and date facet display

2011-06-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2383:
--

Attachment: SOLR-2383-branch_3x.patch

Moved date facet over to range facet. Fixed popularity facet.

Only problem now is that 3.x does not have support for exclusive range queries 
[from TO to} so the count when clicking a range facet is wrong.

 Velocity: Generalize range and date facet display
 -

 Key: SOLR-2383
 URL: https://issues.apache.org/jira/browse/SOLR-2383
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
  Labels: facet, range, velocity
 Fix For: 3.4, 4.0

 Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, 
 SOLR-2383-branch_3x.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, 
 SOLR-2383.patch, SOLR-2383.patch


 Velocity (/browse) GUI has hardcoded price range facet and a hardcoded 
 manufacturedate_dt date facet. Need general solution which work for any 
 facet.range and facet.date.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar

2011-06-26 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male resolved LUCENE-3244.


   Resolution: Fixed
Fix Version/s: 4.0
 Assignee: Chris Male

Committed revision 1139989.

I'm going to leave module-uptodate alone till there is a need to change it.

 Contrib/Module-uptodate assume name matches path and jar
 

 Key: LUCENE-3244
 URL: https://issues.apache.org/jira/browse/LUCENE-3244
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Reporter: Chris Male
Assignee: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3244.patch


 With adding a new 'queries' module, I am trying to change the project name of 
 contrib/queries to queries-contrib.  However currently the contrib-uptodate 
 assumes that the name property is used in the path and in the jar name.
 By using the name in the path, I must set the value to 'queries' (since the 
 path is contrib/queries).  However because the project name is now 
 queries-contrib, the actual jar file will be 
 lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as 
 is expected.
 Consequently I think we need to separate the path name from the jar name 
 properties.  For simplicity I think adding a new jar-name property will 
 suffice, which can be optional and if omitted, is filled in with the name 
 property.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar

2011-06-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055279#comment-13055279
 ] 

Robert Muir commented on LUCENE-3244:
-

I committed a tiny fix, a ${name} - @{name}

 Contrib/Module-uptodate assume name matches path and jar
 

 Key: LUCENE-3244
 URL: https://issues.apache.org/jira/browse/LUCENE-3244
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Reporter: Chris Male
Assignee: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3244.patch


 With adding a new 'queries' module, I am trying to change the project name of 
 contrib/queries to queries-contrib.  However currently the contrib-uptodate 
 assumes that the name property is used in the path and in the jar name.
 By using the name in the path, I must set the value to 'queries' (since the 
 path is contrib/queries).  However because the project name is now 
 queries-contrib, the actual jar file will be 
 lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as 
 is expected.
 Consequently I think we need to separate the path name from the jar name 
 properties.  For simplicity I think adding a new jar-name property will 
 suffice, which can be optional and if omitted, is filled in with the name 
 property.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar

2011-06-26 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055280#comment-13055280
 ] 

Chris Male commented on LUCENE-3244:


Thanks Robert!

 Contrib/Module-uptodate assume name matches path and jar
 

 Key: LUCENE-3244
 URL: https://issues.apache.org/jira/browse/LUCENE-3244
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Reporter: Chris Male
Assignee: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3244.patch


 With adding a new 'queries' module, I am trying to change the project name of 
 contrib/queries to queries-contrib.  However currently the contrib-uptodate 
 assumes that the name property is used in the path and in the jar name.
 By using the name in the path, I must set the value to 'queries' (since the 
 path is contrib/queries).  However because the project name is now 
 queries-contrib, the actual jar file will be 
 lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as 
 is expected.
 Consequently I think we need to separate the path name from the jar name 
 properties.  For simplicity I think adding a new jar-name property will 
 suffice, which can be optional and if omitted, is filled in with the name 
 property.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar

2011-06-26 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055284#comment-13055284
 ] 

Chris Male commented on LUCENE-3244:


Murphy's law, I needed to fix module-uptodate.

Committed revision 1139996.

 Contrib/Module-uptodate assume name matches path and jar
 

 Key: LUCENE-3244
 URL: https://issues.apache.org/jira/browse/LUCENE-3244
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Reporter: Chris Male
Assignee: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3244.patch


 With adding a new 'queries' module, I am trying to change the project name of 
 contrib/queries to queries-contrib.  However currently the contrib-uptodate 
 assumes that the name property is used in the path and in the jar name.
 By using the name in the path, I must set the value to 'queries' (since the 
 path is contrib/queries).  However because the project name is now 
 queries-contrib, the actual jar file will be 
 lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as 
 is expected.
 Consequently I think we need to separate the path name from the jar name 
 properties.  For simplicity I think adding a new jar-name property will 
 suffice, which can be optional and if omitted, is filled in with the name 
 property.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3240) Move FunctionQuery, ValueSources and DocValues to Queries module

2011-06-26 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3240:
---

Attachment: LUCENE-3240.patch

First patch which migrates the queries contrib over to queries-contrib and 
establishes the queries module.

Now moving onto moving files.

 Move FunctionQuery, ValueSources and DocValues to Queries module
 

 Key: LUCENE-3240
 URL: https://issues.apache.org/jira/browse/LUCENE-3240
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3240.patch


 Having resolved the FunctionQuery sorting issue and moved the MutableValue 
 classes, we can now move FunctionQuery, ValueSources and DocValues to a 
 Queries module.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3240) Move FunctionQuery, ValueSources and DocValues to Queries module

2011-06-26 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055287#comment-13055287
 ] 

Chris Male commented on LUCENE-3240:


Command for using first patch:

{code}
svn move dev-tools/idea/lucene/contrib/queries/queries.iml 
dev-tools/idea/lucene/contrib/queries/queries-contrib.iml
{code}

 Move FunctionQuery, ValueSources and DocValues to Queries module
 

 Key: LUCENE-3240
 URL: https://issues.apache.org/jira/browse/LUCENE-3240
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3240.patch


 Having resolved the FunctionQuery sorting issue and moved the MutableValue 
 classes, we can now move FunctionQuery, ValueSources and DocValues to a 
 Queries module.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-26 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved LUCENE-3234.


Resolution: Fixed

trunk: Committed revision 1139995.
3x: Committed revision 1139997.

Thanks, Mike!

 Provide limit on phrase analysis in FastVectorHighlighter
 -

 Key: LUCENE-3234
 URL: https://issues.apache.org/jira/browse/LUCENE-3234
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3
Reporter: Mike Sokolov
Assignee: Koji Sekiguchi
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, 
 LUCENE-3234.patch, LUCENE-3234.patch


 With larger documents, FVH can spend a lot of time trying to find the 
 best-scoring snippet as it examines every possible phrase formed from 
 matching terms in the document.  If one is willing to accept
 less-than-perfect scoring by limiting the number of phrases that are 
 examined, substantial speedups are possible.  This is analogous to the 
 Highlighter limit on the number of characters to analyze.
 The patch includes an artifical test case that shows  1000x speedup.  In a 
 more normal test environment, with English documents and random queries, I am 
 seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
 effect of selecting the first possible snippet in the document.  Most of our 
 sites operate in this way (just show the first snippet), so this would be a 
 big win for us.
 With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
 phraseLimit, you may not get substantial speedup in the normal case, but you 
 do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-26 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055301#comment-13055301
 ] 

Mike Sokolov commented on LUCENE-3234:
--

Thank you, Koji - it's nice to have my first patch committed!

um - one little comment; since you made the default be MAX_VALUE, there is a 
javadoc comment that should be updated which says it is 5000.

 Provide limit on phrase analysis in FastVectorHighlighter
 -

 Key: LUCENE-3234
 URL: https://issues.apache.org/jira/browse/LUCENE-3234
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3
Reporter: Mike Sokolov
Assignee: Koji Sekiguchi
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, 
 LUCENE-3234.patch, LUCENE-3234.patch


 With larger documents, FVH can spend a lot of time trying to find the 
 best-scoring snippet as it examines every possible phrase formed from 
 matching terms in the document.  If one is willing to accept
 less-than-perfect scoring by limiting the number of phrases that are 
 examined, substantial speedups are possible.  This is analogous to the 
 Highlighter limit on the number of characters to analyze.
 The patch includes an artifical test case that shows  1000x speedup.  In a 
 more normal test environment, with English documents and random queries, I am 
 seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
 effect of selecting the first possible snippet in the document.  Most of our 
 sites operate in this way (just show the first snippet), so this would be a 
 big win for us.
 With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
 phraseLimit, you may not get substantial speedup in the normal case, but you 
 do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3240) Move FunctionQuery, ValueSources and DocValues to Queries module

2011-06-26 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3240:
---

Attachment: LUCENE-3240.patch

Patch that moves FunctionQuery, DocValues and ValueSource.  Also establishes 
module, sets up dependencies, fixes javadocs etc.

Everything compiles and tests pass.

I'd like to commit this before going through and moving the actual impls, since 
some will stay in Solr and some will go to a spatial module.

Command to use the patch coming up.



 Move FunctionQuery, ValueSources and DocValues to Queries module
 

 Key: LUCENE-3240
 URL: https://issues.apache.org/jira/browse/LUCENE-3240
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3240.patch, LUCENE-3240.patch


 Having resolved the FunctionQuery sorting issue and moved the MutableValue 
 classes, we can now move FunctionQuery, ValueSources and DocValues to a 
 Queries module.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3240) Move FunctionQuery, ValueSources and DocValues to Queries module

2011-06-26 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055307#comment-13055307
 ] 

Chris Male commented on LUCENE-3240:


Command to use the patch:

{code}
svn --parents mkdir modules/queries/src/java/org/apache/lucene/queries/function
svn move solr/src/java/org/apache/solr/search/function/DocValues.java 
modules/queries/src/java/org/apache/lucene/queries/function/DocValues.java
svn move solr/src/java/org/apache/solr/search/function/ValueSource.java 
modules/queries/src/java/org/apache/lucene/queries/function/ValueSource.java
svn move solr/src/java/org/apache/solr/search/function/FunctionQuery.java 
modules/queries/src/java/org/apache/lucene/queries/function/FunctionQuery.java
svn move dev-tools/idea/lucene/contrib/queries/queries.iml 
dev-tools/idea/lucene/contrib/queries/queries-contrib.iml
{code}

 Move FunctionQuery, ValueSources and DocValues to Queries module
 

 Key: LUCENE-3240
 URL: https://issues.apache.org/jira/browse/LUCENE-3240
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3240.patch, LUCENE-3240.patch


 Having resolved the FunctionQuery sorting issue and moved the MutableValue 
 classes, we can now move FunctionQuery, ValueSources and DocValues to a 
 Queries module.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter

2011-06-26 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055309#comment-13055309
 ] 

Koji Sekiguchi commented on LUCENE-3234:


Thank you again for checking the commit, Mike! The javadoc has been fixed.

 Provide limit on phrase analysis in FastVectorHighlighter
 -

 Key: LUCENE-3234
 URL: https://issues.apache.org/jira/browse/LUCENE-3234
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3
Reporter: Mike Sokolov
Assignee: Koji Sekiguchi
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, 
 LUCENE-3234.patch, LUCENE-3234.patch


 With larger documents, FVH can spend a lot of time trying to find the 
 best-scoring snippet as it examines every possible phrase formed from 
 matching terms in the document.  If one is willing to accept
 less-than-perfect scoring by limiting the number of phrases that are 
 examined, substantial speedups are possible.  This is analogous to the 
 Highlighter limit on the number of characters to analyze.
 The patch includes an artifical test case that shows  1000x speedup.  In a 
 more normal test environment, with English documents and random queries, I am 
 seeing speedups of around 3-10x when setting phraseLimit=1, which has the 
 effect of selecting the first possible snippet in the document.  Most of our 
 sites operate in this way (just show the first snippet), so this would be a 
 big win for us.
 With phraseLimit = -1, you get the existing FVH behavior. At larger values of 
 phraseLimit, you may not get substantial speedup in the normal case, but you 
 do get the benefit of protection against blow-up in pathological cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3245) Realtime terms dictionary

2011-06-26 Thread Jason Rutherglen (JIRA)
Realtime terms dictionary
-

 Key: LUCENE-3245
 URL: https://issues.apache.org/jira/browse/LUCENE-3245
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor


For LUCENE-2312 we need a realtime terms dictionary.  While 
ConcurrentSkipListMap may be used, it has drawbacks in terms of high object 
overhead which can impact GC collection times and heap memory usage.  

If we implement a skip list that uses primitive backing arrays, we can 
hopefully have a data structure that is [as] fast and memory efficient.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3240) Move FunctionQuery, ValueSources and DocValues to Queries module

2011-06-26 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3240:
---

Attachment: LUCENE-3240.patch

New patch which fixes the dependencies in xml-query-parser.

Everything passes now (including ant generate-maven-artifacts).

 Move FunctionQuery, ValueSources and DocValues to Queries module
 

 Key: LUCENE-3240
 URL: https://issues.apache.org/jira/browse/LUCENE-3240
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3240.patch, LUCENE-3240.patch, LUCENE-3240.patch


 Having resolved the FunctionQuery sorting issue and moved the MutableValue 
 classes, we can now move FunctionQuery, ValueSources and DocValues to a 
 Queries module.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3245) Realtime terms dictionary

2011-06-26 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3245:
-

Attachment: LUCENE-3245.patch

Here's a basic initial patch implementing a single threaded writer, multiple 
reader atomic integer array skip list.  

The next step is to tie in the ByteBlockPool to store terms, eg, implement an 
RTTermsDictAIA class, and an RTTermsDictCSLM class.  

We can then load the same Wiki-EN terms, and measure the comparative write 
speeds.  

Then create a set of terms to lookup from each terms dict and measure the time 
difference.  

I am not yet sure how the speed of AtomicIntegerArray will compare with CSLM's 
usage of AtomicReferenceFieldUpdater.  Of note is the fact that because of 
DWPTs we do not need a skip list that supports concurrent writes.  And because 
we're only adding new unique terms, we do not need delete functionality.  Ie, 
AIA could be faster, though we may need to inline code and perform various 
tuning tricks.

 Realtime terms dictionary
 -

 Key: LUCENE-3245
 URL: https://issues.apache.org/jira/browse/LUCENE-3245
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-3245.patch


 For LUCENE-2312 we need a realtime terms dictionary.  While 
 ConcurrentSkipListMap may be used, it has drawbacks in terms of high object 
 overhead which can impact GC collection times and heap memory usage.  
 If we implement a skip list that uses primitive backing arrays, we can 
 hopefully have a data structure that is [as] fast and memory efficient.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk - Build # 1607 - Failure

2011-06-26 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-trunk/1607/

10 tests failed.
FAILED:  
org.apache.lucene.util.packed.TestPackedInts.testSortWithScoreAndMaxScoreTracking

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.


REGRESSION:  org.apache.lucene.index.TestNRTThreads.testNRTThreads

Error Message:
this writer hit an OutOfMemoryError; cannot commit

Stack Trace:
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot 
commit
at 
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3724)
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2649)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2720)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2702)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2686)
at 
org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:378)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1430)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348)


REGRESSION:  org.apache.lucene.index.TestNorms.testNorms

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1430)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:605)


REGRESSION:  org.apache.lucene.search.TestFieldCache.testInfoStream

Error Message:
this writer hit an OutOfMemoryError; cannot complete optimize

Stack Trace:
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot 
complete optimize
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1696)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1640)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1610)
at 
org.apache.lucene.index.RandomIndexWriter.doRandomOptimize(RandomIndexWriter.java:322)
at 
org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:336)
at 
org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:308)
at org.apache.lucene.search.TestFieldCache.setUp(TestFieldCache.java:84)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1430)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348)


FAILED:  junit.framework.TestSuite.org.apache.lucene.search.TestFieldCache

Error Message:
ensure your setUp() calls super.setUp() and your tearDown() calls 
super.tearDown()!!!

Stack Trace:
junit.framework.AssertionFailedError: ensure your setUp() calls super.setUp() 
and your tearDown() calls super.tearDown()!!!
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:403)


FAILED:  
junit.framework.TestSuite.org.apache.lucene.search.TestNumericRangeQuery32

Error Message:
this writer hit an OutOfMemoryError; cannot commit

Stack Trace:
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot 
commit
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2638)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2720)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2702)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2686)
at 
org.apache.lucene.index.RandomIndexWriter.maybeCommit(RandomIndexWriter.java:218)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:166)
at 
org.apache.lucene.search.TestNumericRangeQuery32.beforeClass(TestNumericRangeQuery32.java:88)


FAILED:  
junit.framework.TestSuite.org.apache.lucene.search.TestNumericRangeQuery32

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestNumericRangeQuery32.afterClass(TestNumericRangeQuery32.java:98)


REGRESSION:  org.apache.lucene.search.TestPhraseQuery.testRandomPhrases

Error Message:
Index: 8, Size: 7

Stack Trace:
java.lang.IndexOutOfBoundsException: Index: 8, Size: 7
at java.util.ArrayList.rangeCheck(ArrayList.java:571)
at java.util.ArrayList.get(ArrayList.java:349)
at org.apache.lucene.store.RAMFile.getBuffer(RAMFile.java:70)
at 
org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:154)
  

[jira] [Updated] (LUCENE-3245) Realtime terms dictionary

2011-06-26 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-3245:
-

Attachment: LUCENE-3245.patch

Added and fixed the code that traverses the skip list to the level zero linked 
list and iterates.

I need to reuse the starts int array, that's next.

 Realtime terms dictionary
 -

 Key: LUCENE-3245
 URL: https://issues.apache.org/jira/browse/LUCENE-3245
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Jason Rutherglen
Priority: Minor
 Attachments: LUCENE-3245.patch, LUCENE-3245.patch


 For LUCENE-2312 we need a realtime terms dictionary.  While 
 ConcurrentSkipListMap may be used, it has drawbacks in terms of high object 
 overhead which can impact GC collection times and heap memory usage.  
 If we implement a skip list that uses primitive backing arrays, we can 
 hopefully have a data structure that is [as] fast and memory efficient.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org