[JENKINS] Solr-trunk - Build # 1544 - Failure
Build: https://builds.apache.org/job/Solr-trunk/1544/ All tests passed Build Log (for compile errors): [...truncated 18107 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2458: -- Attachment: SOLR-2458.patch Attaching final patch which will be committed shortly. Added better error handling. post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Assignee: Jan Høydahl Labels: post.jar Fix For: 3.3 Attachments: SOLR-2458.patch, SOLR-2458.patch, SOLR-2458.patch SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055040#comment-13055040 ] Jan Høydahl commented on LUCENE-3130: - The feature is absolutely needed. Probably it's enough to be able to specify a global term boost factor per query for all synonyms, so Robert's method would work for me. Another usecase is Phonetic variants. Currently I use a separate field for phonetic normalization and include it with a lower weight in DisMax. If phonetic variant instead was stored alongside the original with posIncr=0 and tokenType=phonetic, I could instead specify a deboost factor for phonetic terms and even highlighting would work ootb! Yet another is lower/upper case search. If the LowerCaseFilter would keep the original token and add a lowercased token on same posIncr with tokenType=lowercase, we could support case insensitive match with preference for correct case. If user needs different boost for different fields, perhaps the TokenType name could be configurable on each filter. Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts --- Key: LUCENE-3130 URL: https://issues.apache.org/jira/browse/LUCENE-3130 Project: Lucene - Java Issue Type: Improvement Reporter: Hoss Man A recent thread asked if there was anyway to use QueryTime synonyms such that matches on the original term specified by the user would score higher then matches on the synonym. It occurred to me later that a float Attribute could be set by the SynonymFilter in such situations, and QueryParser could use that float as a boost in the resulting Query. IThis would be fairly straightforward for the simple synonyms = BooleamQuery case, but we'd have to decide how to handle the case of synonyms with multiple terms that produce MTPQ, possibly just punt for now) Likewise, there may be other TokenFilters that inject artificial tokens at query time where it also might make sense to have a reduced boost factor... * SynonymFilter * CommonGramsFilter * WordDelimiterFilter * etc... In all of these cases, the amount of the boost could me configured, and for back compact could default to 1.0 (or null to not set a boost at all) Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied the boost attribute into the payload attribute, these same filters could give penalizing payloads to terms when used at index time) could give penalizing payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2618) Indexing and search on more then one type (Mapping)
[ https://issues.apache.org/jira/browse/SOLR-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055041#comment-13055041 ] Jan Høydahl commented on SOLR-2618: --- You might want to talk to Chris Male who held a talk about improving SolrJ for interacting with domain objects at Berlin Buzzwords: http://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/IntegratingSolrJEEApplications.pdf I think your idea about storing a class name with the document and using reflection to pick the right domain object is interesting.. Indexing and search on more then one type (Mapping) --- Key: SOLR-2618 URL: https://issues.apache.org/jira/browse/SOLR-2618 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 3.2 Reporter: Monica Storfjord Priority: Minor It would be very beneficial for a project that I am currently working on to have the ability to index and search on various subclasses of an object and map the objects directly to the actual domain-object. This functionality exist in Hibernate search for instance. Is this something that future releases have in mind? I would think that this is something that will make the value of Solr more efficient to a lot of users. We are testing SolrJ 3.2 with the use of the SolrJ client and the web interface to index change and search. It should be possible to make a solution that map against a special type field(like field name=classtype type=class) in schemas.xml that are indexed every time and use reflection against the actual class? - Monica -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2620) Remove log4j jar from the clustering contrib (uses slf4j).
[ https://issues.apache.org/jira/browse/SOLR-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055047#comment-13055047 ] Jan Høydahl commented on SOLR-2620: --- You should commit the CHANGES.TXT entry to trunk as well Remove log4j jar from the clustering contrib (uses slf4j). -- Key: SOLR-2620 URL: https://issues.apache.org/jira/browse/SOLR-2620 Project: Solr Issue Type: Improvement Components: contrib - Clustering Affects Versions: 3.3 Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Fix For: 3.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved SOLR-2458. --- Resolution: Fixed Fix Version/s: (was: 3.3) Committed for trunk and 3.x post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Assignee: Jan Høydahl Labels: post.jar Attachments: SOLR-2458.patch, SOLR-2458.patch, SOLR-2458.patch SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Need to create new version 3.4 in JIRA
Now that 3.3 is being shipped we need 3.4 version in JIRA. I seem not to have rights for this -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055050#comment-13055050 ] Robert Muir commented on LUCENE-3130: - {quote} Currently I use a separate field for phonetic normalization and include it with a lower weight in DisMax. If phonetic variant instead was stored alongside the original with posIncr=0 and tokenType=phonetic, I could instead specify a deboost factor for phonetic terms and even highlighting would work ootb! {quote} This doesn't make any sense to me: how is this better shoved into one field than two fields? I don't see any advantage at all. field A with original terms and field B with phonetic terms is no less efficient in the index than having field AB with both mixed up, but keeping them separate keeps code and configurations simple. As for the highlighting, that sounds like a highlighting problem, not an analysis problem. If its often the case that users use things like copyField and do this boosting, then highlighting in Solr needs to be fixed to correlate the offsets back to the original stored field: but we need not make analysis more complicated because of this limitation. {quote} If the LowerCaseFilter would keep the original token and add a lowercased token on same posIncr with tokenType=lowercase, we could support case insensitive match with preference for correct case. {quote} I don't think we should complicate our tokenfilters with such things: in this case I think it would just make the code more complicated and make relevance worse: often case is totally meaningless and boosting terms for some arbitrary reason will skew scores. This is for the same reason as above. If you want to do this, I think you should use two fields, one with no case, and one with case, and boost one of them. Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts --- Key: LUCENE-3130 URL: https://issues.apache.org/jira/browse/LUCENE-3130 Project: Lucene - Java Issue Type: Improvement Reporter: Hoss Man A recent thread asked if there was anyway to use QueryTime synonyms such that matches on the original term specified by the user would score higher then matches on the synonym. It occurred to me later that a float Attribute could be set by the SynonymFilter in such situations, and QueryParser could use that float as a boost in the resulting Query. IThis would be fairly straightforward for the simple synonyms = BooleamQuery case, but we'd have to decide how to handle the case of synonyms with multiple terms that produce MTPQ, possibly just punt for now) Likewise, there may be other TokenFilters that inject artificial tokens at query time where it also might make sense to have a reduced boost factor... * SynonymFilter * CommonGramsFilter * WordDelimiterFilter * etc... In all of these cases, the amount of the boost could me configured, and for back compact could default to 1.0 (or null to not set a boost at all) Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied the boost attribute into the payload attribute, these same filters could give penalizing payloads to terms when used at index time) could give penalizing payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Need to create new version 3.4 in JIRA
I created this in JIRA over a week ago, it exists! On Sun, Jun 26, 2011 at 7:00 AM, Jan Høydahl jan@cominvent.com wrote: Now that 3.3 is being shipped we need 3.4 version in JIRA. I seem not to have rights for this -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Need to create new version 3.4 in JIRA
On Sun, Jun 26, 2011 at 1:00 PM, Jan Høydahl jan@cominvent.com wrote: Now that 3.3 is being shipped we need 3.4 version in JIRA. I seem not to have rights for this power granted :) you are an JIRA admin now on both solr lucene! simon -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3218: Fix Version/s: 3.4 Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3201: Fix Version/s: (was: 3.3) 3.4 improved compound file handling --- Key: LUCENE-3201 URL: https://issues.apache.org/jira/browse/LUCENE-3201 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3201.patch, LUCENE-3201.patch Currently CompoundFileReader could use some improvements, i see the following problems * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap. * it seeks on every readInternal * its not possible for a directory to override or improve the handling of compound files. for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput, and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should, as a user could read into the next file and be left unaware. however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file. its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(), as its position would just work. So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest case for the least code change would be to add this to Directory.java: {code} public Directory openCompoundInput(String filename) { return new CompoundFileReader(this, filename); } {code} Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override... but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2458: -- Component/s: (was: clients - java) Affects Version/s: (was: 3.1) Fix Version/s: 3.4 post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Reporter: Jan Høydahl Assignee: Jan Høydahl Labels: post.jar Fix For: 3.4 Attachments: SOLR-2458.patch, SOLR-2458.patch, SOLR-2458.patch SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Need to create new version 3.4 in JIRA
Perhaps you did it only on the LUCENE JIRA? I had to create it for SOLR just now. Thanks for Admin karma Simon :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 26. juni 2011, at 13.04, Robert Muir wrote: I created this in JIRA over a week ago, it exists! On Sun, Jun 26, 2011 at 7:00 AM, Jan Høydahl jan@cominvent.com wrote: Now that 3.3 is being shipped we need 3.4 version in JIRA. I seem not to have rights for this -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2620) Remove log4j jar from the clustering contrib (uses slf4j).
[ https://issues.apache.org/jira/browse/SOLR-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055054#comment-13055054 ] Dawid Weiss commented on SOLR-2620: --- This JAR was no longer in trunk -- somebody removed it earlier. Remove log4j jar from the clustering contrib (uses slf4j). -- Key: SOLR-2620 URL: https://issues.apache.org/jira/browse/SOLR-2620 Project: Solr Issue Type: Improvement Components: contrib - Clustering Affects Versions: 3.3 Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Fix For: 3.3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-1979: -- Fix Version/s: 3.4 Labels: UpdateProcessor (was: ) Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Labels: UpdateProcessor Fix For: 3.4 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch Language identification from document fields, and mapping of field names to language-specific fields based on detected language. Wrap the Tika LanguageIdentifier in an UpdateProcessor. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3212) Supply FilterIndexReader based on any o.a.l.search.Filter
[ https://issues.apache.org/jira/browse/LUCENE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055061#comment-13055061 ] Michael McCandless commented on LUCENE-3212: I think this is idea is similar to the CachedFilterIndexReader on LUCENE-1536? See https://issues.apache.org/jira/browse/LUCENE-1536?focusedCommentId=12908914page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12908914 Supply FilterIndexReader based on any o.a.l.search.Filter - Key: LUCENE-3212 URL: https://issues.apache.org/jira/browse/LUCENE-3212 Project: Lucene - Java Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 When coding LUCENE-2919 (PKIndexSplitter), Mike and me had the idea, how to effectively apply filters on the lowest level (before query execution). This is very useful for e.g. security Filters that simply hide some documents. Currently when you apply the filter after searching, lots of useless work was done like scoring filtered documents, iterating term positions (for Phrases),... This patch will provide a FilterIndexReader subclass (4.0 only, 3.x is too complicated to implement), that hides filtered documents by returning them in getDeletedDocs(). In contrast to LUCENE-2919, the filtering will work on per-segment (without SlowMultiReaderWrapper), so per segment search keeps available and reopening can be done very efficient, as the filter is only calculated on openeing new or changed segments. This filter should improve use-cases where the filter can be applied one time before all queries (like security filters) on (re-)opening the IndexReader. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-1742) Wrap SegmentInfos in public class
[ https://issues.apache.org/jira/browse/LUCENE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-1742: Wrap SegmentInfos in public class -- Key: LUCENE-1742 URL: https://issues.apache.org/jira/browse/LUCENE-1742 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Priority: Trivial Fix For: 2.9 Attachments: LUCENE-1742.patch, LUCENE-1742.patch, LUCENE-1742.patch, LUCENE-1742.patch, LUCENE-1742.patch Original Estimate: 48h Remaining Estimate: 48h Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not need to be in the org.apache.lucene.index package. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1742) Wrap SegmentInfos in public class
[ https://issues.apache.org/jira/browse/LUCENE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1742: --- Comment: was deleted (was: [Thanks|http://rullymisar.com/]) Wrap SegmentInfos in public class -- Key: LUCENE-1742 URL: https://issues.apache.org/jira/browse/LUCENE-1742 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Priority: Trivial Fix For: 2.9 Attachments: LUCENE-1742.patch, LUCENE-1742.patch, LUCENE-1742.patch, LUCENE-1742.patch, LUCENE-1742.patch Original Estimate: 48h Remaining Estimate: 48h Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not need to be in the org.apache.lucene.index package. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-1742) Wrap SegmentInfos in public class
[ https://issues.apache.org/jira/browse/LUCENE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1742. Resolution: Fixed Wrap SegmentInfos in public class -- Key: LUCENE-1742 URL: https://issues.apache.org/jira/browse/LUCENE-1742 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Priority: Trivial Fix For: 2.9 Attachments: LUCENE-1742.patch, LUCENE-1742.patch, LUCENE-1742.patch, LUCENE-1742.patch, LUCENE-1742.patch Original Estimate: 48h Remaining Estimate: 48h Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not need to be in the org.apache.lucene.index package. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055063#comment-13055063 ] Michael McCandless commented on LUCENE-3179: Thanks for fixing these Uwe! I actually don't like how generic OBS has become... ie, that all methods have an int and long version, that the OBS doesn't know how many bits it holds (I added this field recently, but only for assertions), that some methods grow the number of bits and others don't, some methods accept out-of-bounds indices (negative and numBits), etc. I think it's grown to accommodate too many users but I'm not sure what we should do to fix this. Maybe factor out (yet another) bit set impl that doesn't grow, knows its number of bits, has these fast getNext/getPrev set bit methods, operates only on int indices, etc. OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Assignee: Paul Elschot Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-3179-fix.patch, LUCENE-3179-fix.patch, LUCENE-3179.patch, LUCENE-3179.patch, LUCENE-3179.patch, TestBitUtil.java, TestOpenBitSet.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055064#comment-13055064 ] Michael McCandless commented on LUCENE-3179: bq. One more comment: When working on the code, the symmetry all other methods have between long and int is broken here. For consistency we should add the long method, too. I just don't like the missing consistency. I think we should add the long version, for consistency. bq. Also: OpenBitSet.nextSetBit() does not use Long.numberOfTrailingZeroes() but the new prevSetBit() does. As both methods have intrinsics, why only use one of them? Yonik? Good question! In testing on this issue, above, Dawid and Paul found the intrinsics were faster on modern JREs... seems like nextSetBit should cutover too? OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Assignee: Paul Elschot Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-3179-fix.patch, LUCENE-3179-fix.patch, LUCENE-3179.patch, LUCENE-3179.patch, LUCENE-3179.patch, TestBitUtil.java, TestOpenBitSet.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading
[ https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055069#comment-13055069 ] Michael McCandless commented on LUCENE-3228: bq. lets just commit the package-list files for all third party libs we use into dev-tools and completely eliminate the need for net when building javadocs. +1 Hitting build failures because we can't download these package lists is silly. build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading --- Key: LUCENE-3228 URL: https://issues.apache.org/jira/browse/LUCENE-3228 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Currently, we fail on all javadocs warnings. However, you get a warning if it cannot download the package-list from sun.com So I think we should allow you optionally set a sysprop using linkoffline. Then we would get much less hudson fake failures I feel like Mike opened an issue for this already but I cannot find it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Made the signature of EasySimilarity.score() a bit saner. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading
[ https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055071#comment-13055071 ] Robert Muir commented on LUCENE-3228: - I agree with hossman too. I'm just a javadocs dummy and was doing what I could to stop the 30minute builds. I cant figure out this linkoffline (at least with my experiments its confusing)... but this sounds great. build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading --- Key: LUCENE-3228 URL: https://issues.apache.org/jira/browse/LUCENE-3228 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Currently, we fail on all javadocs warnings. However, you get a warning if it cannot download the package-list from sun.com So I think we should allow you optionally set a sysprop using linkoffline. Then we would get much less hudson fake failures I feel like Mike opened an issue for this already but I cannot find it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3225) Optimize TermsEnum.seek when caller doesn't need next term
[ https://issues.apache.org/jira/browse/LUCENE-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3225. Resolution: Fixed Optimize TermsEnum.seek when caller doesn't need next term -- Key: LUCENE-3225 URL: https://issues.apache.org/jira/browse/LUCENE-3225 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3225.patch, LUCENE-3225.patch Some codecs are able to save CPU if the caller is only interested in exact matches. EG, Memory codec and SimpleText can do more efficient FSTEnum lookup if they know the caller doesn't need to know the term following the seek term. We have cases like this in Lucene, eg when IW deletes documents by Term, if the term is not found in a given segment then it doesn't need to know the ceiling term. Likewise when TermQuery looks up the term in each segment. I had done this change as part of LUCENE-3030, which is a new terms index that's able to save seeking for exact-only lookups, but now that we have Memory codec that can also save CPU I think we should commit this today. The change adds a boolean onlyExact param to seek(BytesRef). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #160: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/160/ No tests ran. Build Log (for compile errors): [...truncated 12698 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs
FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs --- Key: LUCENE-3243 URL: https://issues.apache.org/jira/browse/LUCENE-3243 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Affects Versions: 3.2 Environment: Lucene 3.2 Reporter: Jahangir Anwari Priority: Minor Needed to return position offsets along with highlighted snippets when using FVH for highlighting. Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) patch I was able to get the fragInfo for a particular Phrase search. Currently the Toffs(Term offsets) class only stores the start and end offset. To get the position offset, I added the position offset information in Toffs and FieldPhraseList class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs
[ https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jahangir Anwari updated LUCENE-3243: Attachment: (was: LUCENE-3243.patch.diff) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs --- Key: LUCENE-3243 URL: https://issues.apache.org/jira/browse/LUCENE-3243 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Affects Versions: 3.2 Environment: Lucene 3.2 Reporter: Jahangir Anwari Priority: Minor Labels: feature, lucene Needed to return position offsets along with highlighted snippets when using FVH for highlighting. Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) patch I was able to get the fragInfo for a particular Phrase search. Currently the Toffs(Term offsets) class only stores the start and end offset. To get the position offset, I added the position offset information in Toffs and FieldPhraseList class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs
[ https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jahangir Anwari updated LUCENE-3243: Attachment: LUCENE-3243.patch.diff FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs --- Key: LUCENE-3243 URL: https://issues.apache.org/jira/browse/LUCENE-3243 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Affects Versions: 3.2 Environment: Lucene 3.2 Reporter: Jahangir Anwari Priority: Minor Labels: feature, lucene Needed to return position offsets along with highlighted snippets when using FVH for highlighting. Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) patch I was able to get the fragInfo for a particular Phrase search. Currently the Toffs(Term offsets) class only stores the start and end offset. To get the position offset, I added the position offset information in Toffs and FieldPhraseList class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3243) FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs
[ https://issues.apache.org/jira/browse/LUCENE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jahangir Anwari updated LUCENE-3243: Attachment: LUCENE-3243.patch.diff FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs --- Key: LUCENE-3243 URL: https://issues.apache.org/jira/browse/LUCENE-3243 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Affects Versions: 3.2 Environment: Lucene 3.2 Reporter: Jahangir Anwari Priority: Minor Labels: feature, lucene Attachments: LUCENE-3243.patch.diff Needed to return position offsets along with highlighted snippets when using FVH for highlighting. Using the ([LUCENE-3141|https://issues.apache.org/jira/browse/LUCENE-3141]) patch I was able to get the fragInfo for a particular Phrase search. Currently the Toffs(Term offsets) class only stores the start and end offset. To get the position offset, I added the position offset information in Toffs and FieldPhraseList class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3171) BlockJoinQuery/Collector
[ https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055096#comment-13055096 ] Michael McCandless commented on LUCENE-3171: bq. The possible inefficiency is the same as the one for a any sparsely filled OpenBitSet. Ahh, OK. Though, I suspect this (the linear scan OBS does for next/prevSetBit) is a minor cost overall, if indeed the app has so many child docs per parent that a sparse bit set would be warranted? Ie, the Query/Collector would still be visiting these many child docs per parent, I guess? (Unless the query hits few results). I don't think a jdoc warning is really required for this... but I'm fine if you want to add one? I'll commit this soon and resolve LUCENE-2454 as duplicate! BlockJoinQuery/Collector Key: LUCENE-3171 URL: https://issues.apache.org/jira/browse/LUCENE-3171 Project: Lucene - Java Issue Type: Improvement Components: modules/other Reporter: Michael McCandless Fix For: 3.3, 4.0 Attachments: LUCENE-3171.patch, LUCENE-3171.patch, LUCENE-3171.patch I created a single-pass Query + Collector to implement nested docs. The approach is similar to LUCENE-2454, in that the app must index documents in join order, as a block (IW.add/updateDocuments), with the parent doc at the end of the block, except that this impl is one pass. Once you join at indexing time, you can take any query that matches child docs and join it up to the parent docID space, using BlockJoinQuery. You then use BlockJoinCollector, which sorts parent docs by provided Sort, to gather results, grouped by parent; this collector finds any BlockJoinQuerys (using Scorer.visitScorers) and retains the child docs corresponding to each collected parent doc. After searching is done, you retrieve the TopGroups from a provided BlockJoinQuery. Like LUCENE-2454, this is less general than the arbitrary joins in Solr (SOLR-2272) or parent/child from ElasticSearch (https://github.com/elasticsearch/elasticsearch/issues/553), since you must do the join at indexing time as a doc block, but it should be able to handle nested joins as well as joins to multiple tables, though I don't yet have test cases for these. I put this in a new Join module (modules/join); I think as we refactor join impls we should put them here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] release 3.3 (take two)
+1 Mike McCandless http://blog.mikemccandless.com On Sun, Jun 26, 2011 at 11:12 AM, Robert Muir rcm...@gmail.com wrote: Artifacts here: http://s.apache.org/lusolr330rc1 working release notes here: http://wiki.apache.org/lucene-java/ReleaseNote33 http://wiki.apache.org/solr/ReleaseNote33 To see the changes between the previous release candidate (rc0): svn diff -r 1139028:1139775 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3 Here is my +1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2949) FastVectorHighlighter FieldTermStack could likely benefit from using TermVectorMapper
[ https://issues.apache.org/jira/browse/LUCENE-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055115#comment-13055115 ] Mike Sokolov commented on LUCENE-2949: -- This looks like the same issue as LUCENENET-350? FastVectorHighlighter FieldTermStack could likely benefit from using TermVectorMapper - Key: LUCENE-2949 URL: https://issues.apache.org/jira/browse/LUCENE-2949 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0.3, 4.0 Reporter: Grant Ingersoll Assignee: Koji Sekiguchi Priority: Minor Labels: FastVectorHighlighter, Highlighter Fix For: 3.3 Attachments: LUCENE-2949.patch Based on my reading of the FieldTermStack constructor that loads the vector from disk, we could probably save a bunch of time and memory by using the TermVectorMapper callback mechanism instead of materializing the full array of terms into memory and then throwing most of them out. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3212) Supply FilterIndexReader based on any o.a.l.search.Filter
[ https://issues.apache.org/jira/browse/LUCENE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055116#comment-13055116 ] Uwe Schindler commented on LUCENE-3212: --- It's similar, but I dont understand the impl there. I would simply override getDeletedDocs to return the deleted docs ored with the filtered. Then you dont need to override terms() and fields(). Supply FilterIndexReader based on any o.a.l.search.Filter - Key: LUCENE-3212 URL: https://issues.apache.org/jira/browse/LUCENE-3212 Project: Lucene - Java Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 When coding LUCENE-2919 (PKIndexSplitter), Mike and me had the idea, how to effectively apply filters on the lowest level (before query execution). This is very useful for e.g. security Filters that simply hide some documents. Currently when you apply the filter after searching, lots of useless work was done like scoring filtered documents, iterating term positions (for Phrases),... This patch will provide a FilterIndexReader subclass (4.0 only, 3.x is too complicated to implement), that hides filtered documents by returning them in getDeletedDocs(). In contrast to LUCENE-2919, the filtering will work on per-segment (without SlowMultiReaderWrapper), so per segment search keeps available and reopening can be done very efficient, as the filter is only calculated on openeing new or changed segments. This filter should improve use-cases where the filter can be applied one time before all queries (like security filters) on (re-)opening the IndexReader. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3179: -- Attachment: LUCENE-3179-long-ntz.patch Here the patch with the long version and Long.numberOfTrailingZeroes() instead of BitUtils.ntz(). Path was already available on my checkout. We should only also test the long versions (according to Clover all of them are not really tested). OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Assignee: Paul Elschot Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-3179-fix.patch, LUCENE-3179-fix.patch, LUCENE-3179-long-ntz.patch, LUCENE-3179.patch, LUCENE-3179.patch, LUCENE-3179.patch, TestBitUtil.java, TestOpenBitSet.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3179: -- Attachment: LUCENE-3179-long-ntz.patch New patch that also improves tests to check all uncovered long methods (of course the indexes are still Integer.MAX_VALUE(. OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Assignee: Paul Elschot Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-3179-fix.patch, LUCENE-3179-fix.patch, LUCENE-3179-long-ntz.patch, LUCENE-3179-long-ntz.patch, LUCENE-3179.patch, LUCENE-3179.patch, LUCENE-3179.patch, TestBitUtil.java, TestOpenBitSet.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3231) Add fixed size DocValues int variants expose Arrays where possible
[ https://issues.apache.org/jira/browse/LUCENE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3231: Attachment: LUCENE-3231.patch here is a new patch, * adds Field API for new int types * adds tests for getArray / hasArray * adds tests for new Int types * unifies some of the existing tests * adds javadocs I think we ready here... all tests pass Add fixed size DocValues int variants expose Arrays where possible Key: LUCENE-3231 URL: https://issues.apache.org/jira/browse/LUCENE-3231 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3231.patch, LUCENE-3231.patch currently we only have variable bit packed ints implementation. for flexible scoring or loading field caches it is desirable to have fixed int implementations for 8, 16, 32 and 64 bit. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1536: --- Attachment: LUCENE-1536.patch Initial patch for trunk... lots of nocommits, but tests all pass and I think this is [roughly] the approach we should take to get fast(er) Filter perf. Conceptually, this change is fairly easy, because the flex APIs all accept a Bits to apply low-level filtering. However, this Bits is inverted vs the Filter that callers pass to IndexSearcher (skipDocs vs keepDocs), so, my patch inverts 1) the meaning of this first arg to the Docs/AndPositions enums (it becomes an acceptDocs instead of skipDocs), and 2) deleted docs coming back from IndexReaders (renames IR.getDeletedDocs - IR.getNotDeletedDocs). That change (inverting the Bits to be keepDocs not skipDocs) is the vast majority of the patch. The real change is to add DocIdSet.getRandomAccessBits and bitsIncludesDeletedDocs, which IndexSearcher then consults to figure out whether to push the filter low instead of high. I then fixed OpenBitSet to return this from getRandomAccessBits, and fixed CachingWrapperFilter to turn this on/off as well as state whether deleted docs were folded into the filter. This means filters cached with CachingWrapperFilter will apply low, and if it's DeletesMode.RECACHE then it's a single filter that's applied (else I wrap with an AND NOT deleted check per docID), but custom filters are also free to impl these methods to have their filters applied low. if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3212) Supply FilterIndexReader based on any o.a.l.search.Filter
[ https://issues.apache.org/jira/browse/LUCENE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055146#comment-13055146 ] Michael McCandless commented on LUCENE-3212: That's a good point -- I'm not sure why I didn't just override getDeletedDocs! It seems like that should work fine. Supply FilterIndexReader based on any o.a.l.search.Filter - Key: LUCENE-3212 URL: https://issues.apache.org/jira/browse/LUCENE-3212 Project: Lucene - Java Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 When coding LUCENE-2919 (PKIndexSplitter), Mike and me had the idea, how to effectively apply filters on the lowest level (before query execution). This is very useful for e.g. security Filters that simply hide some documents. Currently when you apply the filter after searching, lots of useless work was done like scoring filtered documents, iterating term positions (for Phrases),... This patch will provide a FilterIndexReader subclass (4.0 only, 3.x is too complicated to implement), that hides filtered documents by returning them in getDeletedDocs(). In contrast to LUCENE-2919, the filtering will work on per-segment (without SlowMultiReaderWrapper), so per segment search keeps available and reopening can be done very efficient, as the filter is only calculated on openeing new or changed segments. This filter should improve use-cases where the filter can be applied one time before all queries (like security filters) on (re-)opening the IndexReader. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3203) Rate-limit IO used by merging
[ https://issues.apache.org/jira/browse/LUCENE-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3203. Resolution: Fixed Fix Version/s: (was: 3.3) (was: 4.0) IOContext branch Rate-limit IO used by merging - Key: LUCENE-3203 URL: https://issues.apache.org/jira/browse/LUCENE-3203 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: IOContext branch Attachments: LUCENE-3203.patch, LUCENE-3203.patch Large merges can mess up searches and increase NRT reopen time (see http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html). A simple rate limiter improves the spikey NRT reopen times during big merges, so I think we should somehow make this possible. Likely this would reduce impact on searches as well. Typically apps that do indexing and searching on same box are in no rush to see the merges complete so this is a good tradeoff. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055151#comment-13055151 ] Bill Bell commented on SOLR-2242: - OK. Here are some test cases. I am getting a weird error on running it: ant -Dtestcase=NumFacetTermsFacetsTest test {code} junit-sequential: [junit] Testsuite: org.apache.solr.request.NumFacetTermsFacetsTest [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 4.072 sec [junit] [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=NumFacetTermsFacetsTest -Dtestmethod=testNumFacetTermsFacetCounts -Dtests.seed=3921835369594659663:-3219730304883530389 [junit] *** BEGIN org.apache.solr.request.NumFacetTermsFacetsTest.testNumFacetTermsFacetCounts: Insane FieldCache usage(s) *** [junit] SUBREADER: Found caches for descendants of DirectoryReader(segments_3 _0(4.0):C6)+hgid_i1 [junit] 'DirectoryReader(segments_3 _0(4.0):C6)'='hgid_i1',class org.apache.lucene.search.FieldCache$DocTermsIndex,org.apache.lucene.search.cache.DocTermsIndexCreator@603bb3eb=org.apache.lucene.search.cache.DocTermsIndexCreator$DocTermsIndexImpl#1026179434 (size =~ 372 bytes) [junit] 'org.apache.lucene.index.SegmentCoreReaders@7e8905bd'='hgid_i1',int,org.apache.lucene.search.cache.IntValuesCreator@30781822=org.apache.lucene.search.cache.CachedArray$IntValues#291172425 (size =~ 92 bytes) [junit] [junit] *** END org.apache.solr.request.NumFacetTermsFacetsTest.testNumFacetTermsFacetCounts: Insane FieldCache usage(s) *** [junit] - --- [junit] Testcase: testNumFacetTermsFacetCounts(org.apache.solr.request.NumFacetTermsFacetsTest): FAILED [junit] org.apache.solr.request.NumFacetTermsFacetsTest.testNumFacetTermsFacetCounts: Insane FieldCache usage(s) found expected:0 but was:1 [junit] junit.framework.AssertionFailedError: org.apache.solr.request.NumFacetTermsFacetsTest.testNumFacetTermsFacetCounts: Insane FieldCache usage(s) found expected:0 but was:1 [junit] at org.apache.lucene.util.LuceneTestCase.assertSaneFieldCaches(LuceneTestCase.java:725) [junit] at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:620) [junit] at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:96) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1430) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348) [junit] [junit] [junit] Test org.apache.solr.request.NumFacetTermsFacetsTest FAILED {code} Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bell updated SOLR-2242: Attachment: SOLR-2242-notworkingtest.patch The test case gives an error. Not familiar with this error Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: SOLR-2242-notworkingtest.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055155#comment-13055155 ] Bill Bell commented on SOLR-2242: - I think it has to do with a NPE in group ion 4.0 it fails on other code. Nothing to do with this patch. {code} assertQ(check group and facet counts with numFacetTerms=1, req(q, id:[1 TO 6] ,indent, on ,facet, true ,group, true ,group.field, hgid_i1 ,f.hgid_i1.facet.limit, -1 ,f.hgid_i1.facet.mincount, 1 ,f.hgid_i1.facet.numFacetTerms, 1 ,facet.field, hgid_i1 ) ,*[count(//arr[@name='groups'])=1] ,*[count(//lst[@name='facet_fields']/lst[@name='hgid_i1']/int)=1] // there are 1 unique items ,//lst[@name='hgid_i1']/int[@name='numFacetTerms'][.='4'] ); {code} Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: SOLR-2242-notworkingtest.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [VOTE] release 3.3 (take two)
+1 I looked at the differences, and then just ran tests on the Solr and Lucene source tarballs. Steve -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Sunday, June 26, 2011 11:12 AM To: dev@lucene.apache.org Subject: [VOTE] release 3.3 (take two) Artifacts here: http://s.apache.org/lusolr330rc1 working release notes here: http://wiki.apache.org/lucene-java/ReleaseNote33 http://wiki.apache.org/solr/ReleaseNote33 To see the changes between the previous release candidate (rc0): svn diff -r 1139028:1139775 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3 Here is my +1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055158#comment-13055158 ] Bill Bell commented on SOLR-2242: - {code} junit-sequential: [junit] Testsuite: org.apache.solr.request.NumFacetTermsFacetsTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.48 sec [junit] {code} I fixed the NamedList() generic too. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: SOLR-2242-notworkingtest.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bell updated SOLR-2242: Attachment: SOLR-2242.shard.withtests.patch I left the group in there, we can uncomment when it starts working again (if it does). Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: SOLR-2242-notworkingtest.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] release 3.3 (take two)
+1 All tests are fine on both Infinispan and Hibernate Search. While I understand that often APIs needed changes, I'm very happy to state that for the first time three mayor releases are fully API compatible! (As far as tested on these projects, Lucene versions 3.1.0, 3.2.0, 3.3.0 are drop-in compatible replacements) Regards, Sanne 2011/6/26 Steven A Rowe sar...@syr.edu: +1 I looked at the differences, and then just ran tests on the Solr and Lucene source tarballs. Steve -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Sunday, June 26, 2011 11:12 AM To: dev@lucene.apache.org Subject: [VOTE] release 3.3 (take two) Artifacts here: http://s.apache.org/lusolr330rc1 working release notes here: http://wiki.apache.org/lucene-java/ReleaseNote33 http://wiki.apache.org/solr/ReleaseNote33 To see the changes between the previous release candidate (rc0): svn diff -r 1139028:1139775 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3 Here is my +1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055169#comment-13055169 ] Uwe Schindler commented on LUCENE-1536: --- Hi Mike, nicae patch, only little bit big. I reviewed the essential parts like applying the filter in IndexSearcher, real cool. Also CachingWrapperFilter looks fine (not closely reviewed). My question: Do we really need to make the delDocs inverse in *this* issue? The IndexSearcher impl can also be done using a simple OrNotBits(delDocs, filterDocs) wrapper (instead AndBits) implementation and NotBits (if no delDocs available)? The patch is unreadable because of that. In general, reversing the delDocs might be a good idea, but we should do it separate and hard (not allow both variants implemented by IndexReader Co.). The method name getNotDeletedDocs() should also be getVisibleDocs() or similar [I don't like double negation]. About the filters: I like the new API (it is as discussed before), so the DocIdSet is extended by an optional getBits() method, defaulting to null. About the impls: FieldCacheRangeFilter can also implement getBits() directly as FieldCache is random access. It should just return an own Bits impl for the DocIdSet that checks the filtering in get(index). if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055181#comment-13055181 ] Uwe Schindler commented on LUCENE-1536: --- One more comment about DocIdSet.bitsIncludesDeletedDocs(). I think the default in DocIdSet and of course OpenBitSet should be true, because current filters always respect deleted docs (this was a requirement: MTQ uses deleted docs, FCRF explicitely ands it in). So the default is fine here. Of course CachingWrapperFilter sets this to false if the SegmentReader got new deletes. if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3212) Supply FilterIndexReader based on any o.a.l.search.Filter
[ https://issues.apache.org/jira/browse/LUCENE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055185#comment-13055185 ] Uwe Schindler commented on LUCENE-3212: --- I don't think, this issue is obsolete with LUCENE-1536: If you have one filter thats e.g. applied for one user every time, maybe for all his queries, it can live as long as the SegmentReader lives. So simply wrapping the IndexReader with a Filter has much more flexibility, as its done one time on creating the IndexReader - so I think, this filter could additionally live in contrib. If we have RandomAccessFilters, this one and also PKIndexSplitter (which will only use this FIR and drop its own impl) can directly use the Bits supplied by the Filter's DocIdSet. Supply FilterIndexReader based on any o.a.l.search.Filter - Key: LUCENE-3212 URL: https://issues.apache.org/jira/browse/LUCENE-3212 Project: Lucene - Java Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 When coding LUCENE-2919 (PKIndexSplitter), Mike and me had the idea, how to effectively apply filters on the lowest level (before query execution). This is very useful for e.g. security Filters that simply hide some documents. Currently when you apply the filter after searching, lots of useless work was done like scoring filtered documents, iterating term positions (for Phrases),... This patch will provide a FilterIndexReader subclass (4.0 only, 3.x is too complicated to implement), that hides filtered documents by returning them in getDeletedDocs(). In contrast to LUCENE-2919, the filtering will work on per-segment (without SlowMultiReaderWrapper), so per segment search keeps available and reopening can be done very efficient, as the filter is only calculated on openeing new or changed segments. This filter should improve use-cases where the filter can be applied one time before all queries (like security filters) on (re-)opening the IndexReader. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] release 3.3 (take two)
+1 On Mon, Jun 27, 2011 at 8:32 AM, Sanne Grinovero sanne.grinov...@gmail.comwrote: +1 All tests are fine on both Infinispan and Hibernate Search. While I understand that often APIs needed changes, I'm very happy to state that for the first time three mayor releases are fully API compatible! (As far as tested on these projects, Lucene versions 3.1.0, 3.2.0, 3.3.0 are drop-in compatible replacements) Regards, Sanne 2011/6/26 Steven A Rowe sar...@syr.edu: +1 I looked at the differences, and then just ran tests on the Solr and Lucene source tarballs. Steve -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Sunday, June 26, 2011 11:12 AM To: dev@lucene.apache.org Subject: [VOTE] release 3.3 (take two) Artifacts here: http://s.apache.org/lusolr330rc1 working release notes here: http://wiki.apache.org/lucene-java/ReleaseNote33 http://wiki.apache.org/solr/ReleaseNote33 To see the changes between the previous release candidate (rc0): svn diff -r 1139028:1139775 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3 Here is my +1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Software Developer | JTeam BV.| www.jteam.nl
[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-1979: -- Attachment: SOLR-1979.patch Fixed threshold so that Tika distance 0.1 gives certainty 0.5 and distance 0.02 gives certainty 0.9. The default threshold of 0.5 now works pretty well, at least for the tests... *New parameters:* Field name mapping is now configurable to user defined pattern, so to map ABC_title to title_lang, you set: {code} langid.map.pattern=ABC_(.*) langid.map.replace=$1_{lang} {code} A parameter to map multiple detected languages to same field regex. I.e. to map both Japanese, Korean and Chinese texts to a field *_cjk, do: {code}langid.map.lcmap=jp:cjk zh:cjk ko:cjk{code} Turn off validation of field names against schema (useful if you want to rename or delete fields later in the UpdateChain): {code}langid.enforceSchema=false{code} *Other changes* Removed default on langField, i.e. if langField is not specified, the detected language will not be written anywhere. A typical minimal config for only detecting language and writing to a field is now: {code} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory defaults str name=langid.fltitle,subject,text,keywords/str str name=langid.langFieldlanguage_s/str /defaults /processor {code} Also added multiple other languages to the tests. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Labels: UpdateProcessor Fix For: 3.4 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch Language identification from document fields, and mapping of field names to language-specific fields based on detected language. Wrap the Tika LanguageIdentifier in an UpdateProcessor. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9102 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9102/ All tests passed Build Log (for compile errors): [...truncated 16720 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar
Contrib/Module-uptodate assume name matches path and jar Key: LUCENE-3244 URL: https://issues.apache.org/jira/browse/LUCENE-3244 Project: Lucene - Java Issue Type: Bug Components: general/build Reporter: Chris Male With adding a new 'queries' module, I am trying to change the project name of contrib/queries to queries-contrib. However currently the contrib-uptodate assumes that the name property is used in the path and in the jar name. By using the name in the path, I must set the value to 'queries' (since the path is contrib/queries). However because the project name is now queries-contrib, the actual jar file will be lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as is expected. Consequently I think we need to separate the path name from the jar name properties. For simplicity I think adding a new jar-name property will suffice, which can be optional and if omitted, is filled in with the name property. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar
[ https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055239#comment-13055239 ] Chris Male commented on LUCENE-3244: Actually I now see the ability to set the full jarfile in the contrib-uptodate macro. I still want to avoid this, since it requires the invoker of the macro to know the full path. Instead I think having an optional 'project-name' property will suffice. Contrib/Module-uptodate assume name matches path and jar Key: LUCENE-3244 URL: https://issues.apache.org/jira/browse/LUCENE-3244 Project: Lucene - Java Issue Type: Bug Components: general/build Reporter: Chris Male With adding a new 'queries' module, I am trying to change the project name of contrib/queries to queries-contrib. However currently the contrib-uptodate assumes that the name property is used in the path and in the jar name. By using the name in the path, I must set the value to 'queries' (since the path is contrib/queries). However because the project name is now queries-contrib, the actual jar file will be lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as is expected. Consequently I think we need to separate the path name from the jar name properties. For simplicity I think adding a new jar-name property will suffice, which can be optional and if omitted, is filled in with the name property. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar
[ https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3244: --- Attachment: LUCENE-3244.patch Patch adds contrib-src-name attribute to contrib-uptodate. This allows the name of the src for the contrib to be different to the contrib's project name. The name attribute is now assumed to be the project name. If the contrib-src-name property is omitted, name is used. I have code that makes use of this (in changing the queries contrib to queries-contrib) and have verified it works. I'd be great if someone could review this to see any implications I might have missed. Contrib/Module-uptodate assume name matches path and jar Key: LUCENE-3244 URL: https://issues.apache.org/jira/browse/LUCENE-3244 Project: Lucene - Java Issue Type: Bug Components: general/build Reporter: Chris Male Attachments: LUCENE-3244.patch With adding a new 'queries' module, I am trying to change the project name of contrib/queries to queries-contrib. However currently the contrib-uptodate assumes that the name property is used in the path and in the jar name. By using the name in the path, I must set the value to 'queries' (since the path is contrib/queries). However because the project name is now queries-contrib, the actual jar file will be lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as is expected. Consequently I think we need to separate the path name from the jar name properties. For simplicity I think adding a new jar-name property will suffice, which can be optional and if omitted, is filled in with the name property. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2383) Velocity: Generalize range and date facet display
[ https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2383: -- Attachment: SOLR-2383-branch_3x.patch This patch (SOLR-2383-branch_3x.patch) works with 3x branch. Ready for commit? Velocity: Generalize range and date facet display - Key: SOLR-2383 URL: https://issues.apache.org/jira/browse/SOLR-2383 Project: Solr Issue Type: Bug Components: Response Writers Reporter: Jan Høydahl Assignee: Grant Ingersoll Labels: facet, range, velocity Fix For: 3.3 Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch Velocity (/browse) GUI has hardcoded price range facet and a hardcoded manufacturedate_dt date facet. Need general solution which work for any facet.range and facet.date. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar
[ https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055265#comment-13055265 ] Steven Rowe commented on LUCENE-3244: - bq. Can the good stuff in the queries contrib move to the module, and the sandbox stuff (if any) go somewhere else?! +1 Contrib/Module-uptodate assume name matches path and jar Key: LUCENE-3244 URL: https://issues.apache.org/jira/browse/LUCENE-3244 Project: Lucene - Java Issue Type: Bug Components: general/build Reporter: Chris Male Attachments: LUCENE-3244.patch With adding a new 'queries' module, I am trying to change the project name of contrib/queries to queries-contrib. However currently the contrib-uptodate assumes that the name property is used in the path and in the jar name. By using the name in the path, I must set the value to 'queries' (since the path is contrib/queries). However because the project name is now queries-contrib, the actual jar file will be lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as is expected. Consequently I think we need to separate the path name from the jar name properties. For simplicity I think adding a new jar-name property will suffice, which can be optional and if omitted, is filled in with the name property. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2383) Velocity: Generalize range and date facet display
[ https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2383: -- Fix Version/s: (was: 3.3) 4.0 3.4 Velocity: Generalize range and date facet display - Key: SOLR-2383 URL: https://issues.apache.org/jira/browse/SOLR-2383 Project: Solr Issue Type: Bug Components: Response Writers Reporter: Jan Høydahl Assignee: Grant Ingersoll Labels: facet, range, velocity Fix For: 3.4, 4.0 Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch Velocity (/browse) GUI has hardcoded price range facet and a hardcoded manufacturedate_dt date facet. Need general solution which work for any facet.range and facet.date. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-3234: --- Attachment: LUCENE-3234.patch Updated patch attached. I added CHANGES.txt entries for Lucene and Solr, used Integer.MAX_VALUE for the default and added @param for phraseLimit in the new constructor javadoc. Will commit soon. Provide limit on phrase analysis in FastVectorHighlighter - Key: LUCENE-3234 URL: https://issues.apache.org/jira/browse/LUCENE-3234 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 Reporter: Mike Sokolov Assignee: Koji Sekiguchi Fix For: 3.4, 4.0 Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch With larger documents, FVH can spend a lot of time trying to find the best-scoring snippet as it examines every possible phrase formed from matching terms in the document. If one is willing to accept less-than-perfect scoring by limiting the number of phrases that are examined, substantial speedups are possible. This is analogous to the Highlighter limit on the number of characters to analyze. The patch includes an artifical test case that shows 1000x speedup. In a more normal test environment, with English documents and random queries, I am seeing speedups of around 3-10x when setting phraseLimit=1, which has the effect of selecting the first possible snippet in the document. Most of our sites operate in this way (just show the first snippet), so this would be a big win for us. With phraseLimit = -1, you get the existing FVH behavior. At larger values of phraseLimit, you may not get substantial speedup in the normal case, but you do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar
[ https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055269#comment-13055269 ] Chris Male commented on LUCENE-3244: Absolutely. I intended to do that afterward I had resolved the FunctionQuery moving (as its a dependency for many other issues). Would you guys prefer I do that and not make this change? Or are you okay with this change as well? Contrib/Module-uptodate assume name matches path and jar Key: LUCENE-3244 URL: https://issues.apache.org/jira/browse/LUCENE-3244 Project: Lucene - Java Issue Type: Bug Components: general/build Reporter: Chris Male Attachments: LUCENE-3244.patch With adding a new 'queries' module, I am trying to change the project name of contrib/queries to queries-contrib. However currently the contrib-uptodate assumes that the name property is used in the path and in the jar name. By using the name in the path, I must set the value to 'queries' (since the path is contrib/queries). However because the project name is now queries-contrib, the actual jar file will be lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as is expected. Consequently I think we need to separate the path name from the jar name properties. For simplicity I think adding a new jar-name property will suffice, which can be optional and if omitted, is filled in with the name property. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar
[ https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055269#comment-13055269 ] Chris Male edited comment on LUCENE-3244 at 6/27/11 1:11 AM: - Absolutely. I intended to do that after I had resolved the FunctionQuery moving (as its a dependency for many other issues). Would you guys prefer I do that and not make this change? Or are you okay with this change as well? was (Author: cmale): Absolutely. I intended to do that afterward I had resolved the FunctionQuery moving (as its a dependency for many other issues). Would you guys prefer I do that and not make this change? Or are you okay with this change as well? Contrib/Module-uptodate assume name matches path and jar Key: LUCENE-3244 URL: https://issues.apache.org/jira/browse/LUCENE-3244 Project: Lucene - Java Issue Type: Bug Components: general/build Reporter: Chris Male Attachments: LUCENE-3244.patch With adding a new 'queries' module, I am trying to change the project name of contrib/queries to queries-contrib. However currently the contrib-uptodate assumes that the name property is used in the path and in the jar name. By using the name in the path, I must set the value to 'queries' (since the path is contrib/queries). However because the project name is now queries-contrib, the actual jar file will be lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as is expected. Consequently I think we need to separate the path name from the jar name properties. For simplicity I think adding a new jar-name property will suffice, which can be optional and if omitted, is filled in with the name property. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-3234: --- Attachment: LUCENE-3234.patch Oops, wrong patch. This one is correct. Provide limit on phrase analysis in FastVectorHighlighter - Key: LUCENE-3234 URL: https://issues.apache.org/jira/browse/LUCENE-3234 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 Reporter: Mike Sokolov Assignee: Koji Sekiguchi Fix For: 3.4, 4.0 Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch With larger documents, FVH can spend a lot of time trying to find the best-scoring snippet as it examines every possible phrase formed from matching terms in the document. If one is willing to accept less-than-perfect scoring by limiting the number of phrases that are examined, substantial speedups are possible. This is analogous to the Highlighter limit on the number of characters to analyze. The patch includes an artifical test case that shows 1000x speedup. In a more normal test environment, with English documents and random queries, I am seeing speedups of around 3-10x when setting phraseLimit=1, which has the effect of selecting the first possible snippet in the document. Most of our sites operate in this way (just show the first snippet), so this would be a big win for us. With phraseLimit = -1, you get the existing FVH behavior. At larger values of phraseLimit, you may not get substantial speedup in the normal case, but you do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2383) Velocity: Generalize range and date facet display
[ https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2383: -- Attachment: SOLR-2383-branch_3x.patch Moved date facet over to range facet. Fixed popularity facet. Only problem now is that 3.x does not have support for exclusive range queries [from TO to} so the count when clicking a range facet is wrong. Velocity: Generalize range and date facet display - Key: SOLR-2383 URL: https://issues.apache.org/jira/browse/SOLR-2383 Project: Solr Issue Type: Bug Components: Response Writers Reporter: Jan Høydahl Assignee: Grant Ingersoll Labels: facet, range, velocity Fix For: 3.4, 4.0 Attachments: SOLR-2383-branch_32.patch, SOLR-2383-branch_3x.patch, SOLR-2383-branch_3x.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch Velocity (/browse) GUI has hardcoded price range facet and a hardcoded manufacturedate_dt date facet. Need general solution which work for any facet.range and facet.date. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar
[ https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male resolved LUCENE-3244. Resolution: Fixed Fix Version/s: 4.0 Assignee: Chris Male Committed revision 1139989. I'm going to leave module-uptodate alone till there is a need to change it. Contrib/Module-uptodate assume name matches path and jar Key: LUCENE-3244 URL: https://issues.apache.org/jira/browse/LUCENE-3244 Project: Lucene - Java Issue Type: Bug Components: general/build Reporter: Chris Male Assignee: Chris Male Fix For: 4.0 Attachments: LUCENE-3244.patch With adding a new 'queries' module, I am trying to change the project name of contrib/queries to queries-contrib. However currently the contrib-uptodate assumes that the name property is used in the path and in the jar name. By using the name in the path, I must set the value to 'queries' (since the path is contrib/queries). However because the project name is now queries-contrib, the actual jar file will be lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as is expected. Consequently I think we need to separate the path name from the jar name properties. For simplicity I think adding a new jar-name property will suffice, which can be optional and if omitted, is filled in with the name property. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar
[ https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055279#comment-13055279 ] Robert Muir commented on LUCENE-3244: - I committed a tiny fix, a ${name} - @{name} Contrib/Module-uptodate assume name matches path and jar Key: LUCENE-3244 URL: https://issues.apache.org/jira/browse/LUCENE-3244 Project: Lucene - Java Issue Type: Bug Components: general/build Reporter: Chris Male Assignee: Chris Male Fix For: 4.0 Attachments: LUCENE-3244.patch With adding a new 'queries' module, I am trying to change the project name of contrib/queries to queries-contrib. However currently the contrib-uptodate assumes that the name property is used in the path and in the jar name. By using the name in the path, I must set the value to 'queries' (since the path is contrib/queries). However because the project name is now queries-contrib, the actual jar file will be lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as is expected. Consequently I think we need to separate the path name from the jar name properties. For simplicity I think adding a new jar-name property will suffice, which can be optional and if omitted, is filled in with the name property. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar
[ https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055280#comment-13055280 ] Chris Male commented on LUCENE-3244: Thanks Robert! Contrib/Module-uptodate assume name matches path and jar Key: LUCENE-3244 URL: https://issues.apache.org/jira/browse/LUCENE-3244 Project: Lucene - Java Issue Type: Bug Components: general/build Reporter: Chris Male Assignee: Chris Male Fix For: 4.0 Attachments: LUCENE-3244.patch With adding a new 'queries' module, I am trying to change the project name of contrib/queries to queries-contrib. However currently the contrib-uptodate assumes that the name property is used in the path and in the jar name. By using the name in the path, I must set the value to 'queries' (since the path is contrib/queries). However because the project name is now queries-contrib, the actual jar file will be lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as is expected. Consequently I think we need to separate the path name from the jar name properties. For simplicity I think adding a new jar-name property will suffice, which can be optional and if omitted, is filled in with the name property. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3244) Contrib/Module-uptodate assume name matches path and jar
[ https://issues.apache.org/jira/browse/LUCENE-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055284#comment-13055284 ] Chris Male commented on LUCENE-3244: Murphy's law, I needed to fix module-uptodate. Committed revision 1139996. Contrib/Module-uptodate assume name matches path and jar Key: LUCENE-3244 URL: https://issues.apache.org/jira/browse/LUCENE-3244 Project: Lucene - Java Issue Type: Bug Components: general/build Reporter: Chris Male Assignee: Chris Male Fix For: 4.0 Attachments: LUCENE-3244.patch With adding a new 'queries' module, I am trying to change the project name of contrib/queries to queries-contrib. However currently the contrib-uptodate assumes that the name property is used in the path and in the jar name. By using the name in the path, I must set the value to 'queries' (since the path is contrib/queries). However because the project name is now queries-contrib, the actual jar file will be lucene-queries-contrib-${version}.jar, not lucene-queries-${version}.jar, as is expected. Consequently I think we need to separate the path name from the jar name properties. For simplicity I think adding a new jar-name property will suffice, which can be optional and if omitted, is filled in with the name property. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3240) Move FunctionQuery, ValueSources and DocValues to Queries module
[ https://issues.apache.org/jira/browse/LUCENE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3240: --- Attachment: LUCENE-3240.patch First patch which migrates the queries contrib over to queries-contrib and establishes the queries module. Now moving onto moving files. Move FunctionQuery, ValueSources and DocValues to Queries module Key: LUCENE-3240 URL: https://issues.apache.org/jira/browse/LUCENE-3240 Project: Lucene - Java Issue Type: Sub-task Components: core/search Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-3240.patch Having resolved the FunctionQuery sorting issue and moved the MutableValue classes, we can now move FunctionQuery, ValueSources and DocValues to a Queries module. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3240) Move FunctionQuery, ValueSources and DocValues to Queries module
[ https://issues.apache.org/jira/browse/LUCENE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055287#comment-13055287 ] Chris Male commented on LUCENE-3240: Command for using first patch: {code} svn move dev-tools/idea/lucene/contrib/queries/queries.iml dev-tools/idea/lucene/contrib/queries/queries-contrib.iml {code} Move FunctionQuery, ValueSources and DocValues to Queries module Key: LUCENE-3240 URL: https://issues.apache.org/jira/browse/LUCENE-3240 Project: Lucene - Java Issue Type: Sub-task Components: core/search Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-3240.patch Having resolved the FunctionQuery sorting issue and moved the MutableValue classes, we can now move FunctionQuery, ValueSources and DocValues to a Queries module. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved LUCENE-3234. Resolution: Fixed trunk: Committed revision 1139995. 3x: Committed revision 1139997. Thanks, Mike! Provide limit on phrase analysis in FastVectorHighlighter - Key: LUCENE-3234 URL: https://issues.apache.org/jira/browse/LUCENE-3234 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 Reporter: Mike Sokolov Assignee: Koji Sekiguchi Fix For: 3.4, 4.0 Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch With larger documents, FVH can spend a lot of time trying to find the best-scoring snippet as it examines every possible phrase formed from matching terms in the document. If one is willing to accept less-than-perfect scoring by limiting the number of phrases that are examined, substantial speedups are possible. This is analogous to the Highlighter limit on the number of characters to analyze. The patch includes an artifical test case that shows 1000x speedup. In a more normal test environment, with English documents and random queries, I am seeing speedups of around 3-10x when setting phraseLimit=1, which has the effect of selecting the first possible snippet in the document. Most of our sites operate in this way (just show the first snippet), so this would be a big win for us. With phraseLimit = -1, you get the existing FVH behavior. At larger values of phraseLimit, you may not get substantial speedup in the normal case, but you do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055301#comment-13055301 ] Mike Sokolov commented on LUCENE-3234: -- Thank you, Koji - it's nice to have my first patch committed! um - one little comment; since you made the default be MAX_VALUE, there is a javadoc comment that should be updated which says it is 5000. Provide limit on phrase analysis in FastVectorHighlighter - Key: LUCENE-3234 URL: https://issues.apache.org/jira/browse/LUCENE-3234 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 Reporter: Mike Sokolov Assignee: Koji Sekiguchi Fix For: 3.4, 4.0 Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch With larger documents, FVH can spend a lot of time trying to find the best-scoring snippet as it examines every possible phrase formed from matching terms in the document. If one is willing to accept less-than-perfect scoring by limiting the number of phrases that are examined, substantial speedups are possible. This is analogous to the Highlighter limit on the number of characters to analyze. The patch includes an artifical test case that shows 1000x speedup. In a more normal test environment, with English documents and random queries, I am seeing speedups of around 3-10x when setting phraseLimit=1, which has the effect of selecting the first possible snippet in the document. Most of our sites operate in this way (just show the first snippet), so this would be a big win for us. With phraseLimit = -1, you get the existing FVH behavior. At larger values of phraseLimit, you may not get substantial speedup in the normal case, but you do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3240) Move FunctionQuery, ValueSources and DocValues to Queries module
[ https://issues.apache.org/jira/browse/LUCENE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3240: --- Attachment: LUCENE-3240.patch Patch that moves FunctionQuery, DocValues and ValueSource. Also establishes module, sets up dependencies, fixes javadocs etc. Everything compiles and tests pass. I'd like to commit this before going through and moving the actual impls, since some will stay in Solr and some will go to a spatial module. Command to use the patch coming up. Move FunctionQuery, ValueSources and DocValues to Queries module Key: LUCENE-3240 URL: https://issues.apache.org/jira/browse/LUCENE-3240 Project: Lucene - Java Issue Type: Sub-task Components: core/search Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-3240.patch, LUCENE-3240.patch Having resolved the FunctionQuery sorting issue and moved the MutableValue classes, we can now move FunctionQuery, ValueSources and DocValues to a Queries module. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3240) Move FunctionQuery, ValueSources and DocValues to Queries module
[ https://issues.apache.org/jira/browse/LUCENE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055307#comment-13055307 ] Chris Male commented on LUCENE-3240: Command to use the patch: {code} svn --parents mkdir modules/queries/src/java/org/apache/lucene/queries/function svn move solr/src/java/org/apache/solr/search/function/DocValues.java modules/queries/src/java/org/apache/lucene/queries/function/DocValues.java svn move solr/src/java/org/apache/solr/search/function/ValueSource.java modules/queries/src/java/org/apache/lucene/queries/function/ValueSource.java svn move solr/src/java/org/apache/solr/search/function/FunctionQuery.java modules/queries/src/java/org/apache/lucene/queries/function/FunctionQuery.java svn move dev-tools/idea/lucene/contrib/queries/queries.iml dev-tools/idea/lucene/contrib/queries/queries-contrib.iml {code} Move FunctionQuery, ValueSources and DocValues to Queries module Key: LUCENE-3240 URL: https://issues.apache.org/jira/browse/LUCENE-3240 Project: Lucene - Java Issue Type: Sub-task Components: core/search Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-3240.patch, LUCENE-3240.patch Having resolved the FunctionQuery sorting issue and moved the MutableValue classes, we can now move FunctionQuery, ValueSources and DocValues to a Queries module. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055309#comment-13055309 ] Koji Sekiguchi commented on LUCENE-3234: Thank you again for checking the commit, Mike! The javadoc has been fixed. Provide limit on phrase analysis in FastVectorHighlighter - Key: LUCENE-3234 URL: https://issues.apache.org/jira/browse/LUCENE-3234 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 Reporter: Mike Sokolov Assignee: Koji Sekiguchi Fix For: 3.4, 4.0 Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch With larger documents, FVH can spend a lot of time trying to find the best-scoring snippet as it examines every possible phrase formed from matching terms in the document. If one is willing to accept less-than-perfect scoring by limiting the number of phrases that are examined, substantial speedups are possible. This is analogous to the Highlighter limit on the number of characters to analyze. The patch includes an artifical test case that shows 1000x speedup. In a more normal test environment, with English documents and random queries, I am seeing speedups of around 3-10x when setting phraseLimit=1, which has the effect of selecting the first possible snippet in the document. Most of our sites operate in this way (just show the first snippet), so this would be a big win for us. With phraseLimit = -1, you get the existing FVH behavior. At larger values of phraseLimit, you may not get substantial speedup in the normal case, but you do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3245) Realtime terms dictionary
Realtime terms dictionary - Key: LUCENE-3245 URL: https://issues.apache.org/jira/browse/LUCENE-3245 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Jason Rutherglen Priority: Minor For LUCENE-2312 we need a realtime terms dictionary. While ConcurrentSkipListMap may be used, it has drawbacks in terms of high object overhead which can impact GC collection times and heap memory usage. If we implement a skip list that uses primitive backing arrays, we can hopefully have a data structure that is [as] fast and memory efficient. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3240) Move FunctionQuery, ValueSources and DocValues to Queries module
[ https://issues.apache.org/jira/browse/LUCENE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3240: --- Attachment: LUCENE-3240.patch New patch which fixes the dependencies in xml-query-parser. Everything passes now (including ant generate-maven-artifacts). Move FunctionQuery, ValueSources and DocValues to Queries module Key: LUCENE-3240 URL: https://issues.apache.org/jira/browse/LUCENE-3240 Project: Lucene - Java Issue Type: Sub-task Components: core/search Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-3240.patch, LUCENE-3240.patch, LUCENE-3240.patch Having resolved the FunctionQuery sorting issue and moved the MutableValue classes, we can now move FunctionQuery, ValueSources and DocValues to a Queries module. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3245) Realtime terms dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-3245: - Attachment: LUCENE-3245.patch Here's a basic initial patch implementing a single threaded writer, multiple reader atomic integer array skip list. The next step is to tie in the ByteBlockPool to store terms, eg, implement an RTTermsDictAIA class, and an RTTermsDictCSLM class. We can then load the same Wiki-EN terms, and measure the comparative write speeds. Then create a set of terms to lookup from each terms dict and measure the time difference. I am not yet sure how the speed of AtomicIntegerArray will compare with CSLM's usage of AtomicReferenceFieldUpdater. Of note is the fact that because of DWPTs we do not need a skip list that supports concurrent writes. And because we're only adding new unique terms, we do not need delete functionality. Ie, AIA could be faster, though we may need to inline code and perform various tuning tricks. Realtime terms dictionary - Key: LUCENE-3245 URL: https://issues.apache.org/jira/browse/LUCENE-3245 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-3245.patch For LUCENE-2312 we need a realtime terms dictionary. While ConcurrentSkipListMap may be used, it has drawbacks in terms of high object overhead which can impact GC collection times and heap memory usage. If we implement a skip list that uses primitive backing arrays, we can hopefully have a data structure that is [as] fast and memory efficient. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk - Build # 1607 - Failure
Build: https://builds.apache.org/job/Lucene-trunk/1607/ 10 tests failed. FAILED: org.apache.lucene.util.packed.TestPackedInts.testSortWithScoreAndMaxScoreTracking Error Message: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. Stack Trace: junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. REGRESSION: org.apache.lucene.index.TestNRTThreads.testNRTThreads Error Message: this writer hit an OutOfMemoryError; cannot commit Stack Trace: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3724) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2649) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2720) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2702) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2686) at org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:378) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1430) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348) REGRESSION: org.apache.lucene.index.TestNorms.testNorms Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1430) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:605) REGRESSION: org.apache.lucene.search.TestFieldCache.testInfoStream Error Message: this writer hit an OutOfMemoryError; cannot complete optimize Stack Trace: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot complete optimize at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1696) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1640) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1610) at org.apache.lucene.index.RandomIndexWriter.doRandomOptimize(RandomIndexWriter.java:322) at org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:336) at org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:308) at org.apache.lucene.search.TestFieldCache.setUp(TestFieldCache.java:84) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1430) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348) FAILED: junit.framework.TestSuite.org.apache.lucene.search.TestFieldCache Error Message: ensure your setUp() calls super.setUp() and your tearDown() calls super.tearDown()!!! Stack Trace: junit.framework.AssertionFailedError: ensure your setUp() calls super.setUp() and your tearDown() calls super.tearDown()!!! at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:403) FAILED: junit.framework.TestSuite.org.apache.lucene.search.TestNumericRangeQuery32 Error Message: this writer hit an OutOfMemoryError; cannot commit Stack Trace: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2638) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2720) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2702) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2686) at org.apache.lucene.index.RandomIndexWriter.maybeCommit(RandomIndexWriter.java:218) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:166) at org.apache.lucene.search.TestNumericRangeQuery32.beforeClass(TestNumericRangeQuery32.java:88) FAILED: junit.framework.TestSuite.org.apache.lucene.search.TestNumericRangeQuery32 Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestNumericRangeQuery32.afterClass(TestNumericRangeQuery32.java:98) REGRESSION: org.apache.lucene.search.TestPhraseQuery.testRandomPhrases Error Message: Index: 8, Size: 7 Stack Trace: java.lang.IndexOutOfBoundsException: Index: 8, Size: 7 at java.util.ArrayList.rangeCheck(ArrayList.java:571) at java.util.ArrayList.get(ArrayList.java:349) at org.apache.lucene.store.RAMFile.getBuffer(RAMFile.java:70) at org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:154)
[jira] [Updated] (LUCENE-3245) Realtime terms dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-3245: - Attachment: LUCENE-3245.patch Added and fixed the code that traverses the skip list to the level zero linked list and iterates. I need to reuse the starts int array, that's next. Realtime terms dictionary - Key: LUCENE-3245 URL: https://issues.apache.org/jira/browse/LUCENE-3245 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-3245.patch, LUCENE-3245.patch For LUCENE-2312 we need a realtime terms dictionary. While ConcurrentSkipListMap may be used, it has drawbacks in terms of high object overhead which can impact GC collection times and heap memory usage. If we implement a skip list that uses primitive backing arrays, we can hopefully have a data structure that is [as] fast and memory efficient. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org