[jira] [Assigned] (SOLR-2603) Encoding of alternate fields in highlighting
[ https://issues.apache.org/jira/browse/SOLR-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-2603: Assignee: Koji Sekiguchi Encoding of alternate fields in highlighting Key: SOLR-2603 URL: https://issues.apache.org/jira/browse/SOLR-2603 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 3.2 Reporter: Massimo Schiavon Assignee: Koji Sekiguchi Priority: Minor Could be useful if the method DefaultSolrHighlighter.alternateField(NamedList, SolrParams, Document, String) applies configured encoding to alternate fields. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2603) Encoding of alternate fields in highlighting
[ https://issues.apache.org/jira/browse/SOLR-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-2603: - Affects Version/s: 4.0 3.1 Fix Version/s: 4.0 3.3 Encoding of alternate fields in highlighting Key: SOLR-2603 URL: https://issues.apache.org/jira/browse/SOLR-2603 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 3.1, 3.2, 4.0 Reporter: Massimo Schiavon Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.3, 4.0 Could be useful if the method DefaultSolrHighlighter.alternateField(NamedList, SolrParams, Document, String) applies configured encoding to alternate fields. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2603) Encoding of alternate fields in highlighting
[ https://issues.apache.org/jira/browse/SOLR-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-2603: - Attachment: SOLR-2603.patch Massimo, thank you for opening the issue. Can you try the attached patch? Encoding of alternate fields in highlighting Key: SOLR-2603 URL: https://issues.apache.org/jira/browse/SOLR-2603 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 3.1, 3.2, 4.0 Reporter: Massimo Schiavon Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.3, 4.0 Attachments: SOLR-2603.patch Could be useful if the method DefaultSolrHighlighter.alternateField(NamedList, SolrParams, Document, String) applies configured encoding to alternate fields. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-2793: -- Attachment: LUCENE-2793.patch I made some more changes to the earlier patch. I tried putting a nocommit wherever I thought the code was leading to a assertError or a bufferSize as 0 error. Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2919: -- Attachment: LUCENE-2919-filter.patch New patch: - simplified the Filter logic - added option to negate the filter in the IndexReader, this enabled use of only *one* TermRangeFilter and simply negate it for the second pass. - made code correctly close using IOUtils.closeSafely Tests are still ugly. IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3206) FST package API refactoring
[ https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051475#comment-13051475 ] Dawid Weiss commented on LUCENE-3206: - bq. this could be a non-negligible increase in FST size for the non-ascii case I think? I don't know. If the non-ASCII is encoded as UTF8 for the BytesRef, then storing full unicode points on transitions shouldn't really account for much more (in fact it may create fewer states/ transitions because multibyte UTF8 sequences will require multiple transitions)? This we would need to check, of course. And I assume input sequences ARE text, which in general may not be the case... I think I'll leave BYTE1/BYTE4 an option for now and see if I can improve on it once I have a working test suite. bq. I think SimpleText codec is a good example? Also VariableGapTermsIndexReader, and MemoryCodec? Each of these use the BytesRefFSTEnum, I believe. I wasn't clear -- I can find the places where they're used, but I wanted to clarify the nature of stored keys and values (are they UTF8 text, utf16, unicode, random bytes)? I can go through the code, but you're probably a faster source of information on this one. Robert, if you're reading this -- anything you envision could be stored as transition labels? FST package API refactoring --- Key: LUCENE-3206 URL: https://issues.apache.org/jira/browse/LUCENE-3206 Project: Lucene - Java Issue Type: Improvement Components: core/FSTs Affects Versions: 3.2 Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-3206.patch The current API is still marked @experimental, so I think there's still time to fiddle with it. I've been using the current API for some time and I do have some ideas for improvement. This is a placeholder for these -- I'll post a patch once I have a working proof of concept. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051480#comment-13051480 ] Stefan Matheis (steffkes) commented on SOLR-2399: - Shawn: Of course, it's just a quick prototype to demonstrate the functionality. The Layout will change if it's integrated :) Uwe: Thanks for the Changes! Yes, the Analysis-Page has a few things that needs to be changed - mainly regarding layout/arrangement, but also functionality. Will see if i can finish working on that next Week. noah: Thanks, it's integrated .. {{id}} as property works w/o problems? The Layout on Safari looks good, compared to the provided Screenshots? Erick: that was a [quick change|https://github.com/steffkes/solr-admin/commit/799da2e97889b7a576eaf1a516511bc126dcb1b4] : every entry has now it's own url .. so after reloading the page, the view will be the same as before. Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051484#comment-13051484 ] Paul Elschot commented on LUCENE-2454: -- Tried the current patch here to make use prevSetBit, but ran into a problem with the query weight that could be related to LUCENE-3208. When fixing the patch here so that NestedDocumentQuery.java looks like this: {code} public Weight createWeight(IndexSearcher searcher) throws IOException { return new NestedDocumentQueryWeight(childQuery.createWeight(searcher)); } {code} the TestNestedDocumentQuery from the patch here fails with an UnsupportedOperationException. After adding the class name to Query.java constructing this exception the test fails by: UnsupportedOperationException: org.apache.lucene.search.NumericRangeQuery That means that probably the above fix to the patch is wrong. Any comments on how to continue this? Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051486#comment-13051486 ] Michael McCandless commented on LUCENE-2454: I suspect the NestedDocumentQuery must impl rewrite, and rewrite the childQuery. I hit this on LUCENE-3171, too. Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051487#comment-13051487 ] Michael McCandless commented on LUCENE-2919: Patch looks great Uwe! I love how generic it is now, that you can just provide any Filter. IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2919: --- Attachment: LUCENE-2919-3x.patch Here's patch for back-porting original approach to 3.x. IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051490#comment-13051490 ] Uwe Schindler commented on LUCENE-2919: --- I will fix the test and commit this, then backport again, using your TermPositions. IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-2919: - Assignee: Uwe Schindler IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Uwe Schindler Priority: Minor Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3206) FST package API refactoring
[ https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051491#comment-13051491 ] Michael McCandless commented on LUCENE-3206: {quote} bq. this could be a non-negligible increase in FST size for the non-ascii case I think? I don't know. If the non-ASCII is encoded as UTF8 for the BytesRef, then storing full unicode points on transitions shouldn't really account for much more (in fact it may create fewer states/ transitions because multibyte UTF8 sequences will require multiple transitions)? This we would need to check, of course. And I assume input sequences ARE text, which in general may not be the case... I think I'll leave BYTE1/BYTE4 an option for now and see if I can improve on it once I have a working test suite. {quote} Ahh, yes I agree it'd be a more interesting comparison if you use UTF32 instead of UTF8. The case I was worried about is if you must use UTF8 (ie because TermsEnum speaks only BytesRef), then writing those bytes as a vInt instead of a fixed byte is a penalty to non-ascii. {quote} bq. I think SimpleText codec is a good example? Also VariableGapTermsIndexReader, and MemoryCodec? Each of these use the BytesRefFSTEnum, I believe. I wasn't clear -- I can find the places where they're used, but I wanted to clarify the nature of stored keys and values (are they UTF8 text, utf16, unicode, random bytes)? I can go through the code, but you're probably a faster source of information on this one. Robert, if you're reading this -- anything you envision could be stored as transition labels? {quote} Ahh... I think all uses have BytesRef (UTF8 encoded term) as the key, and various things as the values. I don't think we've used FST during analysis yet but we should try; then I suspect we'd use UTF16 labels? FST package API refactoring --- Key: LUCENE-3206 URL: https://issues.apache.org/jira/browse/LUCENE-3206 Project: Lucene - Java Issue Type: Improvement Components: core/FSTs Affects Versions: 3.2 Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-3206.patch The current API is still marked @experimental, so I think there's still time to fiddle with it. I've been using the current API for some time and I do have some ideas for improvement. This is a placeholder for these -- I'll post a patch once I have a working proof of concept. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikola Tankovic updated LUCENE-2308: Attachment: LUCENE-2308-2.patch A baby step towards dividing AbstractField and Field towards Field, TextField, StringField, NumericField and BinaryField with default FieldType's. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-2.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051492#comment-13051492 ] Nikola Tankovic edited comment on LUCENE-2308 at 6/18/11 10:00 AM: --- New patch: a baby step towards dividing AbstractField and Field towards Field, TextField, StringField, NumericField and BinaryField with default FieldType's. was (Author: ntankovic): A baby step towards dividing AbstractField and Field towards Field, TextField, StringField, NumericField and BinaryField with default FieldType's. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-2.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2919: -- Attachment: LUCENE-2919-filter.patch Final patch: - improved tests - changed api to be able to pass arbitrary filter This ready to commit, will do this soon, as the current trunk is unfortunately broken (splits incorrect) IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Uwe Schindler Priority: Minor Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3171) BlockJoinQuery/Collector
[ https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3171: --- Attachment: LUCENE-3171.patch New patch, I think it's ready to commit! BlockJoinQuery/Collector Key: LUCENE-3171 URL: https://issues.apache.org/jira/browse/LUCENE-3171 Project: Lucene - Java Issue Type: Improvement Components: modules/other Reporter: Michael McCandless Fix For: 3.3, 4.0 Attachments: LUCENE-3171.patch, LUCENE-3171.patch I created a single-pass Query + Collector to implement nested docs. The approach is similar to LUCENE-2454, in that the app must index documents in join order, as a block (IW.add/updateDocuments), with the parent doc at the end of the block, except that this impl is one pass. Once you join at indexing time, you can take any query that matches child docs and join it up to the parent docID space, using BlockJoinQuery. You then use BlockJoinCollector, which sorts parent docs by provided Sort, to gather results, grouped by parent; this collector finds any BlockJoinQuerys (using Scorer.visitScorers) and retains the child docs corresponding to each collected parent doc. After searching is done, you retrieve the TopGroups from a provided BlockJoinQuery. Like LUCENE-2454, this is less general than the arbitrary joins in Solr (SOLR-2272) or parent/child from ElasticSearch (https://github.com/elasticsearch/elasticsearch/issues/553), since you must do the join at indexing time as a doc block, but it should be able to handle nested joins as well as joins to multiple tables, though I don't yet have test cases for these. I put this in a new Join module (modules/join); I think as we refactor join impls we should put them here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051495#comment-13051495 ] Paul Elschot commented on LUCENE-2454: -- NestedDocumentQuery already implements rewrite() by returning *this*, just as in 3171. This is a more complete traceback of exception: {noformat} [junit] java.lang.UnsupportedOperationException: org.apache.lucene.search.NumericRangeQuery [junit] at org.apache.lucene.search.Query.createWeight(Query.java:91) [junit] at org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177) [junit] at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358) [junit] at org.apache.lucene.search.nested.NestedDocumentQuery.createWeight(NestedDocumentQuery.java:65) [junit] at org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177) [junit] at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358) [junit] at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:676) [junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:292) [junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281) [junit] at org.apache.lucene.search.TestNestedDocumentQuery.testSimple(TestNestedDocumentQuery.java:92) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1414) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1332) {noformat} Could BooleanWeight be the offendor? Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3206) FST package API refactoring
[ https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051496#comment-13051496 ] Dawid Weiss commented on LUCENE-3206: - I think I know how to compare storing byte[] of UTF8 as compared to vint-encoded codepoints in UTF32 -- I'll encode the wikipedia terms list in both ways and we will see what comes out. Theoretically they should be very, very similar (and full unicode codepoints should generate fewer arcs) because UTF8 uses an encoding scheme with similar overhead to vint encoding... os if something is a single-byte sequence in UTF8, will remain single byte vint. Double-byte UTF8 character will remaing double-byte vint (last double byte codepoint is 0x7ff=2047, whereas the last double byte vint is 2^14=16384. And so on. So for text, vint-encoded UTF32 should be more compact than UTF8... The gain is of course when your labels are not text, but arbitrary bytes -- then byte[] representation would be nicer. FST package API refactoring --- Key: LUCENE-3206 URL: https://issues.apache.org/jira/browse/LUCENE-3206 Project: Lucene - Java Issue Type: Improvement Components: core/FSTs Affects Versions: 3.2 Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-3206.patch The current API is still marked @experimental, so I think there's still time to fiddle with it. I've been using the current API for some time and I do have some ideas for improvement. This is a placeholder for these -- I'll post a patch once I have a working proof of concept. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051495#comment-13051495 ] Paul Elschot edited comment on LUCENE-2454 at 6/18/11 10:45 AM: NestedDocumentQuery already implements rewrite() by returning *this*, just as in 3171. This is a more complete traceback of the exception: {noformat} [junit] java.lang.UnsupportedOperationException: org.apache.lucene.search.NumericRangeQuery [junit] at org.apache.lucene.search.Query.createWeight(Query.java:91) [junit] at org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177) [junit] at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358) [junit] at org.apache.lucene.search.nested.NestedDocumentQuery.createWeight(NestedDocumentQuery.java:65) [junit] at org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177) [junit] at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358) [junit] at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:676) [junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:292) [junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281) [junit] at org.apache.lucene.search.TestNestedDocumentQuery.testSimple(TestNestedDocumentQuery.java:92) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1414) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1332) {noformat} Could BooleanWeight be the offendor? Or should NestedDocumentQuery rewrite by rewriting its child, as you suggested? But rewrite to what then? was (Author: paul.elsc...@xs4all.nl): NestedDocumentQuery already implements rewrite() by returning *this*, just as in 3171. This is a more complete traceback of exception: {noformat} [junit] java.lang.UnsupportedOperationException: org.apache.lucene.search.NumericRangeQuery [junit] at org.apache.lucene.search.Query.createWeight(Query.java:91) [junit] at org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177) [junit] at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358) [junit] at org.apache.lucene.search.nested.NestedDocumentQuery.createWeight(NestedDocumentQuery.java:65) [junit] at org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177) [junit] at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358) [junit] at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:676) [junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:292) [junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281) [junit] at org.apache.lucene.search.TestNestedDocumentQuery.testSimple(TestNestedDocumentQuery.java:92) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1414) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1332) {noformat} Could BooleanWeight be the offendor? Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051497#comment-13051497 ] Uwe Schindler commented on LUCENE-2919: --- Committed trunk revision: 1137162 Backporting... IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Uwe Schindler Priority: Minor Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2919: -- Attachment: (was: LUCENE-2919-3x.patch) IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Uwe Schindler Priority: Minor Attachments: LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2919: -- Attachment: LUCENE-2919-3x.patch Patch for 3.x (not merged one). IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Uwe Schindler Priority: Minor Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk - Build # 1598 - Still Failing
Build: https://builds.apache.org/job/Lucene-trunk/1598/ No tests ran. Build Log (for compile errors): [...truncated 11605 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2919. --- Resolution: Fixed Fix Version/s: 4.0 3.3 Committed 3.x revision: 1137166 IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Uwe Schindler Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3212) Supply FilterIndexReader based on any o.a.l.search.Filter
Supply FilterIndexReader based on any o.a.l.search.Filter - Key: LUCENE-3212 URL: https://issues.apache.org/jira/browse/LUCENE-3212 Project: Lucene - Java Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 When coding LUCENE-2919 (PKIndexSplitter), Mike and me had the idea, how to effectively apply filters on the lowest level (before query execution). This is very useful for e.g. security Filters that simply hide some documents. Currently when you apply the filter after searching, lots of useless work was done like scoring filtered documents, iterating term positions (for Phrases),... This patch will provide a FilterIndexReader subclass (4.0 only, 3.x is too complicated to implement), that hides filtered documents by returning them in getDeletedDocs(). In contrast to LUCENE-2919, the filtering will work on per-segment (without SlowMultiReaderWrapper), so per segment search keeps available and reopening can be done very efficient, as the filter is only calculated on openeing new or changed segments. This filter should improve use-cases where the filter can be applied one time before all queries (like security filters) on (re-)opening the IndexReader. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3213) Use AtomicReaderContext also for CustomScoreProvider
Use AtomicReaderContext also for CustomScoreProvider Key: LUCENE-3213 URL: https://issues.apache.org/jira/browse/LUCENE-3213 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 When moving to AtomicReaderContext, one place was not changed to use it: CustomScoreQuery's CustomScoreProvider. It should also take AtomicReaderContext instead of IndexReader, as this may help users to effectively implement custom scoring there absolute DocIds are needed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051502#comment-13051502 ] Paul Elschot edited comment on LUCENE-2454 at 6/18/11 11:39 AM: One of the nocommits in the patch is about the use of an Filter for the parent filter. NesteDocumentQuery uses an OpenBitSet from this Filter for next() and advance() just like a Filter and also as a parent filter. So how about adding sth like this: {code} public abstract class ParentFilter { public abstract ParentDISI getParentDISI(IndexReader reader); } public class ParentDISI extends DocIdSetIterator { public int getParent(); // to be used only after next() or advance() returned NO_MORE_DOCS } {code} together with another constructor for NestedDocumentQuery with a ParentFilter argument? was (Author: paul.elsc...@xs4all.nl): One of the nocommits in the patch is about the use of an Filter for the parent filter. NesteDocumentQuery uses an OpenBitSet from this Filter for next() and advance() just like a Filter and also as a parent filter. So how about adding sth like this: {code} public abstract class ParentFilter { public abstract ParentDISI getParentDISI(IndexReader reader); } public class ParentDISI extends DocIdSetIterator { public int getParent(); // to be used only after next() or advance() returned NO_MORE_DOCS } {code} together with another constructor for NestedDocumentIterator with a ParentFilter argument? Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051502#comment-13051502 ] Paul Elschot commented on LUCENE-2454: -- One of the nocommits in the patch is about the use of an Filter for the parent filter. NesteDocumentQuery uses an OpenBitSet from this Filter for next() and advance() just like a Filter and also as a parent filter. So how about adding sth like this: {code} public abstract class ParentFilter { public abstract ParentDISI getParentDISI(IndexReader reader); } public class ParentDISI extends DocIdSetIterator { public int getParent(); // to be used only after next() or advance() returned NO_MORE_DOCS } {code} together with another constructor for NestedDocumentIterator with a ParentFilter argument? Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051502#comment-13051502 ] Paul Elschot edited comment on LUCENE-2454 at 6/18/11 11:41 AM: One of the nocommits in the patch is about the use of an Filter for the parent filter. NestedDocumentQuery uses an OpenBitSet from this Filter for next() and advance() just like a Filter and also as a parent filter. So how about adding sth like this: {code} public abstract class ParentFilter { public abstract ParentDISI getParentDISI(IndexReader reader); } public abstract class ParentDISI extends DocIdSetIterator { public abstract int getParent(); // to be used only after next() or advance() returned NO_MORE_DOCS } {code} together with another constructor for NestedDocumentQuery with a ParentFilter argument? was (Author: paul.elsc...@xs4all.nl): One of the nocommits in the patch is about the use of an Filter for the parent filter. NesteDocumentQuery uses an OpenBitSet from this Filter for next() and advance() just like a Filter and also as a parent filter. So how about adding sth like this: {code} public abstract class ParentFilter { public abstract ParentDISI getParentDISI(IndexReader reader); } public class ParentDISI extends DocIdSetIterator { public int getParent(); // to be used only after next() or advance() returned NO_MORE_DOCS } {code} together with another constructor for NestedDocumentQuery with a ParentFilter argument? Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-2454: - Comment: was deleted (was: One of the nocommits in the patch is about the use of an Filter for the parent filter. NestedDocumentQuery uses an OpenBitSet from this Filter for next() and advance() just like a Filter and also as a parent filter. So how about adding sth like this: {code} public abstract class ParentFilter { public abstract ParentDISI getParentDISI(IndexReader reader); } public abstract class ParentDISI extends DocIdSetIterator { public abstract int getParent(); // to be used only after next() or advance() returned NO_MORE_DOCS } {code} together with another constructor for NestedDocumentQuery with a ParentFilter argument? ) Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3213) Use AtomicReaderContext also for CustomScoreProvider
[ https://issues.apache.org/jira/browse/LUCENE-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3213: -- Attachment: LUCENE-3213.patch Easy patch, will commit soon! Use AtomicReaderContext also for CustomScoreProvider Key: LUCENE-3213 URL: https://issues.apache.org/jira/browse/LUCENE-3213 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 Attachments: LUCENE-3213.patch When moving to AtomicReaderContext, one place was not changed to use it: CustomScoreQuery's CustomScoreProvider. It should also take AtomicReaderContext instead of IndexReader, as this may help users to effectively implement custom scoring there absolute DocIds are needed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3213) Use AtomicReaderContext also for CustomScoreProvider
[ https://issues.apache.org/jira/browse/LUCENE-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-3213. --- Resolution: Fixed Committed trunk revision: 1137176 Use AtomicReaderContext also for CustomScoreProvider Key: LUCENE-3213 URL: https://issues.apache.org/jira/browse/LUCENE-3213 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 Attachments: LUCENE-3213.patch When moving to AtomicReaderContext, one place was not changed to use it: CustomScoreQuery's CustomScoreProvider. It should also take AtomicReaderContext instead of IndexReader, as this may help users to effectively implement custom scoring there absolute DocIds are needed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2607) Clean up /clients directory
Clean up /clients directory --- Key: SOLR-2607 URL: https://issues.apache.org/jira/browse/SOLR-2607 Project: Solr Issue Type: Task Components: clients - java, clients - ruby - flare Affects Versions: 4.0 Reporter: Eric Pugh Priority: Minor The /clients directory is a bit of a mess. The only actively maintained client SolrJ is actually in the /dist directory! The other clients that used to be in here, /php and /javascript (I think!) have been moved. The only one is /ruby, and it isn't actively maintained the way other ruby clients are. I'd recommend just removing the /clients directory since it's very confusing to a new user who would logically go here to find clients, and only find a ruby one! It would also let us slim down the size of the download. Alterntively if we want the /clients directory, then lets copy over the solrj lib to this dir instead of /dist I am happy to submit a patch if this makes sense. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051512#comment-13051512 ] Michael McCandless commented on LUCENE-2308: Patch looks good, thanks Nikola! When you make the patch, can you run svn diff from the top-level dir? Ie, so that file paths look lucene/src/java/org/apache/lucene/document/Field.java A couple minor code-formatting things: * Please add { } around one-line ifs, eg in FieldType.toString * import lines go after the copyright (FieldType.java) * If possible please try to avoid adding noise to the patch, for example re-formatting javadocs (eg NumericField.java). It's fine to clean things up (add missing {}'s to existing code) as you go, but if it's simply a reformat that just adds noise which makes it harder to see real changes. Other stuff: * The DEFAULT_TYPE for each field can be final right? * For FieldType, can we use direct members of the class, instead of the EnumSet? (Ie, boolean indexed, boolean stored, etc.). The patch causes compilation errors when I run ant compile-core, but that's expected right? I think our immediate goal here should be to get a compilable patch with tests passing, ie the dirt path. Then we can go back and iterate. But, because so many tests rely on the current Document/Field API... I think in order to stage this we should make a totally new package, call it document2 for now, and create all these new classes inside there. Then, one by one we can cutover tests to use document2/*, starting with TestDemo. Eventually, once everything is cutover, we can remove document and rename document2 to document. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-2.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
faceting on multivalued fields
Hello all, I'm new to this list, so if I don't use it correctly, please say so. I have a question about facetting on multivalued fields. I have indexed some data from a product feed. One of my fields, the category field, is a multivalued field. This field contains multiple categories related to the product. For example when I index a product feed composed of fashion items the category field can be filled with the values women men boys when this piece of clothing is available for women, men and boys. Like I said I indexed the category as a multivalued field. Now I want to facet on it. Facetation works, however, not as expected. When I query Solr with the following URL q=*:*facet=truefacet.field=categoryfq=category:women, I receive the following response int name=women71/int int name=men6/int int name=babies1/int int name=baby0/int int name=boys0/int int name=girls0/int It looks like Solr returns every document where 'women' is part of the multivalued category field, but also returns the facets(count) for all keywords that where indexed as part of the multivalued field, along with 'Women'. In this example I got back documents which had the category field indexed like women men , women babies and women babies men. What is worse, since it also calculates the facets for babies and men, when I put another facet.field in the query (like brand), the response also returns brands for categories men and babies. Is this as designed? Is there a way to let Solr *only* return the documents which have 'women' in their category field? Thanks a lot for the help! Regards, Dennis
[jira] [Resolved] (LUCENE-3209) Memory codec
[ https://issues.apache.org/jira/browse/LUCENE-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3209. Resolution: Fixed Memory codec Key: LUCENE-3209 URL: https://issues.apache.org/jira/browse/LUCENE-3209 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3209.patch This codec stores all terms/postings in RAM. It uses an FSTBytesRef. This is useful on a primary key field to ensure lookups don't need to hit disk, to keep NRT reopen time fast even under IO contention. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: faceting on multivalued fields
On Sat, Jun 18, 2011 at 5:36 PM, Dennis de Boer datdeb...@gmail.com wrote: Hello all, I'm new to this list, so if I don't use it correctly, please say so. I have a question about facetting on multivalued fields. I have indexed some data from a product feed. One of my fields, the category field, is a multivalued field. [...] Please show us the definition of this field in the Solr schema, including the tokenizers/analyzers on the field type. My guess is that your facet field is getting analyzed. This might be of help: http://wiki.apache.org/solr/SolrFacetingOverview Regards, Gora - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: faceting on multivalued fields
As Mohanty said, your facet field seems to be analysed with the white space tokenizer (field type probably text) which would generate individual tokens for category - women babies men and hence the individual facets. You should use string as the field type for category so that it is not tokenized. Regards, Jayendra On Sat, Jun 18, 2011 at 11:50 AM, Gora Mohanty g...@mimirtech.com wrote: On Sat, Jun 18, 2011 at 5:36 PM, Dennis de Boer datdeb...@gmail.com wrote: Hello all, I'm new to this list, so if I don't use it correctly, please say so. I have a question about facetting on multivalued fields. I have indexed some data from a product feed. One of my fields, the category field, is a multivalued field. [...] Please show us the definition of this field in the Solr schema, including the tokenizers/analyzers on the field type. My guess is that your facet field is getting analyzed. This might be of help: http://wiki.apache.org/solr/SolrFacetingOverview Regards, Gora - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3214) Ability to mlock certain fields from the terms dict
Ability to mlock certain fields from the terms dict --- Key: LUCENE-3214 URL: https://issues.apache.org/jira/browse/LUCENE-3214 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless This is a hacked up prototype! It works but I'm not sure how to get this to a committable point. The patch invokes mlock() (tested only on Linux), locking pages from the terms dictionary file that hold terms for a specified field. You can only do this with MMapDirectory. I used this to lock pages for the id field in the NRT stress test; it's an alternative to MemoryCodec. But, it requires you set up the OS to allow the app/user to lock pages in RAM. It works very well in reducing the NRT reopen latency even when large merges are running... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3214) Ability to mlock certain fields from the terms dict
[ https://issues.apache.org/jira/browse/LUCENE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3214: --- Attachment: LUCENE-3214.patch Prototype hacked up but working patch. Ability to mlock certain fields from the terms dict --- Key: LUCENE-3214 URL: https://issues.apache.org/jira/browse/LUCENE-3214 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-3214.patch This is a hacked up prototype! It works but I'm not sure how to get this to a committable point. The patch invokes mlock() (tested only on Linux), locking pages from the terms dictionary file that hold terms for a specified field. You can only do this with MMapDirectory. I used this to lock pages for the id field in the NRT stress test; it's an alternative to MemoryCodec. But, it requires you set up the OS to allow the app/user to lock pages in RAM. It works very well in reducing the NRT reopen latency even when large merges are running... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051552#comment-13051552 ] Jason Rutherglen commented on LUCENE-2919: -- Thanks, committing this means I can remove a custom GitHub branch with only this patch. Also, it'd be great if we somehow published nightly versions to Maven repositories. Though they'd accumulate over time. IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Uwe Schindler Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051553#comment-13051553 ] Nikola Tankovic commented on LUCENE-2308: - Thanks Mike, everything sound good, I'll correct suggested things, then start with document2 package! :) Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-2.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3197) Optimize runs forever if you keep deleting docs at the same time
[ https://issues.apache.org/jira/browse/LUCENE-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3197. Resolution: Fixed Optimize runs forever if you keep deleting docs at the same time Key: LUCENE-3197 URL: https://issues.apache.org/jira/browse/LUCENE-3197 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-3197.patch Because we cascade merges for an optimize... if you also delete documents while the merges are running, then the merge policy will see the resulting single segment as still not optimized (since it has pending deletes) and do a single-segment merge, and will repeat indefinitely (as long as your app keeps deleting docs). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051575#comment-13051575 ] Ryan McKinley commented on LUCENE-2919: --- to get the current maven build, check: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/lastSuccessfulBuild/artifact/maven_artifacts/ IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Uwe Schindler Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051576#comment-13051576 ] Jason Rutherglen commented on LUCENE-2919: -- @Ryan Thanks! What would one place as the artifact info into the pom.xml? IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Uwe Schindler Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2919) IndexSplitter that divides by primary key term
[ https://issues.apache.org/jira/browse/LUCENE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051586#comment-13051586 ] Ryan McKinley commented on LUCENE-2919: --- Jason... not really sure what you are asking 4.0-SNAPSHOT? https://builds.apache.org/job/Lucene-Solr-Maven-trunk/lastSuccessfulBuild/artifact/maven_artifacts/org/apache/lucene/lucene-core/4.0-SNAPSHOT/maven-metadata.xml IndexSplitter that divides by primary key term -- Key: LUCENE-2919 URL: https://issues.apache.org/jira/browse/LUCENE-2919 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Uwe Schindler Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-2919-3x.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919-filter.patch, LUCENE-2919.patch Index splitter that divides by primary key term. The contrib MultiPassIndexSplitter we have divides by docid, however to guarantee external constraints it's sometimes necessary to split by a primary key term id. I think this implementation is a fairly trivial change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3206) FST package API refactoring
[ https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051597#comment-13051597 ] Dawid Weiss commented on LUCENE-3206: - I encoded wikipedia termslist in UTF32 (int4) and UTF8 (int1). Interesting results: {noformat} 271,461,850 utf32.fst Arcs: 64.485.082 Nodes: 36.624.613 270,137,939 utf8.fst Arcs: 66.478.193 Nodes: 38.687.637 {noformat} So... the files are pretty much the same size... UTF32 is slighly bigger, but (as predicted) it has fewer arcs and fewer nodes. I checked and ALL input UTF8 strings are the same or longer than vint-coded UTF32 sequences... So how come UTF32 automaton is larger? I have no clue -- I assume it may be something with the size of v-coded pointers... but I have no clue. In any case, the size gain from using int1 to encode UTF8 is minimal over just using full unicode codepoints and v-coded int4. Performance-wise it may be a hit (because one would need to convert UTF8/UTF16 to full unicode codepoints), but size-wise it seems to be relatively the same. FST package API refactoring --- Key: LUCENE-3206 URL: https://issues.apache.org/jira/browse/LUCENE-3206 Project: Lucene - Java Issue Type: Improvement Components: core/FSTs Affects Versions: 3.2 Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-3206.patch The current API is still marked @experimental, so I think there's still time to fiddle with it. I've been using the current API for some time and I do have some ideas for improvement. This is a placeholder for these -- I'll post a patch once I have a working proof of concept. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3206) FST package API refactoring
[ https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051598#comment-13051598 ] Dawid Weiss commented on LUCENE-3206: - Oh, a wild guess: with int4 more nodes will be expanded into bsearch arrays (fixed size arcs). This may account for the observed size difference. And it may matter for traversals too (because int4 nodes will have a higher fanout, especially at root and first levels... something to consider). FST package API refactoring --- Key: LUCENE-3206 URL: https://issues.apache.org/jira/browse/LUCENE-3206 Project: Lucene - Java Issue Type: Improvement Components: core/FSTs Affects Versions: 3.2 Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-3206.patch The current API is still marked @experimental, so I think there's still time to fiddle with it. I've been using the current API for some time and I do have some ideas for improvement. This is a placeholder for these -- I'll post a patch once I have a working proof of concept. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8901 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8901/ 1 tests failed. REGRESSION: org.apache.lucene.TestExternalCodecs.testPerFieldCodec Error Message: expected:727 but was:728 Stack Trace: junit.framework.AssertionFailedError: expected:727 but was:728 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333) at org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:566) Build Log (for compile errors): [...truncated 3256 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 8901 - Failure
this one was triggered by LUCENE-3197 On Sat, Jun 18, 2011 at 5:59 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8901/ 1 tests failed. REGRESSION: org.apache.lucene.TestExternalCodecs.testPerFieldCodec Error Message: expected:727 but was:728 Stack Trace: junit.framework.AssertionFailedError: expected:727 but was:728 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333) at org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:566) Build Log (for compile errors): [...truncated 3256 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051611#comment-13051611 ] Paul Elschot commented on LUCENE-2454: -- At Query, the javadocs of both createWeight() and rewrite() start with a word of warning. I'll probably need at least a few days to wrap my head around it, so in case anyone meanwhile can provide more help... Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2589) Commenting out the arr name=queries section in firstSearcher generates an NPE
[ https://issues.apache.org/jira/browse/SOLR-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051619#comment-13051619 ] Erick Erickson commented on SOLR-2589: -- Forgot to include the issue number in the comment, so it's not showing up here, the revision is r1137092. Here's the ViewVC link: http://svn.apache.org/viewvc?view=revisionrevision=r1137092 Thanks Steve for pointing this out. Commenting out the arr name=queries section in firstSearcher generates an NPE -- Key: SOLR-2589 URL: https://issues.apache.org/jira/browse/SOLR-2589 Project: Solr Issue Type: Bug Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Trivial Fix For: 3.3, 4.0 Attachments: SOLR-2589-3x.patch, SOLR-2589.patch, SOLR-2589.patch Original Estimate: 1h Remaining Estimate: 1h This has been around from at least 1.4.1, it just clutters up the log, it's pretty harmless but easy to fix. I'll get it done as soon as I get my account set up. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
Way cool! On Sat, Jun 18, 2011 at 4:56 AM, Stefan Matheis (steffkes) (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051480#comment-13051480 ] Stefan Matheis (steffkes) commented on SOLR-2399: - Shawn: Of course, it's just a quick prototype to demonstrate the functionality. The Layout will change if it's integrated :) Uwe: Thanks for the Changes! Yes, the Analysis-Page has a few things that needs to be changed - mainly regarding layout/arrangement, but also functionality. Will see if i can finish working on that next Week. noah: Thanks, it's integrated .. {{id}} as property works w/o problems? The Layout on Safari looks good, compared to the provided Screenshots? Erick: that was a [quick change|https://github.com/steffkes/solr-admin/commit/799da2e97889b7a576eaf1a516511bc126dcb1b4] : every entry has now it's own url .. so after reloading the page, the view will be the same as before. Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-3.x - Build # 412 - Still Failing
Build: https://builds.apache.org/job/Lucene-3.x/412/ 2 tests failed. REGRESSION: org.apache.lucene.search.TestPhraseQuery.testRandomPhrases Error Message: GC overhead limit exceeded Stack Trace: java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:89) at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:62) at org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132) at org.apache.lucene.store.RAMOutputStream.writeBytes(RAMOutputStream.java:118) at org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:119) at org.apache.lucene.index.TermInfosWriter.writeTerm(TermInfosWriter.java:227) at org.apache.lucene.index.TermInfosWriter.add(TermInfosWriter.java:191) at org.apache.lucene.index.FormatPostingsDocsWriter.finish(FormatPostingsDocsWriter.java:122) at org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:314) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:119) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:113) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:70) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3542) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3507) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2063) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2030) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:108) at org.apache.lucene.search.TestPhraseQuery.testRandomPhrases(TestPhraseQuery.java:659) FAILED: org.apache.lucene.util.fst.TestFSTs.testBigSet Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.IntsRef.copy(IntsRef.java:111) at org.apache.lucene.util.IntsRef.init(IntsRef.java:44) at org.apache.lucene.util.fst.TestFSTs$FSTTester.verifyPruned(TestFSTs.java:791) at org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:499) at org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:363) at org.apache.lucene.util.fst.TestFSTs.doTest(TestFSTs.java:211) at org.apache.lucene.util.fst.TestFSTs.testRandomWords(TestFSTs.java:944) at org.apache.lucene.util.fst.TestFSTs.testBigSet(TestFSTs.java:964) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1272) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1190) Build Log (for compile errors): [...truncated 12536 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2535: Attachment: SOLR-2535.patch Here's the patch I used. As before, it's just David's with the extra changes omitted. In Solr 3.2 and trunk the admin/file handler fails to show directory listings - Key: SOLR-2535 URL: https://issues.apache.org/jira/browse/SOLR-2535 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.1, 3.2, 4.0 Environment: java 1.6, jetty Reporter: Peter Wolanin Fix For: 3.3 Attachments: SOLR-2535.patch, SOLR-2535_fix_admin_file_handler_for_directory_listings.patch In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk - Build # 1599 - Still Failing
Build: https://builds.apache.org/job/Lucene-trunk/1599/ 2 tests failed. FAILED: org.apache.lucene.search.TestPhraseQuery.testRandomPhrases Error Message: close() called in wrong state: INCREMENT Stack Trace: junit.framework.AssertionFailedError: close() called in wrong state: INCREMENT at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333) at org.apache.lucene.analysis.MockTokenizer.close(MockTokenizer.java:176) at org.apache.lucene.analysis.TokenFilter.close(TokenFilter.java:48) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:187) at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:293) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:229) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:372) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1474) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1234) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1215) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:163) at org.apache.lucene.search.TestPhraseQuery.testRandomPhrases(TestPhraseQuery.java:659) FAILED: org.apache.lucene.util.fst.TestFSTs.testBigSet Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.IntsRef.copy(IntsRef.java:111) at org.apache.lucene.util.IntsRef.init(IntsRef.java:44) at org.apache.lucene.util.fst.TestFSTs$FSTTester.verifyPruned(TestFSTs.java:793) at org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:501) at org.apache.lucene.util.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:365) at org.apache.lucene.util.fst.TestFSTs.doTest(TestFSTs.java:213) at org.apache.lucene.util.fst.TestFSTs.testRandomWords(TestFSTs.java:946) at org.apache.lucene.util.fst.TestFSTs.testBigSet(TestFSTs.java:966) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333) Build Log (for compile errors): [...truncated 11546 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8907 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8907/ 1 tests failed. REGRESSION: org.apache.lucene.TestExternalCodecs.testPerFieldCodec Error Message: expected:720 but was:721 Stack Trace: junit.framework.AssertionFailedError: expected:720 but was:721 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333) at org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:566) Build Log (for compile errors): [...truncated 3266 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org