[jira] [Issue Comment Edited] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044608#comment-13044608 ] Paul Elschot edited comment on LUCENE-2454 at 6/7/11 7:37 AM: -- I finally had some time to start taking a look at the grouping module and again at the patch here. There is too much code there for me to come up with a test case soon. So please don't wait for me to commit this. An easy way to test this would be to have a boolean query with required term and an optional term, with the optional term occurring in a document group in a document before (i.e. with a lower docId than) a document in the same group with a required term. In case I run into this I'll open a separate issue. was (Author: paul.elsc...@xs4all.nl): I finally had some time to start taking a look at the grouping module and again at the patch here. There is too much code there for me to come up with a test case soon. So please don't wait for me to commit this. An easy way to test this would be to have a boolean query with required term and an optional term, with the optional term occurring the in a document group in a document before (i.e. with a lower docId than) a document in the same group with a required term. In case I run into this I'll open a separate issue. Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker reassigned LUCENE-2793: - Assignee: Varun Thacker (was: Simon Willnauer) Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045314#comment-13045314 ] Paul Elschot commented on LUCENE-2454: -- That is very nicely readable XML. The problem might occur when a document with an optional term occurs before a document in the same group with a required term. So the second question is the one for which the problem might occur. The score value Grant's resume should then be higher than the score value for Sean's. Testing only for the set of expected results is not enough for this particular query. The problem might occur in another disguise when requiring both terms and then the set of expected results is enough to test, but this is not as easily tested because one does not know beforehand the order in which the terms are going to be advance()d. The case with an optional term is simpler to test because the optional term is certain to be advance()d to compute the score value after the required term determines that there is a match (see ReqOptSumScorer.score()), and then to be certain of the correct advance() on the optional term one needs to test the score value. Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-2793: -- Attachment: LUCENE-2793.patch I have made the required changes. There are places where I have made the Context=Other , some of which might be wrong. Please suggest me where to make the necessary changes. Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045319#comment-13045319 ] Paul Elschot commented on LUCENE-2454: -- Looking at the structure of the BooleanQuery, I would expect this to work correctly. The ParentsFilter on the unfiltered scorer of required term (mahout) should return the docId of the parent (resume) when the unfiltered scorer is at the document containing the required term. Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045322#comment-13045322 ] Simon Willnauer commented on LUCENE-2793: - Very very quick review: * I think OneMerge should not be required for createing a IOContext we maybe should add a default ctor. * IOContext.Other is confusing I think. If a IOContext doesn't make lots of sense somewhere we should not need to pass it in. Can't we simply have overloaded methods? maybe I just don't like the name, maybe use DEFAULT? * IOContext seem to pretty straight forward you either read or write but it seems to be confused with high level operations like Merge and Flush. Either with go on a high level or we only have read and write here. Since read / write is implicit (you either pull input or output) we should make this high-level only. So maybe we have Query or Search instead of Read here? Maybe it makes sense to specify stuff like Consume or Sequential here too some high level APIs define sequential access so I think it does not conflict? Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045334#comment-13045334 ] Mark Harwood commented on LUCENE-2454: -- bq. Looking at the structure of the BooleanQuery, I would expect this to work correctly. I've found it to be robust so far - you just need to be clear about directing criteria at only one child or potentially different children. The main challenge in using this functionality is allowing users to articulate the nuances of such queries and Lucene-3133 is a holding place for this. Under the covers using the same cached filter for parent filters certainly helps with performance and I typically wrap the ParentFilter tag in the XML queries with a CachedFilter tag to achieve this Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3177) Decouple indexer from Document/Field impls
[ https://issues.apache.org/jira/browse/LUCENE-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3177: --- Attachment: LUCENE-3177.patch Patch. Tests pass, but there are many nocommits. I was *almost* able to create only IndexableField, so that IW.addDocument took IterableIndexableField, except for doc level boost, so I had to create IndexableDocument. I also cutover to a .binaryValue(BytesRef reuse) API here, replacing getBinaryValue/Length/Offset. I would actually like to take IndexableDocument/Field further, so that eg responsibiliity for analysis lies under the tokenStreamValue() method, but I think we should leave that for LUCENE-2309. This is a big enough change already... Decouple indexer from Document/Field impls -- Key: LUCENE-3177 URL: https://issues.apache.org/jira/browse/LUCENE-3177 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3177.patch I think we should define minimal iterator interfaces, IndexableDocument/Field, that indexer requires to index documents. Indexer would consume only these bare minimum interfaces, not the concrete Document/Field/FieldType classes from oal.document package. Then, the Document/Field/FieldType hierarchy is one concrete impl of these interfaces. Apps are free to make their own impls as well. Maybe eventually we make another impl that enforces a global schema, eg factored out of Solr's impl. I think this frees design pressure on our Document/Field/FieldType hierarchy, ie, these classes are free to become concrete fully-featured user-space classes with all sorts of friendly sugar APIs for adding/removing fields, getting/setting values, types, etc., but they don't need substantial extensibility/hierarchy. Ie, the extensibility point shifts to IndexableDocument/Field interface. I think this means we can collapse the three classes we now have for a Field (Fieldable/AbstracField/Field) down to a single concrete class (well, except for LUCENE-2308 where we want to break out dedicated classes for different field types...). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3177) Decouple indexer from Document/Field impls
Decouple indexer from Document/Field impls -- Key: LUCENE-3177 URL: https://issues.apache.org/jira/browse/LUCENE-3177 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3177.patch I think we should define minimal iterator interfaces, IndexableDocument/Field, that indexer requires to index documents. Indexer would consume only these bare minimum interfaces, not the concrete Document/Field/FieldType classes from oal.document package. Then, the Document/Field/FieldType hierarchy is one concrete impl of these interfaces. Apps are free to make their own impls as well. Maybe eventually we make another impl that enforces a global schema, eg factored out of Solr's impl. I think this frees design pressure on our Document/Field/FieldType hierarchy, ie, these classes are free to become concrete fully-featured user-space classes with all sorts of friendly sugar APIs for adding/removing fields, getting/setting values, types, etc., but they don't need substantial extensibility/hierarchy. Ie, the extensibility point shifts to IndexableDocument/Field interface. I think this means we can collapse the three classes we now have for a Field (Fieldable/AbstracField/Field) down to a single concrete class (well, except for LUCENE-2308 where we want to break out dedicated classes for different field types...). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045351#comment-13045351 ] Michael McCandless commented on LUCENE-2308: Thanks for the patch Nikola! Note: when you submit patches that you intend to donate to Apache, you should remember to check the box that says Grant license to ASF..., as long as you are the sole creator of that patch (and thus have the right to grant this patch the ASF). Patches that incorporate someone elses source code are more interesting because we have to ensure the license is compatible with Apache's, update our LICENSE/NOTICE, etc. Stepping back here... I think we should think a bit about the target end goal here and then work out the baby steps to get there? I think ideally once we are done here, it should be incredibly simple to create a document, something like this: {code} Document d = new Document(); d.add(new TextField(title)); d.add(new StringField(id)); d.add(new BinaryField(bytes)); d.add(new NumericField(price)); {code} These classes each use a default FieldType under the hood: * TextField indexes, tokenizes, with norms and TFAP * StringField indexes untokenized and no norms, no TFAP (maybe) * BinaryField only stores the byte[] * NumericField does what it does today If an app wants to tweak the type, it can do so, something like this: {code} FieldType titleFieldType = new FieldType(Textfield.DEFAULT_TYPE); titleFieldType.setOmitNorms(true); titleFieldType.setOmitTFAP(true); d.add(new Field(titleFieldType, title)); {code} Ie, the default *Field classes are sugar for binding to the common default type, but you can easily go and customize the type if you want to. Does that sound roughly like the goal here? Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045350#comment-13045350 ] Michael McCandless commented on LUCENE-2308: I've opened LUCENE-3177 (and linked to this issue), to strongly decouple what indexer needs when indexing documents/fields from what we do in this issue. Ie, LUCENE-3177 gives us more freedom here, I think, to create specific concrete FieldType hierarhy for creating documents. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045354#comment-13045354 ] Michael McCandless commented on LUCENE-2793: Patch looks great! Comments: * I think IOCtx should have ctor taking only OneMerge, which would set the OneMerge and set the context as Merge? * Likewise, a ctor taking a SegmentInfo to mean context = Flush? * And finally a default ctor that maps to Other (or Default or Unknown or Unspecified or something) * You don't need to create the IOCtx in IW.maybeMerge? * Maybe we want a readOnce boolean in IOCtx? When we read del docs, norms, terms index, doc values, segments file / segments.gen, we would set this? (And UnixDir would send eg NO_REUSE down to the OS). * I think we'll need a NativeMMapDir as well as NativeDir (or NativeUnix/WindowsDir), because mmap can also take flags giving hints about access patterns. I'll open a new issue... * Why does SegmentCoreReaders hang onto the IOCtx? Seems like classes shouldn't hang onto it... (also: PreFlexFields). * Hmm does createOutput even need an IOCtx...? What would a dir do with this? I suppose if it's a merge and we had io prioritization (someday) we could set lower prio... OK let's keep it. * I think Codec.fieldsProducer/Consumer should take an IOCtx? * Still need to fix IW's ReaderPool to key off of IOCtx.Context plus the info. Maybe put a // nocommit in there so we remember...? Eg, where you commented out the readBufferSize = ... inside ReaderPool.get is a good place. Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045374#comment-13045374 ] Nikola Tankovic commented on LUCENE-2308: - Mike, that's exactly what I needed, quick API goal summary. This looks very clean and nice to me. Next patch will continue in that direction. Basically, like I said, we should remove AbstractField and keep only Field (with Fieldable interface). Then extend Field with Text,String,Binary and Numeric. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3175) speed up core tests
[ https://issues.apache.org/jira/browse/LUCENE-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3175: Attachment: LUCENE-3175_2.patch i shaved another minute off on my mac with this patch. speed up core tests --- Key: LUCENE-3175 URL: https://issues.apache.org/jira/browse/LUCENE-3175 Project: Lucene - Java Issue Type: Test Reporter: Robert Muir Fix For: 3.3, 4.0 Attachments: LUCENE-3175.patch, LUCENE-3175.patch, LUCENE-3175_2.patch Our core tests have gotten slower and slower, if you don't have a really fast computer its probably frustrating. I think we should: 1. still have random parameters, but make the 'obscene' settings like SimpleText rarer... we can always make them happen more on NIGHTLY 2. tests that make a lot of documents can conditionalize on NIGHTLY so that they are still doing a reasonable test on ordinary runs e.g. numdocs = (NIGHTLY ? 1 : 1000) * multiplier 3. refactor some of the slow huge classes with lots of tests like TestIW/TestIR, at least pull out really slow methods like TestIR.testDiskFull into its own class. this gives better parallelization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3174) Similarity.Stats class for term collection statistics
[ https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3174: Lucene Fields: [New, Patch Available] (was: [New]) Similarity.Stats class for term collection statistics --- Key: LUCENE-3174 URL: https://issues.apache.org/jira/browse/LUCENE-3174 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Priority: Minor Fix For: flexscoring branch Attachments: LUCENE-3174.patch In order to support ranking methods besides TF-IDF, we need to make the statistics they need available. These statistics could be computed in computeWeight (soon to become computeStats) and stored in a separate object for easy access. Since this object will be used solely by subclasses of Similarity, it should be implented as a static inner class, i.e. Similarity.Stats. There are two ways this could be implemented: - as a single Similarity.Stats class, reused by all ranking algorithms. In this case, this class would have a member field for all statistics; - as a hierarchy of Stats classes, one for each ranking algorithm. Each subclass would define only the statistics needed for the ranking algorithm. In the second case, the Stats class in DefaultSimilarity would have a single field, idf, while the one in e.g. BM25Similarity would have idf and average field/document length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3174) Similarity.Stats class for term collection statistics
[ https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3174: Attachment: LUCENE-3174.patch Patch v0.1 Similarity.Stats class for term collection statistics --- Key: LUCENE-3174 URL: https://issues.apache.org/jira/browse/LUCENE-3174 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Priority: Minor Fix For: flexscoring branch Attachments: LUCENE-3174.patch In order to support ranking methods besides TF-IDF, we need to make the statistics they need available. These statistics could be computed in computeWeight (soon to become computeStats) and stored in a separate object for easy access. Since this object will be used solely by subclasses of Similarity, it should be implented as a static inner class, i.e. Similarity.Stats. There are two ways this could be implemented: - as a single Similarity.Stats class, reused by all ranking algorithms. In this case, this class would have a member field for all statistics; - as a hierarchy of Stats classes, one for each ranking algorithm. Each subclass would define only the statistics needed for the ranking algorithm. In the second case, the Stats class in DefaultSimilarity would have a single field, idf, while the one in e.g. BM25Similarity would have idf and average field/document length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045428#comment-13045428 ] Michael McCandless commented on LUCENE-2308: I think we can create only Field? Ie, no Fieldable interface nor AbstractField? I think IndexableField (LUCENE-3177) is the only interface we need? Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1967) New Native PHP Response Writer Class
[ https://issues.apache.org/jira/browse/SOLR-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045435#comment-13045435 ] Eric Pugh commented on SOLR-1967: - Seems like this could be closed as a won't fix. One of the most popular clients for PHP, solr-php-client (http://code.google.com/p/solr-php-client/), doesn't use the PHP writer at all! And isn't going to: http://code.google.com/p/solr-php-client/issues/detail?id=6#c1 I'd echo Peter Wolanin's comment that having lots of writers that don't get tested/updated isn't a good thing. New Native PHP Response Writer Class Key: SOLR-1967 URL: https://issues.apache.org/jira/browse/SOLR-1967 Project: Solr Issue Type: New Feature Components: clients - php, Response Writers Affects Versions: 1.4 Reporter: Israel Ekpo Labels: php, response, solrclient, writer Fix For: 3.3 Attachments: phpnative.tar.gz, phpnativeresponsewriter.jar Original Estimate: 0h Remaining Estimate: 0h Hi Solr users, If you are using Apache Solr via PHP, I have some good news for you. There is a new response writer for the PHP native extension, currently available as a plugin. This new feature adds a new response writer class to the org.apache.solr.request package. This class is used by the PHP Native Solr Client driver to prepare the query response from Solr. This response writer allows you to configure the way the data is serialized for the PHP client. You can use your own class name and you can also control how the properties are serialized as well. The formatting of the response data is very similar to the way it is currently done by the PECL extension on the client side. The only difference now is that this serialization is happening on the server side instead. You will find this new response writer particularly useful when dealing with responses for - highlighting - admin threads responses - more like this responses to mention just a few You can pass the objectClassName request parameter to specify the class name to be used for serializing objects. Please note that the class must be available on the client side to avoid a PHP_Incomplete_Object error during the unserialization process. You can also pass in the objectPropertiesStorageMode request parameter with either a 0 (independent properties) or a 1 (combined properties). These parameters can also be passed as a named list when loading the response writer in the solrconfig.xml file Having this control allows you to create custom objects which gives the flexibility of implementing custom __get methods, ArrayAccess, Traversable and Iterator interfaces on the PHP client side. Until this class in incorporated into Solr, you simply have to copy the jar file containing this plugin into your lib directory under $SOLR_HOME The jar file is available here and so is the source code. Then set up the configuration as shown below and then restart your servelet container Below is an example configuration in solrconfig.xml code queryResponseWriter name=phpnative class=org.apache.solr.request.PHPNativeResponseWriter !-- You can choose a different class for your objects. Just make sure the class is available in the client -- str name=objectClassNameSolrObject/str !-- 0 means OBJECT_PROPERTIES_STORAGE_MODE_INDEPENDENT 1 means OBJECT_PROPERTIES_STORAGE_MODE_COMBINED In independed mode, each property is a separate property In combined mode, all the properites are merged into a _properties array. The combined mode allows you to create custom __getters and you could also implement ArrayAccess, Iterator and Traversable -- int name=objectPropertiesStorageMode0/int /queryResponseWriter code Below is an example implementation on the PHP client side. Support for specifying custom response writers will be available starting from the 0.9.11 version of the PECL extension for Solr currently available here http://pecl.php.net/package/solr Here is an example of how to use the new response writer with the PHP client. code ?php class SolrClass { public $_properties = array(); public function __get($property_name) { if (property_exists($this, $property_name)) { return $this-$property_name; } else if (isset($_properties[$property_name])) { return $_properties[$property_name]; } return null; } } $options = array ( 'hostname' = 'localhost', 'port' = 8983, 'path' = '/solr/' ); $client = new SolrClient($options); $client-setResponseWriter(phpnative); $response = $client-ping(); $query = new SolrQuery(); $query-setQuery(:); $query-set(objectClassName, SolrClass); $query-set(objectPropertiesStorageMode, 1); $response = $client-query($query); $resp = $response-getResponse(); ? code Documentation of the
RE: svn commit: r1133021 - /lucene/dev/trunk/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/TokenSources.java
Wuh, what a leftover from earlier days! The generics policeman thanks for correcting that! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: mikemcc...@apache.org [mailto:mikemcc...@apache.org] Sent: Tuesday, June 07, 2011 4:14 PM To: comm...@lucene.apache.org Subject: svn commit: r1133021 - /lucene/dev/trunk/lucene/contrib/highlighter/src/java/org/apache/lucene/ search/highlight/TokenSources.java Author: mikemccand Date: Tue Jun 7 14:13:48 2011 New Revision: 1133021 URL: http://svn.apache.org/viewvc?rev=1133021view=rev Log: fix redundant cast compilation warning Modified: lucene/dev/trunk/lucene/contrib/highlighter/src/java/org/apache/lucene/s earch/highlight/TokenSources.java Modified: lucene/dev/trunk/lucene/contrib/highlighter/src/java/org/apache/lucene/s earch/highlight/TokenSources.java URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/highlighter /src/java/org/apache/lucene/search/highlight/TokenSources.java?rev=1133 021r1=1133020r2=1133021view=diff == --- lucene/dev/trunk/lucene/contrib/highlighter/src/java/org/apache/lucene/s earch/highlight/TokenSources.java (original) +++ lucene/dev/trunk/lucene/contrib/highlighter/src/java/org/apache/lucene/s earch/highlight/TokenSources.java Tue Jun 7 14:13:48 2011 @@ -165,7 +165,7 @@ public class TokenSources { this.tokens = tokens; termAtt = addAttribute(CharTermAttribute.class); offsetAtt = addAttribute(OffsetAttribute.class); -posincAtt = (PositionIncrementAttribute) addAttribute(PositionIncrementAttribute.class); +posincAtt = addAttribute(PositionIncrementAttribute.class); } @Override - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release PyLucene 3.2.0
+1 I built on OS X 10.6.6, passed all tests (I think? No overall summary in the end, but I didn't see any obvious problem), and ran my usual smoke test indexing first 100K docs from a line file from Wikipedia, and running a few searches. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 6, 2011 at 4:58 PM, Andi Vajda va...@apache.org wrote: The PyLucene 3.2.0-1 release closely tracking the recent release of Lucene Java 3.2 is ready. A release candidate is available from: http://people.apache.org/~vajda/staging_area/ A list of changes in this release can be seen at: http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_3_2/CHANGES PyLucene 3.2.0 is built with JCC 2.9 included in these release artifacts. A list of Lucene Java changes can be seen at: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_2/lucene/CHANGES.txt Please vote to release these artifacts as PyLucene 3.2.0-1. Thanks ! Andi.. ps: the KEYS file for PyLucene release signing is at: http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS http://people.apache.org/~vajda/staging_area/KEYS pps: here is my +1
[jira] [Commented] (SOLR-2491) spellcheck.maxCollationTries breaks when using FieldCollapsing
[ https://issues.apache.org/jira/browse/SOLR-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045453#comment-13045453 ] James Dyer commented on SOLR-2491: -- I think this issue can go separately from SOLR-2564 and have it use ungrouped queries. This little patch allows people to use both features in tandem now rather than waiting for later (for instance, I have an app in production using this patch now...) . As a follow-up to SOLR-2564, it would be nice to give the user the option to return # of grouped hits. If the end-user is receiving groups as results and the app gives a message like 300 results (groups) returned, then in the case of a misspelled query, any did-you-mean message that includes # of hits would probably need to be consistent and give # groups rather than # documents. So this would be useful additional functionality, whenever we indeed get grouping that can return # groups... Maybe a separate issue should be opened just for this, and it can be worked after SOLR-2564 goes in? spellcheck.maxCollationTries breaks when using FieldCollapsing -- Key: SOLR-2491 URL: https://issues.apache.org/jira/browse/SOLR-2491 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2491.patch If specifying spellcheck.maxCollationTries and group=true on the same query, you never get any Spell Check Collations back. The problem is that SpellCheckCollator relies on ResponseBuilder.getToLog().get(hits) to see how many results each test query returns. When group=true, the toLog isn't populated so SpellCheckCollator is unable to find a collation that can return results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3178) Native MMapDir
Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045455#comment-13045455 ] Michael McCandless commented on LUCENE-2793: {quote} I think we'll need a NativeMMapDir as well as NativeDir (or NativeUnix/WindowsDir), because mmap can also take flags giving hints about access patterns. I'll open a new issue... {quote} I opened LUCENE-3178. Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2491) spellcheck.maxCollationTries breaks when using FieldCollapsing
[ https://issues.apache.org/jira/browse/SOLR-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045458#comment-13045458 ] Robert Muir commented on SOLR-2491: --- James: sounds like a plan. Lets try to get this one resolved and we can followup with the option (and maybe change default or whatever) when that makes sense. I'll review the patch shortly. spellcheck.maxCollationTries breaks when using FieldCollapsing -- Key: SOLR-2491 URL: https://issues.apache.org/jira/browse/SOLR-2491 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2491.patch If specifying spellcheck.maxCollationTries and group=true on the same query, you never get any Spell Check Collations back. The problem is that SpellCheckCollator relies on ResponseBuilder.getToLog().get(hits) to see how many results each test query returns. When group=true, the toLog isn't populated so SpellCheckCollator is unable to find a collation that can return results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045460#comment-13045460 ] Nikola Tankovic commented on LUCENE-2308: - Yes, IndexableField looks sufficient. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2542) dataimport global session putVal blank
[ https://issues.apache.org/jira/browse/SOLR-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Wesemann updated SOLR-2542: - Attachment: TestContext.java JUnitTest for this dataimport global session putVal blank -- Key: SOLR-2542 URL: https://issues.apache.org/jira/browse/SOLR-2542 Project: Solr Issue Type: Bug Affects Versions: 3.1 Reporter: Linbin Chen Labels: dataimport Fix For: 3.3 Attachments: TestContext.java, dataimport-globalSession-bug-solr3.1.patch {code:title=ContextImpl.java} private void putVal(String name, Object val, Map map) { if(val == null) map.remove(name); else entitySession.put(name, val); } {code} change to {code:title=ContextImpl.java} private void putVal(String name, Object val, Map map) { if(val == null) map.remove(name); else map.put(name, val); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045461#comment-13045461 ] Robert Muir commented on LUCENE-3178: - can the flags you need all be set with madvise() or are some only available as flags to mmap() ? If so, it might not be that bad. Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Adding shard info to returned document
With the DocTransformer stuff in place, we should be able to return the shard info with the documents. (like SOLR-705) I see two options: 1. Each server adds its own ID to the documents -- I like this approach, but (as far as i can tell) the shards don't really know their ID (or that they are in a distributed request). To support this, we could pass a parameter like shard.id=localhost:9877 along with the request 2. The controlling server adds the ID to documents as they are returned from the shards. This is kinda messy, but avoids passing an extra parameter. thoughts? ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Adding shard info to returned document
On Tue, Jun 7, 2011 at 10:57 AM, Ryan McKinley ryan...@gmail.com wrote: With the DocTransformer stuff in place, we should be able to return the shard info with the documents. (like SOLR-705) I see two options: 1. Each server adds its own ID to the documents -- I like this approach, but (as far as i can tell) the shards don't really know their ID (or that they are in a distributed request). To support this, we could pass a parameter like shard.id=localhost:9877 along with the request Shards currently know that they are in a distrib request via isShard=true I originally favored #2 (the controlling server adds the ID), but thinking about it again, I'm starting to lean toward your #1. If/when we move to micro-sharding (keeping multiple indexes around so we can rebalance easily), a distrib request should state what parts of the index it is requesting from the server. -Yonik http://www.lucidimagination.com 2. The controlling server adds the ID to documents as they are returned from the shards. This is kinda messy, but avoids passing an extra parameter. thoughts? ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2491) spellcheck.maxCollationTries breaks when using FieldCollapsing
[ https://issues.apache.org/jira/browse/SOLR-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-2491. --- Resolution: Fixed Assignee: Robert Muir Committed revision 1133043. Thanks James! spellcheck.maxCollationTries breaks when using FieldCollapsing -- Key: SOLR-2491 URL: https://issues.apache.org/jira/browse/SOLR-2491 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Assignee: Robert Muir Priority: Minor Fix For: 4.0 Attachments: SOLR-2491.patch If specifying spellcheck.maxCollationTries and group=true on the same query, you never get any Spell Check Collations back. The problem is that SpellCheckCollator relies on ResponseBuilder.getToLog().get(hits) to see how many results each test query returns. When group=true, the toLog isn't populated so SpellCheckCollator is unable to find a collation that can return results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Is docs for PatternReplaceFilterFactory missing on wiki...?
Seems like the documentation for PatternReplaceFilterFactory should be added to this wiki page? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?#TokenFilterFactories Is there a desire for this page meant to be an exhaustive index of all the Analyzers etc available? I know it's explicitly called out that it isn't. Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2571) IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup
[ https://issues.apache.org/jira/browse/SOLR-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045523#comment-13045523 ] James Dyer commented on SOLR-2571: -- I added thresholdTokenFrequency to the SpellCheckComponent wiki page. IndexBasedSpellChecker thresholdTokenFrequency fails with a ClassCastException on startup --- Key: SOLR-2571 URL: https://issues.apache.org/jira/browse/SOLR-2571 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4.1, 3.1, 4.0 Reporter: James Dyer Assignee: Robert Muir Priority: Minor Labels: whereIsHossManWhenYouNeedHim Fix For: 3.3, 4.0 Attachments: SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.patch, SOLR-2571.solr3.2.patch When parsing the configuration for thresholdTokenFrequency, the IndexBasedSpellChecker tries to pull a Float from the DataConfig.xml-derrived NamedList. However, this comes through as a String. Therefore, a ClassCastException is always thrown whenever this parameter is specified. The code ought to be doing Float.parseFloat(...) on the value. This looks like a nice feature to use in cases the data contains misspelled or rare words leading to spurious correct queries. I would have liked to have used this with a project we just completed however this bug prevented that. This issue came up recently in the User's mailing list so I am raising an issue now. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045526#comment-13045526 ] Michael McCandless commented on LUCENE-3178: I think we want to call madvise, and not change the flags passed to the original mmap invocation? But I'm not sure... Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Is docs for PatternReplaceFilterFactory missing on wiki...?
Hi Eric, If you want to add it, you should. Steve -Original Message- From: Eric Pugh [mailto:ep...@opensourceconnections.com] Sent: Tuesday, June 07, 2011 12:21 PM To: dev@lucene.apache.org Subject: Is docs for PatternReplaceFilterFactory missing on wiki...? Seems like the documentation for PatternReplaceFilterFactory should be added to this wiki page? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?#TokenFilterF actories Is there a desire for this page meant to be an exhaustive index of all the Analyzers etc available? I know it's explicitly called out that it isn't. Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Is docs for PatternReplaceFilterFactory missing on wiki...?
On Tue, Jun 7, 2011 at 12:20 PM, Eric Pugh ep...@opensourceconnections.com wrote: Seems like the documentation for PatternReplaceFilterFactory should be added to this wiki page? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?#TokenFilterFactories Is there a desire for this page meant to be an exhaustive index of all the Analyzers etc available? I know it's explicitly called out that it isn't. I'm not sure it's meant to be exhaustive, but it should include anything generally useful enough (or at least a pointer to somewhere else that lists some of the generally useful stuff). PatternReplaceFilterFactory certainly seems to fit the bill of useful! -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045561#comment-13045561 ] Paul Elschot commented on LUCENE-2454: -- So one concern that is left is performance for parent testing. I'll open an issue for OpenBitSet.prevSetBit(). Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3179) OpenBitSet.prevSetBit()
OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-3179: - Attachment: LUCENE-3197.patch Add prevSetBit() and tests. Also moves some test code from TestOpenBitSet to TestBitUtil. OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3197.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045561#comment-13045561 ] Paul Elschot edited comment on LUCENE-2454 at 6/7/11 6:21 PM: -- So one concern that is left is performance for parent testing. I'll open an issue for OpenBitSet.prevSetBit(), LUCENE-3197 was (Author: paul.elsc...@xs4all.nl): So one concern that is left is performance for parent testing. I'll open an issue for OpenBitSet.prevSetBit(). Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045561#comment-13045561 ] Paul Elschot edited comment on LUCENE-2454 at 6/7/11 6:22 PM: -- So one concern that is left is performance for parent testing. I'll open an issue for OpenBitSet.prevSetBit(), LUCENE-3179 was (Author: paul.elsc...@xs4all.nl): So one concern that is left is performance for parent testing. I'll open an issue for OpenBitSet.prevSetBit(), LUCENE-3197 Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045576#comment-13045576 ] Yonik Seeley commented on LUCENE-3179: -- Hey Paul, did you try this implementation against Long.numberOfLeadingZeros? The later Oracle Java6 implementations have instrinsified this method, so it might be faster: http://bugs.sun.com/view_bug.do?bug_id=6823354 OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3197.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-3179: - Attachment: LUCENE-3179.patch Correct the issue number in the patch, and remove a superfluous javadoc comment. OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch, LUCENE-3197.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-3179: - Attachment: LUCENE-3179.patch Correct the issue number in the patch, remove a superfluous javadoc comment, and grant licence ... OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch, LUCENE-3179.patch, LUCENE-3197.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-3179: - Attachment: (was: LUCENE-3197.patch) OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-3179: - Attachment: (was: LUCENE-3179.patch) OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045587#comment-13045587 ] Paul Elschot commented on LUCENE-3179: -- I did not try this against Long.numberOfLeadingZeros, but in case that is faster we should use that of course. OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2576) DirectSolrSpellChecker is not returning frequency information
DirectSolrSpellChecker is not returning frequency information - Key: SOLR-2576 URL: https://issues.apache.org/jira/browse/SOLR-2576 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 DirectSolrSpellChecker is not returning frequency information. This also causes the correctlySpelled flag in extended results to sometimes be wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2576) DirectSolrSpellChecker is not returning frequency information
[ https://issues.apache.org/jira/browse/SOLR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2576: - Attachment: SOLR-2576.patch This patch fixes DirectSolrSpellChecker to correctly forward the frequency data. Results are now consistent with IndexBasedSpellChecker. An additional DSSC unit test is also added. I also changed the method name SpellingResult.add(Token token, int docFreq) to SpellingResult.addFrequency(Token token, int docFreq) . This less-ambiguous method name should help prevent this kind of error in the future. Note, however, if back-porting to 3.x, it might be wise to add back a deprecated SpellingResult.add(Token token, int docFreq) method. This will prevent us from breaking anyone's custom solr spellcheckers... DirectSolrSpellChecker is not returning frequency information - Key: SOLR-2576 URL: https://issues.apache.org/jira/browse/SOLR-2576 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2576.patch DirectSolrSpellChecker is not returning frequency information. This also causes the correctlySpelled flag in extended results to sometimes be wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2576) DirectSolrSpellChecker is not returning frequency information
[ https://issues.apache.org/jira/browse/SOLR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned SOLR-2576: - Assignee: Robert Muir DirectSolrSpellChecker is not returning frequency information - Key: SOLR-2576 URL: https://issues.apache.org/jira/browse/SOLR-2576 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Assignee: Robert Muir Priority: Minor Fix For: 4.0 Attachments: SOLR-2576.patch DirectSolrSpellChecker is not returning frequency information. This also causes the correctlySpelled flag in extended results to sometimes be wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2577) Give option for spellcheck.collateExtendedResults w/Grouping to return #-of-Grouped-Hits
Give option for spellcheck.collateExtendedResults w/Grouping to return #-of-Grouped-Hits -- Key: SOLR-2577 URL: https://issues.apache.org/jira/browse/SOLR-2577 Project: Solr Issue Type: Improvement Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Currently, if using spellcheck.collateExtendedResults in conjunction with group=true, the spelling collation report always gives # of hits as # of documents. It would be useful to give users the option to get # of groups back instead (or possibly in addition). This cannot happen, however, until Solr's group function can return the total # of groups. This functionality is indicated in SOLR-2564. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3174) Similarity.Stats class for term collection statistics
[ https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045598#comment-13045598 ] David Mark Nemeskey commented on LUCENE-3174: - Here's what the patch does: - it introduces the Similarity.Stats class and its subclasses - renames computeWeight() to computeStats() - fixes methods that call computeStats() What remains to be done: - rewrite the javadoc - Stats will be used inside other Similarity methods: its availability should be unsured somehow. The current solution in MockBM25Similarity is not satisfactory because there is only one Similarity object at a time. - MultiPhraseWeight, PhraseWeight, SpanWeight, TermWeight call computeStats and extract the IDFExplain object. This level of coupling is not desirable, and should be eliminated. All the more so, as not all Similarity subclasses will have an idf - It might not even make sense to expose computeStats()? To consider: - it might be better if Stats were static, because they could inherit fields from each other Similarity.Stats class for term collection statistics --- Key: LUCENE-3174 URL: https://issues.apache.org/jira/browse/LUCENE-3174 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Priority: Minor Fix For: flexscoring branch Attachments: LUCENE-3174.patch In order to support ranking methods besides TF-IDF, we need to make the statistics they need available. These statistics could be computed in computeWeight (soon to become computeStats) and stored in a separate object for easy access. Since this object will be used solely by subclasses of Similarity, it should be implented as a static inner class, i.e. Similarity.Stats. There are two ways this could be implemented: - as a single Similarity.Stats class, reused by all ranking algorithms. In this case, this class would have a member field for all statistics; - as a hierarchy of Stats classes, one for each ranking algorithm. Each subclass would define only the statistics needed for the ranking algorithm. In the second case, the Stats class in DefaultSimilarity would have a single field, idf, while the one in e.g. BM25Similarity would have idf and average field/document length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-2229) SimpleSpanFragmenter fails to start a new fragment
[ https://issues.apache.org/jira/browse/LUCENE-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elmer Garduno closed LUCENE-2229. - Resolution: Won't Fix SimpleSpanFragmenter fails to start a new fragment -- Key: LUCENE-2229 URL: https://issues.apache.org/jira/browse/LUCENE-2229 Project: Lucene - Java Issue Type: Bug Components: modules/highlighter Reporter: Elmer Garduno Priority: Minor Attachments: LUCENE-2229.patch Original Estimate: 1h Remaining Estimate: 1h SimpleSpanFragmenter fails to identify a new fragment when there is more than one stop word after a span is detected. This problem can be observed when the Query contains a PhraseQuery. The problem is that the span extends toward the end of the TokenGroup. This is because {{waitForProps = positionSpans.get(i).end + 1;}} and {{position += posIncAtt.getPositionIncrement();}} this generates a value of {{position}} greater than the value of {{waitForProps}} and {{(waitForPos == position)}} never matches. {code:title=SimpleSpanFragmenter.java} public boolean isNewFragment() { position += posIncAtt.getPositionIncrement(); if (waitForPos == position) { waitForPos = -1; } else if (waitForPos != -1) { return false; } WeightedSpanTerm wSpanTerm = queryScorer.getWeightedSpanTerm(termAtt.term()); if (wSpanTerm != null) { ListPositionSpan positionSpans = wSpanTerm.getPositionSpans(); for (int i = 0; i positionSpans.size(); i++) { if (positionSpans.get(i).start == position) { waitForPos = positionSpans.get(i).end + 1; break; } } } ... {code} An example is provided in the test case for the following Document and the query *all tokens* followed by the words _of a_. {panel:title=Document} Attribute instances are reused for *all tokens* _of a_ document. Thus, a TokenStream/-Filter needs to update the appropriate Attribute(s) in incrementToken(). The consumer, commonly the Lucene indexer, consumes the data in the Attributes and then calls incrementToken() again until it retuns false, which indicates that the end of the stream was reached. This means that in each call of incrementToken() a TokenStream/-Filter can safely overwrite the data in the Attribute instances. {panel} {code:title=HighlighterTest.java} public void testSimpleSpanFragmenter() throws Exception { ... doSearching(\all tokens\); maxNumFragmentsRequired = 2; scorer = new QueryScorer(query, FIELD_NAME); highlighter = new Highlighter(this, scorer); for (int i = 0; i hits.totalHits; i++) { String text = searcher.doc(hits.scoreDocs[i].doc).get(FIELD_NAME); TokenStream tokenStream = analyzer.tokenStream(FIELD_NAME, new StringReader(text)); highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer, 20)); String result = highlighter.getBestFragments(tokenStream, text, maxNumFragmentsRequired, ...); System.out.println(\t + result); } } {code} {panel:title=Result} are reused for Ball/B Btokens/B of a document. Thus, a TokenStream/-Filter needs to update the appropriate Attribute(s) in incrementToken(). The consumer, commonly the Lucene indexer, consumes the data in the Attributes and then calls incrementToken() again until it retuns false, which indicates that the end of the stream was reached. This means that in each call of incrementToken() a TokenStream/-Filter can safely overwrite the data in the Attribute instances. {panel} {panel:title=Expected Result} for Ball/B Btokens/B of a document {panel} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2578) ReplicationHandler Backups -- clean up old backups
ReplicationHandler Backups -- clean up old backups -- Key: SOLR-2578 URL: https://issues.apache.org/jira/browse/SOLR-2578 Project: Solr Issue Type: Improvement Components: replication (java) Affects Versions: 3.2, 4.0 Reporter: James Dyer Priority: Minor Fix For: 3.3, 4.0 It would be nice when performing backups if there was an easy way to tell ReplicationHandler to only keep so many and then delete the older ones. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2578) ReplicationHandler Backups -- clean up old backups
[ https://issues.apache.org/jira/browse/SOLR-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2578: - Attachment: SOLR-2578.patch This patch adds the functionality with a new parameter: numberToKeep . The unit test has been enhanced to do 2 backups and then check to see if the first one was automatically deleted (numberToKeep=1). ReplicationHandler Backups -- clean up old backups -- Key: SOLR-2578 URL: https://issues.apache.org/jira/browse/SOLR-2578 Project: Solr Issue Type: Improvement Components: replication (java) Affects Versions: 3.2, 4.0 Reporter: James Dyer Priority: Minor Fix For: 3.3, 4.0 Attachments: SOLR-2578.patch It would be nice when performing backups if there was an easy way to tell ReplicationHandler to only keep so many and then delete the older ones. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045655#comment-13045655 ] Uwe Schindler commented on LUCENE-3179: --- If it's faster, should we not replace it completely in Lucene? The impl in Java 5 (Sun JDK) is identical to ours from BitUtils, so why replicate? If it gets intrinsic, it can only get faster. I assume its a relict from pre-Java-1.5 times like Lucene 2.9. OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045656#comment-13045656 ] Uwe Schindler commented on LUCENE-3179: --- With the previous comment I also refer to nextSetBit(). OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045659#comment-13045659 ] Dawid Weiss commented on LUCENE-3179: - I posted the benchmarks of intrinsic vs. manual (OpenBitSet) performance of nlz and pop (bitcount) methods a while ago -- they should still be around JIRA somewhere. If I recall right, the difference was significant, although not like an order of magnitude or something... and on CPUs without intrinsic instructions the implementation handcrafted by Yonik was actually faster than the one in the standard library. Of course these days most CPUs will have popcnt/ nlz instructions, so it makes sense to switch. OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045662#comment-13045662 ] Dawid Weiss commented on LUCENE-3179: - I think it's the 1.6 that adds these intrinsics -- I don't know if they've been backported to updates to 1.5, but this should be relatively easy to verify empirically. OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2579) UIMAUpdateRequestProcessor ignore error fails if text.length() 0
UIMAUpdateRequestProcessor ignore error fails if text.length() 0 -- Key: SOLR-2579 URL: https://issues.apache.org/jira/browse/SOLR-2579 Project: Solr Issue Type: Bug Affects Versions: 3.2 Reporter: Elmer Garduno Priority: Minor Fix For: 3.3 If UIMAUpdateRequestProcessor is configured to ignore errors, an exception is raised when logging the error and text.length() 100. if (solrUIMAConfiguration.isIgnoreErrors()) log.warn(new StringBuilder(skip the text processing due to ) .append(e.getLocalizedMessage()).append(optionalFieldInfo) .append( text=\).append(text.substring(0, 100)).append(...\).toString()); else{ throw new SolrException(ErrorCode.SERVER_ERROR, new StringBuilder(processing error: ) .append(e.getLocalizedMessage()).append(optionalFieldInfo) .append( text=\).append(text.substring(0, 100)).append(...\).toString(), e); } I'm submitting a patch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045667#comment-13045667 ] Dawid Weiss commented on LUCENE-3179: - Intrinsics are implemented/added at the hotspot (jit) level, you won't see them in src.jar -- all calls to specific methods in Long.* or Integer.* are replaced by handcrafted assembly (usually process-specific instructions that do what a given method should do). If you're interested, check out openjdk code of hotspot and scan for intrinsics (or popcnt). OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045669#comment-13045669 ] Uwe Schindler commented on LUCENE-3179: --- You misunderstood me, i know what intrinsics are. My confusion was related to that: bq. and on CPUs without intrinsic instructions the implementation handcrafted by Yonik was actually faster than the one in the standard library And the so called hand crafted method is identical in src.jar and Yonik's code. So without intrinsics, the standard library and Yoniks code should be identical in performance, as it was same code, the last time I looked into it. OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045674#comment-13045674 ] Yonik Seeley commented on LUCENE-3179: -- bq. And the so called hand crafted method is identical in src.jar and Yonik's code. For pop, yes. But not for ntz or pop_array and friends. BitUtil.pop exists because this was originally written to work with java1.4 which didn't have Long.bitCount() http://markmail.org/message/5ay4m2thsvsahk3c OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045676#comment-13045676 ] Paul Elschot commented on LUCENE-3179: -- The micro benchmarks for ntz() and pop() are at LUCENE-2221 OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045679#comment-13045679 ] Dawid Weiss commented on LUCENE-3179: - Oh, ok -- clear. So, my comment was related to the various methods of doing bitcounts and other bit-fiddling on arrays of long values (for example pop_array) -- these are HD derived implementations; I compared them to naive loops using intrinsics and naive loops on cpus (and jvms) without intrinsics -- in that case simple loops with intrinsics was faster than Lucene's code, but Lucene's code was faster than simple loops without intrinsics (effectively using whatever was in the std. library). OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045682#comment-13045682 ] Uwe Schindler commented on LUCENE-3179: --- OK, so we can sefely remove BitUtil.pop and replace by the Java 5 method (maybe review again the code in src.jar also for ntz). And if this one is an intrinsic in Java 6 its even faster. Now we talk the same language :-) OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045685#comment-13045685 ] Paul Elschot commented on LUCENE-3179: -- As to the performance, the current patch at LUCENE-2454 has a bitwise linear search to do this. OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Priority: Minor Fix For: 3.3 Attachments: LUCENE-3179.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045683#comment-13045683 ] Shawn Heisey commented on SOLR-2399: The xinclude stuff looks pretty cool! A suggestion on it: shade the background of the entire expanded section, so it stands out better. Thanks for including this! Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2576) DirectSolrSpellChecker is not returning frequency information
[ https://issues.apache.org/jira/browse/SOLR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045690#comment-13045690 ] Robert Muir commented on SOLR-2576: --- Thanks James, patch looks good! This is definitely the source of confusion, because there are several overloaded methods named add(), one of which does a completely different thing :) DirectSolrSpellChecker is not returning frequency information - Key: SOLR-2576 URL: https://issues.apache.org/jira/browse/SOLR-2576 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Assignee: Robert Muir Priority: Minor Fix For: 3.3, 4.0 Attachments: SOLR-2576.patch DirectSolrSpellChecker is not returning frequency information. This also causes the correctlySpelled flag in extended results to sometimes be wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2576) DirectSolrSpellChecker is not returning frequency information
[ https://issues.apache.org/jira/browse/SOLR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2576: -- Fix Version/s: 3.3 adding fix version 3.3 to backport the API improvement. DirectSolrSpellChecker is not returning frequency information - Key: SOLR-2576 URL: https://issues.apache.org/jira/browse/SOLR-2576 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Assignee: Robert Muir Priority: Minor Fix For: 3.3, 4.0 Attachments: SOLR-2576.patch DirectSolrSpellChecker is not returning frequency information. This also causes the correctlySpelled flag in extended results to sometimes be wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2576) DirectSolrSpellChecker is not returning frequency information
[ https://issues.apache.org/jira/browse/SOLR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-2576. --- Resolution: Fixed Committed revision 1133187 (trunk), 1133190 (branch_3x) DirectSolrSpellChecker is not returning frequency information - Key: SOLR-2576 URL: https://issues.apache.org/jira/browse/SOLR-2576 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Assignee: Robert Muir Priority: Minor Fix For: 3.3, 4.0 Attachments: SOLR-2576.patch DirectSolrSpellChecker is not returning frequency information. This also causes the correctlySpelled flag in extended results to sometimes be wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3174) Similarity.Stats class for term collection statistics
[ https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045706#comment-13045706 ] Robert Muir commented on LUCENE-3174: - Hi David, after reviewing the patch, I think we should do this: * make Similarity.Stats static * pass this, instead of Weight, to exactDocScorer() and sloppyDocScorer(). this should fix the MockBM25Sim issue as it wont need to hold a stats since its passed here. Similarity.Stats class for term collection statistics --- Key: LUCENE-3174 URL: https://issues.apache.org/jira/browse/LUCENE-3174 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Priority: Minor Fix For: flexscoring branch Attachments: LUCENE-3174.patch In order to support ranking methods besides TF-IDF, we need to make the statistics they need available. These statistics could be computed in computeWeight (soon to become computeStats) and stored in a separate object for easy access. Since this object will be used solely by subclasses of Similarity, it should be implented as a static inner class, i.e. Similarity.Stats. There are two ways this could be implemented: - as a single Similarity.Stats class, reused by all ranking algorithms. In this case, this class would have a member field for all statistics; - as a hierarchy of Stats classes, one for each ranking algorithm. Each subclass would define only the statistics needed for the ranking algorithm. In the second case, the Stats class in DefaultSimilarity would have a single field, idf, while the one in e.g. BM25Similarity would have idf and average field/document length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8685 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8685/ All tests passed Build Log (for compile errors): [...truncated 14853 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-91) IndexWriter ctor does not release lock on exception
[ https://issues.apache.org/jira/browse/LUCENE-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045794#comment-13045794 ] Adam Ahmed commented on LUCENE-91: -- Please note that this also happens when Lock.obtain times out. As far as I can tell, the only way to avoid that possibility is to LOCK_OBTAIN_WAIT_FOREVER, and forever generally sounds like a bad idea. I would say that is a bug, and more clearly so than exceptions due to the index not existing. IndexWriter ctor does not release lock on exception --- Key: LUCENE-91 URL: https://issues.apache.org/jira/browse/LUCENE-91 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 1.2 Environment: Operating System: All Platform: All Reporter: Alex Staubo Assignee: Lucene Developers If IndexWriter construction fails with an exception, the write.lock lock is not released. For example, this happens if one tries to open an IndexWriter on an FSDirectory which does not contain an Lucene index. FileNotFoundException will be thrown by org.apache.lucene.store.FSInputStream, after which the write lock will remain in the directory, and nobody can open the index. I have been using this pattern -- doing IndexWriter(..., false), catching FileNotFoundException and doing IndexWriter(..., true) -- in my code to initialize the index on demand, because the app never know if the index already exists. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-91) IndexWriter ctor does not release lock on exception
[ https://issues.apache.org/jira/browse/LUCENE-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045800#comment-13045800 ] Adam Ahmed commented on LUCENE-91: -- The better place for the timeout fix would probably be in Lock.obtain(), where it should attempt something similar to Lock.release() if a timeout occurs. IndexWriter ctor does not release lock on exception --- Key: LUCENE-91 URL: https://issues.apache.org/jira/browse/LUCENE-91 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 1.2 Environment: Operating System: All Platform: All Reporter: Alex Staubo Assignee: Lucene Developers If IndexWriter construction fails with an exception, the write.lock lock is not released. For example, this happens if one tries to open an IndexWriter on an FSDirectory which does not contain an Lucene index. FileNotFoundException will be thrown by org.apache.lucene.store.FSInputStream, after which the write lock will remain in the directory, and nobody can open the index. I have been using this pattern -- doing IndexWriter(..., false), catching FileNotFoundException and doing IndexWriter(..., true) -- in my code to initialize the index on demand, because the app never know if the index already exists. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org