[Lucene.Net] [jira] [Created] (LUCENENET-406) Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net
Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net -- Key: LUCENENET-406 URL: https://issues.apache.org/jira/browse/LUCENENET-406 Project: Lucene.Net Issue Type: New Feature Components: Lucene.Net Core Reporter: Pasha Bizhan Priority: Minor Lucene.Net 1.4. nunit tests included. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-406) Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net
[ https://issues.apache.org/jira/browse/LUCENENET-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pasha Bizhan updated LUCENENET-406: --- Component/s: (was: Lucene.Net Core) Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net -- Key: LUCENENET-406 URL: https://issues.apache.org/jira/browse/LUCENENET-406 Project: Lucene.Net Issue Type: New Feature Reporter: Pasha Bizhan Priority: Minor Lucene.Net 1.4. nunit tests included. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-406) Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net
[ https://issues.apache.org/jira/browse/LUCENENET-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pasha Bizhan updated LUCENENET-406: --- Attachment: solr.net.zip full source code with nInut tests Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net -- Key: LUCENENET-406 URL: https://issues.apache.org/jira/browse/LUCENENET-406 Project: Lucene.Net Issue Type: New Feature Reporter: Pasha Bizhan Priority: Minor Attachments: solr.net.zip Lucene.Net 1.4. nunit tests included. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Issue Comment Edited] (LUCENENET-406) Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net
[ https://issues.apache.org/jira/browse/LUCENENET-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010037#comment-13010037 ] Pasha Bizhan edited comment on LUCENENET-406 at 3/23/11 9:38 AM: - full source code with nUnit tests was (Author: pbizhan): full source code with nInut tests Solr.Net - port of the synonyms analyzers from Solr for Lucene.Net -- Key: LUCENENET-406 URL: https://issues.apache.org/jira/browse/LUCENENET-406 Project: Lucene.Net Issue Type: New Feature Reporter: Pasha Bizhan Priority: Minor Attachments: solr.net.zip Lucene.Net 1.4. nunit tests included. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-380) Evaluate Sharpen as a port tool
[ https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010051#comment-13010051 ] Scott Lombard commented on LUCENENET-380: - Why fork outside of ASF if we can keep it inside? Is a independent project justified? It seems to me there is a lot of infrastructure that needs to be duplicated and maintained. I agreed to starting a fork outside of ASF because I didn't think there was any possibility to bring code into lucene.net. Now, I just don't understand licensing well enough to rule out a dOCL license from db4o. Evaluate Sharpen as a port tool --- Key: LUCENENET-380 URL: https://issues.apache.org/jira/browse/LUCENENET-380 Project: Lucene.Net Issue Type: Task Components: Build Automation, Lucene.Net Contrib, Lucene.Net Core, Lucene.Net Demo, Lucene.Net Test Reporter: George Aroush Assignee: Alex Thompson Attachments: 3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip, 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java, Lucene.Net.3_0_3_Sharpen20110106.zip, Lucene.Net.Sharpen20101104.zip, Lucene.Net.Sharpen20101114.zip, NIOFSDirectory.java, QueryParser.java, TestBufferedIndexInput.java, TestDateFilter.java This task is to evaluate Sharpen as a port tool for Lucene.Net. The files to be evaluated are attached. We need to run those files (which are off Java Lucene 2.9.2) against Sharpen and compare the result against JLCA result. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] Wrong home link in lucene.net website
Just a quick bug report on the website: http://incubator.apache.org/lucene.net/ the Lucene.Net logo links to the homepage of the incubator and not on the homepage of the project. Simone -- Simone Chiaretta Microsoft MVP ASP.NET - ASPInsider Blog: http://codeclimber.net.nz RSS: http://feeds2.feedburner.com/codeclimber twitter: @simonech Any sufficiently advanced technology is indistinguishable from magic Life is short, play hard
[Lucene.Net] [jira] [Commented] (LUCENENET-380) Evaluate Sharpen as a port tool
[ https://issues.apache.org/jira/browse/LUCENENET-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010206#comment-13010206 ] Alex Thompson commented on LUCENENET-380: - My thoughts on the fork have been to make something that would be useful beyond Lucene, and the scope of the problems seem to be beyond the scope of Lucene.Net, so I do think an independent project would be a more natural fit. And if we used say BitBucket, would the infrastructure really be that much of a barrier? Evaluate Sharpen as a port tool --- Key: LUCENENET-380 URL: https://issues.apache.org/jira/browse/LUCENENET-380 Project: Lucene.Net Issue Type: Task Components: Build Automation, Lucene.Net Contrib, Lucene.Net Core, Lucene.Net Demo, Lucene.Net Test Reporter: George Aroush Assignee: Alex Thompson Attachments: 3.0.2_JavaToCSharpConverter_AfterPostProcessing.zip, 3.0.2_JavaToCSharpConverter_NoPostProcessing.zip, IndexWriter.java, Lucene.Net.3_0_3_Sharpen20110106.zip, Lucene.Net.Sharpen20101104.zip, Lucene.Net.Sharpen20101114.zip, NIOFSDirectory.java, QueryParser.java, TestBufferedIndexInput.java, TestDateFilter.java This task is to evaluate Sharpen as a port tool for Lucene.Net. The files to be evaluated are attached. We need to run those files (which are off Java Lucene 2.9.2) against Sharpen and compare the result against JLCA result. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010022#comment-13010022 ] Simon Willnauer commented on LUCENE-2310: - Hey Chris, good that you reactivate this issue! I was looking into similar stuff while working on docvalues since it really needs to add stuff to Field / Fieldable. With a cleanup and eventually FieldType this would be way less painless I guess. I have a couple of questions and comments to the current patch. Btw. I like the fact that the previous patch was uploaded March 21 2010 and the latest took 1 year to come up on march 23 2011 :) * Why do you reformat all the stuff in Field, is that necessary here at all? I mean its needed eventually but for the deprecation of things it only bloats the patch really doesn't it? * When you deprecate AbstractField and Fieldable, Field should ideally be a standalone class. So I see that this still needs to subclass Fieldable / AbstractField but could it stand alone now so that we can simply remove the extends / implements on Field once we drop things in 4.0? I think it looks good from looking at the patch though * I don't like the name getAllFields on Document since it implies that we have a getPartialFields or something. I see that you can not use getFields since it only differs in return type which doesn't belong to the signature though. Maybe we should implement IterableField here and offer an additional method getFieldsAsList or maybe getFields(ListField fields) * once we have this in what are the next steps towards FieldType? Will we have only one class Field that is backed by a FieldType but still offers the methods it has now? Or doe we have two totally new classes FieldTyps and FieldValue, something like this: {code} class FieldValue { FieldType type; float boost; String name; Object value; } {code} * I wonder if this patch raises tons of deprecation warnings all over lucene where Fieldable was used? In IW we use it all over the place though. We must fix that in this issue too otherwise uwe will go mad I guess :) thanks for bringing this up again! Reduce Fieldable, AbstractField and Field complexity Key: LUCENE-2310 URL: https://issues.apache.org/jira/browse/LUCENE-2310 Project: Lucene - Java Issue Type: Sub-task Components: Index Reporter: Chris Male Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-DocumentGetFields-core.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch In order to move field type like functionality into its own class, we really need to try to tackle the hierarchy of Fieldable, AbstractField and Field. Currently AbstractField depends on Field, and does not provide much more functionality that storing fields, most of which are being moved over to FieldType. Therefore it seems ideal to try to deprecate AbstractField (and possible Fieldable), moving much of the functionality into Field and FieldType. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010027#comment-13010027 ] Chris Male commented on LUCENE-2310: Thanks for taking a look at this Simon. bq. Why do you reformat all the stuff in Field, is that necessary here at all? I mean its needed eventually but for the deprecation of things it only bloats the patch really doesn't it? Because for me this issue is about reducing the complexity of these classes and Field is a mess. Making it more readable reduces the complexity. If needs be I will do this in two patches, but I don't feel this issue is resolved till the code in Field is readable. bq. When you deprecate AbstractField and Fieldable, Field should ideally be a standalone class. So I see that this still needs to subclass Fieldable / AbstractField but could it stand alone now so that we can simply remove the extends / implements on Field once we drop things in 4.0? I think it looks good from looking at the patch though I don't really understand what you're suggesting here. In 3x where the deprecations will be occurring Field has to continue to extend AbstractField. Yes in 4.0 we can drop that extension but addressing the deprecations is not in the scope of 3x. bq. I don't like the name getAllFields on Document since it implies that we have a getPartialFields or something. I see that you can not use getFields since it only differs in return type which doesn't belong to the signature though. Maybe we should implement IterableField here and offer an additional method getFieldsAsList or maybe getFields(ListField fields) Yeah good call. I think implementing IterableField is best, but it will also require adding a count() method to Document since often people retrieve the List to get the number of fields. bq. once we have this in what are the next steps towards FieldType? Will we have only one class Field that is backed by a FieldType but still offers the methods it has now? Or doe we have two totally new classes FieldTyps and FieldValue Once FieldType is in, all the various metadata properties (isIndexed, isStored etc) will be moved to FieldType, leaving Field as what you suggest as FieldValue. Field will contain its type, boost, name, value. If we have Analyzers on FieldTypes, then we will be able to remove the TokenStream from Field. bq. I wonder if this patch raises tons of deprecation warnings all over lucene where Fieldable was used? In IW we use it all over the place though. We must fix that in this issue too otherwise uwe will go mad I guess Yeah but not in 3x unfortunately. As it stands people can retrieve the List of Fieldables via getFields() and add whatever implementation of Fieldable they like. Consequently we need to continue to support Fieldable in IW for example. Once this code has been committed I will create a new patch for trunk which moves all of Solr and Lucene over to the Field. I could do this in many places already of course, but that core classes like IW would have to remain as they are. I will wait for your thoughts on the reformating and then make a new patch. Reduce Fieldable, AbstractField and Field complexity Key: LUCENE-2310 URL: https://issues.apache.org/jira/browse/LUCENE-2310 Project: Lucene - Java Issue Type: Sub-task Components: Index Reporter: Chris Male Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-DocumentGetFields-core.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch In order to move field type like functionality into its own class, we really need to try to tackle the hierarchy of Fieldable, AbstractField and Field. Currently AbstractField depends on Field, and does not provide much more functionality that storing fields, most of which are being moved over to FieldType. Therefore it seems ideal to try to deprecate AbstractField (and possible Fieldable), moving much of the functionality into Field and FieldType. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]
Hey Simon and all, May we get an update on this? I understand that Google has published the list of accepted organizations, which -- not surprisingly -- includes the ASF. Is there any information on how many slots Apache got, and which issues will be selected? The student application period opens on the 28th, so I'm just wondering if I should go ahead and apply or wait for the decision. Thanks, David On 2011 March 11, Friday 17:23:58 Simon Willnauer wrote: Hey folks, Google Summer of Code 2011 is very close and the Project Applications Period has started recently. Now it's time to get some excited students on board for this year's GSoC. I encourage students to submit an application to the Google Summer of Code web-application. Lucene Solr are amazing projects and GSoC is an incredible opportunity to join the community and push the project forward. If you are a student and you are interested spending some time on a great open source project while getting paid for it, you should submit your application from March 28 - April 8, 2011. There are only 3 weeks until this process starts! Quote from the GSoC website: We hear almost universally from our mentoring organizations that the best applications they receive are from students who took the time to interact and discuss their ideas before submitting an application, so make sure to check out each organization's Ideas list to get to know a particular open source organization better. So if you have any ideas what Lucene Solr should have, or if you find any of the GSoC pre-selected projects [1] interesting, please join us on dev@lucene.apache.org [2]. Since you as a student must apply for a certain project via the GSoC website [3], it's a good idea to work on it ahead of time and include the community and possible mentors as soon as possible. Open source development here at the Apache Software Foundation happens almost exclusively in the public and I encourage you to follow this. Don't mail folks privately; please use the mailing list to get the best possible visibility and attract interested community members and push your idea forward. As always, it's the idea that counts not the person! That said, please do not underestimate the complexity of even small GSoC - Projects. Don't try to rewrite Lucene or Solr! A project usually gains more from a smaller, well discussed and carefully crafted tested feature than from a half baked monster change that's too large to work with. Once your proposal has been accepted and you begin work, you should give the community the opportunity to iterate with you. We prefer progress over perfection so don't hesitate to describe your overall vision, but when the rubber meets the road let's take it in small steps. A code patch of 20 KB is likely to be reviewed very quickly so get fast feedback, while a patch even 60kb in size can take very - Hide quoted text - long. So try to break up your vision and the community will work with you to get things done! On behalf of the Lucene Solr community, Go! join the mailing list and apply for GSoC 2011, Simon [1] https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQu ery=labels+%3D+lucene-gsoc-11 [2] http://lucene.apache.org/java/docs/mailinglists.html [3] http://www.google-melange.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-2983) FieldInfos should be read-only if loaded from disk
FieldInfos should be read-only if loaded from disk -- Key: LUCENE-2983 URL: https://issues.apache.org/jira/browse/LUCENE-2983 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Currently FieldInfos create a private FieldNumberBiMap when they are loaded from a directory which is necessary due to some limitation we need to face with IW#addIndexes(Dir). If we add an index via a directory to an existing index field number can conflict with the global field numbers in the IW receiving the directories. Those field number conflicts will remain until those segments are merged and we stabilize again based on the IW global field numbers. Yet, we unnecessarily creating a BiMap here where we actually should enforce read-only semantics since nobody should modify this FieldInfos instance we loaded from the directory. If somebody needs to get a modifiable copy they should simply create a new one and all all FieldInfo instances to it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010030#comment-13010030 ] Simon Willnauer commented on LUCENE-2310: - bq. I don't really understand what you're suggesting here. In 3x where the deprecations will be occurring Field has to continue to extend AbstractField. Yes in 4.0 we can drop that extension but addressing the deprecations is not in the scope of 3x. What I mean here is that if I would simply remove the extends AbstractField from Field would it still compile or are there any dependencies from AbstractField? IMO AbstractField should just be empty now right? Reduce Fieldable, AbstractField and Field complexity Key: LUCENE-2310 URL: https://issues.apache.org/jira/browse/LUCENE-2310 Project: Lucene - Java Issue Type: Sub-task Components: Index Reporter: Chris Male Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-DocumentGetFields-core.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch In order to move field type like functionality into its own class, we really need to try to tackle the hierarchy of Fieldable, AbstractField and Field. Currently AbstractField depends on Field, and does not provide much more functionality that storing fields, most of which are being moved over to FieldType. Therefore it seems ideal to try to deprecate AbstractField (and possible Fieldable), moving much of the functionality into Field and FieldType. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2983) FieldInfos should be read-only if loaded from disk
[ https://issues.apache.org/jira/browse/LUCENE-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2983: Attachment: LUCENE-2983.patch here is a patch with tests. All tests pass FieldInfos should be read-only if loaded from disk -- Key: LUCENE-2983 URL: https://issues.apache.org/jira/browse/LUCENE-2983 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-2983.patch Currently FieldInfos create a private FieldNumberBiMap when they are loaded from a directory which is necessary due to some limitation we need to face with IW#addIndexes(Dir). If we add an index via a directory to an existing index field number can conflict with the global field numbers in the IW receiving the directories. Those field number conflicts will remain until those segments are merged and we stabilize again based on the IW global field numbers. Yet, we unnecessarily creating a BiMap here where we actually should enforce read-only semantics since nobody should modify this FieldInfos instance we loaded from the directory. If somebody needs to get a modifiable copy they should simply create a new one and all all FieldInfo instances to it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010032#comment-13010032 ] Chris Male commented on LUCENE-2310: Yes Field would still compile if you removed the extends. However if we empty AbstractField then any client code that also extends AbstractField would break. Thats why I deprecate the whole class but leave its code in. We could empty it and change it to extend Field, I think that would still work. Reduce Fieldable, AbstractField and Field complexity Key: LUCENE-2310 URL: https://issues.apache.org/jira/browse/LUCENE-2310 Project: Lucene - Java Issue Type: Sub-task Components: Index Reporter: Chris Male Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-DocumentGetFields-core.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch In order to move field type like functionality into its own class, we really need to try to tackle the hierarchy of Fieldable, AbstractField and Field. Currently AbstractField depends on Field, and does not provide much more functionality that storing fields, most of which are being moved over to FieldType. Therefore it seems ideal to try to deprecate AbstractField (and possible Fieldable), moving much of the functionality into Field and FieldType. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-2984) Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos
Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos -- Key: LUCENE-2984 URL: https://issues.apache.org/jira/browse/LUCENE-2984 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.0 Spin-off from LUCENe-2881 which had this change already but due to some random failures related to this change I remove this part of the patch to make it more isolated and easier to test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010035#comment-13010035 ] Simon Willnauer commented on LUCENE-2310: - {quote} Yeah but not in 3x unfortunately. As it stands people can retrieve the List of Fieldables via getFields() and add whatever implementation of Fieldable they like. Consequently we need to continue to support Fieldable in IW for example. Once this code has been committed I will create a new patch for trunk which moves all of Solr and Lucene over to the Field. I could do this in many places already of course, but that core classes like IW would have to remain as they are. {quote} So, what is the reason for doing this in 3.x at all, can't we simply drop stuff in 4.0 and let 3.x alone? Simon Reduce Fieldable, AbstractField and Field complexity Key: LUCENE-2310 URL: https://issues.apache.org/jira/browse/LUCENE-2310 Project: Lucene - Java Issue Type: Sub-task Components: Index Reporter: Chris Male Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-DocumentGetFields-core.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch In order to move field type like functionality into its own class, we really need to try to tackle the hierarchy of Fieldable, AbstractField and Field. Currently AbstractField depends on Field, and does not provide much more functionality that storing fields, most of which are being moved over to FieldType. Therefore it seems ideal to try to deprecate AbstractField (and possible Fieldable), moving much of the functionality into Field and FieldType. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010036#comment-13010036 ] Chris Male commented on LUCENE-2310: bq. So, what is the reason for doing this in 3.x at all, can't we simply drop stuff in 4.0 and let 3.x alone? Very good question. Certainly we are simplifying the codebase and I feel that Field is what most users use (not AbstractField). But I know some expert users do use AbstractField. But maybe they can handle the hard change? Reduce Fieldable, AbstractField and Field complexity Key: LUCENE-2310 URL: https://issues.apache.org/jira/browse/LUCENE-2310 Project: Lucene - Java Issue Type: Sub-task Components: Index Reporter: Chris Male Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-DocumentGetFields-core.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch In order to move field type like functionality into its own class, we really need to try to tackle the hierarchy of Fieldable, AbstractField and Field. Currently AbstractField depends on Field, and does not provide much more functionality that storing fields, most of which are being moved over to FieldType. Therefore it seems ideal to try to deprecate AbstractField (and possible Fieldable), moving much of the functionality into Field and FieldType. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2984) Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos
[ https://issues.apache.org/jira/browse/LUCENE-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2984: Description: Spin-off from LUCENE-2881 which had this change already but due to some random failures related to this change I remove this part of the patch to make it more isolated and easier to test. (was: Spin-off from LUCENe-2881 which had this change already but due to some random failures related to this change I remove this part of the patch to make it more isolated and easier to test. ) Move hasVectors() hasProx() responsibility out of SegmentInfo to FieldInfos -- Key: LUCENE-2984 URL: https://issues.apache.org/jira/browse/LUCENE-2984 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.0 Spin-off from LUCENE-2881 which had this change already but due to some random failures related to this change I remove this part of the patch to make it more isolated and easier to test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)
[ https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010039#comment-13010039 ] Doron Cohen commented on LUCENE-2980: - bq. Perhaps we should add a specific test in CSTest for this problem? I wouldn't use file.delete() as in indicator because on Linux it will pass Agree, I'll add one. Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text) --- Key: LUCENE-2980 URL: https://issues.apache.org/jira/browse/LUCENE-2980 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2980.patch, LUCENE-2980.patch file.gz is correctly handled as gzip, but file.GZ handled as text which is wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)
[ https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010043#comment-13010043 ] Doron Cohen commented on LUCENE-2980: - bq. Perhaps we should add a specific test in CSTest for this problem? I wouldn't use file.delete() as in indicator because on Linux it will pass Changed my mind about adding this test to ContentSourceTest - I think such a test fits more to the CommonCompress project, because it should directly call CompressorStreamFactory.createCompressorInputStream(in). In our test we invoke ContentSource.getInputStream(File) and so we cannot pass such a close-sensing stream. But this is a valid point, especially, the test case I provided to COMPRESS-127 will fail on Windows but will likely pass on Linux. I'll add a reference to your comment in COMPRESS-127. Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text) --- Key: LUCENE-2980 URL: https://issues.apache.org/jira/browse/LUCENE-2980 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2980.patch, LUCENE-2980.patch file.gz is correctly handled as gzip, but file.GZ handled as text which is wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name
[ https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-2977: Summary: WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name (was: WriteLineDocTask should write gzip/bzip2/txt according to the extension of specifie output file name) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name - Key: LUCENE-2977 URL: https://issues.apache.org/jira/browse/LUCENE-2977 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Since the readers behave this way it would be nice and handy if also this line writer would. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)
[ https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010064#comment-13010064 ] Shai Erera commented on LUCENE-2980: Agreed. Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text) --- Key: LUCENE-2980 URL: https://issues.apache.org/jira/browse/LUCENE-2980 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2980.patch, LUCENE-2980.patch file.gz is correctly handled as gzip, but file.GZ handled as text which is wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)
[ https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-2980: Attachment: LUCENE-2980.patch Updated patch applies workaround only for GZIP format, as other types do close their wrapped stream (COMPRESS-127). Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text) --- Key: LUCENE-2980 URL: https://issues.apache.org/jira/browse/LUCENE-2980 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2980.patch, LUCENE-2980.patch, LUCENE-2980.patch file.gz is correctly handled as gzip, but file.GZ handled as text which is wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)
[ https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-2980. - Resolution: Fixed Lucene Fields: (was: [New]) Committed: - trunk: r1084544, r1084549 - 3x: r1084552 Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text) --- Key: LUCENE-2980 URL: https://issues.apache.org/jira/browse/LUCENE-2980 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2980.patch, LUCENE-2980.patch, LUCENE-2980.patch file.gz is correctly handled as gzip, but file.GZ handled as text which is wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2982) Get rid of ContenSource's workaround for closing b/gzip input stream once this is fixed in CommonCompress
[ https://issues.apache.org/jira/browse/LUCENE-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010086#comment-13010086 ] Doron Cohen commented on LUCENE-2982: - COMPRESS-127 was fixed, so whenever a new CommonsCompress release is available should be able to complete this one. I subscribed to annou...@apache.org to be notified when that happens... Get rid of ContenSource's workaround for closing b/gzip input stream once this is fixed in CommonCompress - Key: LUCENE-2982 URL: https://issues.apache.org/jira/browse/LUCENE-2982 Project: Lucene - Java Issue Type: Task Components: contrib/benchmark Reporter: Doron Cohen Priority: Minor Once COMPRESS-127 is fixed get rid of the entire workaround method ContentSource.closableCompressorInputStream(). It would simplify the code and would perform better without that delegation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-2985) Build SegmentCodecs incrementally for consistent codecIDs during indexing
Build SegmentCodecs incrementally for consistent codecIDs during indexing - Key: LUCENE-2985 URL: https://issues.apache.org/jira/browse/LUCENE-2985 Project: Lucene - Java Issue Type: Improvement Components: Codecs, Index Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: CSF branch, 4.0 currently we build the SegementCodecs during flush which is fine as long as no codec needs to know which fields it should handle. This will change with DocValues or when we expose StoredFields / TermVectors via Codec (see LUCENE-2621 or LUCENE-2935). The other downside it that we don't have a consistent view of which codec belongs to which field during indexing and all FieldInfo instances are unassigned (set to -1). Instead we should build the SegmentCodecs incrementally as fields come in so no matter when a codec needs to be selected to process a document / field we have the right codec ID assigned. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]
On Wed, Mar 23, 2011 at 9:37 AM, David Nemeskey nemeskey.da...@sztaki.hu wrote: Hey Simon and all, May we get an update on this? I understand that Google has published the list of accepted organizations, which -- not surprisingly -- includes the ASF. Is there any information on how many slots Apache got, and which issues will be selected? The student application period opens on the 28th, so I'm just wondering if I should go ahead and apply or wait for the decision. David, you should go ahead and apply via the GSoC website and reference the issue there this is how I understand it works. We will later rate the proposals from the GSoC website and decide which we choose. This is also when slots get assigned. simon Thanks, David On 2011 March 11, Friday 17:23:58 Simon Willnauer wrote: Hey folks, Google Summer of Code 2011 is very close and the Project Applications Period has started recently. Now it's time to get some excited students on board for this year's GSoC. I encourage students to submit an application to the Google Summer of Code web-application. Lucene Solr are amazing projects and GSoC is an incredible opportunity to join the community and push the project forward. If you are a student and you are interested spending some time on a great open source project while getting paid for it, you should submit your application from March 28 - April 8, 2011. There are only 3 weeks until this process starts! Quote from the GSoC website: We hear almost universally from our mentoring organizations that the best applications they receive are from students who took the time to interact and discuss their ideas before submitting an application, so make sure to check out each organization's Ideas list to get to know a particular open source organization better. So if you have any ideas what Lucene Solr should have, or if you find any of the GSoC pre-selected projects [1] interesting, please join us on dev@lucene.apache.org [2]. Since you as a student must apply for a certain project via the GSoC website [3], it's a good idea to work on it ahead of time and include the community and possible mentors as soon as possible. Open source development here at the Apache Software Foundation happens almost exclusively in the public and I encourage you to follow this. Don't mail folks privately; please use the mailing list to get the best possible visibility and attract interested community members and push your idea forward. As always, it's the idea that counts not the person! That said, please do not underestimate the complexity of even small GSoC - Projects. Don't try to rewrite Lucene or Solr! A project usually gains more from a smaller, well discussed and carefully crafted tested feature than from a half baked monster change that's too large to work with. Once your proposal has been accepted and you begin work, you should give the community the opportunity to iterate with you. We prefer progress over perfection so don't hesitate to describe your overall vision, but when the rubber meets the road let's take it in small steps. A code patch of 20 KB is likely to be reviewed very quickly so get fast feedback, while a patch even 60kb in size can take very - Hide quoted text - long. So try to break up your vision and the community will work with you to get things done! On behalf of the Lucene Solr community, Go! join the mailing list and apply for GSoC 2011, Simon [1] https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQu ery=labels+%3D+lucene-gsoc-11 [2] http://lucene.apache.org/java/docs/mailinglists.html [3] http://www.google-melange.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2985) Build SegmentCodecs incrementally for consistent codecIDs during indexing
[ https://issues.apache.org/jira/browse/LUCENE-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2985: Attachment: LUCENE-2985.patch here is an initial patch that uses a SegmentCodecBuilder to assign codec IDs during indexing in DocFieldProcessorPerThread. Build SegmentCodecs incrementally for consistent codecIDs during indexing - Key: LUCENE-2985 URL: https://issues.apache.org/jira/browse/LUCENE-2985 Project: Lucene - Java Issue Type: Improvement Components: Codecs, Index Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: CSF branch, 4.0 Attachments: LUCENE-2985.patch currently we build the SegementCodecs during flush which is fine as long as no codec needs to know which fields it should handle. This will change with DocValues or when we expose StoredFields / TermVectors via Codec (see LUCENE-2621 or LUCENE-2935). The other downside it that we don't have a consistent view of which codec belongs to which field during indexing and all FieldInfo instances are unassigned (set to -1). Instead we should build the SegmentCodecs incrementally as fields come in so no matter when a codec needs to be selected to process a document / field we have the right codec ID assigned. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #70: POMs out of sync
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-3.x/70/ No tests ran. Build Log (for compile errors): [...truncated 22 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
Hey Dawid, Thanks for doing this. It would be good, too, if we no longer had to pass in -Dsolr.clustering.enabled=true as there is no reason why we can't just have it on like the other components. -Grant On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote: Author: dweiss Date: Tue Mar 22 20:44:21 2011 New Revision: 1084345 URL: http://svn.apache.org/viewvc?rev=1084345view=rev Log: Removing the note about excluded JARs (everything is included). Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml URL: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345r1=1084344r2=1084345view=diff == --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original) +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 20:44:21 2011 @@ -1183,12 +1183,10 @@ http://wiki.apache.org/solr/ClusteringComponent - This relies on third party jars which are notincluded in the - release. To use this component (and the /clustering handler) - Those jars will need to be downloaded, and you'll need to set - the solr.cluster.enabled system property when running solr... + You'll need to set the solr.cluster.enabled system property + when running solr to run with clustering enabled: - java -Dsolr.clustering.enabled=true -jar start.jar + java -Dsolr.clustering.enabled=true -jar start.jar -- searchComponent name=clustering enable=${solr.clustering.enabled:false} - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.
[ https://issues.apache.org/jira/browse/LUCENE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-2967. - Resolution: Won't Fix Lucene Fields: (was: [New]) I spent some time on this. It's quite fascinating: the number of collisions for the default probing is smaller than: a) linear probing with murmurhash mix of the original hash b) linear probing without murmurhash mix (start from raw hash only). Curiously, the number of collisions for (b) is smaller than for (a) -- this could be explained if we assume bits are spread evently throughout the entire 32-bit range after murmurhash, so after masking to table size there should be more collisions on lower bits compared to a raw hash (this would have more collisions on upper bits and fewer on lower bits because it is multiplicative... or at least I think so). Anyway, I tried many different versions and I don't see any significant difference in favor of linear probing here. Measured the GC overhead during my tests too, but it is not the primary factor contributing to the total cost of constructing the FST (about 3-5% of the total time, running in parallel, typically). Use linear probing with an additional good bit avalanching function in FST's NodeHash. -- Key: LUCENE-2967 URL: https://issues.apache.org/jira/browse/LUCENE-2967 Project: Lucene - Java Issue Type: Improvement Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Fix For: 4.0 Attachments: LUCENE-2967.patch I recently had an interesting discussion with Sebastiano Vigna (fastutil), who suggested that linear probing, given a hash mixing function with good avalanche properties, is a way better method of constructing lookups in associative arrays compared to quadratic probing. Indeed, with linear probing you can implement removals from a hash map without removed slot markers and linear probing has nice properties with respect to modern CPUs (caches). I've reimplemented HPPC's hash maps to use linear probing and we observed a nice speedup (the same applies for fastutils of course). This patch changes NodeHash's implementation to use linear probing. The code is a bit simpler (I think :). I also moved the load factor to a constant -- 0.5 seems like a generous load factor, especially if we allow large FSTs to be built. I don't see any significant speedup in constructing large automata, but there is no slowdown either (I checked on one machine only for now, but will verify on other machines too). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.1
+1 * Ran Solr example * Perused entire structure of both binary and source distros Noticed the minor issues others have reported, to echo Ryan, none seem like blockers to me. And also to echo Ryan's thanks huge thanks to everyone's hard work on the 3.1 Lucene/Solr release(s). This is a big milestone for the technology and community. Erik On Mar 22, 2011, at 23:42 , Ryan McKinley wrote: +1 * Walked through the solr example * Tested a simple maven project, worked well I don't think the minor issues listed so far are blockers Thanks to everyone who worked on this! ryan On Tue, Mar 22, 2011 at 10:21 AM, Yonik Seeley yo...@lucidimagination.com wrote: Please vote to release the artifacts at http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2 as Lucene 3.1 and Solr 3.1 Thanks for everyone's help pulling all this together! -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010110#comment-13010110 ] Mark Harwood commented on LUCENE-2454: -- bq. I have not looked this patch so this comment may be off base. The slideshare deck gives a good overview: http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene As a simple Lucene-focused addition I'd prefer not to explore all the possible implications for Solr adoption here. The affected areas in Solr are extensive and would include schema definitions, query syntax, facets/filter caching, result-fetching, DIH etc etc. Probably best discussed elsewhere. Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili updated SOLR-2436: -- Attachment: SOLR-2436_2.patch Hello Koji, I've tested your patch, I needed to align it to latest patch applied (see SOLR-2387) to make tests work (see attached patch). In my opinion this solution is better than the current one as it reflects the Solr way of specifying parameters in Handlers. However I think it should be good if it was possible to alternatively get rid of the uimaConfig file defining each parameter inside the Processor with Solr elements (str/lst/int etc.) as well. move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010112#comment-13010112 ] Tommaso Teofili edited comment on SOLR-2436 at 3/23/11 1:26 PM: Hello Koji, I've tested your patch, I needed to align it to latest patch applied (see SOLR-2387) to make tests work (see attached patch). In my opinion the solution you're proposing is better than the current one as it reflects the Solr way of specifying parameters in Handlers. However I think it should be good if it was possible to alternatively get rid of the uimaConfig file defining each parameter inside the Processor with Solr elements (str/lst/int etc.) as well. was (Author: teofili): Hello Koji, I've tested your patch, I needed to align it to latest patch applied (see SOLR-2387) to make tests work (see attached patch). In my opinion this solution is better than the current one as it reflects the Solr way of specifying parameters in Handlers. However I think it should be good if it was possible to alternatively get rid of the uimaConfig file defining each parameter inside the Processor with Solr elements (str/lst/int etc.) as well. move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
On Mar 23, 2011, at 9:20 AM, Dawid Weiss wrote: Sure, I'll change it. Can I alter branch_3x too? That's fine to change 3_x, the 3.1 release is on lucene_solr_3_1 (or something similar). This way it will be on in 3.2. -Grant Don't know what the policy is after the RCs have been published. Dawid On Wed, Mar 23, 2011 at 2:07 PM, Grant Ingersoll gsing...@apache.org wrote: Hey Dawid, Thanks for doing this. It would be good, too, if we no longer had to pass in -Dsolr.clustering.enabled=true as there is no reason why we can't just have it on like the other components. -Grant On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote: Author: dweiss Date: Tue Mar 22 20:44:21 2011 New Revision: 1084345 URL: http://svn.apache.org/viewvc?rev=1084345view=rev Log: Removing the note about excluded JARs (everything is included). Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml URL: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345r1=1084344r2=1084345view=diff == --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original) +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 20:44:21 2011 @@ -1183,12 +1183,10 @@ http://wiki.apache.org/solr/ClusteringComponent - This relies on third party jars which are notincluded in the - release. To use this component (and the /clustering handler) - Those jars will need to be downloaded, and you'll need to set - the solr.cluster.enabled system property when running solr... + You'll need to set the solr.cluster.enabled system property + when running solr to run with clustering enabled: - java -Dsolr.clustering.enabled=true -jar start.jar + java -Dsolr.clustering.enabled=true -jar start.jar -- searchComponent name=clustering enable=${solr.clustering.enabled:false} - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2573: Attachment: LUCENE-2573.patch here is my current state on this issue. I did't add all JDocs needed (by far) and I will wait until we settled on the API for FlushPolicy. * I removed the complex TieredFlushPolicy entirely and added one DefaultFlushPolicy that flushes at IWC.getRAMBufferSizeMB() / sets biggest DWPT pending. * DW will stall threads if we reach 2 x maxNetRam which is retrieved from FlushPolicy so folks can lower that depending on their env. * DWFlushControl checks if a single DWPT grows too large and sets it forcefully pending once its ram consumption is 1.9 GB. That should be enough buffer to not reach the 2048MB limit. We should consider making this configurable. * FlushPolicy has now three methods onInsert, onUpdate and onDelete while DefaultFlushPolicy only implements onInsert and onDelete, the Abstract base class just calls those on an update. * I removed FlushControl from IW * added documentation on IWC for FlushPolicy and removed the jdocs for the RAM limit. I think we should add some lines about how RAM is now used and that users should balance the RAM with the number of threads they are using. Will do that later on though. * For testing I added a ThrottledIndexOutput that makes flushing slow so I can test if we are stalled and / or blocked. This is passed to MockDirectoryWrapper. Its currently under util but it rather should go under store, no? * byte consumption is now committed before FlushPolicy is called since we don't have the multitier flush which required that to reliably proceed across tier boundaries (not required but it was easier to test really). So FP doesn't need to take care of the delta * FlushPolicy now also flushes on maxBufferedDeleteTerms while the buffered delete terms is not yet connected to the DW#getNumBufferedDeleteTerms() which causes some failures though. I added //nocommit @Ignore to those tests. * this patch also contains a @Ignore on TestPersistentSnapshotDeletionPolicy which I couldn't figure out why it is failing but it could be due to an old version of LUCENE-2881 on this branch. I will see if it still fails once we merged. * Healthiness now doesn't stall if we are not flushing on RAM consumption to ensure we don't lock in threads. over all this seems much closer now. I will start writing jdocs. Flush on buffered delete terms might need some tests and I should also write a more reliable test for Healthiness... current it relies on that the ThrottledIndexOutput is slowing down indexing enough to block which might not be true all the time. It didn't fail yet. Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity
divorce defaultsimilarityprovider from defaultsimilarity Key: LUCENE-2986 URL: https://issues.apache.org/jira/browse/LUCENE-2986 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Priority: Minor Fix For: 4.0 In LUCENE-2236 as a start, we made DefaultSimilarity which implements the factory interface (SimilarityProvider), and also extends Similarity. Its factory interface just returns itself always by default. Doron mentioned it would be cleaner to split the two, and I thought it would be good to revisit it later. Today as I was looking at SOLR-2338, it became pretty clear that we should do this, it makes things a lot cleaner. I think currently its confusing to users to see the two apis mixed if they are trying to subclass. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity
[ https://issues.apache.org/jira/browse/LUCENE-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2986: Attachment: LUCENE-2986.patch Attached is a patch: adds DefaultSimilarityProvider, which has our default implementations of the non-field-specific methods (coord/queryNorm/etc), and always returns DefaultSimilarity. divorce defaultsimilarityprovider from defaultsimilarity Key: LUCENE-2986 URL: https://issues.apache.org/jira/browse/LUCENE-2986 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Priority: Minor Fix For: 4.0 Attachments: LUCENE-2986.patch In LUCENE-2236 as a start, we made DefaultSimilarity which implements the factory interface (SimilarityProvider), and also extends Similarity. Its factory interface just returns itself always by default. Doron mentioned it would be cleaner to split the two, and I thought it would be good to revisit it later. Today as I was looking at SOLR-2338, it became pretty clear that we should do this, it makes things a lot cleaner. I think currently its confusing to users to see the two apis mixed if they are trying to subclass. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name
[ https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-2977: Attachment: LUCENE-2977.patch Patch for auto-detecting output compression mode of result line file: - getInputStream() moved from ContentSource to a new class StreamUtils under util. It is now named inputStream(File). - outputStream() method added to StreamUtils. Before applying this patch *svn mv modules/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/ContentSourceTest.java modules/benchmark/src/test/org/apache/lucene/benchmark/byTask/utils/StreamUtilsTest.java* I kept for now the force-bzip logic in WriteLineDocTask but I would like to remove it - it is strange, and in any case LineDocSource would only auto-detect bzip input format if WriteLineDocTask was able to auto-detect bzip output format. Removing it will also simplify StreamUtils. Any opinions on removing this force-bzip option? WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name - Key: LUCENE-2977 URL: https://issues.apache.org/jira/browse/LUCENE-2977 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2977.patch Since the readers behave this way it would be nice and handy if also this line writer would. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode
[ https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-2945: - Attachment: LUCENE-2945d.patch Also has the changes to SpanNearClauseFactory. Surround Query doesn't properly handle equals/hashcode -- Key: LUCENE-2945 URL: https://issues.apache.org/jira/browse/LUCENE-2945 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1.1, 4.0 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, LUCENE-2945d.patch In looking at using the surround queries with Solr, I am hitting issues caused by collisions due to equals/hashcode not being implemented on the anonymous inner classes that are created by things like DistanceQuery (branch 3.x, near line 76) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode
[ https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010218#comment-13010218 ] Paul Elschot edited comment on LUCENE-2945 at 3/23/11 5:01 PM: --- New -2945d patch that also has the changes to SpanNearClauseFactory. was (Author: paul.elsc...@xs4all.nl): Also has the changes to SpanNearClauseFactory. Surround Query doesn't properly handle equals/hashcode -- Key: LUCENE-2945 URL: https://issues.apache.org/jira/browse/LUCENE-2945 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1.1, 4.0 Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, LUCENE-2945d.patch In looking at using the surround queries with Solr, I am hitting issues caused by collisions due to equals/hashcode not being implemented on the anonymous inner classes that are created by things like DistanceQuery (branch 3.x, near line 76) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2438) Case Insensitive Search for Wildcard Queries
Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-2438: --- Attachment: SOLR-2438.patch Attached patch file Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Attachments: SOLR-2438.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name
[ https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010263#comment-13010263 ] Shai Erera commented on LUCENE-2977: Patch looks good ! In StreamUtils you have b.bz/b -- it should be b.bz2/b bq. Any opinions on removing this force-bzip option? +1 (you mean the bzip.compression property in WLDT right?). I think that it's reasonable to request the user to specify an output file with .bz2 extension if he wants bzip compression. I don't see how it will simplify StreamUtils though, but I trust you :) (perhaps you meant it will simplify WLDT?) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name - Key: LUCENE-2977 URL: https://issues.apache.org/jira/browse/LUCENE-2977 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2977.patch Since the readers behave this way it would be nice and handy if also this line writer would. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010268#comment-13010268 ] Peter Sturge commented on SOLR-2438: If you're like me, you may have often wondered why MyTerm, myterm, myter* and MyTer* can return different, and sometimes empty results. This patch addresses this for wildcard queries by adding an attribute to relevant solr.TextField entries in schema.xml. The new attribute is called: {{ignoreCaseForWildcards}} Example entry in schema.xml: {code:title=schema.xml [excerpt]|borderStyle=solid} fieldType name=text_lcws class=solr.TextField positionIncrementGap=100 ignoreCaseForWildcards=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ /analyzer /fieldType {code} It's worth noting that this will lower-case text for ALL terms that match the field type - including synonyms and stemmers. For backward compatibility, the default behaviour is as before - i.e. a case sensitive wildcard search ({{ignoreCaseForWildcards=false}}). The patch was created against the lucene_solr_3_1 branch. I've not applied it yet on trunk. [caveat emptor] I freely admit I'm no schema expert, so commiters and community members may see use cases where this approach could pose problems. I'm all for feedback to enhance the functionality... The hope here is to re-ignite enthusiasm for case-insensitive wildcard searches in Solr - in line with the 'it just works' Solr philosophy. Enjoy! Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Attachments: SOLR-2438.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: write byte[] directly to TokenStream
works great - thanks! On Wed, Mar 23, 2011 at 1:04 AM, Robert Muir rcm...@gmail.com wrote: On Mar 22, 2011 11:38 PM, Ryan McKinley ryan...@gmail.com wrote: I'm messing with putting binary data directly in the index. I have a field class with: @Override public TokenStream tokenStreamValue() { byte[] value = (byte[])fieldsData; Token token = new Token( 0, value.length, geo ); token.resizeBuffer( value.length ); BytesRef ref = token.getBytesRef(); ref.bytes = value; ref.length = value.length; ref.offset = 0; token.setLength( ref.length ); return new SingleTokenTokenStream( token ); } but that is just writing an empty token. Is it possible to set the Token value without converting to char[]? check out Test2BTerms for an example... - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2415) Change XMLWriter version parameter to wt.xml.version
[ https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010308#comment-13010308 ] Hoss Man commented on SOLR-2415: bq. how should we handle the desire to change the faceting format (to make it easier to add metadata like total number of constraints, etc)? version would be one way. facet.format would be another way. i don't think the *structure* of the response (ie: the facet response section) should be driven by the same param as the *format* of the response, which is what version currently is. Something like facet.format seems more appropriate when dealing with a specific component like that ... but i don't think it should be a numeric version equse property, i think it should be descriptive (ie: flat, vs nested or something) bq. perhaps we should add a getVersion() parameter on SolrQueryRequest and have that used across all components. when i suggested we have a common wt.version param that all of the response writers could use, i didn't mean to suggest that it should have a singular id space. my suggestion was that the specific values specified for version or wt.version or whatever would only be meaningful to the specific response writer used -- just as the current values of the version param that the XMLResponseWriter uses are meaninless to the JSONResponseWriter. the overlap would only be in reusing the param name (in the same way that q is the common param name for the main query, regardless of what query parser is specified by defType) bq. Look at how long the existing response writers have hung around in their current format, independent of the version # changes (1.2, 1.3, 1.4, and now 3.1) the version param of the XML response writer has never been in sync with the solr version, it was never intended to be. it's always been the version number of the xml format. Change XMLWriter version parameter to wt.xml.version -- Key: SOLR-2415 URL: https://issues.apache.org/jira/browse/SOLR-2415 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Trivial Fix For: 4.0 The XMLWriter has a parameter called 'version'. This controls some specifics about how the XMLWriter works. Using the parameter name 'version' made sense back when the XMLWriter was the only option, but with all the various writers and different places where 'version' makes sense, I think we should change this parameter name to wt.xml.version so that it specifically refers to the XMLWriter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010318#comment-13010318 ] Stefan Matheis (steffkes) commented on SOLR-2399: - Ryan: ty, will take your points on my list - pretty sure, that it should be possible to integrate them Mark: ty! :) For today, it's about *Logging*. Talked about that with Hoss on #solr the last days, so already changed a few things .. on the way, but not finished: http://files.mathe.is/solr-admin/07_logging.png Actually thinking about the following points: * Tree Structure good way to solve it? * Do we need the possibitly to collapse/expand the three/the childrens? The List could be longer (the screenshot is cropped, just for layout reasons) especially while using SolrCloud which adds about 30 Loggers * In the current er .. Interface you are able to see that the row you're looking at has a level set and in the end (at the right) which is the effective level - for me, that does not matter. if a row/logger, has level-x - that's enough to know. don't need to see if this level is set or inherited. * just a quick idea: if you change f.e. {{org.apache.solr}} then the interface will automatically update all childrens in realtime, affects all nested/sub loggers w/o a assigned level. Thoughts on these points? anyone? : Short Note: i moved Logging to a global level, because it's not configurable on a per-core basis. Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Priority: Minor Fix For: 4.0 *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin [This commit shows the differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d] between old/existing index.jsp and my new one (which is could copy-cut/paste'd from the existing one). Main Action takes place in [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js] which is actually neither clean nor pretty .. just work-in-progress. Actually it's Work in Progress, so ... give it a try. It's developed with Firefox as Browser, so, for a first impression .. please don't use _things_ like Internet Explorer or so ;o Jan already suggested a bunch of good things, i'm sure there are more ideas over there :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name
[ https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-2977: Attachment: LUCENE-2977.patch Thanks for reviewing Shai! bq. In StreamUtils you have b.bz/b -- it should be b.bz2/b Good catch! Fixed. bq. +1 (you mean the bzip.compression property in WLDT right?). Yes. bq. I think that it's reasonable to request the user to specify an output file with .bz2 extension if he wants bzip compression. Great, I removed it. bq. I don't see how it will simplify StreamUtils though, but I trust you :) (perhaps you meant it will simplify WLDT?) It allowed to keep just one of the two variations of StreamUtils.outputStream(). WLDT and the tests became simpler as well. Attaching updated patch. (again first apply that svn mv...) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name - Key: LUCENE-2977 URL: https://issues.apache.org/jira/browse/LUCENE-2977 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2977.patch, LUCENE-2977.patch Since the readers behave this way it would be nice and handy if also this line writer would. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010318#comment-13010318 ] Stefan Matheis (steffkes) edited comment on SOLR-2399 at 3/23/11 8:15 PM: -- Ryan: ty, will take your points on my list - pretty sure, that it should be possible to integrate them Mark: ty! :) For today, it's about *Logging*. Talked about that with Hoss on #solr the last days, so already changed a few things .. on the way, but not finished: http://files.mathe.is/solr-admin/07_logging.png Actually thinking about the following points: * Tree Structure good way to solve it? * Do we need the possibitly to collapse/expand the three/the childrens? The List could be longer (the screenshot is cropped, just for layout reasons) especially while using SolrCloud which adds about 30 Loggers * In the current er .. Interface you are able to see that the row you're looking at has a level set and in the end (at the right) which is the effective level - for me, that does not matter. if a row/logger, has level-x - that's enough to know. don't need to see if this level is set or inherited. * just a quick idea: if you change f.e. {{org.apache.solr}} then the interface will automatically update all childrens in realtime, affects all nested/sub loggers w/o a assigned level. Thoughts on these points? anyone? : Short Note: i moved Logging to a global level, because it's not configurable on a per-core basis. # Edit What i forgot to mention .. actually it's based on a [static logging.json-file|https://github.com/steffkes/solr-admin/blob/master/logging.json] but will try to change the {{LogLevelSection}} Servlet so that it outputs the needed json-structure was (Author: steffkes): Ryan: ty, will take your points on my list - pretty sure, that it should be possible to integrate them Mark: ty! :) For today, it's about *Logging*. Talked about that with Hoss on #solr the last days, so already changed a few things .. on the way, but not finished: http://files.mathe.is/solr-admin/07_logging.png Actually thinking about the following points: * Tree Structure good way to solve it? * Do we need the possibitly to collapse/expand the three/the childrens? The List could be longer (the screenshot is cropped, just for layout reasons) especially while using SolrCloud which adds about 30 Loggers * In the current er .. Interface you are able to see that the row you're looking at has a level set and in the end (at the right) which is the effective level - for me, that does not matter. if a row/logger, has level-x - that's enough to know. don't need to see if this level is set or inherited. * just a quick idea: if you change f.e. {{org.apache.solr}} then the interface will automatically update all childrens in realtime, affects all nested/sub loggers w/o a assigned level. Thoughts on these points? anyone? : Short Note: i moved Logging to a global level, because it's not configurable on a per-core basis. Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Priority: Minor Fix For: 4.0 *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin [This commit shows the differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d] between old/existing index.jsp and my new one (which is could copy-cut/paste'd from the existing one). Main Action takes place in [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js] which is actually neither clean nor pretty .. just work-in-progress. Actually it's Work in Progress, so ... give it a try. It's developed with Firefox as Browser, so, for a first impression .. please don't use _things_ like Internet Explorer or so ;o Jan already suggested a bunch of good things, i'm sure there are more ideas over there :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2439) change solr javadocs to link to local lucene javadocs w/relative links
change solr javadocs to link to local lucene javadocs w/relative links -- Key: SOLR-2439 URL: https://issues.apache.org/jira/browse/SOLR-2439 Project: Solr Issue Type: Task Components: documentation Reporter: Hoss Man Fix For: 3.2 Now that solr/lucene are in lock step development, and solr releases include the entire lucene-java release, the solr ant targets for building javadocs should depend on the lucene (and module) targets for building javadocs and link directly to the local copies of those docs (using relative paths) (currently, the links point to https://hudson.apache.org/hudson/job/Lucene-trunk/javadoc/all/) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.1
: Please vote to release the artifacts at : http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2 -0 I can't in good conscience vote for these artifacts. For the most part, there only only a few minor hicups -- but the big blocker (in my opinion) is that since RC1, dev-tools has been removed from the solr src packages and this causes the top level build.xml (and instructions for IDE users in the top level README.txt file) to be broken. My detailed notes below... ## ### apache-solr-3.1.0-src.tgz dev-tools isn't in here -- this totally boggles my mind, particularly since there was a deliberate and concious switch to make the source releases match what you get when doing an svn export because dev-tools is missing, 3 of the top level ant targets advertised using ant -p don't work; including 'ant idea' and 'ant eclipse' which are also explicitly mentioned in the top level README.txt as how people using those IDEs should get started developing the code. This seems like a major issue to me. we're setting ourselves up to make the release look completely broken right out of the gate for anyone using one of those IDEs. Ask about this on IRC. yonik ryan indicated that a couple of folks had said they would veto any release with dev-tools in it because that stuff is suppose to be unsupported ... this makes no sense to me as we have lots of places in the code base where things are documented as being experimental, subject to change, and/or for developer use only. i don't relaly see how dev-tools should be any different. if there is really such violent oposition to including dev-tools in src releases, then the top level build.xml should not depend on it, and the top level README.txt should not refer to it (except maybe with something like people interested in hacking on the src should use svn which includes some unofficial 'dev-tools' --- Now that the src packages are driven by svn exports, more files exist then were in RC1 and some of the changes we made to the solr/README.txt based on the earlier release candidates are missleading. In particular a lot of things are listed as being in the docs directory of a binary distribution, but those files *do* exist in the src packages -- if you look in the site directory. This seems silly, but at no point is the README.txt factually incorrect, so I guess it's not a big enough deal to worry about. --- running all tests, running the example, and building the javadocs all worked fine. ## ### apache-solr-3.1.0.tgz docs look good, basic example usage works fine. ## ### apache-solr-3.1.0.zip Diffing the contents of apache-solr-3.1.0.tgz with apache-solr-3.1.0.zip (using diff --ignore-all-space --strip-trailing-cr -r) turned up a quite a fiew instances where the CRLF fixing in build.xml seems to have corrupted some non-ascii characters in a few files contrib/dataimporthandler/lib/activation-LICENSE.txt contrib/dataimporthandler/lib/mail-LICENSE.txt docs/skin/CommonMessages_de.xml docs/skin/CommonMessages_es.xml docs/skin/CommonMessages_fr.xml example/solr/conf/velocity/facet_dates.vm ...but these changes don't seem to have substantively harmed the files. ## ### lucene-3.1.0-src.tar.gz tests and javadocs worked fine. ## ### lucene-3.1.0.tar.gz docs look good, demo runs fine. ## ### lucene-3.1.0.zip no differences found with lucene-3.1.0.tar.gz -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name
[ https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010547#comment-13010547 ] Shai Erera commented on LUCENE-2977: Looks good to me. WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name - Key: LUCENE-2977 URL: https://issues.apache.org/jira/browse/LUCENE-2977 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2977.patch, LUCENE-2977.patch Since the readers behave this way it would be nice and handy if also this line writer would. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.1
: Please vote to release the artifacts at : http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2 -0 I can't in good conscience vote for these artifacts. I don't want to suggest anything to slow down the release... but if the problems are with the source release, what about just doing a single source release for lucene+solr? We currently have: lucene-solr-3.1RC2/lucene/ lucene-solr-3.1RC2/lucene/lucene-3.1.0-src.tar.gz lucene-solr-3.1RC2/lucene/... lucene-solr-3.1RC2/solr/ lucene-solr-3.1RC2/solr/apache-solr-3.1.0-src.tgz lucene-solr-3.1RC2/solr/... Why not: lucene-solr-3.1RC2/lucene-3.1.0-src.tar.gz lucene-solr-3.1RC2/lucene/... lucene-solr-3.1RC2/solr/... and let the src release be as close to svn export as possible? This will make sure the result builds just as it does when we actually build it! With the maven artifacts, we have source for each jar: http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/solr/maven/org/apache/solr/solr-core/3.1.0/solr-core-3.1.0-sources.jar http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/lucene/maven/org/apache/lucene/lucene-queries/3.1.0/lucene-queries-3.1.0-sources.jar I'm not sure the exact ASF source requirements, but maybe the maven source.jar files are good enough? Again, I don't think this should be a blocker, but it would be nice to have things simplified for the next release -- gasp. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2338: -- Attachment: SOLR-2338.patch Here's a first stab: I included LUCENE-2986's cleanup work for easy testing (this issue depends upon it). Here is the syntax: {noformat} !-- specify a Similarity classname directly -- fieldType name=sim1 class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ /fieldType !-- specify a Similarity factory -- fieldType name=sim2 class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer similarity class=org.apache.solr.schema.CustomSimilarityFactory str name=echois there an echo?/str /similarity /fieldType {noformat} Additionally, its necessary to allow customization of the SimilarityProvider too, in order to customize the non-field specific stuff like coord()... this is done via: {noformat} !-- expert: SimilarityProvider contains scoring routines that are not field-specific, such as coord() and queryNorm(). most scoring customization happens in the fieldtype. A custom similarity provider may be specified here, but the default is fine for most applications. -- similarityProvider class=org.apache.solr.schema.CustomSimilarityProviderFactory str name=echois there an echo?/str /similarityProvider {noformat} improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Attachments: SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.1
On Thu, Mar 24, 2011 at 12:18 AM, Ryan McKinley ryan...@gmail.com wrote: I don't want to suggest anything to slow down the release... but if the problems are with the source release, what about just doing a single source release for lucene+solr? We currently have: lucene-solr-3.1RC2/lucene/ lucene-solr-3.1RC2/lucene/lucene-3.1.0-src.tar.gz lucene-solr-3.1RC2/lucene/... lucene-solr-3.1RC2/solr/ lucene-solr-3.1RC2/solr/apache-solr-3.1.0-src.tgz lucene-solr-3.1RC2/solr/... Why not: lucene-solr-3.1RC2/lucene-3.1.0-src.tar.gz lucene-solr-3.1RC2/lucene/... lucene-solr-3.1RC2/solr/... and let the src release be as close to svn export as possible? This will make sure the result builds just as it does when we actually build it! With the maven artifacts, we have source for each jar: http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/solr/maven/org/apache/solr/solr-core/3.1.0/solr-core-3.1.0-sources.jar http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/lucene/maven/org/apache/lucene/lucene-queries/3.1.0/lucene-queries-3.1.0-sources.jar I'm not sure the exact ASF source requirements, but maybe the maven source.jar files are good enough? I don't think someone should have to deal with maven to get the lucene source release... I think lucene should have its own artifacts as in the past (the source code being the most important). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
GSoC 2011
Hello, I am planning to submit a project proposal to GSoC 2011 and Lucene seems to have a lot of GSoC projects this year. Last year I did a GSoC project using Lucene for PhotArk project. This year, instead of just using Lucene, I am planning to contribute code to it. My experience with Lucene is just as a regular user, the only code I have changed/extended so far was token streams/analyzers and query parser, so I have more knowledge on this part of the code. Based on that, I'm planning to focus on query parser and analyzer/token stream projects. Does that sound reasonable? I will be studying the code and planning the proposal(s), so you should start seeing more posts from me in the next few days. -- Phillipe Ramalho
Re: [VOTE] Release Lucene/Solr 3.1
I don't think someone should have to deal with maven to get the lucene source release... I think lucene should have its own artifacts as in the past (the source code being the most important). sorry, did not mean to muddy the water with maven discussion... ignore my comment when you say lucene should have its own artifacts do you mean lucene w/o solr? or could a single source artifact include everything? (making the release process easier and apparently cleaner) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org