[jira] [Updated] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc
[ https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3223: --- Fix Version/s: 4.0 > SearchWithSortTask ignores sorting by Doc > - > > Key: LUCENE-3223 > URL: https://issues.apache.org/jira/browse/LUCENE-3223 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Chris Male >Assignee: Chris Male >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch > > > During my work in LUCENE-3912, I found the following code: > {code} > if (field.equals("doc")) { > sortField0 = SortField.FIELD_DOC; > } if (field.equals("score")) { > sortField0 = SortField.FIELD_SCORE; > } ... > {code} > This means the setting of SortField.FIELD_DOC is ignored. While I don't know > much about this code, this seems like a valid setting and obviously just a > bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc
[ https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male resolved LUCENE-3223. Resolution: Fixed Committed revision 1137882. > SearchWithSortTask ignores sorting by Doc > - > > Key: LUCENE-3223 > URL: https://issues.apache.org/jira/browse/LUCENE-3223 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Chris Male >Assignee: Chris Male >Priority: Minor > Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch > > > During my work in LUCENE-3912, I found the following code: > {code} > if (field.equals("doc")) { > sortField0 = SortField.FIELD_DOC; > } if (field.equals("score")) { > sortField0 = SortField.FIELD_SCORE; > } ... > {code} > This means the setting of SortField.FIELD_DOC is ignored. While I don't know > much about this code, this seems like a valid setting and obviously just a > bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052380#comment-13052380 ] Dawid Weiss commented on LUCENE-2341: - I'll take a look at the differences between Morfologik and Morfeusz right now, actually. I'll post the results once I have something. > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum
[ https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052377#comment-13052377 ] Chris Male commented on LUCENE-3219: You'll have to guide me on the backwards compat issue since this is a break due to the fields being public and some methods changing from returning int to returning SortField.Type. > Change SortField types to an Enum > - > > Key: LUCENE-3219 > URL: https://issues.apache.org/jira/browse/LUCENE-3219 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Reporter: Chris Male >Assignee: Chris Male >Priority: Minor > Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, > LUCENE-3219.patch > > > When updating my SOLR-2533 patch, one issue was that the int value I had > given my new type had been used by another change in the mean time. Since we > don't use these fields in a bitset kind of way, we can convert them to an > enum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052376#comment-13052376 ] Dawid Weiss commented on LUCENE-2341: - Thanks for the contribution, Michał. Robert: the dictionary is licensed under MPL or CC-SA (to be selected by the user depending on one's needs). Do you know which one is preferable over another? Michał: there is also another (much larger) dictionary that has been released recently and comes from the Morfeusz project. http://sgjp.pl/morfeusz/dopobrania.html This dictionary is actually licensed under BSD license, so no legal worries at all. Both dictionaries are nearly identical (they differ slightly in their convention of morphosyntactic annotations) and Morfeusz's dictionary could be compiled into an automaton for use with Morfologik. Which way should we go? What do you think? > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum
[ https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052368#comment-13052368 ] Simon Willnauer commented on LUCENE-3219: - looks good to me. BTW. should we backport those changes? > Change SortField types to an Enum > - > > Key: LUCENE-3219 > URL: https://issues.apache.org/jira/browse/LUCENE-3219 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Reporter: Chris Male >Assignee: Chris Male >Priority: Minor > Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, > LUCENE-3219.patch > > > When updating my SOLR-2533 patch, one issue was that the int value I had > given my new type had been used by another change in the mean time. Since we > don't use these fields in a bitset kind of way, we can convert them to an > enum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc
[ https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052362#comment-13052362 ] Simon Willnauer commented on LUCENE-3223: - bq. Simple patch fixing the problem. Do I need a CHANGES entry for trivial things like this? looks good, I don't think we need a changes entry for this. go ahead and commit! > SearchWithSortTask ignores sorting by Doc > - > > Key: LUCENE-3223 > URL: https://issues.apache.org/jira/browse/LUCENE-3223 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Chris Male >Assignee: Chris Male >Priority: Minor > Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch > > > During my work in LUCENE-3912, I found the following code: > {code} > if (field.equals("doc")) { > sortField0 = SortField.FIELD_DOC; > } if (field.equals("score")) { > sortField0 = SortField.FIELD_SCORE; > } ... > {code} > This means the setting of SortField.FIELD_DOC is ignored. While I don't know > much about this code, this seems like a valid setting and obviously just a > bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
+1 wait for grouping post facet counts... Go Martijn v Groningen !! On 6/20/11 12:03 PM, "Michael McCandless" wrote: >+1 to releasing 3.3 in a few weeks... there's a lot of new stuff after >3.2. > >Mike McCandless > >http://blog.mikemccandless.com > >On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir wrote: >> i was planning on doing an RC in a few weeks actually. >> >> we have a lot of good stuff in there today already, however i wanted >> to give a few weeks for the grouping stuff to run on hudson. >> >> On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer >> wrote: >>> I would say within the next 3 month. >>> >>> Thoughts? >>> >>> On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček >>>wrote: Hi, How soon can we expect official Lucene 3.3 release? Best regards, Lukas >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > >- >To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1967) New Native PHP Response Writer Class
[ https://issues.apache.org/jira/browse/SOLR-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052342#comment-13052342 ] Israel Ekpo commented on SOLR-1967: --- To use the 'json' response writer in lieu of phpnative, see documentation for SolrClient::__construct() http://www.php.net/manual/en/solrclient.construct.php > New Native PHP Response Writer Class > > > Key: SOLR-1967 > URL: https://issues.apache.org/jira/browse/SOLR-1967 > Project: Solr > Issue Type: New Feature > Components: clients - php, Response Writers >Affects Versions: 1.4 >Reporter: Israel Ekpo > Labels: php, response, solrclient, writer > Fix For: 3.3 > > Attachments: phpnative.tar.gz, phpnativeresponsewriter.jar > > Original Estimate: 0h > Remaining Estimate: 0h > > Hi Solr users, > If you are using Apache Solr via PHP, I have some good news for you. > There is a new response writer for the PHP native extension, currently > available as a plugin. > This new feature adds a new response writer class to the > org.apache.solr.request package. > This class is used by the PHP Native Solr Client driver to prepare the query > response from Solr. > This response writer allows you to configure the way the data is serialized > for the PHP client. > You can use your own class name and you can also control how the properties > are serialized as well. > The formatting of the response data is very similar to the way it is > currently done by the PECL extension on the client side. > The only difference now is that this serialization is happening on the server > side instead. > You will find this new response writer particularly useful when dealing with > responses for > - highlighting > - admin threads responses > - more like this responses > to mention just a few > You can pass the "objectClassName" request parameter to specify the class > name to be used for serializing objects. > Please note that the class must be available on the client side to avoid a > PHP_Incomplete_Object error during the unserialization process. > You can also pass in the "objectPropertiesStorageMode" request parameter with > either a 0 (independent properties) or a 1 (combined properties). > These parameters can also be passed as a named list when loading the response > writer in the solrconfig.xml file > Having this control allows you to create custom objects which gives the > flexibility of implementing custom __get methods, ArrayAccess, Traversable > and Iterator interfaces on the PHP client side. > Until this class in incorporated into Solr, you simply have to copy the jar > file containing this plugin into your lib directory under $SOLR_HOME > The jar file is available here and so is the source code. > Then set up the configuration as shown below and then restart your servelet > container > Below is an example configuration in solrconfig.xml > > class="org.apache.solr.request.PHPNativeResponseWriter"> > > SolrObject > > 0 > > Below is an example implementation on the PHP client side. > Support for specifying custom response writers will be available starting > from the 0.9.11 version of the PECL extension for Solr currently available > here > http://pecl.php.net/package/solr > Here is an example of how to use the new response writer with the PHP client. > > class SolrClass > { > public $_properties = array(); > public function __get($property_name) { > if (property_exists($this, $property_name)) { return $this->$property_name; } > else if (isset($_properties[$property_name])) { return > $_properties[$property_name]; } > return null; > } > } > $options = array > ( > 'hostname' => 'localhost', > 'port' => 8983, > 'path' => '/solr/' > ); > $client = new SolrClient($options); > $client->setResponseWriter("phpnative"); > $response = $client->ping(); > $query = new SolrQuery(); > $query->setQuery(":"); > $query->set("objectClassName", "SolrClass"); > $query->set("objectPropertiesStorageMode", 1); > $response = $client->query($query); > $resp = $response->getResponse(); > ?> > > Documentation of the changes to the PECL extension are available here > http://docs.php.net/manual/en/solrclient.construct.php > http://docs.php.net/manual/en/solrclient.setresponsewriter.php > Please contact me at ie...@php.net, if you have any questions or comments. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-1967) New Native PHP Response Writer Class
[ https://issues.apache.org/jira/browse/SOLR-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Israel Ekpo closed SOLR-1967. - Resolution: Won't Fix The latest version of the PECL extension now supports JSON response writer which should be easier to use without additional configuration. > New Native PHP Response Writer Class > > > Key: SOLR-1967 > URL: https://issues.apache.org/jira/browse/SOLR-1967 > Project: Solr > Issue Type: New Feature > Components: clients - php, Response Writers >Affects Versions: 1.4 >Reporter: Israel Ekpo > Labels: php, response, solrclient, writer > Fix For: 3.3 > > Attachments: phpnative.tar.gz, phpnativeresponsewriter.jar > > Original Estimate: 0h > Remaining Estimate: 0h > > Hi Solr users, > If you are using Apache Solr via PHP, I have some good news for you. > There is a new response writer for the PHP native extension, currently > available as a plugin. > This new feature adds a new response writer class to the > org.apache.solr.request package. > This class is used by the PHP Native Solr Client driver to prepare the query > response from Solr. > This response writer allows you to configure the way the data is serialized > for the PHP client. > You can use your own class name and you can also control how the properties > are serialized as well. > The formatting of the response data is very similar to the way it is > currently done by the PECL extension on the client side. > The only difference now is that this serialization is happening on the server > side instead. > You will find this new response writer particularly useful when dealing with > responses for > - highlighting > - admin threads responses > - more like this responses > to mention just a few > You can pass the "objectClassName" request parameter to specify the class > name to be used for serializing objects. > Please note that the class must be available on the client side to avoid a > PHP_Incomplete_Object error during the unserialization process. > You can also pass in the "objectPropertiesStorageMode" request parameter with > either a 0 (independent properties) or a 1 (combined properties). > These parameters can also be passed as a named list when loading the response > writer in the solrconfig.xml file > Having this control allows you to create custom objects which gives the > flexibility of implementing custom __get methods, ArrayAccess, Traversable > and Iterator interfaces on the PHP client side. > Until this class in incorporated into Solr, you simply have to copy the jar > file containing this plugin into your lib directory under $SOLR_HOME > The jar file is available here and so is the source code. > Then set up the configuration as shown below and then restart your servelet > container > Below is an example configuration in solrconfig.xml > > class="org.apache.solr.request.PHPNativeResponseWriter"> > > SolrObject > > 0 > > Below is an example implementation on the PHP client side. > Support for specifying custom response writers will be available starting > from the 0.9.11 version of the PECL extension for Solr currently available > here > http://pecl.php.net/package/solr > Here is an example of how to use the new response writer with the PHP client. > > class SolrClass > { > public $_properties = array(); > public function __get($property_name) { > if (property_exists($this, $property_name)) { return $this->$property_name; } > else if (isset($_properties[$property_name])) { return > $_properties[$property_name]; } > return null; > } > } > $options = array > ( > 'hostname' => 'localhost', > 'port' => 8983, > 'path' => '/solr/' > ); > $client = new SolrClient($options); > $client->setResponseWriter("phpnative"); > $response = $client->ping(); > $query = new SolrQuery(); > $query->setQuery(":"); > $query->set("objectClassName", "SolrClass"); > $query->set("objectPropertiesStorageMode", 1); > $response = $client->query($query); > $resp = $response->getResponse(); > ?> > > Documentation of the changes to the PECL extension are available here > http://docs.php.net/manual/en/solrclient.construct.php > http://docs.php.net/manual/en/solrclient.setresponsewriter.php > Please contact me at ie...@php.net, if you have any questions or comments. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc
[ https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male reassigned LUCENE-3223: -- Assignee: Chris Male > SearchWithSortTask ignores sorting by Doc > - > > Key: LUCENE-3223 > URL: https://issues.apache.org/jira/browse/LUCENE-3223 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Chris Male >Assignee: Chris Male >Priority: Minor > Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch > > > During my work in LUCENE-3912, I found the following code: > {code} > if (field.equals("doc")) { > sortField0 = SortField.FIELD_DOC; > } if (field.equals("score")) { > sortField0 = SortField.FIELD_SCORE; > } ... > {code} > This means the setting of SortField.FIELD_DOC is ignored. While I don't know > much about this code, this seems like a valid setting and obviously just a > bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3219) Change SortField types to an Enum
[ https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male reassigned LUCENE-3219: -- Assignee: Chris Male > Change SortField types to an Enum > - > > Key: LUCENE-3219 > URL: https://issues.apache.org/jira/browse/LUCENE-3219 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Reporter: Chris Male >Assignee: Chris Male >Priority: Minor > Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, > LUCENE-3219.patch > > > When updating my SOLR-2533 patch, one issue was that the int value I had > given my new type had been used by another change in the mean time. Since we > don't use these fields in a bitset kind of way, we can convert them to an > enum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc
[ https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3223: --- Attachment: LUCENE-3223.patch Simple patch fixing the problem. Do I need a CHANGES entry for trivial things like this? > SearchWithSortTask ignores sorting by Doc > - > > Key: LUCENE-3223 > URL: https://issues.apache.org/jira/browse/LUCENE-3223 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch > > > During my work in LUCENE-3912, I found the following code: > {code} > if (field.equals("doc")) { > sortField0 = SortField.FIELD_DOC; > } if (field.equals("score")) { > sortField0 = SortField.FIELD_SCORE; > } ... > {code} > This means the setting of SortField.FIELD_DOC is ignored. While I don't know > much about this code, this seems like a valid setting and obviously just a > bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc
[ https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3223: --- Attachment: LUCENE-3223-test.patch Test demonstrating error. > SearchWithSortTask ignores sorting by Doc > - > > Key: LUCENE-3223 > URL: https://issues.apache.org/jira/browse/LUCENE-3223 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3223-test.patch > > > During my work in LUCENE-3912, I found the following code: > {code} > if (field.equals("doc")) { > sortField0 = SortField.FIELD_DOC; > } if (field.equals("score")) { > sortField0 = SortField.FIELD_SCORE; > } ... > {code} > This means the setting of SortField.FIELD_DOC is ignored. While I don't know > much about this code, this seems like a valid setting and obviously just a > bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc
SearchWithSortTask ignores sorting by Doc - Key: LUCENE-3223 URL: https://issues.apache.org/jira/browse/LUCENE-3223 Project: Lucene - Java Issue Type: Bug Components: modules/benchmark Reporter: Chris Male Priority: Minor During my work in LUCENE-3912, I found the following code: {code} if (field.equals("doc")) { sortField0 = SortField.FIELD_DOC; } if (field.equals("score")) { sortField0 = SortField.FIELD_SCORE; } ... {code} This means the setting of SortField.FIELD_DOC is ignored. While I don't know much about this code, this seems like a valid setting and obviously just a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3219) Change SortField types to an Enum
[ https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3219: --- Attachment: LUCENE-3219.patch Updated patch to incorporate Simon's suggestions: - SearchWithSortTask now uses SortField.Type.valueOf(). This changes the exception thrown to an IllegalArgumentException. - I haven't added Type to FieldCache.Parser since the constructor in SortField that accepts Parsers is deprecated and you can pull the Type from the CachedArrayCreator which is the preferred way of creating a SortField. I did exploit this to reduce the code in the instanceof comparisons. > Change SortField types to an Enum > - > > Key: LUCENE-3219 > URL: https://issues.apache.org/jira/browse/LUCENE-3219 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, > LUCENE-3219.patch > > > When updating my SOLR-2533 patch, one issue was that the int value I had > given my new type had been used by another change in the mean time. Since we > don't use these fields in a bitset kind of way, we can convert them to an > enum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinicius Barros updated LUCENE-1768: Attachment: week4.patch This patch includes the builder for numeric range queries. This week I intend to start writing junits. > NumericRange support for new query parser > - > > Key: LUCENE-1768 > URL: https://issues.apache.org/jira/browse/LUCENE-1768 > Project: Lucene - Java > Issue Type: New Feature > Components: core/queryparser >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Adriano Crestani > Labels: contrib, gsoc, gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: week1.patch, week2.patch, week3.patch, week4.patch > > > It would be good to specify some type of "schema" for the query parser in > future, to automatically create NumericRangeQuery for different numeric > types? It would then be possible to index a numeric value > (double,float,long,int) using NumericField and then the query parser knows, > which type of field this is and so it correctly creates a NumericRangeQuery > for strings like "[1.567..*]" or "(1.787..19.5]". > There is currently no way to extract if a field is numeric from the index, so > the user will have to configure the FieldConfig objects in the ConfigHandler. > But if this is done, it will not be that difficult to implement the rest. > The only difference between the current handling of RangeQuery is then the > instantiation of the correct Query type and conversion of the entered numeric > values (simple Number.valueOf(...) cast of the user entered numbers). > Evenerything else is identical, NumericRangeQuery also supports the MTQ > rewrite modes (as it is a MTQ). > Another thing is a change in Date semantics. There are some strange flags in > the current parser that tells it how to handle dates. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052273#comment-13052273 ] Ryan McKinley commented on SOLR-2399: - right now this only works with 4.0 Once all the kinks are worked out -- and the things it depends on are ported to 3.x, this will likely also get ported to 3.x > Solr Admin Interface, reworked > -- > > Key: SOLR-2399 > URL: https://issues.apache.org/jira/browse/SOLR-2399 > Project: Solr > Issue Type: Improvement > Components: web gui >Reporter: Stefan Matheis (steffkes) >Assignee: Ryan McKinley >Priority: Minor > Fix For: 4.0 > > Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, > SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, > SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, > SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch > > > *The idea was to create a new, fresh (and hopefully clean) Solr Admin > Interface.* [Based on this > [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] > *Features:* > * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] > * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] > * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] > * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, > SOLR-2400) > * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] > * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) > * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] > * [Replication|http://files.mathe.is/solr-admin/10_replication.png] > * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] > * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) > ** Stub (using static data) > Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI > I've quickly created a Github-Repository (Just for me, to keep track of the > changes) > » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052268#comment-13052268 ] Young Kim commented on SOLR-2399: - Out of curiosity, does is this compatible with 3.2.0 or just 4.0? I've followed the instructions (with 3.2.0 of course), and I keep on hitting a build failed. > Solr Admin Interface, reworked > -- > > Key: SOLR-2399 > URL: https://issues.apache.org/jira/browse/SOLR-2399 > Project: Solr > Issue Type: Improvement > Components: web gui >Reporter: Stefan Matheis (steffkes) >Assignee: Ryan McKinley >Priority: Minor > Fix For: 4.0 > > Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, > SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, > SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, > SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch > > > *The idea was to create a new, fresh (and hopefully clean) Solr Admin > Interface.* [Based on this > [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] > *Features:* > * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] > * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] > * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] > * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, > SOLR-2400) > * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] > * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) > * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] > * [Replication|http://files.mathe.is/solr-admin/10_replication.png] > * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] > * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) > ** Stub (using static data) > Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI > I've quickly created a Github-Repository (Just for me, to keep track of the > changes) > » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052252#comment-13052252 ] Robert Muir commented on SOLR-2452: --- bq. So I hope to have this issue resolved this week. Really? thats awesome! Worst case, some of those top-level targets could be literally 'put back' probably with minimal modifications. My idea of temporary nuking was to try to start over, extending lucene's build system, as otherwise i got lost in all the xml. > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir >Assignee: Steven Rowe > Fix For: 3.3, 4.0 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052251#comment-13052251 ] Robert Muir commented on LUCENE-2341: - Sorry, about my second comment i was confusing this with the stuff you have for the morfologik jar itself, which is correct :) What i should have said was, I think we should include this information in the top-level modules/analysis/LICENSE.txt and modules/analysis/NOTICE.txt > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052250#comment-13052250 ] Steven Rowe commented on SOLR-2452: --- bq. However, I think I would recommend thinking about when you want to make the change: it will make merging code up to this branch nearly impossible... is it holding back other changes or is this a final step? It's not a final step. All of the targets you removed need to be put back (I counted 40 or so). But I think this will be a minor amount of work comparitively. I think for the moment I'll keep iterating on the patch, rather than committing it to the branch, to minimize merge costs, until I have all of the Solr targets re-implemented. I don't think it'll take too long, maybe another day or two. Once that's done, I'll commit the moves/copies from the shell script and the patch, then generate a full patch for review. Assuming there are no objections then, I plan to commit within a day or so to minimize merge costs. So I hope to have this issue resolved this week. > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir > Fix For: 3.3, 4.0 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated SOLR-2452: -- Fix Version/s: 4.0 > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir >Assignee: Steven Rowe > Fix For: 3.3, 4.0 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe reassigned SOLR-2452: - Assignee: Steven Rowe > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir >Assignee: Steven Rowe > Fix For: 3.3, 4.0 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052246#comment-13052246 ] Robert Muir commented on LUCENE-2341: - Hi Michał, This patch looks great! I took a quick glance, here are a couple suggestions: * In the MorfologikFilter, I think we should implement reset(), first calling the superclass reset(), then clearing the stemsAcc list. This ensures that all of the filter's state is cleared before it is reused. Under normal operations, this should not be necessary, but some consumers in Lucene (e.g. LimitTokenCountFilter, and some similar code in the Highlighter), will only partially consume up to some point, then suddenly stop. By clearing this list in reset() we ensure that there is no chance any leftover stems will appear in the next stream. * because the data is licensed under MPL, I think we should explicitly list a hyperlink if possible to the source code used in the NOTICE.txt. I saw you included some wordage in LICENSE.txt but I think this should only say 'XYZ data is under this license, with the actual MPL license text. In the NOTICE.txt we should link to the source code I think... there is some more information on this under the section Category B: Reciprocal Licenses at http://www.apache.org/legal/3party.html > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michał Dybizbański updated LUCENE-2341: --- Attachment: morfologik-stemming-1.5.0.jar LUCENE-2341.diff Hi This patch introduces stemming filter and analyzer, that use [Morfologik library|http://morfologik.blogspot.com], developed by Dawid Weiss and Marcin Miłkowski. Tokens are stemmed by Morfologik with a dictionary, and current distribution provides a dictionary for polish language. The MorfologikFilter yields one or more terms for each token. Each of those terms is given the same position in the index. I'm attaching a binary distribution of the library (morfologik-stemming-1.5.0.jar), that needs to be placed in modules/analysis/morfologik/lib/ subdirectory. It is also available as a [Maven artifact|http://mvnrepository.com/artifact/org.carrot2/morfologik-stemming/1.5.0]. The library is BSD-licensed and a dictionary uses data from [Polish dictionary for aspell/ispell/myspell (SJP.PL)|http://www.sjp.pl/slownik/en/], which is licensed under GPL, LGPL, MPL and CC SA licenses. This is my first contribution to the Lucene project, so please be forgiving :) Thanks to Dawid for help. Regards, Michał > explore morfologik integration > -- > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Robert Muir >Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2548) Remove all interning of field names from flex API
[ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052237#comment-13052237 ] Michael McCandless commented on LUCENE-2548: Woops -- my comment was just saying that both == and ! = cases weren't always caught by PMD/findbugs. But maybe I somehow messed up running them! > Remove all interning of field names from flex API > - > > Key: LUCENE-2548 > URL: https://issues.apache.org/jira/browse/LUCENE-2548 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2548.patch, LUCENE-2548.patch > > > In previous versions of Lucene, interning of fields was important to minimize > string comparison cost when iterating TermEnums, to detect changes in field > name. As we separated field names from terms in flex, no query compares field > names anymore, so the whole performance problematic interning can be removed. > I will start with doing this, but we need to carefully review some places > e.g. in preflex codec. > Maybe before this issue we should remove the Term class completely. :-) > Robert? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2548) Remove all interning of field names from flex API
[ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052236#comment-13052236 ] Uwe Schindler commented on LUCENE-2548: --- bq. Can you explain shortly what "Unable to render embedded object: File" has to do with interning? That was just a JIRA formatting issue in Mike's comment I was referring to. > Remove all interning of field names from flex API > - > > Key: LUCENE-2548 > URL: https://issues.apache.org/jira/browse/LUCENE-2548 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2548.patch, LUCENE-2548.patch > > > In previous versions of Lucene, interning of fields was important to minimize > string comparison cost when iterating TermEnums, to detect changes in field > name. As we separated field names from terms in flex, no query compares field > names anymore, so the whole performance problematic interning can be removed. > I will start with doing this, but we need to carefully review some places > e.g. in preflex codec. > Maybe before this issue we should remove the Term class completely. :-) > Robert? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052223#comment-13052223 ] Mark Harwood commented on LUCENE-2454: -- bq. prevSetBit is called for each child doc You could call nextSetBit on the first child to know the "safe" range of child docs attributable to the same parent but you would be taking a gamble that this was worth the call i.e. there were many possible children per parent to be tested. bq. It uses 2 passes if you also want to collect child docs per parent I tend to work with distributed indexes so it involves a 2 pass op anyway - one to understand best parents across the multiple shards first then the perparentlimitedquery to ensure we only pay the retrieve costs for those parents that make the final cut. bq. I think it should use a PQ to find the lowest child to evict per parent doc? Careful object reuse would need to be factored in to avoid excessive GC - each parent would fill a PQ full of child-match object instances that could/should be reused in assessing the next parent > Nested Document query support > - > > Key: LUCENE-2454 > URL: https://issues.apache.org/jira/browse/LUCENE-2454 > Project: Lucene - Java > Issue Type: New Feature > Components: core/search >Affects Versions: 3.0.2 >Reporter: Mark Harwood >Assignee: Mark Harwood >Priority: Minor > Attachments: LUCENE-2454.patch, LUCENE-2454.patch, > LuceneNestedDocumentSupport.zip > > > A facility for querying nested documents in a Lucene index as outlined in > http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1305#comment-1305 ] Michael McCandless commented on LUCENE-3218: Patch looks cool! So the CFW will take the first output opened against it and let it write directly into the "actual" CFS file, and then if another file is opened while that first one is still open, the 2nd file will write to separate file and then will copy in on close. We may want to delegate the separate files too? So that on close they copy themselves into the CFS and remove the original? This way IW won't have to separately create CFS in the end. Somehow we need IW to add the biggest sub-file first... s/compund/compound CFW.close should assert currentOutput != null (and, if we delegate sep entries, that they are also all closed)? You might need to sync the CompoundFileWriter.this.currentOutput test / setting to null? Though... Lucene is always single threaded in writing files for the same segment, today anyway. Can we make a separate createCompoundOutput? (Ie, instaed of passing OpenMode to openCompoundInput). And: I'm assuming a given compound output can only be opened once, appended to / separate files copied into, closed and then never opened again for writing? (Ie, still "write once" at the file level). > Make CFS appendable > - > > Key: LUCENE-3218 > URL: https://issues.apache.org/jira/browse/LUCENE-3218 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Fix For: 4.0 > > Attachments: LUCENE-3218.patch > > > Currently CFS is created once all files are written during a flush / merge. > Once on disk the files are copied into the CFS format which is basically a > unnecessary for some of the files. We can at any time write at least one file > directly into the CFS which can save a reasonable amount of IO. For instance > stored fields could be written directly during indexing and during a Codec > Flush one of the written files can be appended directly. This optimization is > a nice sideeffect for lucene indexing itself but more important for DocValues > and LUCENE-3216 we could transparently pack per field files into a single > file only for docvalues without changing any code once LUCENE-3216 is > resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8953 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8953/ 8 tests failed. FAILED: org.apache.lucene.util.automaton.TestMinimize.testElements Error Message: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. Stack Trace: junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. at java.lang.Thread.run(Thread.java:636) REGRESSION: org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety Error Message: Error occurred in thread Thread-107: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/7/test5658056595tmp/_g_5.pyl (Too many open files in system) Stack Trace: junit.framework.AssertionFailedError: Error occurred in thread Thread-107: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/7/test5658056595tmp/_g_5.pyl (Too many open files in system) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333) /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/7/test5658056595tmp/_g_5.pyl (Too many open files in system) at org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822) REGRESSION: org.apache.lucene.index.TestStressIndexing2.testRandomIWReader Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/test4023440248tmp/_c_1.tib (Too many open files in system) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/test4023440248tmp/_c_1.tib (Too many open files in system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:233) at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:416) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:293) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:375) at org.apache.lucene.index.codecs.BlockTermsWriter.(BlockTermsWriter.java:75) at org.apache.lucene.index.codecs.mockrandom.MockRandomCodec.fieldsConsumer(MockRandomCodec.java:226) at org.apache.lucene.index.PerFieldCodecWrapper$FieldsWriter.(PerFieldCodecWrapper.java:73) at org.apache.lucene.index.PerFieldCodecWrapper.fieldsConsumer(PerFieldCodecWrapper.java:61) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:565) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:116) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3466) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3110) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1877) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1872) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1868) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:287) at org.apache.lucene.index.TestStressIndexing2.testRandomIWReader(TestStressIndexing2.java:67) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333) REGRESSION: org.apache.lucene.search.TestBooleanOr.testElements Error Message: org/apache/lucene/search/MatchAllDocsQuery$MatchAllScorer Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/search/MatchAllDocsQuery$MatchAllScorer at org.apache.lucene.search.MatchAllDocsQuery.createWeight(MatchAllDocsQuery.java:153) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:676) at org.apache.lucene.search.QueryWrapperFilter.getDocIdSet(QueryWrapperFilter.java:55) at org.apache.lucene.index.BufferedDeletesStream.applyQueryDeletes(BufferedDeletesStream.java:441) at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:281) at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2836) at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2827) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2803) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2772) at org.apache.lu
[jira] [Commented] (LUCENE-2548) Remove all interning of field names from flex API
[ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052215#comment-13052215 ] Uwe Schindler commented on LUCENE-2548: --- Hi Mike, patch looks great, thanks for doing this hard work :-) PreFlexCodec looks fine, see no problems there. Lucene code iterating TermsEnums was successfully cleaned up (the lovely MTQs) from T.createTerm and equals added at some places. I cannot check if there are comparisons missing, I wonder why PMD/Findbugs has bugs that it does not find all occurences, maybe because some SuppressWarnings also hiding those occurences? Can you explain shortly what "Unable to render embedded object: File" has to do with interning? Solr code is fine, I expected more to change. Some places in Solr still seems to use some "placeholder" terms (called idTerm and other names). We should maybe check if they are only field names in reality? GREAT WORK! I AM SO HAPPY, dumdidumm...! > Remove all interning of field names from flex API > - > > Key: LUCENE-2548 > URL: https://issues.apache.org/jira/browse/LUCENE-2548 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2548.patch, LUCENE-2548.patch > > > In previous versions of Lucene, interning of fields was important to minimize > string comparison cost when iterating TermEnums, to detect changes in field > name. As we separated field names from terms in flex, no query compares field > names anymore, so the whole performance problematic interning can be removed. > I will start with doing this, but we need to carefully review some places > e.g. in preflex codec. > Maybe before this issue we should remove the Term class completely. :-) > Robert? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052194#comment-13052194 ] Michael McCandless commented on LUCENE-2454: bq. Would modules/grouping meanwhile be a better place for this than lucene/contrib/queries? I think modules/join is the right place? When we factor out Solr's generic join impl it can go there too... I have some concerns about the current approach here (this is why I opened LUCENE-3171): * prevSetBit is called for each child doc, which is an O(N^2) cost (N = number of child docs for one parent) I think? Admittedly, "typically" N is probably small... * It uses 2 passes if you also want to collect child docs per parent * PerParentLimitedQuery is also O(N^2) cost, both on insert of a new child and on popping the child docs per group: I think it should use a PQ to find the lowest child to evict per parent doc? * I think "typically" an app will want to collect the top N groups (parent docs and their children), so it's more efficient to gather those top N and only in the end sort the each set of children per-parent? (This is similar to how 2nd pass grouping collector works). * PerParentLimitedQuery only supports relevance sort w/in each parent. * You don't get the parent/child structure back, from PerParentLimitedQuery (but now we have TopGroups which is a great match for representing each parent and its children). If you always only use PerParentLimitedQuery on the top parents from the first pass, eg you AND/filter it against those parent docs, then the O(N^2) cost is less severe since it'll have a small constant in front, but since it's a Query I imagine users will use it w/o that filter, which is bad... I think using a TopN Collector is a better match here. > Nested Document query support > - > > Key: LUCENE-2454 > URL: https://issues.apache.org/jira/browse/LUCENE-2454 > Project: Lucene - Java > Issue Type: New Feature > Components: core/search >Affects Versions: 3.0.2 >Reporter: Mark Harwood >Assignee: Mark Harwood >Priority: Minor > Attachments: LUCENE-2454.patch, LUCENE-2454.patch, > LuceneNestedDocumentSupport.zip > > > A facility for querying nested documents in a Lucene index as outlined in > http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2548) Remove all interning of field names from flex API
[ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052184#comment-13052184 ] Uwe Schindler commented on LUCENE-2548: --- Yupee Juhee. I was on business trip whole day. Insane! Will review soon! > Remove all interning of field names from flex API > - > > Key: LUCENE-2548 > URL: https://issues.apache.org/jira/browse/LUCENE-2548 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2548.patch, LUCENE-2548.patch > > > In previous versions of Lucene, interning of fields was important to minimize > string comparison cost when iterating TermEnums, to detect changes in field > name. As we separated field names from terms in flex, no query compares field > names anymore, so the whole performance problematic interning can be removed. > I will start with doing this, but we need to carefully review some places > e.g. in preflex codec. > Maybe before this issue we should remove the Term class completely. :-) > Robert? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2548) Remove all interning of field names from flex API
[ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2548: --- Attachment: LUCENE-2548.patch I agree -- I removed createTerm! And fixed the nocommits Beast chewed on this for a while and didn't hit any failures except various Solr tests that still intermittently fail... I think it's ready! > Remove all interning of field names from flex API > - > > Key: LUCENE-2548 > URL: https://issues.apache.org/jira/browse/LUCENE-2548 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2548.patch, LUCENE-2548.patch > > > In previous versions of Lucene, interning of fields was important to minimize > string comparison cost when iterating TermEnums, to detect changes in field > name. As we separated field names from terms in flex, no query compares field > names anymore, so the whole performance problematic interning can be removed. > I will start with doing this, but we need to carefully review some places > e.g. in preflex codec. > Maybe before this issue we should remove the Term class completely. :-) > Robert? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
REMINDER: Participation Requested: Survey about Open-Source Software Development
Hi, Apologies for any inconvenience and thank you to those who have already completed the survey. We will keep the survey open for another couple of weeks. But, we do hope you will consider responding to the email request below (sent 2 weeks ago). Thanks, Dr. Jeffrey Carver Assistant Professor University of Alabama (v) 205-348-9829 (f) 205-348-0219 http://www.cs.ua.edu/~carver -Original Message- From: Jeffrey Carver [mailto:opensourcesur...@cs.ua.edu] Sent: Monday, June 13, 2011 11:27 AM To: 'dev@lucene.apache.org' Subject: Participation Requested: Survey about Open-Source Software Development Hi, Drs. Jeffrey Carver, Rosanna Guadagno, Debra McCallum, and Mr. Amiangshu Bosu, University of Alabama, and Dr. Lorin Hochstein, University of Southern California, are conducting a survey of open-source software developers. This survey seeks to understand how developers on distributed, virtual teams, like open-source projects, interact with each other to accomplish their tasks. You must be at least 19 years of age to complete the survey. The survey should take approximately 15 minutes to complete. If you are actively participating as a developer, please consider completing our survey. Here is the link to the survey: http://goo.gl/HQnux We apologize for inconvenience and if you receive multiple copies of this email. This survey has been approved by The University of Alabama IRB board. Thanks, Dr. Jeffrey Carver Assistant Professor University of Alabama (v) 205-348-9829 (f) 205-348-0219 http://www.cs.ua.edu/~carver - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3222) Buffered deletes under count RAM
Buffered deletes under count RAM Key: LUCENE-3222 URL: https://issues.apache.org/jira/browse/LUCENE-3222 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.3, 4.0 I found this while working on LUCENE-2548: when we freeze the deletes (create FrozenBufferedDeletes), when we set the bytesUsed we are failing to account for RAM required for the term bytes (and now term field). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052159#comment-13052159 ] Robert Muir commented on LUCENE-3201: - I didnt commit because I didn't measure any performance improvements from the patch (this frustrated me). Also, I didn't address Uwe's last comment... In general, I was thinking that this would be a good performance win, but it isn't. So we should consider it from a refactoring perspective only. > improved compound file handling > --- > > Key: LUCENE-3201 > URL: https://issues.apache.org/jira/browse/LUCENE-3201 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3201.patch, LUCENE-3201.patch > > > Currently CompoundFileReader could use some improvements, i see the following > problems > * its CSIndexInput extends bufferedindexinput, which is stupid for > directories like mmap. > * it seeks on every readInternal > * its not possible for a directory to override or improve the handling of > compound files. > for example: it seems if you were impl'ing this thing from scratch, you would > just wrap the II directly (not extend BufferedIndexInput, > and add compound file offset X to seek() calls, and override length(). But of > course, then you couldnt throw read past EOF always when you should, > as a user could read into the next file and be left unaware. > however, some directories could handle this better. for example MMapDirectory > could return an indexinput that simply mmaps the 'slice' of the CFS file. > its underlying bytebuffer etc naturally does bounds checks already etc, so it > wouldnt need to be buffered, not even needing to add any offsets to seek(), > as its position would just work. > So I think we should try to refactor this so that a Directory can customize > how compound files are handled, the simplest > case for the least code change would be to add this to Directory.java: > {code} > public Directory openCompoundInput(String filename) { > return new CompoundFileReader(this, filename); > } > {code} > Because most code depends upon the fact compound files are implemented as a > Directory and transparent. at least then a subclass could override... > but the 'recursion' is a little ugly... we could still label it > expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052156#comment-13052156 ] Simon Willnauer commented on LUCENE-3201: - this seems ready to commit... I think we should get that in so I can take it further on LUCENE-3218 Robert is it ok for you if I commit this or are you gonig to do it? simon > improved compound file handling > --- > > Key: LUCENE-3201 > URL: https://issues.apache.org/jira/browse/LUCENE-3201 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3201.patch, LUCENE-3201.patch > > > Currently CompoundFileReader could use some improvements, i see the following > problems > * its CSIndexInput extends bufferedindexinput, which is stupid for > directories like mmap. > * it seeks on every readInternal > * its not possible for a directory to override or improve the handling of > compound files. > for example: it seems if you were impl'ing this thing from scratch, you would > just wrap the II directly (not extend BufferedIndexInput, > and add compound file offset X to seek() calls, and override length(). But of > course, then you couldnt throw read past EOF always when you should, > as a user could read into the next file and be left unaware. > however, some directories could handle this better. for example MMapDirectory > could return an indexinput that simply mmaps the 'slice' of the CFS file. > its underlying bytebuffer etc naturally does bounds checks already etc, so it > wouldnt need to be buffered, not even needing to add any offsets to seek(), > as its position would just work. > So I think we should try to refactor this so that a Directory can customize > how compound files are handled, the simplest > case for the least code change would be to add this to Directory.java: > {code} > public Directory openCompoundInput(String filename) { > return new CompoundFileReader(this, filename); > } > {code} > Because most code depends upon the fact compound files are implemented as a > Directory and transparent. at least then a subclass could override... > but the 'recursion' is a little ugly... we could still label it > expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3218: Attachment: LUCENE-3218.patch first sketch still some nocommits - this patch includes the latest patch from LUCENE-3201 which made the CFS part of directory. This patch adds write support to the CompoundFileDirectory. The CFWriter tries to write files directly to the CFS if possible like when no other file is currently open for writing it opens a stream directly on the CFS. Yet, this change also adds a new file to the CFS (.cfe) which only holds the entry table which makes all seeks unneeded (plays better with AppendingCodec). I currently don't use it during indexing since we decided after flush if we use CFS or not. Yet this might change with this optimization but I will leave this to another issue. > Make CFS appendable > - > > Key: LUCENE-3218 > URL: https://issues.apache.org/jira/browse/LUCENE-3218 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Fix For: 4.0 > > Attachments: LUCENE-3218.patch > > > Currently CFS is created once all files are written during a flush / merge. > Once on disk the files are copied into the CFS format which is basically a > unnecessary for some of the files. We can at any time write at least one file > directly into the CFS which can save a reasonable amount of IO. For instance > stored fields could be written directly during indexing and during a Codec > Flush one of the written files can be appended directly. This optimization is > a nice sideeffect for lucene indexing itself but more important for DocValues > and LUCENE-3216 we could transparently pack per field files into a single > file only for docvalues without changing any code once LUCENE-3216 is > resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2609) Allow arbitrary bbox lat-lon, not limited to circle
[ https://issues.apache.org/jira/browse/SOLR-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052151#comment-13052151 ] Zac Smith commented on SOLR-2609: - Thanks David, I have updated this to be a feature request. > Allow arbitrary bbox lat-lon, not limited to circle > --- > > Key: SOLR-2609 > URL: https://issues.apache.org/jira/browse/SOLR-2609 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 3.1 >Reporter: Zac Smith > Labels: spatialsearch > > The Spatial Search documentation states that you can create your own bounding > box using a range query: > "Since the LatLonType field also supports field queries and range queries, > one can manually create their own bounding box rather than using bbox: > ...&q=*:*&fq=store:[45,-94 TO 46,-93]" > This works unless your range covers an area where longitude goes from 180 to > -180. For instance I want all items in the longitude range of > 178 to -177 which of course gives no results (it is not a valid numeric > range). It's not really surprising that this doesn't work as it is just a > standard range query with no spatial filters being applied. > UPDATE > Updated issue to be an enhancement, title changed. > Desired functionality is for bbox to accept coordinate parameters for an > arbitrary size bounding box. The bbox should take into account the prime > meridians, in particular the 180th meridian. > Documentation also needs to be updated to remove incorrect query example. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2609) Allow arbitrary bbox lat-lon, not limited to circle
[ https://issues.apache.org/jira/browse/SOLR-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zac Smith updated SOLR-2609: Description: The Spatial Search documentation states that you can create your own bounding box using a range query: "Since the LatLonType field also supports field queries and range queries, one can manually create their own bounding box rather than using bbox: ...&q=*:*&fq=store:[45,-94 TO 46,-93]" This works unless your range covers an area where longitude goes from 180 to -180. For instance I want all items in the longitude range of 178 to -177 which of course gives no results (it is not a valid numeric range). It's not really surprising that this doesn't work as it is just a standard range query with no spatial filters being applied. UPDATE Updated issue to be an enhancement, title changed. Desired functionality is for bbox to accept coordinate parameters for an arbitrary size bounding box. The bbox should take into account the prime meridians, in particular the 180th meridian. Documentation also needs to be updated to remove incorrect query example. was: The Spatial Search documentation states that you can create your own bounding box using a range query: "Since the LatLonType field also supports field queries and range queries, one can manually create their own bounding box rather than using bbox: ...&q=*:*&fq=store:[45,-94 TO 46,-93]" This works unless your range covers an area where longitude goes from 180 to -180. For instance I want all items in the longitude range of 178 to -177 which of course gives no results (it is not a valid numeric range). It's not really surprising that this doesn't work as it is just a standard range query with no spatial filters being applied. I am wondering if this is just an issue with the documentation and there is another way that this should be done? Please advise if more details are needed. Issue Type: Improvement (was: Bug) Summary: Allow arbitrary bbox lat-lon, not limited to circle (was: Coordinate range queries do not work with Spatial Solr) > Allow arbitrary bbox lat-lon, not limited to circle > --- > > Key: SOLR-2609 > URL: https://issues.apache.org/jira/browse/SOLR-2609 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 3.1 >Reporter: Zac Smith > Labels: spatialsearch > > The Spatial Search documentation states that you can create your own bounding > box using a range query: > "Since the LatLonType field also supports field queries and range queries, > one can manually create their own bounding box rather than using bbox: > ...&q=*:*&fq=store:[45,-94 TO 46,-93]" > This works unless your range covers an area where longitude goes from 180 to > -180. For instance I want all items in the longitude range of > 178 to -177 which of course gives no results (it is not a valid numeric > range). It's not really surprising that this doesn't work as it is just a > standard range query with no spatial filters being applied. > UPDATE > Updated issue to be an enhancement, title changed. > Desired functionality is for bbox to accept coordinate parameters for an > arbitrary size bounding box. The bbox should take into account the prime > meridians, in particular the 180th meridian. > Documentation also needs to be updated to remove incorrect query example. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum
[ https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052137#comment-13052137 ] Simon Willnauer commented on LUCENE-3219: - chris, patch looks good... some minor comments: * I wonder if a parser could hold a Type so we could get rid of the if (parser instanceof FieldCache.$Parser) ? * in SearchWithSortTask I wonder if you could simply call Type.valueOf(typeString.toUpperCase()); - the less code the better :) overall looks good simon > Change SortField types to an Enum > - > > Key: LUCENE-3219 > URL: https://issues.apache.org/jira/browse/LUCENE-3219 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch > > > When updating my SOLR-2533 patch, one issue was that the int value I had > given my new type had been used by another change in the mean time. Since we > don't use these fields in a bitset kind of way, we can convert them to an > enum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2609) Coordinate range queries do not work with Spatial Solr
[ https://issues.apache.org/jira/browse/SOLR-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052136#comment-13052136 ] David Smiley commented on SOLR-2609: Yes, this should be a feature request for "Allow arbitrary bbox lat-lon, not limited to circle". Under the hood, I recall the first order of business is resolving the point-radius to a bounding box. At that point the special prime-meridian logic is handled. It seems it would not be hard to make a patch that ads new parameters for explicit lat-lon bbox params. > Coordinate range queries do not work with Spatial Solr > -- > > Key: SOLR-2609 > URL: https://issues.apache.org/jira/browse/SOLR-2609 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 3.1 >Reporter: Zac Smith > Labels: spatialsearch > > The Spatial Search documentation states that you can create your own bounding > box using a range query: > "Since the LatLonType field also supports field queries and range queries, > one can manually create their own bounding box rather than using bbox: > ...&q=*:*&fq=store:[45,-94 TO 46,-93]" > This works unless your range covers an area where longitude goes from 180 to > -180. For instance I want all items in the longitude range of > 178 to -177 which of course gives no results (it is not a valid numeric > range). It's not really surprising that this doesn't work as it is just a > standard range query with no spatial filters being applied. > I am wondering if this is just an issue with the documentation and there is > another way that this should be done? Please advise if more details are > needed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2524) Adding grouping to Solr 3x
[ https://issues.apache.org/jira/browse/SOLR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052114#comment-13052114 ] Michael McCandless commented on SOLR-2524: -- bq. Question: Does this support the option of getting facet counts after grouping? I am getting lost in all the issues I don't think it does. For that we need LUCENE-3097, which I think (?) is close. > Adding grouping to Solr 3x > -- > > Key: SOLR-2524 > URL: https://issues.apache.org/jira/browse/SOLR-2524 > Project: Solr > Issue Type: New Feature >Reporter: Martijn van Groningen >Assignee: Martijn van Groningen > Fix For: 3.3 > > Attachments: SOLR-2524.patch, SOLR-2524.patch, SOLR-2524.patch, > SOLR-2524.patch, SOLR-2524.patch, SOLR-2524.patch > > > Grouping was recently added to Lucene 3x. See LUCENE-1421 for more > information. > I think it would be nice if we expose this functionality also to the Solr > users that are bound to a 3.x version. > The grouping feature added to Lucene is currently a subset of the > functionality that Solr 4.0-trunk offers. Mainly it doesn't support grouping > by function / query. > The work involved getting the grouping contrib to work on Solr 3x is > acceptable. I have it more or less running here. It supports the response > format and request parameters (expect: group.query and group.func) described > in the FieldCollapse page on the Solr wiki. > I think it would be great if this is included in the Solr 3.2 release. Many > people are using grouping as patch now and this would help them a lot. Any > thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved SOLR-236. - Resolution: Duplicate Resolving this lon issue as a duplicate of SOLR-2524, which brings grouping (finally!) to Solr 3.x via the new (factored out from Solr's trunk grouping impl then backported to 3.x) grouping module. > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 3.3 > > Attachments: DocSetScoreCollector.java, > NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, > SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, > SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-branch_3x.patch, SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, > SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, > SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, > collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, > collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, solr-236.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
+1 to releasing 3.3 in a few weeks... there's a lot of new stuff after 3.2. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir wrote: > i was planning on doing an RC in a few weeks actually. > > we have a lot of good stuff in there today already, however i wanted > to give a few weeks for the grouping stuff to run on hudson. > > On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer > wrote: >> I would say within the next 3 month. >> >> Thoughts? >> >> On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček wrote: >>> Hi, >>> How soon can we expect official Lucene 3.3 release? >>> Best regards, >>> Lukas >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2609) Coordinate range queries do not work with Spatial Solr
[ https://issues.apache.org/jira/browse/SOLR-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052093#comment-13052093 ] Zac Smith commented on SOLR-2609: - It would be really great if there was support for creating arbitrary bounding boxes that do work over the 180th meridian. Should this be changed from a bug to a feature request to that end? > Coordinate range queries do not work with Spatial Solr > -- > > Key: SOLR-2609 > URL: https://issues.apache.org/jira/browse/SOLR-2609 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 3.1 >Reporter: Zac Smith > Labels: spatialsearch > > The Spatial Search documentation states that you can create your own bounding > box using a range query: > "Since the LatLonType field also supports field queries and range queries, > one can manually create their own bounding box rather than using bbox: > ...&q=*:*&fq=store:[45,-94 TO 46,-93]" > This works unless your range covers an area where longitude goes from 180 to > -180. For instance I want all items in the longitude range of > 178 to -177 which of course gives no results (it is not a valid numeric > range). It's not really surprising that this doesn't work as it is just a > standard range query with no spatial filters being applied. > I am wondering if this is just an issue with the documentation and there is > another way that this should be done? Please advise if more details are > needed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3219) Change SortField types to an Enum
[ https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3219: --- Attachment: LUCENE-3219.patch Even better patch which CHANGES entry correct. > Change SortField types to an Enum > - > > Key: LUCENE-3219 > URL: https://issues.apache.org/jira/browse/LUCENE-3219 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch > > > When updating my SOLR-2533 patch, one issue was that the int value I had > given my new type had been used by another change in the mean time. Since we > don't use these fields in a bitset kind of way, we can convert them to an > enum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3219) Change SortField types to an Enum
[ https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3219: --- Attachment: LUCENE-3219.patch Patch updated to trunk. Compiles and tests pass. I intend to commit in the next day or so. > Change SortField types to an Enum > - > > Key: LUCENE-3219 > URL: https://issues.apache.org/jira/browse/LUCENE-3219 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Reporter: Chris Male >Priority: Minor > Attachments: LUCENE-3219.patch, LUCENE-3219.patch > > > When updating my SOLR-2533 patch, one issue was that the int value I had > given my new type had been used by another change in the mean time. Since we > don't use these fields in a bitset kind of way, we can convert them to an > enum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052052#comment-13052052 ] Robert Muir commented on LUCENE-3220: - one last thing, can we do 'numberOfFieldTokens' instead of noFieldTokens? then I think we can commit this as a step, should make things a lot easier for experimentation, if you are new to lucene it will make life much easier. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Oh, sorry, how lame of me :( Actually I am working now on a different machine than the one I usually do, so that's why I made those mistakes. Anyhow, I have fixed them. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052032#comment-13052032 ] Robert Muir commented on LUCENE-3220: - oh two more nitpicky comments: * can you update the patch to use two-spaces instead of tabs? if you use eclipse, you can download this and configure this as your default codestyle: http://people.apache.org/~rmuir/Eclipse-Lucene-Codestyle.xml * can you also remove the @author? For legal reasons (i think actually for your protection!) we omit these from new files. * it might be a good idea to use the tag @lucene.experimental also for new classes: this is a template that 'ant-javadocs' replaces with "WARNING: This API is experimental and might change in incompatible ways in the next release." to tell users that its very new and not to expect precise backwards compatibility. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052029#comment-13052029 ] Robert Muir commented on LUCENE-3220: - bq. I'll put a nocommit there for the time being, and if no sims use it, I'll just remove it from the Stats. Terrier has it, though, so I guess there should be at least one method that depends on it. I've never seen one that did... I don't imagine us ever implementing this efficiently given that we support incremental indexing. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch, LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052025#comment-13052025 ] David Mark Nemeskey commented on LUCENE-3220: - * I was wondering about that too -- actually docNo is a mistake, it should have been noDocs or noOfDocs anyway, but I guess I'll just go with numberOfDocuments. * I'll put a nocommit there for the time being, and if no sims use it, I'll just remove it from the Stats. Terrier has it, though, so I guess there should be at least one method that depends on it. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2548) Remove all interning of field names from flex API
[ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052024#comment-13052024 ] Robert Muir commented on LUCENE-2548: - is there any reason to keep Term.createTerm() after we do this? seems useless after interning is removed. > Remove all interning of field names from flex API > - > > Key: LUCENE-2548 > URL: https://issues.apache.org/jira/browse/LUCENE-2548 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2548.patch > > > In previous versions of Lucene, interning of fields was important to minimize > string comparison cost when iterating TermEnums, to detect changes in field > name. As we separated field names from terms in flex, no query compares field > names anymore, so the whole performance problematic interning can be removed. > I will start with doing this, but we need to carefully review some places > e.g. in preflex codec. > Maybe before this issue we should remove the Term class completely. :-) > Robert? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052019#comment-13052019 ] Robert Muir commented on LUCENE-3220: - a few comments (it generally looks close to me): * maybe we should use 'numberOfDocuments' instead of 'docNo' and same with 'numberOfFieldTokens'? this might make the naming more clear * i'm worried about 'uniqueTermCount', do you know of which implementations require this? this number is not accurate if the index has more than one segment. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2611) Typos in /example solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-2611. --- Resolution: Fixed Fix Version/s: 3.3 Thank you Eric! > Typos in /example solrconfig.xml > > > Key: SOLR-2611 > URL: https://issues.apache.org/jira/browse/SOLR-2611 > Project: Solr > Issue Type: Improvement > Components: documentation >Affects Versions: 3.2 >Reporter: Eric Pugh >Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: typos.patch > > > I noticed many typos have crept into the example app's Solrconfig.xml. I > will attach a patch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2611) Typos in /example solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-2611: Attachment: typos.patch > Typos in /example solrconfig.xml > > > Key: SOLR-2611 > URL: https://issues.apache.org/jira/browse/SOLR-2611 > Project: Solr > Issue Type: Improvement > Components: documentation >Affects Versions: 3.2 >Reporter: Eric Pugh >Priority: Minor > Fix For: 4.0 > > Attachments: typos.patch > > > I noticed many typos have crept into the example app's Solrconfig.xml. I > will attach a patch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2611) Typos in /example solrconfig.xml
Typos in /example solrconfig.xml Key: SOLR-2611 URL: https://issues.apache.org/jira/browse/SOLR-2611 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 3.2 Reporter: Eric Pugh Priority: Minor Fix For: 4.0 I noticed many typos have crept into the example app's Solrconfig.xml. I will attach a patch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052004#comment-13052004 ] James Dyer commented on SOLR-2382: -- Noble, I appreciate your interest in this issue! I could easily move BerkleyBackedCache to its one issue. This would remove any difficulty in dealing with the Sleepycat License. We would still want to maintain the SortedMapBackedCache, however. Otherwise we would lose all caching ability (it would break CachedSqlEntityProcessor). In any case, if your goal is to break this issue into more managable chunks just offloading BerkleyBackedCache might not be enough. I had considered breaking this up into possibly 3 parts because I realize this is a huge patch. But the functionality is all designed to work together and it would have been more work for me, etc. Let me know what you want me to do. I would love to see this integrated with a GA release someday. I think this would have broad application and a lot of real-world use cases. (& we depend on it here...) > DIH Cache Improvements > -- > > Key: SOLR-2382 > URL: https://issues.apache.org/jira/browse/SOLR-2382 > Project: Solr > Issue Type: New Feature > Components: contrib - DataImportHandler >Reporter: James Dyer >Priority: Minor > Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, > SOLR-2382.patch, SOLR-2382.patch > > > Functionality: > 1. Provide a pluggable caching framework for DIH so that users can choose a > cache implementation that best suits their data and application. > > 2. Provide a means to temporarily cache a child Entity's data without > needing to create a special cached implementation of the Entity Processor > (such as CachedSqlEntityProcessor). > > 3. Provide a means to write the final (root entity) DIH output to a cache > rather than to Solr. Then provide a way for a subsequent DIH call to use the > cache as an Entity input. Also provide the ability to do delta updates on > such persistent caches. > > 4. Provide the ability to partition data across multiple caches that can > then be fed back into DIH and indexed either to varying Solr Shards, or to > the same Core in parallel. > Use Cases: > 1. We needed a flexible & scalable way to temporarily cache child-entity > data prior to joining to parent entities. > - Using SqlEntityProcessor with Child Entities can cause an "n+1 select" > problem. > - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching > mechanism and does not scale. > - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). > > 2. We needed the ability to gather data from long-running entities by a > process that runs separate from our main indexing process. > > 3. We wanted the ability to do a delta import of only the entities that > changed. > - Lucene/Solr requires entire documents to be re-indexed, even if only a > few fields changed. > - Our data comes from 50+ complex sql queries and/or flat files. > - We do not want to incur overhead re-gathering all of this data if only 1 > entity's data changed. > - Persistent DIH caches solve this problem. > > 4. We want the ability to index several documents in parallel (using 1.4.1, > which did not have the "threads" parameter). > > 5. In the future, we may need to use Shards, creating a need to easily > partition our source data into Shards. > Implementation Details: > 1. De-couple EntityProcessorBase from caching. > - Created a new interface, DIHCache & two implementations: > - SortedMapBackedCache - An in-memory cache, used as default with > CachedSqlEntityProcessor (now deprecated). > - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested > with je-4.1.6.jar >- NOTE: the existing Lucene Contrib "db" project uses je-3.3.93.jar. > I believe this may be incompatible due to Generic Usage. >- NOTE: I did not modify the ant script to automatically get this jar, > so to use or evaluate this patch, download bdb-je from > http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html > > 2. Allow Entity Processors to take a "cacheImpl" parameter to cause the > entity data to be cached (see EntityProcessorBase & DIHCacheProperties). > > 3. Partially De-couple SolrWriter from DocBuilder > - Created a new interface DIHWriter, & two implementations: >- SolrWriter (refactored) >- DIHCacheWriter (allows DIH to write ultimately to a Cache). > > 4. Create a new Entity Processor, DIHCacheProcessor, which reads a > persistent Cache as DIH Entity Input. > > 5. Support a "partition" parameter with both DIHCacheWriter and > DIHCacheProcessor to allow for easy partitioning of source entity data. > > 6. Change the semantics of enti
[jira] [Updated] (LUCENE-2548) Remove all interning of field names from flex API
[ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2548: --- Attachment: LUCENE-2548.patch Initial patch. Tests are passing, at least a few iterations (I'll beast it). There are still a few nocommits... I used PMD and findbugs to find == and != on strings, but surprisingly there are cases that these tools seem to miss. I also did various greps to try to find cases... but I'm sure I've missed some! > Remove all interning of field names from flex API > - > > Key: LUCENE-2548 > URL: https://issues.apache.org/jira/browse/LUCENE-2548 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2548.patch > > > In previous versions of Lucene, interning of fields was important to minimize > string comparison cost when iterating TermEnums, to detect changes in field > name. As we separated field names from terms in flex, no query compares field > names anymore, so the whole performance problematic interning can be removed. > I will start with doing this, but we need to carefully review some places > e.g. in preflex codec. > Maybe before this issue we should remove the Term class completely. :-) > Robert? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-2548) Remove all interning of field names from flex API
[ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-2548: -- Assignee: Michael McCandless > Remove all interning of field names from flex API > - > > Key: LUCENE-2548 > URL: https://issues.apache.org/jira/browse/LUCENE-2548 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 4.0 > > > In previous versions of Lucene, interning of fields was important to minimize > string comparison cost when iterating TermEnums, to detect changes in field > name. As we separated field names from terms in flex, no query compares field > names anymore, so the whole performance problematic interning can be removed. > I will start with doing this, but we need to carefully review some places > e.g. in preflex codec. > Maybe before this issue we should remove the Term class completely. :-) > Robert? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051995#comment-13051995 ] Jan Høydahl commented on SOLR-236: -- I think you should consider the group by now included in 3_x branch (SOLR-2524 was recently committed) > Field collapsing > > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 >Reporter: Emmanuel Keller >Assignee: Shalin Shekhar Mangar > Fix For: 3.3 > > Attachments: DocSetScoreCollector.java, > NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, > SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, > SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-branch_3x.patch, SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, > SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, > SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, > SOLR-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, > collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, > collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > quasidistributed.additional.patch, solr-236.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2609) Coordinate range queries do not work with Spatial Solr
[ https://issues.apache.org/jira/browse/SOLR-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051988#comment-13051988 ] David Smiley commented on SOLR-2609: I highly doubt this can be fixed, based on how it works. The documentation/wiki should be updated to note this problem. I recommend you use bbox: http://wiki.apache.org/solr/SpatialSearch#bbox_-_Bounding-box_filter Granted you cannot specify an arbitrary bounding box, only one based on a point-distance, but this may be good enough. > Coordinate range queries do not work with Spatial Solr > -- > > Key: SOLR-2609 > URL: https://issues.apache.org/jira/browse/SOLR-2609 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 3.1 >Reporter: Zac Smith > Labels: spatialsearch > > The Spatial Search documentation states that you can create your own bounding > box using a range query: > "Since the LatLonType field also supports field queries and range queries, > one can manually create their own bounding box rather than using bbox: > ...&q=*:*&fq=store:[45,-94 TO 46,-93]" > This works unless your range covers an area where longitude goes from 180 to > -180. For instance I want all items in the longitude range of > 178 to -177 which of course gives no results (it is not a valid numeric > range). It's not really surprising that this doesn't work as it is just a > standard range query with no spatial filters being applied. > I am wondering if this is just an issue with the documentation and there is > another way that this should be done? Please advise if more details are > needed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3221) improve docvalues integration with scoring
improve docvalues integration with scoring -- Key: LUCENE-3221 URL: https://issues.apache.org/jira/browse/LUCENE-3221 Project: Lucene - Java Issue Type: New Feature Components: core/index Reporter: Robert Muir Fix For: flexscoring branch Currently, the flexscoring branch is limited by the fact that you can at most index one single byte per-document for scoring within Similarity. I added a simple test, showing how in your app itself you can index a per-document value (such as a boost) and then use it in scoring: http://svn.apache.org/repos/asf/lucene/dev/branches/flexscoring/lucene/src/test/org/apache/lucene/search/TestDocValuesScoring.java However, I think we should generalize this mechanism (note, names of classes can be changed to whatver makes sense). In Similarity, instead of byte computeNorm(FieldInvertState), I think we should have void computeNorm(StatsWriter, FieldInvertState). Then a Similarity can ask the StatsWriter for instance(s), where an instance is something like a (name, type, aggregates) pair. Name would be a simple name like "boost" that the sim later uses to retrieve this docvalue. type would be something like int8/int32/varint/byte. aggregates could at first be a boolean or whatever, I think at first we should allow for the sum be be written (e.g. to provide sum and average). This would support aggregate statistics such as 'total number of tokens in index' and 'average length'. so an example of the new computeNorm or whatever we call it would be {noformat} void computeNorm(StatsWriter writer, FieldInvertState state) { writer.getReference("length", INT32, Aggregates.YES).write(state.numTokens); writer.getReference("boost", FLOAT32, Aggregates.NO).write(state.boost); ... } {noformat} So these docvalues field names that the Sim writes, I think the sim should be able to reference them with "relative" names like length and boost. Whatever we do behind the scenes is an implementation detail. Also for this to work, I think we need to add int8, int16, int32, ... types to docvalues, and maybe we should add hasArray()/getArray(). I think the existing compressed INTS should be kept, but maybe renamed to varint or something like that. This could still be useful, for example if someone wants to have "real document lengths" for bm25, but they don't really need a full 32-bit range, they can make the tradeoff to use packed integers and load less into ram... but that should be the sim's choice to make. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2793: Attachment: LUCENE-2793.patch For the record - I went through the latest patch and added some nocommits where needed. I will take this patch and commit it to the branch. We should now work on that branch to fix all the remaining issues. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch EasyStats object added. > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Attachments: LUCENE-3220.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased
[ https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051965#comment-13051965 ] Mike Sokolov commented on SOLR-219: --- Fair enough - And by the way +1 on all this - I hated having to hack QueryParser just to prevent stop words getting stripped from phrases. "The the" and "The who" were problematic :) > Determine if prefix, wildcard, fuzzy queries should be lowercased > - > > Key: SOLR-219 > URL: https://issues.apache.org/jira/browse/SOLR-219 > Project: Solr > Issue Type: Improvement >Reporter: Yonik Seeley >Priority: Minor > Fix For: 3.3 > > Attachments: lowercase_prefix.patch, wildcardlowercase.patch > > > Solr should be able to "do the right thing" when doing prefix/wildcard/fuzzy > queries on fields with respect to lowercasing or not. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2533) Improve API of ValueSource & FunctionQuery SortField weighting
[ https://issues.apache.org/jira/browse/SOLR-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male resolved SOLR-2533. -- Resolution: Fixed Committed revision 1137612 > Improve API of ValueSource & FunctionQuery SortField weighting > -- > > Key: SOLR-2533 > URL: https://issues.apache.org/jira/browse/SOLR-2533 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Chris Male >Assignee: Chris Male > Attachments: SOLR-2533.patch, SOLR-2533.patch, SOLR-2533.patch, > SOLR-2533.patch, SOLR-2533.patch > > > Started from LUCENE-2883: Support for sorting by ValueSource and > FunctionQueries is done through ValueSource#getSort and the > ValueSourceSortField. In order to support VSs containing other Queries, its > necessary to allow the Querys to be weighted by an IndexSearcher. Currently > this is handled by having ValueSourceSortField implement SolrSortField. In > Solr's SolrIndexSearcher, SortFields implementing SolrSortField are then > weighted before the Sort is used. > Sorting by FunctionQuery and ValueSource are invaluable and will become > available to all Lucene users in LUCENE-2883. But in order to do so, we need > to remove the coupling of this functionality to Solr, and make it more > standard. > Any and all thoughts about how to do this are appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: KStem custom lexicons configuration possible?
Hi Robert, I think the difference between KStem and other stemmers (at least those that I am aware of, like snowball or porter) is that KStem is expected to produce a real valid words and thus other filtering can be applied to the tokens after stemming more easily (for example synonym expansion). Not sure if this is the case with other available stemmers in Lucene. Also my impression from reading the original paper by Robert Krovetz was that possibility to fine-tune lexicons is practical. So that is why I was expecting that KStem API should support this as well. Well, may be a combination of KStem with Override filter (but applied AFTER stemming) would work too in this case :-) Regards, Lukas On Mon, Jun 20, 2011 at 2:32 PM, Robert Muir wrote: > On Mon, Jun 20, 2011 at 8:23 AM, Lukáš Vlček > wrote: > > Hi Robert, > > this sounds interesting I will look at it in more detail. > > However, I do not think this is really a general solution. If I > understand > > StemmerOverrideFilter correctly (from a quick glance) it rely on the fact > > that you *know* exact term (the key in the map) in advance. In other > words > > if I wanted to "fix" some term produced by Kstem filter I would have to > know > > what is the product of the stemming in advance. Now, this means that if I > > switch to snowball or porter or other stemmer instead of KStem or simply > > update something else in the filtering chain then I am in trouble. Also > if I > > understand correctly the original KStem implementation it can still get > > updates to lexicons which means that once these updates are ported to > Java > > implementation it can again result in problem with existing override > filter > > setup. > > More generally, is there any reason why lexicons are not configurable in > > Because we have StemmerOverrideFilter and KeywordMarkerFilter. > > look at the source code to Kstem: it uses maps and sets of exceptions, > this is what these filters provide in a general way > (StemmerOverrideFilter being the map, and KeywordMarkerFilter being > the set). > > we added these to work across the board with all lucene stemmers for > this reason. > > I don't understand your concerns at all to be honest, they make no > sense to me. If we "updated" kstem or any other algorithm: it would > break whatever you are doing either way. A hashmap is a hashmap. > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051957#comment-13051957 ] Robert Muir commented on SOLR-2452: --- by the way, obviously since you have been doing all the work here, i don't want you to read this as me questioning/objecting to the change, just trying to maybe help save you some sanity... if you don't mind dealing with the merging I would just say go for it. > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir > Fix For: 3.3 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2202) Money FieldType
[ https://issues.apache.org/jira/browse/SOLR-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051955#comment-13051955 ] Jan Høydahl commented on SOLR-2202: --- Any interest in reviving this and work towards committing a first version? > Money FieldType > --- > > Key: SOLR-2202 > URL: https://issues.apache.org/jira/browse/SOLR-2202 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis >Affects Versions: 1.5 >Reporter: Greg Fodor > Attachments: SOLR-2022-solr-3.patch, SOLR-2202-lucene-1.patch, > SOLR-2202-solr-1.patch, SOLR-2202-solr-2.patch, SOLR-2202-solr-4.patch, > SOLR-2202-solr-5.patch, SOLR-2202-solr-6.patch, SOLR-2202-solr-7.patch, > SOLR-2202-solr-8.patch, SOLR-2202-solr-9.patch > > > Attached please find patches to add support for monetary values to > Solr/Lucene with query-time currency conversion. The following features are > supported: > - Point queries (ex: "price:4.00USD") > - Range quries (ex: "price:[$5.00 TO $10.00]") > - Sorting. > - Currency parsing by either currency code or symbol. > - Symmetric & Asymmetric exchange rates. (Asymmetric exchange rates are > useful if there are fees associated with exchanging the currency.) > At indexing time, money fields can be indexed in a native currency. For > example, if a product on an e-commerce site is listed in Euros, indexing the > price field as "10.00EUR" will index it appropriately. By altering the > currency.xml file, the sorting and querying against Solr can take into > account fluctuations in currency exchange rates without having to re-index > the documents. > The new "money" field type is a polyfield which indexes two fields, one which > contains the amount of the value and another which contains the currency code > or symbol. The currency metadata (names, symbols, codes, and exchange rates) > are expected to be in an xml file which is pointed to by the field type > declaration in the schema.xml. > The current patch is factored such that Money utility functions and > configuration metadata lie in Lucene (see MoneyUtil and CurrencyConfig), > while the MoneyType and MoneyValueSource lie in Solr. This was meant to > mirror the work being done on the spacial field types. > This patch has not yet been deployed to production but will be getting used > to power the international search capabilities of the search engine at Etsy. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051953#comment-13051953 ] Robert Muir commented on SOLR-2452: --- ok, i was just curious, sounds like something that could possibly be dealt with later. I think i said it before too, I find it confusing the way these directories all depend upon each other today and how each one is not its own 'subproject' of the build (that basically acts like a contrib or module itself and states its dependencies). So I would *really* like to see this fixed. However, I think I would recommend thinking about when you want to make the change: it will make merging code up to this branch nearly impossible... is it holding back other changes or is this a final step? > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir > Fix For: 3.3 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2605) CoreAdminHandler, different Output while 'defaultCoreName' is specified
[ https://issues.apache.org/jira/browse/SOLR-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-2605: -- Fix Version/s: 4.0 > CoreAdminHandler, different Output while 'defaultCoreName' is specified > --- > > Key: SOLR-2605 > URL: https://issues.apache.org/jira/browse/SOLR-2605 > Project: Solr > Issue Type: Improvement > Components: web gui >Reporter: Stefan Matheis (steffkes) >Priority: Minor > Fix For: 4.0 > > Attachments: SOLR-2399-admin-cores-default.xml, > SOLR-2399-admin-cores.xml > > > The attached XML-Files show the little difference between a defined > {{defaultCoreName}}-Attribute and a non existing one. > Actually the new admin ui checks for an core with empty name to set single- / > multicore-settings .. it's a quick change to count the number of defined > cores instead. > But, will it be possible, to get the core-name (again)? One of both > attributes would be enough, if that makes a difference :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2605) CoreAdminHandler, different Output while 'defaultCoreName' is specified
[ https://issues.apache.org/jira/browse/SOLR-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051950#comment-13051950 ] Mark Miller commented on SOLR-2605: --- Indeed - this has always been a bit ugly. Was kind of ease over best approach at the time if I remember right. > CoreAdminHandler, different Output while 'defaultCoreName' is specified > --- > > Key: SOLR-2605 > URL: https://issues.apache.org/jira/browse/SOLR-2605 > Project: Solr > Issue Type: Improvement > Components: web gui >Reporter: Stefan Matheis (steffkes) >Priority: Minor > Attachments: SOLR-2399-admin-cores-default.xml, > SOLR-2399-admin-cores.xml > > > The attached XML-Files show the little difference between a defined > {{defaultCoreName}}-Attribute and a non existing one. > Actually the new admin ui checks for an core with empty name to set single- / > multicore-settings .. it's a quick change to count the number of defined > cores instead. > But, will it be possible, to get the core-name (again)? One of both > attributes would be enough, if that makes a difference :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: KStem custom lexicons configuration possible?
On Mon, Jun 20, 2011 at 8:23 AM, Lukáš Vlček wrote: > Hi Robert, > this sounds interesting I will look at it in more detail. > However, I do not think this is really a general solution. If I understand > StemmerOverrideFilter correctly (from a quick glance) it rely on the fact > that you *know* exact term (the key in the map) in advance. In other words > if I wanted to "fix" some term produced by Kstem filter I would have to know > what is the product of the stemming in advance. Now, this means that if I > switch to snowball or porter or other stemmer instead of KStem or simply > update something else in the filtering chain then I am in trouble. Also if I > understand correctly the original KStem implementation it can still get > updates to lexicons which means that once these updates are ported to Java > implementation it can again result in problem with existing override filter > setup. > More generally, is there any reason why lexicons are not configurable in Because we have StemmerOverrideFilter and KeywordMarkerFilter. look at the source code to Kstem: it uses maps and sets of exceptions, this is what these filters provide in a general way (StemmerOverrideFilter being the map, and KeywordMarkerFilter being the set). we added these to work across the board with all lucene stemmers for this reason. I don't understand your concerns at all to be honest, they make no sense to me. If we "updated" kstem or any other algorithm: it would break whatever you are doing either way. A hashmap is a hashmap. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action
[ https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051948#comment-13051948 ] Mark Miller commented on SOLR-2610: --- +1 > Add an option to delete index through CoreAdmin UNLOAD action > - > > Key: SOLR-2610 > URL: https://issues.apache.org/jira/browse/SOLR-2610 > Project: Solr > Issue Type: Improvement > Components: multicore >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 3.3, 4.0 > > > Right now, one can unload a Solr Core but the index files are left behind and > consume disk space. We should have an option to delete the index when > unloading a core. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: KStem custom lexicons configuration possible?
Hi Robert, this sounds interesting I will look at it in more detail. However, I do not think this is really a general solution. If I understand StemmerOverrideFilter correctly (from a quick glance) it rely on the fact that you *know* exact term (the key in the map) in advance. In other words if I wanted to "fix" some term produced by Kstem filter I would have to know what is the product of the stemming in advance. Now, this means that if I switch to snowball or porter or other stemmer instead of KStem or simply update something else in the filtering chain then I am in trouble. Also if I understand correctly the original KStem implementation it can still get updates to lexicons which means that once these updates are ported to Java implementation it can again result in problem with existing override filter setup. More generally, is there any reason why lexicons are not configurable in KStem filter? Regards, Lukas On Mon, Jun 20, 2011 at 1:38 PM, Robert Muir wrote: > On Mon, Jun 20, 2011 at 7:19 AM, Lukáš Vlček > wrote: > > Having an option to modify internal lexicons I would be able to adapt the > > KStem to work better for specific text corpora. > > What do you think? > > please use StemmerOverrideFilter for this! it works with all stemmers, > including this one. > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Created] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action
Add an option to delete index through CoreAdmin UNLOAD action - Key: SOLR-2610 URL: https://issues.apache.org/jira/browse/SOLR-2610 Project: Solr Issue Type: Improvement Components: multicore Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 3.3, 4.0 Right now, one can unload a Solr Core but the index files are left behind and consume disk space. We should have an option to delete the index when unloading a core. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051943#comment-13051943 ] Steven Rowe commented on SOLR-2452: --- {quote} bq. but solrj tests depend on core tests Curious why this is? some base classes that could be moved into test-framework instead? {quote} At a minimum, {{o.a.s.client.solrj.SolrJettyTestBase}} (likely should be moved to another package, given that Solr core {{o.a.s.servlet.\*CacheHeaderTest\*}} tests extend this class) and {{o.a.s.util.ExternalPaths}}. > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir > Fix For: 3.3 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Error :dataimport handler is not request Handler, help
*Hi All Thank you every body , it now works , and I can work with solr 4.0 the solution was the following , as mentioned before by Jeffrey Chang , I tried a clean solr environment , so I removed all data import jar files from solr class paths , and commented the directives in solrconfig.xml but kept only on and this is the I run solr without any jar so an exception raised to tell that dataimport is not exist ClassNotFoundException then I turn off the server and put the jar in the example/solr/lib directory after this final step I fired Solr , and now It works fine now Thanks guys ...I really Thank you * On Mon, Jun 20, 2011 at 2:56 PM, Muhannad wrote: > I tried this and removed all dataimport jars , but only kept one on lib > sirectory in Solr instance but the same error exists , I never faced this > problem before , could it because I have a non-stable version of Solr 4.0? > > > 2011/6/20 Jeffrey Chang > >> Hi, >> >> I've encountered a similar issue before. >> >> The problem for me was the Classloader that loaded DataImportHandler class >> is not the same as the one loading the SolrRequestHandler class. >> >> Trace... >> >> In SolrCore.java (3.1 source) >> <-- >> line 459: createInstance(className, SolrRequestHandler.class, "Request >> Handler") >> <-- >> line: 423: clazz = getResourceLoader().findClass(className); >> <-- >> line: 424: if (cast != null && !cast.isAssignableFrom(clazz)) >> >> This evaluation will fail since clazz is not loaded by the same >> classloader as cast. >> >> What I did was to make sure that the dataimport jars are NOT in the >> classpath and not loaded by other classloaders but from the path specified >> in solrconfig.xml. This will ensure that the dataimport classes are loaded >> by the same classloader. >> >> Not sure if this is the same issue you're encountering, I hope this helps. >> >> Thanks, >> Jeff >> >> On Mon, Jun 20, 2011 at 2:36 PM, Muhannad wrote: >> >>> Yes , I just tried it , and this works for Solr 1.4 I am currently >>> working on , but when I tried 3.1 or 4.0 >>> the same error appears ,I know that the war file no more contains jar >>> files related to dataimport and logging functionality , I put all requested >>> files in class path , and I am sure it loads them as the server starts , but >>> I guess the problem is that it doesn't recognise dataimportHandler as a >>> RequestHandler >>> I really stuck , and confused!!! >>> >>> On Mon, Jun 20, 2011 at 3:14 AM, Bill Bell wrote: >>> Did you try adding something like this to solrconfig.xml ? >>> regex="apache-solr-dataimporthandler-.*\.jar" /> >>> class="org.apache.solr.handler.dataimport.DataImportHandler"> db-data-config.xml From: Muhannad Reply-To: Date: Sun, 19 Jun 2011 23:42:45 +0300 To: Subject: Re: Error :dataimport handler is not request Handler, help I have tried many things , same problem still , any help? On Sun, Jun 19, 2011 at 9:00 PM, Muhannad wrote: > Hi All , I am really stuck in this problem , I am using solr to > index some tables in database and I followed these steps to achieve my > goal > 1- added the following section to solrconfig.xml name="/dataimport" > class="org.apache.solr.handler.dataimport.DataImportHandler"> > > data-config.xml > > > > *2- added apache-solr-dataimporthandler.jar to lib/ directory (include > path) > every thing goes nice !!! for now , till I fire the server > the following error appears , Please I need You help urgently !!! > > ===Error message== > * HTTP ERROR 500 > > Problem accessing /solr/. Reason: > > Severe errors in solr configuration. > > Check your log files for more detailed information on what may be wrong. > > - > org.apache.solr.common.SolrException: Error Instantiating Request > Handler, org.apache.solr.handler.dataimport.DataImportHandler is not a > org.apache.solr.request.SolrRequestHandler > at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:396) > at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:431) > at > org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:158) > at org.apache.solr.core.SolrCore.(SolrCore.java:513) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:653) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:406) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:291) > at > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:240) > at > org.apache.solr.servlet.SolrDispatchFilter.init(Solr
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051939#comment-13051939 ] Robert Muir commented on SOLR-2452: --- {quote} but solrj tests depend on core tests {quote} Curious why this is? some base classes that could be moved into test-framework instead? > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir > Fix For: 3.3 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3174) Similarity.Stats class for term & collection statistics
[ https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3174. - Resolution: Fixed thanks David! > Similarity.Stats class for term & collection statistics > --- > > Key: LUCENE-3174 > URL: https://issues.apache.org/jira/browse/LUCENE-3174 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey >Priority: Minor > Fix For: flexscoring branch > > Attachments: LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174.patch, > LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174.patch, > LUCENE-3174_normalize_boost.patch > > > In order to support ranking methods besides TF-IDF, we need to make the > statistics they need available. These statistics could be computed in > computeWeight (soon to become computeStats) and stored in a separate object > for easy access. Since this object will be used solely by subclasses of > Similarity, it should be implented as a static inner class, i.e. > Similarity.Stats. > There are two ways this could be implemented: > - as a single Similarity.Stats class, reused by all ranking algorithms. In > this case, this class would have a member field for all statistics; > - as a hierarchy of Stats classes, one for each ranking algorithm. Each > subclass would define only the statistics needed for the ranking algorithm. > In the second case, the Stats class in DefaultSimilarity would have a single > field, idf, while the one in e.g. BM25Similarity would have idf and average > field/document length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: KStem custom lexicons configuration possible?
On Mon, Jun 20, 2011 at 7:19 AM, Lukáš Vlček wrote: > Having an option to modify internal lexicons I would be able to adapt the > KStem to work better for specific text corpora. > What do you think? please use StemmerOverrideFilter for this! it works with all stemmers, including this one. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
i was planning on doing an RC in a few weeks actually. we have a lot of good stuff in there today already, however i wanted to give a few weeks for the grouping stuff to run on hudson. On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer wrote: > I would say within the next 3 month. > > Thoughts? > > On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček wrote: >> Hi, >> How soon can we expect official Lucene 3.3 release? >> Best regards, >> Lukas > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: KStem custom lexicons configuration possible?
May be I should show some examples where I think custom configuration can be useful. Let me give you two examples: 1) As of now, KStem does conflation of both words "connector" and "connected" to the same term "connect". 2) Contrary it does not do conflation of "transaction" and "transactions" to the same term. Having an option to modify internal lexicons I would be able to adapt the KStem to work better for specific text corpora. What do you think? Regards, Lukas On Mon, Jun 20, 2011 at 12:55 PM, Lukáš Vlček wrote: > Hi, > > Is there any API in KStem filter for lexicons configuration? > > As far as I understand the original code works in such a way that lexicons > are loaded from files at startup (see > http://lexicalresearch.com/kstem-doc.txt). The author (Robert Krovetz) > names possibility to modify lexicons among advantages of KStem compared to > other stemmers. > > Do people not need it? Would it be a useful addition for KStem filter to > allow custom lexicon configurations in its API? > > Regards, > Lukas > > Note: Big kudos to all who participated in bringing KStem into Lucene! >
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8939 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8939/ 2 tests failed. REGRESSION: org.apache.lucene.search.TestComplexExplanations.testCSQ4 Error Message: org.apache.lucene.search.TestComplexExplanations.testCSQ4: Insane FieldCache usage(s) found expected:<0> but was:<1> Stack Trace: junit.framework.AssertionFailedError: org.apache.lucene.search.TestComplexExplanations.testCSQ4: Insane FieldCache usage(s) found expected:<0> but was:<1> at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333) at org.apache.lucene.util.LuceneTestCase.assertSaneFieldCaches(LuceneTestCase.java:716) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:620) at org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:67) at org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:43) REGRESSION: org.apache.lucene.search.function.TestFieldScoreQuery.testRankInt Error Message: org.apache.lucene.search.function.TestFieldScoreQuery.testRankInt: Insane FieldCache usage(s) found expected:<0> but was:<1> Stack Trace: junit.framework.AssertionFailedError: org.apache.lucene.search.function.TestFieldScoreQuery.testRankInt: Insane FieldCache usage(s) found expected:<0> but was:<1> at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333) at org.apache.lucene.util.LuceneTestCase.assertSaneFieldCaches(LuceneTestCase.java:716) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:620) Build Log (for compile errors): [...truncated 3283 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2305) DataImportScheduler - Marko Bonaci
[ https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051929#comment-13051929 ] Marko Bonaci commented on SOLR-2305: Hi Bill, I had difficulties with setting up the project in Eclipse, and although I have successfully done it in the end, I think that the patch file wont be usable (due to many build path changes I made)? All you have to do to incorporate DIHScheduler is to follow the instructions I posted here: http://wiki.apache.org/solr/DataImportHandler#Scheduling If you run into any kind of problem feel free to post the question here and I'll try to respond promptly. Thank you. > DataImportScheduler - Marko Bonaci > --- > > Key: SOLR-2305 > URL: https://issues.apache.org/jira/browse/SOLR-2305 > Project: Solr > Issue Type: New Feature >Affects Versions: 4.0 >Reporter: Bill Bell > Fix For: 4.0 > > > Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I > cannot find a JIRA ticket for it? > http://wiki.apache.org/solr/DataImportHandler > Do we have a ticket so the code can be tracked? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051928#comment-13051928 ] Chris Male commented on SOLR-2452: -- I think my comments can be addressed later on maybe and shouldn't stop these improvements from going forward so +1 > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir > Fix For: 3.3 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051923#comment-13051923 ] Chris Male commented on SOLR-2452: -- Hmmm I hadn't considered the issue of SolrJ being used for distributed search. > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir > Fix For: 3.3 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
KStem custom lexicons configuration possible?
Hi, Is there any API in KStem filter for lexicons configuration? As far as I understand the original code works in such a way that lexicons are loaded from files at startup (see http://lexicalresearch.com/kstem-doc.txt). The author (Robert Krovetz) names possibility to modify lexicons among advantages of KStem compared to other stemmers. Do people not need it? Would it be a useful addition for KStem filter to allow custom lexicon configurations in its API? Regards, Lukas Note: Big kudos to all who participated in bringing KStem into Lucene!
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Issue Type: Sub-task (was: New Feature) Parent: LUCENE-2959 > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051920#comment-13051920 ] Steven Rowe commented on SOLR-2452: --- bq. Can we address the packaging or is that out of scope of this work? What did you have in mind? > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir > Fix For: 3.3 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Description: With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current "mock" implementation might be OK * _LM_ * _DFR_ Done: was: With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * `EasyStats`: contains all statistics that might be relevant for a ranking algorithm * `EasySimilarity`: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current "mock" implementation might be OK * _LM_ * _DFR_ Done: > Implement various ranking models as Similarities > > > Key: LUCENE-3220 > URL: https://issues.apache.org/jira/browse/LUCENE-3220 > Project: Lucene - Java > Issue Type: New Feature > Components: core/search >Affects Versions: flexscoring branch >Reporter: David Mark Nemeskey >Assignee: David Mark Nemeskey > Labels: gsoc > Original Estimate: 336h > Remaining Estimate: 336h > > With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we > can finally work on implementing the standard ranking models. Currently DFR, > BM25 and LM are on the menu. > TODO: > * {{EasyStats}}: contains all statistics that might be relevant for a > ranking algorithm > * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the > DocScorers and as much implementation detail as possible > * _BM25_: the current "mock" implementation might be OK > * _LM_ > * _DFR_ > Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
That is fine, I just wanted to know when the KStem filter will be part of stable release. On Mon, Jun 20, 2011 at 10:59 AM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > I would say within the next 3 month. > > Thoughts? > > On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček > wrote: > > Hi, > > How soon can we expect official Lucene 3.3 release? > > Best regards, > > Lukas > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051919#comment-13051919 ] Steven Rowe commented on SOLR-2452: --- bq. SolrJ [...] is just a client library That's not all it is; on 8/18/2010 on #lucene IRC, yonik wrote: bq. solrj used to not be included in the war, but solr core uses solrj for distributed search > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir > Fix For: 3.3 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3220) Implement various ranking models as Similarities
Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * `EasyStats`: contains all statistics that might be relevant for a ranking algorithm * `EasySimilarity`: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current "mock" implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051916#comment-13051916 ] Chris Male commented on SOLR-2452: -- I see. Can we address the packaging or is that out of scope of this work? > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir > Fix For: 3.3 > > Attachments: SOLR-2452-post-reshuffling.patch, > SOLR-2452.dir.reshuffle.sh > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org