date:20110620


 [ 
https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3223:
---

Fix Version/s: 4.0

> SearchWithSortTask ignores sorting by Doc
> -
>
> Key: LUCENE-3223
> URL: https://issues.apache.org/jira/browse/LUCENE-3223
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Chris Male
>Assignee: Chris Male
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch
>
>
> During my work in LUCENE-3912, I found the following code:
> {code}
> if (field.equals("doc")) {
> sortField0 = SortField.FIELD_DOC;
> } if (field.equals("score")) {
> sortField0 = SortField.FIELD_SCORE;
> } ...
> {code}
> This means the setting of SortField.FIELD_DOC is ignored.  While I don't know 
> much about this code, this seems like a valid setting and obviously just a 
> bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc


 [ 
https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male resolved LUCENE-3223.


Resolution: Fixed

Committed revision 1137882.

> SearchWithSortTask ignores sorting by Doc
> -
>
> Key: LUCENE-3223
> URL: https://issues.apache.org/jira/browse/LUCENE-3223
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Chris Male
>Assignee: Chris Male
>Priority: Minor
> Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch
>
>
> During my work in LUCENE-3912, I found the following code:
> {code}
> if (field.equals("doc")) {
> sortField0 = SortField.FIELD_DOC;
> } if (field.equals("score")) {
> sortField0 = SortField.FIELD_SCORE;
> } ...
> {code}
> This means the setting of SortField.FIELD_DOC is ignored.  While I don't know 
> much about this code, this seems like a valid setting and obviously just a 
> bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2341) explore morfologik integration

2011-06-20 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052380#comment-13052380
 ] 

Dawid Weiss commented on LUCENE-2341:
-

I'll take a look at the differences between Morfologik and Morfeusz right now, 
actually. I'll post the results once I have something.

> explore morfologik integration
> --
>
> Key: LUCENE-2341
> URL: https://issues.apache.org/jira/browse/LUCENE-2341
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Robert Muir
>Assignee: Dawid Weiss
> Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar
>
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
> available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option 
> for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum


[ 
https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052377#comment-13052377
 ] 

Chris Male commented on LUCENE-3219:


You'll have to guide me on the backwards compat issue since this is a break due 
to the fields being public and some methods changing from returning int to 
returning SortField.Type.

> Change SortField types to an Enum
> -
>
> Key: LUCENE-3219
> URL: https://issues.apache.org/jira/browse/LUCENE-3219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Chris Male
>Assignee: Chris Male
>Priority: Minor
> Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, 
> LUCENE-3219.patch
>
>
> When updating my SOLR-2533 patch, one issue was that the int value I had 
> given my new type had been used by another change in the mean time.  Since we 
> don't use these fields in a bitset kind of way, we can convert them to an 
> enum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2341) explore morfologik integration

2011-06-20 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052376#comment-13052376
 ] 

Dawid Weiss commented on LUCENE-2341:
-

Thanks for the contribution, Michał. 

Robert: the dictionary is licensed under MPL or CC-SA (to be selected by the 
user depending on one's needs). Do you know which one is preferable over 
another?

Michał: there is also another (much larger) dictionary that has been released 
recently and comes from the Morfeusz project. 
http://sgjp.pl/morfeusz/dopobrania.html This dictionary is actually licensed 
under BSD license, so no legal worries at all. Both dictionaries are nearly 
identical (they differ slightly in their convention of morphosyntactic 
annotations) and Morfeusz's dictionary could be compiled into an automaton for 
use with Morfologik.

Which way should we go? What do you think?

> explore morfologik integration
> --
>
> Key: LUCENE-2341
> URL: https://issues.apache.org/jira/browse/LUCENE-2341
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Robert Muir
>Assignee: Dawid Weiss
> Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar
>
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
> available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option 
> for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum


[ 
https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052368#comment-13052368
 ] 

Simon Willnauer commented on LUCENE-3219:
-

looks good to me. BTW. should we backport those changes?

> Change SortField types to an Enum
> -
>
> Key: LUCENE-3219
> URL: https://issues.apache.org/jira/browse/LUCENE-3219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Chris Male
>Assignee: Chris Male
>Priority: Minor
> Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, 
> LUCENE-3219.patch
>
>
> When updating my SOLR-2533 patch, one issue was that the int value I had 
> given my new type had been used by another change in the mean time.  Since we 
> don't use these fields in a bitset kind of way, we can convert them to an 
> enum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc


[ 
https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052362#comment-13052362
 ] 

Simon Willnauer commented on LUCENE-3223:
-

bq. Simple patch fixing the problem. Do I need a CHANGES entry for trivial 
things like this?
looks good, I don't think we need a changes entry for this. go ahead and commit!

> SearchWithSortTask ignores sorting by Doc
> -
>
> Key: LUCENE-3223
> URL: https://issues.apache.org/jira/browse/LUCENE-3223
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Chris Male
>Assignee: Chris Male
>Priority: Minor
> Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch
>
>
> During my work in LUCENE-3912, I found the following code:
> {code}
> if (field.equals("doc")) {
> sortField0 = SortField.FIELD_DOC;
> } if (field.equals("score")) {
> sortField0 = SortField.FIELD_SCORE;
> } ...
> {code}
> This means the setting of SortField.FIELD_DOC is ignored.  While I don't know 
> much about this code, this seems like a valid setting and obviously just a 
> bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene 3.3 release soon?

2011-06-20 Thread Bill Bell

+1 wait for grouping post facet counts... Go Martijn v Groningen !!

On 6/20/11 12:03 PM, "Michael McCandless" 
wrote:

>+1 to releasing 3.3 in a few weeks... there's a lot of new stuff after
>3.2.
>
>Mike McCandless
>
>http://blog.mikemccandless.com
>
>On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir  wrote:
>> i was planning on doing an RC in a few weeks actually.
>>
>> we have a lot of good stuff in there today already, however i wanted
>> to give a few weeks for the grouping stuff to run on hudson.
>>
>> On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer
>>  wrote:
>>> I would say within the next 3 month.
>>>
>>> Thoughts?
>>>
>>> On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček 
>>>wrote:
 Hi,
 How soon can we expect official Lucene 3.3 release?
 Best regards,
 Lukas
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
>-
>To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>For additional commands, e-mail: dev-h...@lucene.apache.org
>



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1967) New Native PHP Response Writer Class

2011-06-20 Thread Israel Ekpo (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052342#comment-13052342
 ] 

Israel Ekpo commented on SOLR-1967:
---

To use the 'json' response writer in lieu of phpnative, see documentation for 
SolrClient::__construct()

http://www.php.net/manual/en/solrclient.construct.php

> New Native PHP Response Writer Class
> 
>
> Key: SOLR-1967
> URL: https://issues.apache.org/jira/browse/SOLR-1967
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - php, Response Writers
>Affects Versions: 1.4
>Reporter: Israel Ekpo
>  Labels: php, response, solrclient, writer
> Fix For: 3.3
>
> Attachments: phpnative.tar.gz, phpnativeresponsewriter.jar
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Hi Solr users,
> If you are using Apache Solr via PHP, I have some good news for you.
> There is a new response writer for the PHP native extension, currently 
> available as a plugin.
> This new feature adds a new response writer class to the 
> org.apache.solr.request package.
> This class is used by the PHP Native Solr Client driver to prepare the query 
> response from Solr.
> This response writer allows you to configure the way the data is serialized 
> for the PHP client.
> You can use your own class name and you can also control how the properties 
> are serialized as well.
> The formatting of the response data is very similar to the way it is 
> currently done by the PECL extension on the client side.
> The only difference now is that this serialization is happening on the server 
> side instead.
> You will find this new response writer particularly useful when dealing with 
> responses for 
> - highlighting
> - admin threads responses
> - more like this responses
> to mention just a few
> You can pass the "objectClassName" request parameter to specify the class 
> name to be used for serializing objects. 
> Please note that the class must be available on the client side to avoid a 
> PHP_Incomplete_Object error during the unserialization process.
> You can also pass in the "objectPropertiesStorageMode" request parameter with 
> either a 0 (independent properties) or a 1 (combined properties).
> These parameters can also be passed as a named list when loading the response 
> writer in the solrconfig.xml file
> Having this control allows you to create custom objects which gives the 
> flexibility of implementing custom __get methods, ArrayAccess, Traversable 
> and Iterator interfaces on the PHP client side.
> Until this class in incorporated into Solr, you simply have to copy the jar 
> file containing this plugin into your lib directory under $SOLR_HOME
> The jar file is available here and so is the source code.
> Then set up the configuration as shown below and then restart your servelet 
> container
> Below is an example configuration in solrconfig.xml
> 
>  class="org.apache.solr.request.PHPNativeResponseWriter">
> 
> SolrObject
> 
> 0
>  
> Below is an example implementation on the PHP client side.
> Support for specifying custom response writers will be available starting 
> from the 0.9.11 version of the PECL extension for Solr currently available 
> here
> http://pecl.php.net/package/solr
> Here is an example of how to use the new response writer with the PHP client.
> 
>  class SolrClass
> {
> public $_properties = array();
> public function __get($property_name) {
> if (property_exists($this, $property_name)) { return $this->$property_name; } 
> else if (isset($_properties[$property_name])) { return 
> $_properties[$property_name]; }
> return null;
> }
> }
> $options = array
> (
> 'hostname' => 'localhost',
> 'port' => 8983,
> 'path' => '/solr/'
> );
> $client = new SolrClient($options);
> $client->setResponseWriter("phpnative");
> $response = $client->ping();
> $query = new SolrQuery();
> $query->setQuery(":");
> $query->set("objectClassName", "SolrClass");
> $query->set("objectPropertiesStorageMode", 1);
> $response = $client->query($query);
> $resp = $response->getResponse();
> ?>
> 
> Documentation of the changes to the PECL extension are available here
> http://docs.php.net/manual/en/solrclient.construct.php
> http://docs.php.net/manual/en/solrclient.setresponsewriter.php
> Please contact me at ie...@php.net, if you have any questions or comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-1967) New Native PHP Response Writer Class

2011-06-20 Thread Israel Ekpo (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Israel Ekpo closed SOLR-1967.
-

Resolution: Won't Fix

The latest version of the PECL extension now supports JSON response writer 
which should be easier to use without additional configuration.

> New Native PHP Response Writer Class
> 
>
> Key: SOLR-1967
> URL: https://issues.apache.org/jira/browse/SOLR-1967
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - php, Response Writers
>Affects Versions: 1.4
>Reporter: Israel Ekpo
>  Labels: php, response, solrclient, writer
> Fix For: 3.3
>
> Attachments: phpnative.tar.gz, phpnativeresponsewriter.jar
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Hi Solr users,
> If you are using Apache Solr via PHP, I have some good news for you.
> There is a new response writer for the PHP native extension, currently 
> available as a plugin.
> This new feature adds a new response writer class to the 
> org.apache.solr.request package.
> This class is used by the PHP Native Solr Client driver to prepare the query 
> response from Solr.
> This response writer allows you to configure the way the data is serialized 
> for the PHP client.
> You can use your own class name and you can also control how the properties 
> are serialized as well.
> The formatting of the response data is very similar to the way it is 
> currently done by the PECL extension on the client side.
> The only difference now is that this serialization is happening on the server 
> side instead.
> You will find this new response writer particularly useful when dealing with 
> responses for 
> - highlighting
> - admin threads responses
> - more like this responses
> to mention just a few
> You can pass the "objectClassName" request parameter to specify the class 
> name to be used for serializing objects. 
> Please note that the class must be available on the client side to avoid a 
> PHP_Incomplete_Object error during the unserialization process.
> You can also pass in the "objectPropertiesStorageMode" request parameter with 
> either a 0 (independent properties) or a 1 (combined properties).
> These parameters can also be passed as a named list when loading the response 
> writer in the solrconfig.xml file
> Having this control allows you to create custom objects which gives the 
> flexibility of implementing custom __get methods, ArrayAccess, Traversable 
> and Iterator interfaces on the PHP client side.
> Until this class in incorporated into Solr, you simply have to copy the jar 
> file containing this plugin into your lib directory under $SOLR_HOME
> The jar file is available here and so is the source code.
> Then set up the configuration as shown below and then restart your servelet 
> container
> Below is an example configuration in solrconfig.xml
> 
>  class="org.apache.solr.request.PHPNativeResponseWriter">
> 
> SolrObject
> 
> 0
>  
> Below is an example implementation on the PHP client side.
> Support for specifying custom response writers will be available starting 
> from the 0.9.11 version of the PECL extension for Solr currently available 
> here
> http://pecl.php.net/package/solr
> Here is an example of how to use the new response writer with the PHP client.
> 
>  class SolrClass
> {
> public $_properties = array();
> public function __get($property_name) {
> if (property_exists($this, $property_name)) { return $this->$property_name; } 
> else if (isset($_properties[$property_name])) { return 
> $_properties[$property_name]; }
> return null;
> }
> }
> $options = array
> (
> 'hostname' => 'localhost',
> 'port' => 8983,
> 'path' => '/solr/'
> );
> $client = new SolrClient($options);
> $client->setResponseWriter("phpnative");
> $response = $client->ping();
> $query = new SolrQuery();
> $query->setQuery(":");
> $query->set("objectClassName", "SolrClass");
> $query->set("objectPropertiesStorageMode", 1);
> $response = $client->query($query);
> $resp = $response->getResponse();
> ?>
> 
> Documentation of the changes to the PECL extension are available here
> http://docs.php.net/manual/en/solrclient.construct.php
> http://docs.php.net/manual/en/solrclient.setresponsewriter.php
> Please contact me at ie...@php.net, if you have any questions or comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc


 [ 
https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male reassigned LUCENE-3223:
--

Assignee: Chris Male

> SearchWithSortTask ignores sorting by Doc
> -
>
> Key: LUCENE-3223
> URL: https://issues.apache.org/jira/browse/LUCENE-3223
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Chris Male
>Assignee: Chris Male
>Priority: Minor
> Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch
>
>
> During my work in LUCENE-3912, I found the following code:
> {code}
> if (field.equals("doc")) {
> sortField0 = SortField.FIELD_DOC;
> } if (field.equals("score")) {
> sortField0 = SortField.FIELD_SCORE;
> } ...
> {code}
> This means the setting of SortField.FIELD_DOC is ignored.  While I don't know 
> much about this code, this seems like a valid setting and obviously just a 
> bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3219) Change SortField types to an Enum


 [ 
https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male reassigned LUCENE-3219:
--

Assignee: Chris Male

> Change SortField types to an Enum
> -
>
> Key: LUCENE-3219
> URL: https://issues.apache.org/jira/browse/LUCENE-3219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Chris Male
>Assignee: Chris Male
>Priority: Minor
> Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, 
> LUCENE-3219.patch
>
>
> When updating my SOLR-2533 patch, one issue was that the int value I had 
> given my new type had been used by another change in the mean time.  Since we 
> don't use these fields in a bitset kind of way, we can convert them to an 
> enum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc


 [ 
https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3223:
---

Attachment: LUCENE-3223.patch

Simple patch fixing the problem.  Do I need a CHANGES entry for trivial things 
like this?

> SearchWithSortTask ignores sorting by Doc
> -
>
> Key: LUCENE-3223
> URL: https://issues.apache.org/jira/browse/LUCENE-3223
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch
>
>
> During my work in LUCENE-3912, I found the following code:
> {code}
> if (field.equals("doc")) {
> sortField0 = SortField.FIELD_DOC;
> } if (field.equals("score")) {
> sortField0 = SortField.FIELD_SCORE;
> } ...
> {code}
> This means the setting of SortField.FIELD_DOC is ignored.  While I don't know 
> much about this code, this seems like a valid setting and obviously just a 
> bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc


 [ 
https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3223:
---

Attachment: LUCENE-3223-test.patch

Test demonstrating error.

> SearchWithSortTask ignores sorting by Doc
> -
>
> Key: LUCENE-3223
> URL: https://issues.apache.org/jira/browse/LUCENE-3223
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3223-test.patch
>
>
> During my work in LUCENE-3912, I found the following code:
> {code}
> if (field.equals("doc")) {
> sortField0 = SortField.FIELD_DOC;
> } if (field.equals("score")) {
> sortField0 = SortField.FIELD_SCORE;
> } ...
> {code}
> This means the setting of SortField.FIELD_DOC is ignored.  While I don't know 
> much about this code, this seems like a valid setting and obviously just a 
> bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc

SearchWithSortTask ignores sorting by Doc
-

 Key: LUCENE-3223
 URL: https://issues.apache.org/jira/browse/LUCENE-3223
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Chris Male
Priority: Minor


During my work in LUCENE-3912, I found the following code:

{code}
if (field.equals("doc")) {
sortField0 = SortField.FIELD_DOC;
} if (field.equals("score")) {
sortField0 = SortField.FIELD_SCORE;
} ...
{code}

This means the setting of SortField.FIELD_DOC is ignored.  While I don't know 
much about this code, this seems like a valid setting and obviously just a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3219) Change SortField types to an Enum


 [ 
https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3219:
---

Attachment: LUCENE-3219.patch

Updated patch to incorporate Simon's suggestions:

- SearchWithSortTask now uses SortField.Type.valueOf().  This changes the 
exception thrown to an IllegalArgumentException.
- I haven't added Type to FieldCache.Parser since the constructor in SortField 
that accepts Parsers is deprecated and you can pull the Type from the 
CachedArrayCreator which is the preferred way of creating a SortField.  I did 
exploit this to reduce the code in the instanceof comparisons.

> Change SortField types to an Enum
> -
>
> Key: LUCENE-3219
> URL: https://issues.apache.org/jira/browse/LUCENE-3219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, 
> LUCENE-3219.patch
>
>
> When updating my SOLR-2533 patch, one issue was that the int value I had 
> given my new type had been used by another change in the mean time.  Since we 
> don't use these fields in a bitset kind of way, we can convert them to an 
> enum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-1768) NumericRange support for new query parser

2011-06-20 Thread Vinicius Barros (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinicius Barros updated LUCENE-1768:


Attachment: week4.patch

This patch includes the builder for numeric range queries. This week I intend 
to start writing junits.

> NumericRange support for new query parser
> -
>
> Key: LUCENE-1768
> URL: https://issues.apache.org/jira/browse/LUCENE-1768
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/queryparser
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Adriano Crestani
>  Labels: contrib, gsoc, gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: week1.patch, week2.patch, week3.patch, week4.patch
>
>
> It would be good to specify some type of "schema" for the query parser in 
> future, to automatically create NumericRangeQuery for different numeric 
> types? It would then be possible to index a numeric value 
> (double,float,long,int) using NumericField and then the query parser knows, 
> which type of field this is and so it correctly creates a NumericRangeQuery 
> for strings like "[1.567..*]" or "(1.787..19.5]".
> There is currently no way to extract if a field is numeric from the index, so 
> the user will have to configure the FieldConfig objects in the ConfigHandler. 
> But if this is done, it will not be that difficult to implement the rest.
> The only difference between the current handling of RangeQuery is then the 
> instantiation of the correct Query type and conversion of the entered numeric 
> values (simple Number.valueOf(...) cast of the user entered numbers). 
> Evenerything else is identical, NumericRangeQuery also supports the MTQ 
> rewrite modes (as it is a MTQ).
> Another thing is a change in Date semantics. There are some strange flags in 
> the current parser that tells it how to handle dates.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-20 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052273#comment-13052273
 ] 

Ryan McKinley commented on SOLR-2399:
-

right now this only works with 4.0

Once all the kinks are worked out -- and the things it depends on are ported to 
3.x, this will likely also get ported to 3.x

> Solr Admin Interface, reworked
> --
>
> Key: SOLR-2399
> URL: https://issues.apache.org/jira/browse/SOLR-2399
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
> SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, 
> SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
> SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch
>
>
> *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
> Interface.* [Based on this 
> [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
> *Features:*
> * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
> * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
> * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
> * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
> SOLR-2400)
> * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
> * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
> * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
> * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
> * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
> * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
> ** Stub (using static data)
> Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
> I've quickly created a Github-Repository (Just for me, to keep track of the 
> changes)
> » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-20 Thread Young Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052268#comment-13052268
 ] 

Young Kim commented on SOLR-2399:
-

Out of curiosity, does is this compatible with 3.2.0 or just 4.0? I've followed 
the instructions (with 3.2.0 of course), and I keep on hitting a build failed.

> Solr Admin Interface, reworked
> --
>
> Key: SOLR-2399
> URL: https://issues.apache.org/jira/browse/SOLR-2399
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
> SOLR-2399-110606.patch, SOLR-2399-admin-interface.patch, 
> SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
> SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch
>
>
> *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
> Interface.* [Based on this 
> [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
> *Features:*
> * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
> * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
> * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
> * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
> SOLR-2400)
> * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
> * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
> * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
> * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
> * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
> * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
> ** Stub (using static data)
> Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
> I've quickly created a Github-Repository (Just for me, to keep track of the 
> changes)
> » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2452) rewrite solr build system


[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052252#comment-13052252
 ] 

Robert Muir commented on SOLR-2452:
---

bq. So I hope to have this issue resolved this week.

Really? thats awesome!

Worst case, some of those top-level targets could be literally 'put back' 
probably with minimal modifications.
My idea of temporary nuking was to try to start over, extending lucene's build 
system, as otherwise i got lost in all the xml.

> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
>Assignee: Steven Rowe
> Fix For: 3.3, 4.0
>
> Attachments: SOLR-2452-post-reshuffling.patch, 
> SOLR-2452.dir.reshuffle.sh
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2341) explore morfologik integration


[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052251#comment-13052251
 ] 

Robert Muir commented on LUCENE-2341:
-

Sorry, about my second comment i was confusing this with the stuff you have for 
the morfologik jar itself, which is correct :)

What i should have said was, I think we should include this information in the 
top-level modules/analysis/LICENSE.txt and modules/analysis/NOTICE.txt





> explore morfologik integration
> --
>
> Key: LUCENE-2341
> URL: https://issues.apache.org/jira/browse/LUCENE-2341
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Robert Muir
>Assignee: Dawid Weiss
> Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar
>
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
> available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option 
> for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2452) rewrite solr build system


[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052250#comment-13052250
 ] 

Steven Rowe commented on SOLR-2452:
---

bq. However, I think I would recommend thinking about when you want to make the 
change: it will make merging code up to this branch nearly impossible... is it 
holding back other changes or is this a final step?

It's not a final step.  All of the targets you removed need to be put back (I 
counted 40 or so).  But I think this will be a minor amount of work 
comparitively.

I think for the moment I'll keep iterating on the patch, rather than committing 
it to the branch, to minimize merge costs, until I have all of the Solr targets 
re-implemented.  I don't think it'll take too long, maybe another day or two.

Once that's done, I'll commit the moves/copies from the shell script and the 
patch, then generate a full patch for review.  Assuming there are no objections 
then, I plan to commit within a day or so to minimize merge costs.

So I hope to have this issue resolved this week.


> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
> Fix For: 3.3, 4.0
>
> Attachments: SOLR-2452-post-reshuffling.patch, 
> SOLR-2452.dir.reshuffle.sh
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2452) rewrite solr build system


 [ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2452:
--

Fix Version/s: 4.0

> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
>Assignee: Steven Rowe
> Fix For: 3.3, 4.0
>
> Attachments: SOLR-2452-post-reshuffling.patch, 
> SOLR-2452.dir.reshuffle.sh
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2452) rewrite solr build system


 [ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe reassigned SOLR-2452:
-

Assignee: Steven Rowe

> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
>Assignee: Steven Rowe
> Fix For: 3.3, 4.0
>
> Attachments: SOLR-2452-post-reshuffling.patch, 
> SOLR-2452.dir.reshuffle.sh
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2341) explore morfologik integration


[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052246#comment-13052246
 ] 

Robert Muir commented on LUCENE-2341:
-

Hi Michał,

This patch looks great!

I took a quick glance, here are a couple suggestions:
* In the MorfologikFilter, I think we should implement reset(), first calling 
the superclass reset(), then clearing the stemsAcc list. This ensures that all 
of the filter's state is cleared before it is reused. Under normal operations, 
this should not be necessary, but some consumers in Lucene (e.g. 
LimitTokenCountFilter, and some similar code in the Highlighter), will only 
partially consume up to some point, then suddenly stop. By clearing this list 
in reset() we ensure that there is no chance any leftover stems will appear in 
the next stream.
* because the data is licensed under MPL, I think we should explicitly list a 
hyperlink if possible to the source code used in the NOTICE.txt. I saw you 
included some wordage in LICENSE.txt but I think this should only say 'XYZ data 
is under this license, with the actual MPL license text. In the NOTICE.txt we 
should link to the source code I think... there is some more information on 
this under the section Category B: Reciprocal Licenses at 
http://www.apache.org/legal/3party.html


> explore morfologik integration
> --
>
> Key: LUCENE-2341
> URL: https://issues.apache.org/jira/browse/LUCENE-2341
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Robert Muir
>Assignee: Dawid Weiss
> Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar
>
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
> available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option 
> for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2341) explore morfologik integration

2011-06-20 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Dybizbański updated LUCENE-2341:
---

Attachment: morfologik-stemming-1.5.0.jar
LUCENE-2341.diff

Hi

This patch introduces stemming filter and analyzer, that use [Morfologik 
library|http://morfologik.blogspot.com], developed by Dawid Weiss and Marcin 
Miłkowski.
Tokens are stemmed by Morfologik with a dictionary, and current distribution 
provides a dictionary for polish language.

The MorfologikFilter yields one or more terms for each token. Each of those 
terms is given the same position in the index.

I'm attaching a binary distribution of the library 
(morfologik-stemming-1.5.0.jar), that needs to be placed in 
modules/analysis/morfologik/lib/ subdirectory.
It is also available as a [Maven 
artifact|http://mvnrepository.com/artifact/org.carrot2/morfologik-stemming/1.5.0].

The library is BSD-licensed and a dictionary uses data from [Polish dictionary 
for aspell/ispell/myspell (SJP.PL)|http://www.sjp.pl/slownik/en/], which is 
licensed under GPL, LGPL, MPL and CC SA licenses.

This is my first contribution to the Lucene project, so please be forgiving :)
Thanks to Dawid for help.

Regards,
  Michał


> explore morfologik integration
> --
>
> Key: LUCENE-2341
> URL: https://issues.apache.org/jira/browse/LUCENE-2341
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Robert Muir
>Assignee: Dawid Weiss
> Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar
>
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
> available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option 
> for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2548) Remove all interning of field names from flex API


[ 
https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052237#comment-13052237
 ] 

Michael McCandless commented on LUCENE-2548:


Woops -- my comment was just saying that both == and ! = cases weren't always 
caught by PMD/findbugs.  But maybe I somehow messed up running them!

> Remove all interning of field names from flex API
> -
>
> Key: LUCENE-2548
> URL: https://issues.apache.org/jira/browse/LUCENE-2548
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2548.patch, LUCENE-2548.patch
>
>
> In previous versions of Lucene, interning of fields was important to minimize 
> string comparison cost when iterating TermEnums, to detect changes in field 
> name. As we separated field names from terms in flex, no query compares field 
> names anymore, so the whole performance problematic interning can be removed. 
> I will start with doing this, but we need to carefully review some places 
> e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) 
> Robert?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2548) Remove all interning of field names from flex API

2011-06-20 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052236#comment-13052236
 ] 

Uwe Schindler commented on LUCENE-2548:
---

bq. Can you explain shortly what "Unable to render embedded object: File" has 
to do with interning?

That was just a JIRA formatting issue in Mike's comment I was referring to.

> Remove all interning of field names from flex API
> -
>
> Key: LUCENE-2548
> URL: https://issues.apache.org/jira/browse/LUCENE-2548
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2548.patch, LUCENE-2548.patch
>
>
> In previous versions of Lucene, interning of fields was important to minimize 
> string comparison cost when iterating TermEnums, to detect changes in field 
> name. As we separated field names from terms in flex, no query compares field 
> names anymore, so the whole performance problematic interning can be removed. 
> I will start with doing this, but we need to carefully review some places 
> e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) 
> Robert?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-20 Thread Mark Harwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052223#comment-13052223
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. prevSetBit is called for each child doc

You could call nextSetBit on the first child to know the "safe" range of child 
docs attributable to the same parent but you would be taking a gamble that this 
was worth the call i.e. there were many possible children per parent to be 
tested.

bq. It uses 2 passes if you also want to collect child docs per parent

I tend to work with distributed indexes so it involves a 2 pass op anyway - one 
to understand best parents across the multiple shards first then the 
perparentlimitedquery to ensure we only pay the retrieve costs for those 
parents that make the final cut.

bq. I think it should use a PQ to find the lowest child to evict per parent doc?

Careful object reuse would need to be factored in to avoid excessive GC - each 
parent would fill a PQ full of child-match object instances that could/should 
be reused in assessing the next parent



> Nested Document query support
> -
>
> Key: LUCENE-2454
> URL: https://issues.apache.org/jira/browse/LUCENE-2454
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: 3.0.2
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
> LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in 
> http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3218) Make CFS appendable


[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1305#comment-1305
 ] 

Michael McCandless commented on LUCENE-3218:


Patch looks cool!

So the CFW will take the first output opened against it and let it write
directly into the "actual" CFS file, and then if another file is
opened while that first one is still open, the 2nd file will write to
separate file and then will copy in on close.  We may want to delegate
the separate files too?  So that on close they copy themselves into
the CFS and remove the original?  This way IW won't have to separately
create CFS in the end.

Somehow we need IW to add the biggest sub-file first...

s/compund/compound

CFW.close should assert currentOutput != null (and, if we delegate sep
entries, that they are also all closed)?

You might need to sync the CompoundFileWriter.this.currentOutput test
/ setting to null?  Though... Lucene is always single threaded in
writing files for the same segment, today anyway.

Can we make a separate createCompoundOutput?  (Ie, instaed of passing
OpenMode to openCompoundInput).  And: I'm assuming a given compound
output can only be opened once, appended to / separate files copied
into, closed and then never opened again for writing?  (Ie, still
"write once" at the file level).


> Make CFS appendable  
> -
>
> Key: LUCENE-3218
> URL: https://issues.apache.org/jira/browse/LUCENE-3218
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3218.patch
>
>
> Currently CFS is created once all files are written during a flush / merge. 
> Once on disk the files are copied into the CFS format which is basically a 
> unnecessary for some of the files. We can at any time write at least one file 
> directly into the CFS which can save a reasonable amount of IO. For instance 
> stored fields could be written directly during indexing and during a Codec 
> Flush one of the written files can be appended directly. This optimization is 
> a nice sideeffect for lucene indexing itself but more important for DocValues 
> and LUCENE-3216 we could transparently pack per field files into a single 
> file only for docvalues without changing any code once LUCENE-3216 is 
> resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8953 - Failure

2011-06-20 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8953/

8 tests failed.
FAILED:  org.apache.lucene.util.automaton.TestMinimize.testElements

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at java.lang.Thread.run(Thread.java:636)


REGRESSION:  org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety

Error Message:
Error occurred in thread Thread-107: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/7/test5658056595tmp/_g_5.pyl
 (Too many open files in system)

Stack Trace:
junit.framework.AssertionFailedError: Error occurred in thread Thread-107:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/7/test5658056595tmp/_g_5.pyl
 (Too many open files in system)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333)
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/7/test5658056595tmp/_g_5.pyl
 (Too many open files in system)
at 
org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822)


REGRESSION:  org.apache.lucene.index.TestStressIndexing2.testRandomIWReader

Error Message:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/test4023440248tmp/_c_1.tib
 (Too many open files in system)

Stack Trace:
java.io.FileNotFoundException: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/test4023440248tmp/_c_1.tib
 (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:416)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:293)
at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:375)
at 
org.apache.lucene.index.codecs.BlockTermsWriter.(BlockTermsWriter.java:75)
at 
org.apache.lucene.index.codecs.mockrandom.MockRandomCodec.fieldsConsumer(MockRandomCodec.java:226)
at 
org.apache.lucene.index.PerFieldCodecWrapper$FieldsWriter.(PerFieldCodecWrapper.java:73)
at 
org.apache.lucene.index.PerFieldCodecWrapper.fieldsConsumer(PerFieldCodecWrapper.java:61)
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:565)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:116)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3466)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3110)
at 
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1877)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1872)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1868)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:287)
at 
org.apache.lucene.index.TestStressIndexing2.testRandomIWReader(TestStressIndexing2.java:67)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333)


REGRESSION:  org.apache.lucene.search.TestBooleanOr.testElements

Error Message:
org/apache/lucene/search/MatchAllDocsQuery$MatchAllScorer

Stack Trace:
java.lang.NoClassDefFoundError: 
org/apache/lucene/search/MatchAllDocsQuery$MatchAllScorer
at 
org.apache.lucene.search.MatchAllDocsQuery.createWeight(MatchAllDocsQuery.java:153)
at 
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:676)
at 
org.apache.lucene.search.QueryWrapperFilter.getDocIdSet(QueryWrapperFilter.java:55)
at 
org.apache.lucene.index.BufferedDeletesStream.applyQueryDeletes(BufferedDeletesStream.java:441)
at 
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:281)
at 
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2836)
at 
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2827)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2803)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2772)
at 
org.apache.lu

[jira] [Commented] (LUCENE-2548) Remove all interning of field names from flex API

2011-06-20 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052215#comment-13052215
 ] 

Uwe Schindler commented on LUCENE-2548:
---

Hi Mike,

patch looks great, thanks for doing this hard work :-) PreFlexCodec looks fine, 
see no problems there. Lucene code iterating TermsEnums was successfully 
cleaned up (the lovely MTQs) from T.createTerm and equals added at some places.

I cannot check if there are comparisons missing, I wonder why PMD/Findbugs has 
bugs that it does not find all occurences, maybe because some SuppressWarnings 
also hiding those occurences? Can you explain shortly what "Unable to render 
embedded object: File" has to do with interning?

Solr code is fine, I expected more to change. Some places in Solr still seems 
to use some "placeholder" terms (called idTerm and other names). We should 
maybe check if they are only field names in reality?

GREAT WORK! I AM SO HAPPY, dumdidumm...!

> Remove all interning of field names from flex API
> -
>
> Key: LUCENE-2548
> URL: https://issues.apache.org/jira/browse/LUCENE-2548
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2548.patch, LUCENE-2548.patch
>
>
> In previous versions of Lucene, interning of fields was important to minimize 
> string comparison cost when iterating TermEnums, to detect changes in field 
> name. As we separated field names from terms in flex, no query compares field 
> names anymore, so the whole performance problematic interning can be removed. 
> I will start with doing this, but we need to carefully review some places 
> e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) 
> Robert?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2454) Nested Document query support


[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052194#comment-13052194
 ] 

Michael McCandless commented on LUCENE-2454:


bq. Would modules/grouping meanwhile be a better place for this than 
lucene/contrib/queries?

I think modules/join is the right place?  When we factor out Solr's
generic join impl it can go there too...

I have some concerns about the current approach here (this is why I
opened LUCENE-3171):

  * prevSetBit is called for each child doc, which is an O(N^2) cost
(N = number of child docs for one parent) I think?  Admittedly,
"typically" N is probably small...

  * It uses 2 passes if you also want to collect child docs per
parent

  * PerParentLimitedQuery is also O(N^2) cost, both on insert of a new
child and on popping the child docs per group: I think it should
use a PQ to find the lowest child to evict per parent doc?

  * I think "typically" an app will want to collect the top N groups
(parent docs and their children), so it's more efficient to gather
those top N and only in the end sort the each set of children
per-parent?  (This is similar to how 2nd pass grouping collector
works).

  * PerParentLimitedQuery only supports relevance sort w/in each
parent.

  * You don't get the parent/child structure back, from
PerParentLimitedQuery (but now we have TopGroups which is a great
match for representing each parent and its children).

If you always only use PerParentLimitedQuery on the top parents from
the first pass, eg you AND/filter it against those parent docs, then
the O(N^2) cost is less severe since it'll have a small constant in
front, but since it's a Query I imagine users will use it w/o that
filter, which is bad... I think using a TopN Collector is a better match
here.


> Nested Document query support
> -
>
> Key: LUCENE-2454
> URL: https://issues.apache.org/jira/browse/LUCENE-2454
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: 3.0.2
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
> LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in 
> http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2548) Remove all interning of field names from flex API

2011-06-20 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052184#comment-13052184
 ] 

Uwe Schindler commented on LUCENE-2548:
---

Yupee Juhee. I was on business trip whole day. Insane! Will review soon!

> Remove all interning of field names from flex API
> -
>
> Key: LUCENE-2548
> URL: https://issues.apache.org/jira/browse/LUCENE-2548
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2548.patch, LUCENE-2548.patch
>
>
> In previous versions of Lucene, interning of fields was important to minimize 
> string comparison cost when iterating TermEnums, to detect changes in field 
> name. As we separated field names from terms in flex, no query compares field 
> names anymore, so the whole performance problematic interning can be removed. 
> I will start with doing this, but we need to carefully review some places 
> e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) 
> Robert?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2548) Remove all interning of field names from flex API


 [ 
https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2548:
---

Attachment: LUCENE-2548.patch

I agree -- I removed createTerm!

And fixed the nocommits

Beast chewed on this for a while and didn't hit any failures except various 
Solr tests that still intermittently fail... I think it's ready!

> Remove all interning of field names from flex API
> -
>
> Key: LUCENE-2548
> URL: https://issues.apache.org/jira/browse/LUCENE-2548
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2548.patch, LUCENE-2548.patch
>
>
> In previous versions of Lucene, interning of fields was important to minimize 
> string comparison cost when iterating TermEnums, to detect changes in field 
> name. As we separated field names from terms in flex, no query compares field 
> names anymore, so the whole performance problematic interning can be removed. 
> I will start with doing this, but we need to carefully review some places 
> e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) 
> Robert?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

REMINDER: Participation Requested: Survey about Open-Source Software Development

2011-06-20 Thread Jeffrey Carver

Hi,

Apologies for any inconvenience and thank you to those who have already
completed the survey. We will keep the survey open for another couple of
weeks. But, we do hope you will consider responding to the email request
below (sent 2 weeks ago).

Thanks,

Dr. Jeffrey Carver
Assistant Professor
University of Alabama
(v) 205-348-9829  (f) 205-348-0219
http://www.cs.ua.edu/~carver

-Original Message-
From: Jeffrey Carver [mailto:opensourcesur...@cs.ua.edu] 
Sent: Monday, June 13, 2011 11:27 AM
To: 'dev@lucene.apache.org'
Subject: Participation Requested: Survey about Open-Source Software
Development

Hi,

Drs. Jeffrey Carver, Rosanna Guadagno, Debra McCallum, and Mr. Amiangshu
Bosu,  University of Alabama, and Dr. Lorin Hochstein, University of
Southern California, are conducting a survey of open-source software
developers. This survey seeks to understand how developers on distributed,
virtual teams, like open-source projects, interact with each other to
accomplish their tasks. You must be at least 19 years of age to complete the
survey. The survey should take approximately 15 minutes to complete.

If you are actively participating as a developer, please consider completing
our survey.
 
Here is the link to the survey:   http://goo.gl/HQnux

We apologize for inconvenience and if you receive multiple copies of this
email. This survey has been approved by The University of Alabama IRB board.

Thanks,

Dr. Jeffrey Carver
Assistant Professor
University of Alabama
(v) 205-348-9829  (f) 205-348-0219
http://www.cs.ua.edu/~carver



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3222) Buffered deletes under count RAM

Buffered deletes under count RAM


 Key: LUCENE-3222
 URL: https://issues.apache.org/jira/browse/LUCENE-3222
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.3, 4.0


I found this while working on LUCENE-2548: when we freeze the deletes (create 
FrozenBufferedDeletes), when we set the bytesUsed we are failing to account for 
RAM required for the term bytes (and now term field).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3201) improved compound file handling


[ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052159#comment-13052159
 ] 

Robert Muir commented on LUCENE-3201:
-

I didnt commit because I didn't measure any performance improvements from the 
patch (this frustrated me).
Also, I didn't address Uwe's last comment...

In general, I was thinking that this would be a good performance win, but it 
isn't. So we should consider it from a refactoring perspective only.


> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3201) improved compound file handling


[ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052156#comment-13052156
 ] 

Simon Willnauer commented on LUCENE-3201:
-

this seems ready to commit... I think we should get that in so I can take it 
further on LUCENE-3218

Robert is it ok for you if I commit this or are you gonig to do it?

simon

> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3218) Make CFS appendable


 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3218:


Attachment: LUCENE-3218.patch

first sketch still some nocommits - this patch includes the latest patch from 
LUCENE-3201 which made the CFS part of directory. This patch adds write support 
to the CompoundFileDirectory. The CFWriter tries to write files directly to the 
CFS if possible like when no other file is currently open for writing it opens 
a stream directly on the CFS. Yet, this change also adds a new file to the CFS 
(.cfe) which only holds the entry table which makes all seeks unneeded (plays 
better with AppendingCodec).

I currently don't use it during indexing since we decided after flush if we use 
CFS or not. Yet this might change with this optimization but I will leave this 
to another issue.



> Make CFS appendable  
> -
>
> Key: LUCENE-3218
> URL: https://issues.apache.org/jira/browse/LUCENE-3218
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3218.patch
>
>
> Currently CFS is created once all files are written during a flush / merge. 
> Once on disk the files are copied into the CFS format which is basically a 
> unnecessary for some of the files. We can at any time write at least one file 
> directly into the CFS which can save a reasonable amount of IO. For instance 
> stored fields could be written directly during indexing and during a Codec 
> Flush one of the written files can be appended directly. This optimization is 
> a nice sideeffect for lucene indexing itself but more important for DocValues 
> and LUCENE-3216 we could transparently pack per field files into a single 
> file only for docvalues without changing any code once LUCENE-3216 is 
> resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2609) Allow arbitrary bbox lat-lon, not limited to circle

2011-06-20 Thread Zac Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052151#comment-13052151
 ] 

Zac Smith commented on SOLR-2609:
-

Thanks David, I have updated this to be a feature request.

> Allow arbitrary bbox lat-lon, not limited to circle
> ---
>
> Key: SOLR-2609
> URL: https://issues.apache.org/jira/browse/SOLR-2609
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Affects Versions: 3.1
>Reporter: Zac Smith
>  Labels: spatialsearch
>
> The Spatial Search documentation states that you can create your own bounding 
> box using a range query:
> "Since the LatLonType field also supports field queries and range queries, 
> one can manually create their own bounding box rather than using bbox: 
> ...&q=*:*&fq=store:[45,-94 TO 46,-93]"
> This works unless your range covers an area where longitude goes from 180 to 
> -180. For instance I want all items in the longitude range of 
> 178 to -177 which of course gives no results (it is not a valid numeric 
> range). It's not really surprising that this doesn't work as it is just a 
> standard range query with no spatial filters being applied.
> UPDATE
> Updated issue to be an enhancement, title changed.
> Desired functionality is for bbox to accept coordinate parameters for an 
> arbitrary size bounding box. The bbox should take into account the prime 
> meridians, in particular the 180th meridian.
> Documentation also needs to be updated to remove incorrect query example.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2609) Allow arbitrary bbox lat-lon, not limited to circle

2011-06-20 Thread Zac Smith (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zac Smith updated SOLR-2609:


Description: 
The Spatial Search documentation states that you can create your own bounding 
box using a range query:
"Since the LatLonType field also supports field queries and range queries, one 
can manually create their own bounding box rather than using bbox: 
...&q=*:*&fq=store:[45,-94 TO 46,-93]"

This works unless your range covers an area where longitude goes from 180 to 
-180. For instance I want all items in the longitude range of 
178 to -177 which of course gives no results (it is not a valid numeric range). 
It's not really surprising that this doesn't work as it is just a standard 
range query with no spatial filters being applied.

UPDATE
Updated issue to be an enhancement, title changed.

Desired functionality is for bbox to accept coordinate parameters for an 
arbitrary size bounding box. The bbox should take into account the prime 
meridians, in particular the 180th meridian.
Documentation also needs to be updated to remove incorrect query example.

  was:
The Spatial Search documentation states that you can create your own bounding 
box using a range query:
"Since the LatLonType field also supports field queries and range queries, one 
can manually create their own bounding box rather than using bbox: 
...&q=*:*&fq=store:[45,-94 TO 46,-93]"

This works unless your range covers an area where longitude goes from 180 to 
-180. For instance I want all items in the longitude range of 
178 to -177 which of course gives no results (it is not a valid numeric range). 
It's not really surprising that this doesn't work as it is just a standard 
range query with no spatial filters being applied.

I am wondering if this is just an issue with the documentation and there is 
another way that this should be done? Please advise if more details are needed.

 Issue Type: Improvement  (was: Bug)
Summary: Allow arbitrary bbox lat-lon, not limited to circle  (was: 
Coordinate range queries do not work with Spatial Solr)

> Allow arbitrary bbox lat-lon, not limited to circle
> ---
>
> Key: SOLR-2609
> URL: https://issues.apache.org/jira/browse/SOLR-2609
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Affects Versions: 3.1
>Reporter: Zac Smith
>  Labels: spatialsearch
>
> The Spatial Search documentation states that you can create your own bounding 
> box using a range query:
> "Since the LatLonType field also supports field queries and range queries, 
> one can manually create their own bounding box rather than using bbox: 
> ...&q=*:*&fq=store:[45,-94 TO 46,-93]"
> This works unless your range covers an area where longitude goes from 180 to 
> -180. For instance I want all items in the longitude range of 
> 178 to -177 which of course gives no results (it is not a valid numeric 
> range). It's not really surprising that this doesn't work as it is just a 
> standard range query with no spatial filters being applied.
> UPDATE
> Updated issue to be an enhancement, title changed.
> Desired functionality is for bbox to accept coordinate parameters for an 
> arbitrary size bounding box. The bbox should take into account the prime 
> meridians, in particular the 180th meridian.
> Documentation also needs to be updated to remove incorrect query example.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum


[ 
https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052137#comment-13052137
 ] 

Simon Willnauer commented on LUCENE-3219:
-

chris, patch looks good...

some minor comments:

* I wonder if a parser could hold a Type so we could get rid of the if (parser 
instanceof FieldCache.$Parser) ?
* in SearchWithSortTask I wonder if you could simply call 
Type.valueOf(typeString.toUpperCase()); - the less code the better :)

overall looks good

simon

> Change SortField types to an Enum
> -
>
> Key: LUCENE-3219
> URL: https://issues.apache.org/jira/browse/LUCENE-3219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch
>
>
> When updating my SOLR-2533 patch, one issue was that the int value I had 
> given my new type had been used by another change in the mean time.  Since we 
> don't use these fields in a bitset kind of way, we can convert them to an 
> enum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2609) Coordinate range queries do not work with Spatial Solr

2011-06-20 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052136#comment-13052136
 ] 

David Smiley commented on SOLR-2609:


Yes, this should be a feature request for "Allow arbitrary bbox lat-lon, not 
limited to circle".  Under the hood, I recall the first order of business is 
resolving the point-radius to a bounding box. At that point the special 
prime-meridian logic is handled. It seems it would not be hard to make a patch 
that ads new parameters for explicit lat-lon bbox params.

> Coordinate range queries do not work with Spatial Solr
> --
>
> Key: SOLR-2609
> URL: https://issues.apache.org/jira/browse/SOLR-2609
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 3.1
>Reporter: Zac Smith
>  Labels: spatialsearch
>
> The Spatial Search documentation states that you can create your own bounding 
> box using a range query:
> "Since the LatLonType field also supports field queries and range queries, 
> one can manually create their own bounding box rather than using bbox: 
> ...&q=*:*&fq=store:[45,-94 TO 46,-93]"
> This works unless your range covers an area where longitude goes from 180 to 
> -180. For instance I want all items in the longitude range of 
> 178 to -177 which of course gives no results (it is not a valid numeric 
> range). It's not really surprising that this doesn't work as it is just a 
> standard range query with no spatial filters being applied.
> I am wondering if this is just an issue with the documentation and there is 
> another way that this should be done? Please advise if more details are 
> needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2524) Adding grouping to Solr 3x


[ 
https://issues.apache.org/jira/browse/SOLR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052114#comment-13052114
 ] 

Michael McCandless commented on SOLR-2524:
--

bq. Question: Does this support the option of getting facet counts after 
grouping? I am getting lost in all the issues

I don't think it does.  For that we need LUCENE-3097, which I think (?) is 
close.

> Adding grouping to Solr 3x
> --
>
> Key: SOLR-2524
> URL: https://issues.apache.org/jira/browse/SOLR-2524
> Project: Solr
>  Issue Type: New Feature
>Reporter: Martijn van Groningen
>Assignee: Martijn van Groningen
> Fix For: 3.3
>
> Attachments: SOLR-2524.patch, SOLR-2524.patch, SOLR-2524.patch, 
> SOLR-2524.patch, SOLR-2524.patch, SOLR-2524.patch
>
>
> Grouping was recently added to Lucene 3x. See LUCENE-1421 for more 
> information.
> I think it would be nice if we expose this functionality also to the Solr 
> users that are bound to a 3.x version.
> The grouping feature added to Lucene is currently a subset of the 
> functionality that Solr 4.0-trunk offers. Mainly it doesn't support grouping 
> by function / query.
> The work involved getting the grouping contrib to work on Solr 3x is 
> acceptable. I have it more or less running here. It supports the response 
> format and request parameters (expect: group.query and group.func) described 
> in the FieldCollapse page on the Solr wiki.
> I think it would be great if this is included in the Solr 3.2 release. Many 
> people are using grouping as patch now and this would help them a lot. Any 
> thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-236) Field collapsing


 [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved SOLR-236.
-

Resolution: Duplicate

Resolving this lon issue as a duplicate of SOLR-2524, which brings grouping 
(finally!) to Solr 3.x via the new (factored out from Solr's trunk grouping 
impl then backported to 3.x) grouping module.

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
>Assignee: Shalin Shekhar Mangar
> Fix For: 3.3
>
> Attachments: DocSetScoreCollector.java, 
> NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
> SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, 
> SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-branch_3x.patch, SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, 
> SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
> SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, 
> collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, 
> collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
> field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> quasidistributed.additional.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene 3.3 release soon?

2011-06-20 Thread Michael McCandless

+1 to releasing 3.3 in a few weeks... there's a lot of new stuff after 3.2.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir  wrote:
> i was planning on doing an RC in a few weeks actually.
>
> we have a lot of good stuff in there today already, however i wanted
> to give a few weeks for the grouping stuff to run on hudson.
>
> On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer
>  wrote:
>> I would say within the next 3 month.
>>
>> Thoughts?
>>
>> On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček  wrote:
>>> Hi,
>>> How soon can we expect official Lucene 3.3 release?
>>> Best regards,
>>> Lukas
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2609) Coordinate range queries do not work with Spatial Solr

2011-06-20 Thread Zac Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052093#comment-13052093
 ] 

Zac Smith commented on SOLR-2609:
-

It would be really great if there was support for creating arbitrary bounding 
boxes that do work over the 180th meridian.
Should this be changed from a bug to a feature request to that end?

> Coordinate range queries do not work with Spatial Solr
> --
>
> Key: SOLR-2609
> URL: https://issues.apache.org/jira/browse/SOLR-2609
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 3.1
>Reporter: Zac Smith
>  Labels: spatialsearch
>
> The Spatial Search documentation states that you can create your own bounding 
> box using a range query:
> "Since the LatLonType field also supports field queries and range queries, 
> one can manually create their own bounding box rather than using bbox: 
> ...&q=*:*&fq=store:[45,-94 TO 46,-93]"
> This works unless your range covers an area where longitude goes from 180 to 
> -180. For instance I want all items in the longitude range of 
> 178 to -177 which of course gives no results (it is not a valid numeric 
> range). It's not really surprising that this doesn't work as it is just a 
> standard range query with no spatial filters being applied.
> I am wondering if this is just an issue with the documentation and there is 
> another way that this should be done? Please advise if more details are 
> needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3219) Change SortField types to an Enum


 [ 
https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3219:
---

Attachment: LUCENE-3219.patch

Even better patch which CHANGES entry correct.

> Change SortField types to an Enum
> -
>
> Key: LUCENE-3219
> URL: https://issues.apache.org/jira/browse/LUCENE-3219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch
>
>
> When updating my SOLR-2533 patch, one issue was that the int value I had 
> given my new type had been used by another change in the mean time.  Since we 
> don't use these fields in a bitset kind of way, we can convert them to an 
> enum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3219) Change SortField types to an Enum


 [ 
https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3219:
---

Attachment: LUCENE-3219.patch

Patch updated to trunk.  Compiles and tests pass.  

I intend to commit in the next day or so.

> Change SortField types to an Enum
> -
>
> Key: LUCENE-3219
> URL: https://issues.apache.org/jira/browse/LUCENE-3219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3219.patch, LUCENE-3219.patch
>
>
> When updating my SOLR-2533 patch, one issue was that the int value I had 
> given my new type had been used by another change in the mean time.  Since we 
> don't use these fields in a bitset kind of way, we can convert them to an 
> enum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities


[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052052#comment-13052052
 ] 

Robert Muir commented on LUCENE-3220:
-

one last thing, can we do 'numberOfFieldTokens' instead of noFieldTokens? 

then I think we can commit this as a step, should make things a lot easier for 
experimentation, if you are new to lucene it will make life much easier.


> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities


 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Oh, sorry, how lame of me :( Actually I am working now on a different machine 
than the one I usually do, so that's why I made those mistakes. Anyhow, I have 
fixed them.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities


[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052032#comment-13052032
 ] 

Robert Muir commented on LUCENE-3220:
-

oh two more nitpicky comments: 
* can you update the patch to use two-spaces instead of tabs? if you use 
eclipse, you can download this and configure this as your default codestyle: 
http://people.apache.org/~rmuir/Eclipse-Lucene-Codestyle.xml
* can you also remove the @author? For legal reasons (i think actually for your 
protection!) we omit these from new files.
* it might be a good idea to use the tag @lucene.experimental also for new 
classes: this is a template that 'ant-javadocs' replaces with "WARNING: This 
API is experimental and might change in incompatible ways in the next release." 
to tell users that its very new and not to expect precise backwards 
compatibility.


> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities


[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052029#comment-13052029
 ] 

Robert Muir commented on LUCENE-3220:
-

bq. I'll put a nocommit there for the time being, and if no sims use it, I'll 
just remove it from the Stats. Terrier has it, though, so I guess there should 
be at least one method that depends on it.

I've never seen one that did... I don't imagine us ever implementing this 
efficiently given that we support incremental indexing.


> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities


 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch, LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities


[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052025#comment-13052025
 ] 

David Mark Nemeskey commented on LUCENE-3220:
-

 * I was wondering about that too -- actually docNo is a mistake, it should 
have been noDocs or noOfDocs anyway, but I guess I'll just go with 
numberOfDocuments.
 * I'll put a nocommit there for the time being, and if no sims use it, I'll 
just remove it from the Stats. Terrier has it, though, so I guess there should 
be at least one method that depends on it.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2548) Remove all interning of field names from flex API


[ 
https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052024#comment-13052024
 ] 

Robert Muir commented on LUCENE-2548:
-

is there any reason to keep Term.createTerm() after we do this? seems useless 
after interning is removed.

> Remove all interning of field names from flex API
> -
>
> Key: LUCENE-2548
> URL: https://issues.apache.org/jira/browse/LUCENE-2548
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2548.patch
>
>
> In previous versions of Lucene, interning of fields was important to minimize 
> string comparison cost when iterating TermEnums, to detect changes in field 
> name. As we separated field names from terms in flex, no query compares field 
> names anymore, so the whole performance problematic interning can be removed. 
> I will start with doing this, but we need to carefully review some places 
> e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) 
> Robert?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities


[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052019#comment-13052019
 ] 

Robert Muir commented on LUCENE-3220:
-

a few comments (it generally looks close to me):
* maybe we should use 'numberOfDocuments' instead of 'docNo' and same with 
'numberOfFieldTokens'? this might make the naming more clear
* i'm worried about 'uniqueTermCount', do you know of which implementations 
require this? this number is not accurate if the index has more than one 
segment.


> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2611) Typos in /example solrconfig.xml


 [ 
https://issues.apache.org/jira/browse/SOLR-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2611.
---

   Resolution: Fixed
Fix Version/s: 3.3

Thank you Eric!

> Typos in /example solrconfig.xml
> 
>
> Key: SOLR-2611
> URL: https://issues.apache.org/jira/browse/SOLR-2611
> Project: Solr
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.2
>Reporter: Eric Pugh
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: typos.patch
>
>
> I noticed many typos have crept into the example app's Solrconfig.xml.  I 
> will attach a patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2611) Typos in /example solrconfig.xml

2011-06-20 Thread Eric Pugh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Pugh updated SOLR-2611:


Attachment: typos.patch

> Typos in /example solrconfig.xml
> 
>
> Key: SOLR-2611
> URL: https://issues.apache.org/jira/browse/SOLR-2611
> Project: Solr
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.2
>Reporter: Eric Pugh
>Priority: Minor
> Fix For: 4.0
>
> Attachments: typos.patch
>
>
> I noticed many typos have crept into the example app's Solrconfig.xml.  I 
> will attach a patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2611) Typos in /example solrconfig.xml

2011-06-20 Thread Eric Pugh (JIRA)

Typos in /example solrconfig.xml


 Key: SOLR-2611
 URL: https://issues.apache.org/jira/browse/SOLR-2611
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.2
Reporter: Eric Pugh
Priority: Minor
 Fix For: 4.0


I noticed many typos have crept into the example app's Solrconfig.xml.  I will 
attach a patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2382) DIH Cache Improvements

2011-06-20 Thread James Dyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052004#comment-13052004
 ] 

James Dyer commented on SOLR-2382:
--

Noble,

I appreciate your interest in this issue!  I could easily move 
BerkleyBackedCache to its one issue.  This would remove any difficulty in 
dealing with the Sleepycat License.  We would still want to maintain the 
SortedMapBackedCache, however.  Otherwise we would lose all caching ability (it 
would break CachedSqlEntityProcessor).

In any case, if your goal is to break this issue into more managable chunks 
just offloading BerkleyBackedCache might not be enough.  I had considered 
breaking this up into possibly 3 parts because I realize this is a huge patch.  
But the functionality is all designed to work together and it would have been 
more work for me, etc.

Let me know what you want me to do.  I would love to see this integrated with a 
GA release someday.  I think this would have broad application and a lot of 
real-world use cases.  (& we depend on it here...)

> DIH Cache Improvements
> --
>
> Key: SOLR-2382
> URL: https://issues.apache.org/jira/browse/SOLR-2382
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
> SOLR-2382.patch, SOLR-2382.patch
>
>
> Functionality:
>  1. Provide a pluggable caching framework for DIH so that users can choose a 
> cache implementation that best suits their data and application.
>  
>  2. Provide a means to temporarily cache a child Entity's data without 
> needing to create a special cached implementation of the Entity Processor 
> (such as CachedSqlEntityProcessor).
>  
>  3. Provide a means to write the final (root entity) DIH output to a cache 
> rather than to Solr.  Then provide a way for a subsequent DIH call to use the 
> cache as an Entity input.  Also provide the ability to do delta updates on 
> such persistent caches.
>  
>  4. Provide the ability to partition data across multiple caches that can 
> then be fed back into DIH and indexed either to varying Solr Shards, or to 
> the same Core in parallel.
> Use Cases:
>  1. We needed a flexible & scalable way to temporarily cache child-entity 
> data prior to joining to parent entities.
>   - Using SqlEntityProcessor with Child Entities can cause an "n+1 select" 
> problem.
>   - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching 
> mechanism and does not scale.
>   - There is no way to cache non-SQL inputs (ex: flat files, xml, etc).
>  
>  2. We needed the ability to gather data from long-running entities by a 
> process that runs separate from our main indexing process.
>   
>  3. We wanted the ability to do a delta import of only the entities that 
> changed.
>   - Lucene/Solr requires entire documents to be re-indexed, even if only a 
> few fields changed.
>   - Our data comes from 50+ complex sql queries and/or flat files.
>   - We do not want to incur overhead re-gathering all of this data if only 1 
> entity's data changed.
>   - Persistent DIH caches solve this problem.
>   
>  4. We want the ability to index several documents in parallel (using 1.4.1, 
> which did not have the "threads" parameter).
>  
>  5. In the future, we may need to use Shards, creating a need to easily 
> partition our source data into Shards.
> Implementation Details:
>  1. De-couple EntityProcessorBase from caching.  
>   - Created a new interface, DIHCache & two implementations:  
> - SortedMapBackedCache - An in-memory cache, used as default with 
> CachedSqlEntityProcessor (now deprecated).
> - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested 
> with je-4.1.6.jar
>- NOTE: the existing Lucene Contrib "db" project uses je-3.3.93.jar.  
> I believe this may be incompatible due to Generic Usage.
>- NOTE: I did not modify the ant script to automatically get this jar, 
> so to use or evaluate this patch, download bdb-je from 
> http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 
>  
>  2. Allow Entity Processors to take a "cacheImpl" parameter to cause the 
> entity data to be cached (see EntityProcessorBase & DIHCacheProperties).
>  
>  3. Partially De-couple SolrWriter from DocBuilder
>   - Created a new interface DIHWriter, & two implementations:
>- SolrWriter (refactored)
>- DIHCacheWriter (allows DIH to write ultimately to a Cache).
>
>  4. Create a new Entity Processor, DIHCacheProcessor, which reads a 
> persistent Cache as DIH Entity Input.
>  
>  5. Support a "partition" parameter with both DIHCacheWriter and 
> DIHCacheProcessor to allow for easy partitioning of source entity data.
>  
>  6. Change the semantics of enti

[jira] [Updated] (LUCENE-2548) Remove all interning of field names from flex API


 [ 
https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2548:
---

Attachment: LUCENE-2548.patch

Initial patch.

Tests are passing, at least a few iterations (I'll beast it).  There are still 
a few nocommits...

I used PMD and findbugs to find == and != on strings, but surprisingly there 
are cases that these tools seem to miss.  I also did various greps to try to 
find cases... but I'm sure I've missed some!

> Remove all interning of field names from flex API
> -
>
> Key: LUCENE-2548
> URL: https://issues.apache.org/jira/browse/LUCENE-2548
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2548.patch
>
>
> In previous versions of Lucene, interning of fields was important to minimize 
> string comparison cost when iterating TermEnums, to detect changes in field 
> name. As we separated field names from terms in flex, no query compares field 
> names anymore, so the whole performance problematic interning can be removed. 
> I will start with doing this, but we need to carefully review some places 
> e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) 
> Robert?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-2548) Remove all interning of field names from flex API


 [ 
https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-2548:
--

Assignee: Michael McCandless

> Remove all interning of field names from flex API
> -
>
> Key: LUCENE-2548
> URL: https://issues.apache.org/jira/browse/LUCENE-2548
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> In previous versions of Lucene, interning of fields was important to minimize 
> string comparison cost when iterating TermEnums, to detect changes in field 
> name. As we separated field names from terms in flex, no query compares field 
> names anymore, so the whole performance problematic interning can be removed. 
> I will start with doing this, but we need to carefully review some places 
> e.g. in preflex codec.
> Maybe before this issue we should remove the Term class completely. :-) 
> Robert?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-236) Field collapsing

2011-06-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051995#comment-13051995
 ] 

Jan Høydahl commented on SOLR-236:
--

I think you should consider the group by now included in 3_x branch (SOLR-2524 
was recently committed)

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
>Assignee: Shalin Shekhar Mangar
> Fix For: 3.3
>
> Attachments: DocSetScoreCollector.java, 
> NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
> SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, 
> SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-branch_3x.patch, SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, 
> SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
> SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, 
> collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, 
> collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
> field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> quasidistributed.additional.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2609) Coordinate range queries do not work with Spatial Solr

2011-06-20 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051988#comment-13051988
 ] 

David Smiley commented on SOLR-2609:


I highly doubt this can be fixed, based on how it works. The documentation/wiki 
should be updated to note this problem.

I recommend you use bbox: 
http://wiki.apache.org/solr/SpatialSearch#bbox_-_Bounding-box_filter
Granted you cannot specify an arbitrary bounding box, only one based on a 
point-distance, but this may be good enough.

> Coordinate range queries do not work with Spatial Solr
> --
>
> Key: SOLR-2609
> URL: https://issues.apache.org/jira/browse/SOLR-2609
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 3.1
>Reporter: Zac Smith
>  Labels: spatialsearch
>
> The Spatial Search documentation states that you can create your own bounding 
> box using a range query:
> "Since the LatLonType field also supports field queries and range queries, 
> one can manually create their own bounding box rather than using bbox: 
> ...&q=*:*&fq=store:[45,-94 TO 46,-93]"
> This works unless your range covers an area where longitude goes from 180 to 
> -180. For instance I want all items in the longitude range of 
> 178 to -177 which of course gives no results (it is not a valid numeric 
> range). It's not really surprising that this doesn't work as it is just a 
> standard range query with no spatial filters being applied.
> I am wondering if this is just an issue with the documentation and there is 
> another way that this should be done? Please advise if more details are 
> needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3221) improve docvalues integration with scoring

improve docvalues integration with scoring
--

Key: LUCENE-3221
URL: https://issues.apache.org/jira/browse/LUCENE-3221
Project: Lucene - Java
Issue Type: New Feature
Components: core/index
Reporter: Robert Muir
Fix For: flexscoring branch

Currently, the flexscoring branch is limited by the fact that you can at most
index one single byte per-document for scoring within Similarity.

I added a simple test, showing how in your app itself you can index a
per-document value (such as a boost) and then use it in scoring:
http://svn.apache.org/repos/asf/lucene/dev/branches/flexscoring/lucene/src/test/org/apache/lucene/search/TestDocValuesScoring.java

However, I think we should generalize this mechanism (note, names of classes
can be changed to whatver makes sense).
In Similarity, instead of byte computeNorm(FieldInvertState), I think we should
have void computeNorm(StatsWriter, FieldInvertState).

Then a Similarity can ask the StatsWriter for instance(s), where an instance is
something like a (name, type, aggregates) pair.
Name would be a simple name like "boost" that the sim later uses to retrieve
this docvalue. type would be something like int8/int32/varint/byte.
aggregates could at first be a boolean or whatever, I think at first we should
allow for the sum be be written (e.g. to provide sum and average).
This would support aggregate statistics such as 'total number of tokens in
index' and 'average length'.

so an example of the new computeNorm or whatever we call it would be
{noformat}
void computeNorm(StatsWriter writer, FieldInvertState state) {
writer.getReference("length", INT32, Aggregates.YES).write(state.numTokens);
writer.getReference("boost", FLOAT32, Aggregates.NO).write(state.boost);
...
}
{noformat}

So these docvalues field names that the Sim writes, I think the sim should be
able to reference them with "relative" names like length and boost.
Whatever we do behind the scenes is an implementation detail.

Also for this to work, I think we need to add int8, int16, int32, ... types to
docvalues, and maybe we should add hasArray()/getArray(). I think
the existing compressed INTS should be kept, but maybe renamed to varint or
something like that. This could still be useful, for example if someone
wants to have "real document lengths" for bm25, but they don't really need a
full 32-bit range, they can make the tradeoff to use packed integers
and load less into ram... but that should be the sim's choice to make.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext


 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2793:


Attachment: LUCENE-2793.patch

For the record - I went through the latest patch and added some nocommits where 
needed. I will take this patch and commit it to the branch. We should now work 
on that branch to fix all the remaining issues.

> Directory createOutput and openInput should take an IOContext
> -
>
> Key: LUCENE-2793
> URL: https://issues.apache.org/jira/browse/LUCENE-2793
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Varun Thacker
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch
>
>
> Today for merging we pass down a larger readBufferSize than for searching 
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold 
> the buffer size, but then could hold other flags like DIRECT (bypass OS's 
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would 
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for 
> merging is not then used for searching, and vice/versa.  Really, it's only 
> all the open file handles that need to be different -- we could in theory 
> share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities


 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

EasyStats object added.

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
> Attachments: LUCENE-3220.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-06-20 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051965#comment-13051965
 ] 

Mike Sokolov commented on SOLR-219:
---

Fair enough - And by the way +1 on all this - I hated having to hack 
QueryParser just to prevent stop words getting stripped from phrases.  "The 
the" and "The who" were problematic :) 

> Determine if prefix, wildcard, fuzzy queries should be lowercased
> -
>
> Key: SOLR-219
> URL: https://issues.apache.org/jira/browse/SOLR-219
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>Priority: Minor
> Fix For: 3.3
>
> Attachments: lowercase_prefix.patch, wildcardlowercase.patch
>
>
> Solr should be able to "do the right thing" when doing prefix/wildcard/fuzzy 
> queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2533) Improve API of ValueSource & FunctionQuery SortField weighting


 [ 
https://issues.apache.org/jira/browse/SOLR-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male resolved SOLR-2533.
--

Resolution: Fixed

Committed revision 1137612

> Improve API of ValueSource & FunctionQuery SortField weighting
> --
>
> Key: SOLR-2533
> URL: https://issues.apache.org/jira/browse/SOLR-2533
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Chris Male
>Assignee: Chris Male
> Attachments: SOLR-2533.patch, SOLR-2533.patch, SOLR-2533.patch, 
> SOLR-2533.patch, SOLR-2533.patch
>
>
> Started from LUCENE-2883: Support for sorting by ValueSource and 
> FunctionQueries is done through ValueSource#getSort and the 
> ValueSourceSortField.  In order to support VSs containing other Queries, its 
> necessary to allow the Querys to be weighted by an IndexSearcher.  Currently 
> this is handled by having ValueSourceSortField implement SolrSortField.  In 
> Solr's SolrIndexSearcher, SortFields implementing SolrSortField are then 
> weighted before the Sort is used.
> Sorting by FunctionQuery and ValueSource are invaluable and will become 
> available to all Lucene users in LUCENE-2883.  But in order to do so, we need 
> to remove the coupling of this functionality to Solr, and make it more 
> standard.
> Any and all thoughts about how to do this are appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: KStem custom lexicons configuration possible?

Hi Robert,

I think the difference between KStem and other stemmers (at least those that
I am aware of, like snowball or porter) is that KStem is expected to produce
a real valid words and thus other filtering can be applied to the tokens
after stemming more easily (for example synonym expansion). Not sure if this
is the case with other available stemmers in Lucene.

Also my impression from reading the original paper by Robert Krovetz was
that possibility to fine-tune lexicons is practical. So that is why I was
expecting that KStem API should support this as well.

Well, may be a combination of KStem with Override filter (but applied AFTER
stemming) would work too in this case :-)

Regards,
Lukas

On Mon, Jun 20, 2011 at 2:32 PM, Robert Muir  wrote:

> On Mon, Jun 20, 2011 at 8:23 AM, Lukáš Vlček 
> wrote:
> > Hi Robert,
> > this sounds interesting I will look at it in more detail.
> > However, I do not think this is really a general solution. If I
> understand
> > StemmerOverrideFilter correctly (from a quick glance) it rely on the fact
> > that you *know* exact term (the key in the map) in advance. In other
> words
> > if I wanted to "fix" some term produced by Kstem filter I would have to
> know
> > what is the product of the stemming in advance. Now, this means that if I
> > switch to snowball or porter or other stemmer instead of KStem or simply
> > update something else in the filtering chain then I am in trouble. Also
> if I
> > understand correctly the original KStem implementation it can still get
> > updates to lexicons which means that once these updates are ported to
> Java
> > implementation it can again result in problem with existing override
> filter
> > setup.
> > More generally, is there any reason why lexicons are not configurable in
>
> Because we have StemmerOverrideFilter and KeywordMarkerFilter.
>
> look at the source code to Kstem: it uses maps and sets of exceptions,
> this is what these filters provide in a general way
> (StemmerOverrideFilter being the map, and KeywordMarkerFilter being
> the set).
>
> we added these to work across the board with all lucene stemmers for
> this reason.
>
> I don't understand your concerns at all to be honest, they make no
> sense to me. If we "updated" kstem or any other algorithm: it would
> break whatever you are doing either way. A hashmap is a hashmap.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

[jira] [Commented] (SOLR-2452) rewrite solr build system


[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051957#comment-13051957
 ] 

Robert Muir commented on SOLR-2452:
---

by the way, obviously since you have been doing all the work here, i don't want 
you to read this as me questioning/objecting to the change, just trying to 
maybe help save you some sanity... if you don't mind dealing with the merging I 
would just say go for it.

> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
> Fix For: 3.3
>
> Attachments: SOLR-2452-post-reshuffling.patch, 
> SOLR-2452.dir.reshuffle.sh
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2202) Money FieldType

2011-06-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051955#comment-13051955
 ] 

Jan Høydahl commented on SOLR-2202:
---

Any interest in reviving this and work towards committing a first version?

> Money FieldType
> ---
>
> Key: SOLR-2202
> URL: https://issues.apache.org/jira/browse/SOLR-2202
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 1.5
>Reporter: Greg Fodor
> Attachments: SOLR-2022-solr-3.patch, SOLR-2202-lucene-1.patch, 
> SOLR-2202-solr-1.patch, SOLR-2202-solr-2.patch, SOLR-2202-solr-4.patch, 
> SOLR-2202-solr-5.patch, SOLR-2202-solr-6.patch, SOLR-2202-solr-7.patch, 
> SOLR-2202-solr-8.patch, SOLR-2202-solr-9.patch
>
>
> Attached please find patches to add support for monetary values to 
> Solr/Lucene with query-time currency conversion. The following features are 
> supported:
> - Point queries (ex: "price:4.00USD")
> - Range quries (ex: "price:[$5.00 TO $10.00]")
> - Sorting.
> - Currency parsing by either currency code or symbol.
> - Symmetric & Asymmetric exchange rates. (Asymmetric exchange rates are 
> useful if there are fees associated with exchanging the currency.)
> At indexing time, money fields can be indexed in a native currency. For 
> example, if a product on an e-commerce site is listed in Euros, indexing the 
> price field as "10.00EUR" will index it appropriately. By altering the 
> currency.xml file, the sorting and querying against Solr can take into 
> account fluctuations in currency exchange rates without having to re-index 
> the documents.
> The new "money" field type is a polyfield which indexes two fields, one which 
> contains the amount of the value and another which contains the currency code 
> or symbol. The currency metadata (names, symbols, codes, and exchange rates) 
> are expected to be in an xml file which is pointed to by the field type 
> declaration in the schema.xml.
> The current patch is factored such that Money utility functions and 
> configuration metadata lie in Lucene (see MoneyUtil and CurrencyConfig), 
> while the MoneyType and MoneyValueSource lie in Solr. This was meant to 
> mirror the work being done on the spacial field types.
> This patch has not yet been deployed to production but will be getting used 
> to power the international search capabilities of the search engine at Etsy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2452) rewrite solr build system


[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051953#comment-13051953
 ] 

Robert Muir commented on SOLR-2452:
---

ok, i was just curious, sounds like something that could possibly be dealt with 
later.

I think i said it before too, I find it confusing the way these directories all 
depend upon each other today and how each one is not its own 'subproject' of 
the build (that basically acts like a contrib or module itself and states its 
dependencies). So I would *really* like to see this fixed.

However, I think I would recommend thinking about when you want to make the 
change: it will make merging code up to this branch nearly impossible... is it 
holding back other changes or is this a final step?


> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
> Fix For: 3.3
>
> Attachments: SOLR-2452-post-reshuffling.patch, 
> SOLR-2452.dir.reshuffle.sh
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2605) CoreAdminHandler, different Output while 'defaultCoreName' is specified

2011-06-20 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2605:
--

Fix Version/s: 4.0

> CoreAdminHandler, different Output while 'defaultCoreName' is specified
> ---
>
> Key: SOLR-2605
> URL: https://issues.apache.org/jira/browse/SOLR-2605
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2399-admin-cores-default.xml, 
> SOLR-2399-admin-cores.xml
>
>
> The attached XML-Files show the little difference between a defined 
> {{defaultCoreName}}-Attribute and a non existing one.
> Actually the new admin ui checks for an core with empty name to set single- / 
> multicore-settings .. it's a quick change to count the number of defined 
> cores instead.
> But, will it be possible, to get the core-name (again)? One of both 
> attributes would be enough, if that makes a difference :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2605) CoreAdminHandler, different Output while 'defaultCoreName' is specified

2011-06-20 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051950#comment-13051950
 ] 

Mark Miller commented on SOLR-2605:
---

Indeed - this has always been a bit ugly. Was kind of ease over best approach 
at the time if I remember right.

> CoreAdminHandler, different Output while 'defaultCoreName' is specified
> ---
>
> Key: SOLR-2605
> URL: https://issues.apache.org/jira/browse/SOLR-2605
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Priority: Minor
> Attachments: SOLR-2399-admin-cores-default.xml, 
> SOLR-2399-admin-cores.xml
>
>
> The attached XML-Files show the little difference between a defined 
> {{defaultCoreName}}-Attribute and a non existing one.
> Actually the new admin ui checks for an core with empty name to set single- / 
> multicore-settings .. it's a quick change to count the number of defined 
> cores instead.
> But, will it be possible, to get the core-name (again)? One of both 
> attributes would be enough, if that makes a difference :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: KStem custom lexicons configuration possible?

2011-06-20 Thread Robert Muir

On Mon, Jun 20, 2011 at 8:23 AM, Lukáš Vlček  wrote:
> Hi Robert,
> this sounds interesting I will look at it in more detail.
> However, I do not think this is really a general solution. If I understand
> StemmerOverrideFilter correctly (from a quick glance) it rely on the fact
> that you *know* exact term (the key in the map) in advance. In other words
> if I wanted to "fix" some term produced by Kstem filter I would have to know
> what is the product of the stemming in advance. Now, this means that if I
> switch to snowball or porter or other stemmer instead of KStem or simply
> update something else in the filtering chain then I am in trouble. Also if I
> understand correctly the original KStem implementation it can still get
> updates to lexicons which means that once these updates are ported to Java
> implementation it can again result in problem with existing override filter
> setup.
> More generally, is there any reason why lexicons are not configurable in

Because we have StemmerOverrideFilter and KeywordMarkerFilter.

look at the source code to Kstem: it uses maps and sets of exceptions,
this is what these filters provide in a general way
(StemmerOverrideFilter being the map, and KeywordMarkerFilter being
the set).

we added these to work across the board with all lucene stemmers for
this reason.

I don't understand your concerns at all to be honest, they make no
sense to me. If we "updated" kstem or any other algorithm: it would
break whatever you are doing either way. A hashmap is a hashmap.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action

2011-06-20 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051948#comment-13051948
 ] 

Mark Miller commented on SOLR-2610:
---

+1

> Add an option to delete index through CoreAdmin UNLOAD action
> -
>
> Key: SOLR-2610
> URL: https://issues.apache.org/jira/browse/SOLR-2610
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 3.3, 4.0
>
>
> Right now, one can unload a Solr Core but the index files are left behind and 
> consume disk space. We should have an option to delete the index when 
> unloading a core.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: KStem custom lexicons configuration possible?

Hi Robert,

this sounds interesting I will look at it in more detail.

However, I do not think this is really a general solution. If I understand
StemmerOverrideFilter correctly (from a quick glance) it rely on the fact
that you *know* exact term (the key in the map) in advance. In other words
if I wanted to "fix" some term produced by Kstem filter I would have to know
what is the product of the stemming in advance. Now, this means that if I
switch to snowball or porter or other stemmer instead of KStem or simply
update something else in the filtering chain then I am in trouble. Also if I
understand correctly the original KStem implementation it can still get
updates to lexicons which means that once these updates are ported to Java
implementation it can again result in problem with existing override filter
setup.

More generally, is there any reason why lexicons are not configurable in
KStem filter?

Regards,
Lukas

On Mon, Jun 20, 2011 at 1:38 PM, Robert Muir  wrote:

> On Mon, Jun 20, 2011 at 7:19 AM, Lukáš Vlček 
> wrote:
> > Having an option to modify internal lexicons I would be able to adapt the
> > KStem to work better for specific text corpora.
> > What do you think?
>
> please use StemmerOverrideFilter for this! it works with all stemmers,
> including this one.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

[jira] [Created] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action

2011-06-20 Thread Shalin Shekhar Mangar (JIRA)

Add an option to delete index through CoreAdmin UNLOAD action
-

 Key: SOLR-2610
 URL: https://issues.apache.org/jira/browse/SOLR-2610
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 3.3, 4.0


Right now, one can unload a Solr Core but the index files are left behind and 
consume disk space. We should have an option to delete the index when unloading 
a core.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2452) rewrite solr build system


[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051943#comment-13051943
 ] 

Steven Rowe commented on SOLR-2452:
---

{quote}
bq. but solrj tests depend on core tests

Curious why this is? some base classes that could be moved into test-framework 
instead?
{quote}

At a minimum, {{o.a.s.client.solrj.SolrJettyTestBase}} (likely should be moved 
to another package, given that Solr core {{o.a.s.servlet.\*CacheHeaderTest\*}} 
tests extend this class) and {{o.a.s.util.ExternalPaths}}.


> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
> Fix For: 3.3
>
> Attachments: SOLR-2452-post-reshuffling.patch, 
> SOLR-2452.dir.reshuffle.sh
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Error :dataimport handler is not request Handler, help

2011-06-20 Thread Muhannad

*Hi All
Thank you every body , it now works , and I can work with solr 4.0
the solution was the following ,
as mentioned before by Jeffrey Chang , I tried a clean solr environment , so
I removed all data import jar files from solr class paths , and commented
the  directives in solrconfig.xml but kept only on and this is the

I run solr without any jar so an exception raised to tell that dataimport is
not exist
ClassNotFoundException
then I turn off the server and put the jar in the example/solr/lib directory
after this final step I fired Solr , and now It works fine now 
Thanks guys ...I really Thank you

*
On Mon, Jun 20, 2011 at 2:56 PM, Muhannad  wrote:

> I tried this and removed all dataimport jars , but only kept one on lib
> sirectory in Solr instance but the same error exists , I never faced this
> problem before , could it because I have a non-stable version of Solr 4.0?
>
>
> 2011/6/20 Jeffrey Chang 
>
>> Hi,
>>
>> I've encountered a similar issue before.
>>
>> The problem for me was the Classloader that loaded DataImportHandler class
>> is not the same as the one loading the SolrRequestHandler class.
>>
>> Trace...
>>
>> In SolrCore.java (3.1 source)
>> <--
>> line 459: createInstance(className, SolrRequestHandler.class, "Request
>> Handler")
>> <--
>> line: 423: clazz = getResourceLoader().findClass(className);
>> <--
>> line: 424: if (cast != null && !cast.isAssignableFrom(clazz))
>>
>> This evaluation will fail since clazz is not loaded by the same
>> classloader as cast.
>>
>> What I did was to make sure that the dataimport jars are NOT in the
>> classpath and not loaded by other classloaders but from the path specified
>> in solrconfig.xml. This will ensure that the dataimport classes are loaded
>> by the same classloader.
>>
>> Not sure if this is the same issue you're encountering, I hope this helps.
>>
>> Thanks,
>> Jeff
>>
>> On Mon, Jun 20, 2011 at 2:36 PM, Muhannad  wrote:
>>
>>> Yes , I just tried it , and this works for Solr 1.4 I am currently
>>> working on  , but when I tried 3.1 or 4.0
>>> the same error appears  ,I know that the war file no more contains jar
>>> files related to dataimport and logging functionality , I put all requested
>>> files in class path , and I am sure it loads them as the server starts , but
>>> I guess the problem is that it doesn't recognise dataimportHandler as a
>>> RequestHandler
>>> I really stuck , and confused!!!
>>>
>>>   On Mon, Jun 20, 2011 at 3:14 AM, Bill Bell wrote:
>>>
   Did you try adding something like this to solrconfig.xml ?

 >>> regex="apache-solr-dataimporthandler-.*\.jar" />

 >>> class="org.apache.solr.handler.dataimport.DataImportHandler">
 
 db-data-config.xml
 
   



  From: Muhannad 
 Reply-To: 
 Date: Sun, 19 Jun 2011 23:42:45 +0300

 To: 
 Subject: Re: Error :dataimport handler is not request Handler, help

 I have tried many things , same problem still , any  help?

 On Sun, Jun 19, 2011 at 9:00 PM, Muhannad  wrote:

>  Hi All , I am really stuck in this problem ,  I am using solr to
> index some tables in database and I followed these steps to achieve my 
> goal
> 1- added the following section  to solrconfig.xml    name="/dataimport"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
> data-config.xml
> 
> 
>
> *2- added apache-solr-dataimporthandler.jar to lib/ directory (include
> path)
> every thing goes nice !!! for now , till I fire the server
> the following error appears , Please I need You help urgently !!!
>
> ===Error message==
> * HTTP ERROR 500
>
> Problem accessing /solr/. Reason:
>
> Severe errors in solr configuration.
>
> Check your log files for more detailed information on what may be wrong.
>
> -
> org.apache.solr.common.SolrException: Error Instantiating Request 
> Handler, org.apache.solr.handler.dataimport.DataImportHandler is not a 
> org.apache.solr.request.SolrRequestHandler
>   at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:396)
>   at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:431)
>   at 
> org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:158)
>   at org.apache.solr.core.SolrCore.(SolrCore.java:513)
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:653)
>   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:406)
>   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:291)
>   at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:240)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.init(Solr

[jira] [Commented] (SOLR-2452) rewrite solr build system


[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051939#comment-13051939
 ] 

Robert Muir commented on SOLR-2452:
---

{quote}
but solrj tests depend on core tests
{quote}

Curious why this is? some base classes that could be moved into test-framework 
instead?


> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
> Fix For: 3.3
>
> Attachments: SOLR-2452-post-reshuffling.patch, 
> SOLR-2452.dir.reshuffle.sh
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3174) Similarity.Stats class for term & collection statistics


 [ 
https://issues.apache.org/jira/browse/LUCENE-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3174.
-

Resolution: Fixed

thanks David!

> Similarity.Stats class for term & collection statistics
> ---
>
> Key: LUCENE-3174
> URL: https://issues.apache.org/jira/browse/LUCENE-3174
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>Priority: Minor
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174.patch, 
> LUCENE-3174.patch, LUCENE-3174.patch, LUCENE-3174.patch, 
> LUCENE-3174_normalize_boost.patch
>
>
> In order to support ranking methods besides TF-IDF, we need to make the 
> statistics they need available. These statistics could be computed in 
> computeWeight (soon to become computeStats) and stored in a separate object 
> for easy access. Since this object will be used solely by subclasses of 
> Similarity, it should be implented as a static inner class, i.e. 
> Similarity.Stats.
> There are two ways this could be implemented:
> - as a single Similarity.Stats class, reused by all ranking algorithms. In 
> this case, this class would have a member field for all statistics;
> - as a hierarchy of Stats classes, one for each ranking algorithm. Each 
> subclass would define only the statistics needed for the ranking algorithm.
> In the second case, the Stats class in DefaultSimilarity would have a single 
> field, idf, while the one in e.g. BM25Similarity would have idf and average 
> field/document length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: KStem custom lexicons configuration possible?

2011-06-20 Thread Robert Muir

On Mon, Jun 20, 2011 at 7:19 AM, Lukáš Vlček  wrote:
> Having an option to modify internal lexicons I would be able to adapt the
> KStem to work better for specific text corpora.
> What do you think?

please use StemmerOverrideFilter for this! it works with all stemmers,
including this one.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene 3.3 release soon?

2011-06-20 Thread Robert Muir

i was planning on doing an RC in a few weeks actually.

we have a lot of good stuff in there today already, however i wanted
to give a few weeks for the grouping stuff to run on hudson.

On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer
 wrote:
> I would say within the next 3 month.
>
> Thoughts?
>
> On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček  wrote:
>> Hi,
>> How soon can we expect official Lucene 3.3 release?
>> Best regards,
>> Lukas
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: KStem custom lexicons configuration possible?

May be I should show some examples where I think custom configuration can be
useful. Let me give you two examples:

1) As of now, KStem does conflation of both words "connector" and
"connected" to the same term "connect".
2) Contrary it does not do conflation of "transaction" and "transactions" to
the same term.

Having an option to modify internal lexicons I would be able to adapt the
KStem to work better for specific text corpora.

What do you think?

Regards,
Lukas

On Mon, Jun 20, 2011 at 12:55 PM, Lukáš Vlček  wrote:

> Hi,
>
> Is there any API in KStem filter for lexicons configuration?
>
> As far as I understand the original code works in such a way that lexicons
> are loaded from files at startup (see
> http://lexicalresearch.com/kstem-doc.txt). The author (Robert Krovetz)
> names possibility to modify lexicons among advantages of KStem compared to
> other stemmers.
>
> Do people not need it? Would it be a useful addition for KStem filter to
> allow custom lexicon configurations in its API?
>
> Regards,
> Lukas
>
> Note: Big kudos to all who participated in bringing KStem into Lucene!
>

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8939 - Still Failing

2011-06-20 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8939/

2 tests failed.
REGRESSION:  org.apache.lucene.search.TestComplexExplanations.testCSQ4

Error Message:
org.apache.lucene.search.TestComplexExplanations.testCSQ4: Insane FieldCache 
usage(s) found expected:<0> but was:<1>

Stack Trace:
junit.framework.AssertionFailedError: 
org.apache.lucene.search.TestComplexExplanations.testCSQ4: Insane FieldCache 
usage(s) found expected:<0> but was:<1>
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333)
at 
org.apache.lucene.util.LuceneTestCase.assertSaneFieldCaches(LuceneTestCase.java:716)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:620)
at 
org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:67)
at 
org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:43)


REGRESSION:  org.apache.lucene.search.function.TestFieldScoreQuery.testRankInt

Error Message:
org.apache.lucene.search.function.TestFieldScoreQuery.testRankInt: Insane 
FieldCache usage(s) found expected:<0> but was:<1>

Stack Trace:
junit.framework.AssertionFailedError: 
org.apache.lucene.search.function.TestFieldScoreQuery.testRankInt: Insane 
FieldCache usage(s) found expected:<0> but was:<1>
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1415)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1333)
at 
org.apache.lucene.util.LuceneTestCase.assertSaneFieldCaches(LuceneTestCase.java:716)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:620)




Build Log (for compile errors):
[...truncated 3283 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2305) DataImportScheduler - Marko Bonaci

2011-06-20 Thread Marko Bonaci (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051929#comment-13051929
 ] 

Marko Bonaci commented on SOLR-2305:


Hi Bill,
I had difficulties with setting up the project in Eclipse, and although I have 
successfully done it in the end, I think that the patch file wont be usable 
(due to many build path changes I made)?

All you have to do to incorporate DIHScheduler is to follow the instructions I 
posted here:
http://wiki.apache.org/solr/DataImportHandler#Scheduling

If you run into any kind of problem feel free to post the question here and 
I'll try to respond promptly.

Thank you.

> DataImportScheduler -  Marko Bonaci
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Bill Bell
> Fix For: 4.0
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2452) rewrite solr build system


[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051928#comment-13051928
 ] 

Chris Male commented on SOLR-2452:
--

I think my comments can be addressed later on maybe and shouldn't stop these 
improvements from going forward so +1

> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
> Fix For: 3.3
>
> Attachments: SOLR-2452-post-reshuffling.patch, 
> SOLR-2452.dir.reshuffle.sh
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2452) rewrite solr build system


[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051923#comment-13051923
 ] 

Chris Male commented on SOLR-2452:
--

Hmmm I hadn't considered the issue of SolrJ being used for distributed search. 

> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
> Fix For: 3.3
>
> Attachments: SOLR-2452-post-reshuffling.patch, 
> SOLR-2452.dir.reshuffle.sh
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

KStem custom lexicons configuration possible?

Hi,

Is there any API in KStem filter for lexicons configuration?

As far as I understand the original code works in such a way that lexicons
are loaded from files at startup (see
http://lexicalresearch.com/kstem-doc.txt). The author (Robert Krovetz) names
possibility to modify lexicons among advantages of KStem compared to other
stemmers.

Do people not need it? Would it be a useful addition for KStem filter to
allow custom lexicon configurations in its API?

Regards,
Lukas

Note: Big kudos to all who participated in bringing KStem into Lucene!

[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities


 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Issue Type: Sub-task  (was: New Feature)
Parent: LUCENE-2959

> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2452) rewrite solr build system


[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051920#comment-13051920
 ] 

Steven Rowe commented on SOLR-2452:
---

bq. Can we address the packaging or is that out of scope of this work?

What did you have in mind?



> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
> Fix For: 3.3
>
> Attachments: SOLR-2452-post-reshuffling.patch, 
> SOLR-2452.dir.reshuffle.sh
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities


 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Description: 
With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
can finally work on implementing the standard ranking models. Currently DFR, 
BM25 and LM are on the menu.

TODO:
 * {{EasyStats}}: contains all statistics that might be relevant for a ranking 
algorithm
 * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
DocScorers and as much implementation detail as possible
 * _BM25_: the current "mock" implementation might be OK
 * _LM_
 * _DFR_

Done:

  was:
With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
can finally work on implementing the standard ranking models. Currently DFR, 
BM25 and LM are on the menu.

TODO:
 * `EasyStats`: contains all statistics that might be relevant for a ranking 
algorithm
 * `EasySimilarity`: the ancestor of all the other similarities. Hides the 
DocScorers and as much implementation detail as possible
 * _BM25_: the current "mock" implementation might be OK
 * _LM_
 * _DFR_

Done:


> Implement various ranking models as Similarities
> 
>
> Key: LUCENE-3220
> URL: https://issues.apache.org/jira/browse/LUCENE-3220
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: flexscoring branch
>Reporter: David Mark Nemeskey
>Assignee: David Mark Nemeskey
>  Labels: gsoc
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
> can finally work on implementing the standard ranking models. Currently DFR, 
> BM25 and LM are on the menu.
> TODO:
>  * {{EasyStats}}: contains all statistics that might be relevant for a 
> ranking algorithm
>  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
> DocScorers and as much implementation detail as possible
>  * _BM25_: the current "mock" implementation might be OK
>  * _LM_
>  * _DFR_
> Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene 3.3 release soon?

That is fine, I just wanted to know when the KStem filter will be part of
stable release.

On Mon, Jun 20, 2011 at 10:59 AM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:

> I would say within the next 3 month.
>
> Thoughts?
>
> On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček 
> wrote:
> > Hi,
> > How soon can we expect official Lucene 3.3 release?
> > Best regards,
> > Lukas
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

[jira] [Commented] (SOLR-2452) rewrite solr build system


[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051919#comment-13051919
 ] 

Steven Rowe commented on SOLR-2452:
---

bq. SolrJ [...] is just a client library

That's not all it is; on 8/18/2010 on #lucene IRC, yonik wrote:

bq. solrj used to not be included in the war, but solr core uses solrj for 
distributed search


> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
> Fix For: 3.3
>
> Attachments: SOLR-2452-post-reshuffling.patch, 
> SOLR-2452.dir.reshuffle.sh
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3220) Implement various ranking models as Similarities

Implement various ranking models as Similarities


 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey


With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
can finally work on implementing the standard ranking models. Currently DFR, 
BM25 and LM are on the menu.

TODO:
 * `EasyStats`: contains all statistics that might be relevant for a ranking 
algorithm
 * `EasySimilarity`: the ancestor of all the other similarities. Hides the 
DocScorers and as much implementation detail as possible
 * _BM25_: the current "mock" implementation might be OK
 * _LM_
 * _DFR_

Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2452) rewrite solr build system