[jira] [Commented] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc

2011-06-21 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052362#comment-13052362
 ] 

Simon Willnauer commented on LUCENE-3223:
-

bq. Simple patch fixing the problem. Do I need a CHANGES entry for trivial 
things like this?
looks good, I don't think we need a changes entry for this. go ahead and commit!

 SearchWithSortTask ignores sorting by Doc
 -

 Key: LUCENE-3223
 URL: https://issues.apache.org/jira/browse/LUCENE-3223
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Chris Male
Assignee: Chris Male
Priority: Minor
 Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch


 During my work in LUCENE-3912, I found the following code:
 {code}
 if (field.equals(doc)) {
 sortField0 = SortField.FIELD_DOC;
 } if (field.equals(score)) {
 sortField0 = SortField.FIELD_SCORE;
 } ...
 {code}
 This means the setting of SortField.FIELD_DOC is ignored.  While I don't know 
 much about this code, this seems like a valid setting and obviously just a 
 bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum

2011-06-21 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052368#comment-13052368
 ] 

Simon Willnauer commented on LUCENE-3219:
-

looks good to me. BTW. should we backport those changes?

 Change SortField types to an Enum
 -

 Key: LUCENE-3219
 URL: https://issues.apache.org/jira/browse/LUCENE-3219
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Chris Male
Assignee: Chris Male
Priority: Minor
 Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, 
 LUCENE-3219.patch


 When updating my SOLR-2533 patch, one issue was that the int value I had 
 given my new type had been used by another change in the mean time.  Since we 
 don't use these fields in a bitset kind of way, we can convert them to an 
 enum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2341) explore morfologik integration

2011-06-21 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052376#comment-13052376
 ] 

Dawid Weiss commented on LUCENE-2341:
-

Thanks for the contribution, Michał. 

Robert: the dictionary is licensed under MPL or CC-SA (to be selected by the 
user depending on one's needs). Do you know which one is preferable over 
another?

Michał: there is also another (much larger) dictionary that has been released 
recently and comes from the Morfeusz project. 
http://sgjp.pl/morfeusz/dopobrania.html This dictionary is actually licensed 
under BSD license, so no legal worries at all. Both dictionaries are nearly 
identical (they differ slightly in their convention of morphosyntactic 
annotations) and Morfeusz's dictionary could be compiled into an automaton for 
use with Morfologik.

Which way should we go? What do you think?

 explore morfologik integration
 --

 Key: LUCENE-2341
 URL: https://issues.apache.org/jira/browse/LUCENE-2341
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar


 Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
 available:
 http://sourceforge.net/projects/morfologik/
 This works differently than LUCENE-2298, and ideally would be another option 
 for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum

2011-06-21 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052377#comment-13052377
 ] 

Chris Male commented on LUCENE-3219:


You'll have to guide me on the backwards compat issue since this is a break due 
to the fields being public and some methods changing from returning int to 
returning SortField.Type.

 Change SortField types to an Enum
 -

 Key: LUCENE-3219
 URL: https://issues.apache.org/jira/browse/LUCENE-3219
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Chris Male
Assignee: Chris Male
Priority: Minor
 Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, 
 LUCENE-3219.patch


 When updating my SOLR-2533 patch, one issue was that the int value I had 
 given my new type had been used by another change in the mean time.  Since we 
 don't use these fields in a bitset kind of way, we can convert them to an 
 enum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2341) explore morfologik integration

2011-06-21 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052380#comment-13052380
 ] 

Dawid Weiss commented on LUCENE-2341:
-

I'll take a look at the differences between Morfologik and Morfeusz right now, 
actually. I'll post the results once I have something.

 explore morfologik integration
 --

 Key: LUCENE-2341
 URL: https://issues.apache.org/jira/browse/LUCENE-2341
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar


 Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
 available:
 http://sourceforge.net/projects/morfologik/
 This works differently than LUCENE-2298, and ideally would be another option 
 for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc

2011-06-21 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male resolved LUCENE-3223.


Resolution: Fixed

Committed revision 1137882.

 SearchWithSortTask ignores sorting by Doc
 -

 Key: LUCENE-3223
 URL: https://issues.apache.org/jira/browse/LUCENE-3223
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Chris Male
Assignee: Chris Male
Priority: Minor
 Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch


 During my work in LUCENE-3912, I found the following code:
 {code}
 if (field.equals(doc)) {
 sortField0 = SortField.FIELD_DOC;
 } if (field.equals(score)) {
 sortField0 = SortField.FIELD_SCORE;
 } ...
 {code}
 This means the setting of SortField.FIELD_DOC is ignored.  While I don't know 
 much about this code, this seems like a valid setting and obviously just a 
 bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc

2011-06-21 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3223:
---

Fix Version/s: 4.0

 SearchWithSortTask ignores sorting by Doc
 -

 Key: LUCENE-3223
 URL: https://issues.apache.org/jira/browse/LUCENE-3223
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Chris Male
Assignee: Chris Male
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch


 During my work in LUCENE-3912, I found the following code:
 {code}
 if (field.equals(doc)) {
 sortField0 = SortField.FIELD_DOC;
 } if (field.equals(score)) {
 sortField0 = SortField.FIELD_SCORE;
 } ...
 {code}
 This means the setting of SortField.FIELD_DOC is ignored.  While I don't know 
 much about this code, this seems like a valid setting and obviously just a 
 bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc

2011-06-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052388#comment-13052388
 ] 

Uwe Schindler commented on LUCENE-3223:
---

Thanks, nice catch!

 SearchWithSortTask ignores sorting by Doc
 -

 Key: LUCENE-3223
 URL: https://issues.apache.org/jira/browse/LUCENE-3223
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Chris Male
Assignee: Chris Male
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch


 During my work in LUCENE-3912, I found the following code:
 {code}
 if (field.equals(doc)) {
 sortField0 = SortField.FIELD_DOC;
 } if (field.equals(score)) {
 sortField0 = SortField.FIELD_SCORE;
 } ...
 {code}
 This means the setting of SortField.FIELD_DOC is ignored.  While I don't know 
 much about this code, this seems like a valid setting and obviously just a 
 bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum

2011-06-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052390#comment-13052390
 ] 

Uwe Schindler commented on LUCENE-3219:
---

At the end of the day, I am sure I will vote to leave it as it is in 3.x!

SortField is heavy-used in Lucene client code and the backwards breaks without 
very sophisticated backwards layers are horrible to handle. It can be done, but 
I dont think its worth the work just for code beauty.

 Change SortField types to an Enum
 -

 Key: LUCENE-3219
 URL: https://issues.apache.org/jira/browse/LUCENE-3219
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Chris Male
Assignee: Chris Male
Priority: Minor
 Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, 
 LUCENE-3219.patch


 When updating my SOLR-2533 patch, one issue was that the int value I had 
 given my new type had been used by another change in the mean time.  Since we 
 don't use these fields in a bitset kind of way, we can convert them to an 
 enum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2382) DIH Cache Improvements

2011-06-21 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052394#comment-13052394
 ] 

Noble Paul commented on SOLR-2382:
--

At least the BDB based cache will have to go to a different issue.

 DIH Cache Improvements
 --

 Key: SOLR-2382
 URL: https://issues.apache.org/jira/browse/SOLR-2382
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
 Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
 SOLR-2382.patch, SOLR-2382.patch


 Functionality:
  1. Provide a pluggable caching framework for DIH so that users can choose a 
 cache implementation that best suits their data and application.
  
  2. Provide a means to temporarily cache a child Entity's data without 
 needing to create a special cached implementation of the Entity Processor 
 (such as CachedSqlEntityProcessor).
  
  3. Provide a means to write the final (root entity) DIH output to a cache 
 rather than to Solr.  Then provide a way for a subsequent DIH call to use the 
 cache as an Entity input.  Also provide the ability to do delta updates on 
 such persistent caches.
  
  4. Provide the ability to partition data across multiple caches that can 
 then be fed back into DIH and indexed either to varying Solr Shards, or to 
 the same Core in parallel.
 Use Cases:
  1. We needed a flexible  scalable way to temporarily cache child-entity 
 data prior to joining to parent entities.
   - Using SqlEntityProcessor with Child Entities can cause an n+1 select 
 problem.
   - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching 
 mechanism and does not scale.
   - There is no way to cache non-SQL inputs (ex: flat files, xml, etc).
  
  2. We needed the ability to gather data from long-running entities by a 
 process that runs separate from our main indexing process.
   
  3. We wanted the ability to do a delta import of only the entities that 
 changed.
   - Lucene/Solr requires entire documents to be re-indexed, even if only a 
 few fields changed.
   - Our data comes from 50+ complex sql queries and/or flat files.
   - We do not want to incur overhead re-gathering all of this data if only 1 
 entity's data changed.
   - Persistent DIH caches solve this problem.
   
  4. We want the ability to index several documents in parallel (using 1.4.1, 
 which did not have the threads parameter).
  
  5. In the future, we may need to use Shards, creating a need to easily 
 partition our source data into Shards.
 Implementation Details:
  1. De-couple EntityProcessorBase from caching.  
   - Created a new interface, DIHCache  two implementations:  
 - SortedMapBackedCache - An in-memory cache, used as default with 
 CachedSqlEntityProcessor (now deprecated).
 - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested 
 with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.  
 I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar, 
 so to use or evaluate this patch, download bdb-je from 
 http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 
  
  2. Allow Entity Processors to take a cacheImpl parameter to cause the 
 entity data to be cached (see EntityProcessorBase  DIHCacheProperties).
  
  3. Partially De-couple SolrWriter from DocBuilder
   - Created a new interface DIHWriter,  two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

  4. Create a new Entity Processor, DIHCacheProcessor, which reads a 
 persistent Cache as DIH Entity Input.
  
  5. Support a partition parameter with both DIHCacheWriter and 
 DIHCacheProcessor to allow for easy partitioning of source entity data.
  
  6. Change the semantics of entity.destroy()
   - Previously, it was being called on each iteration of 
 DocBuilder.buildDocument().
   - Now it is does one-time cleanup tasks (like closing or deleting a 
 disk-backed cache) once the entity processor is completed.
   - The only out-of-the-box entity processor that previously implemented 
 destroy() was LineEntitiyProcessor, so this is not a very invasive change.
 General Notes:
 We are near completion in converting our search functionality from a legacy 
 search engine to Solr.  However, I found that DIH did not support caching to 
 the level of our prior product's data import utility.  In order to get our 
 data into Solr, I created these caching enhancements.  Because I believe this 
 has broad application, and because we would like this feature to be supported 
 by the Community, I have front-ported this, enhanced, to Trunk.  I have also 
 added unit tests and verified that all 

[jira] [Commented] (SOLR-2598) exampledocs/books.json should use name instead of title

2011-06-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052398#comment-13052398
 ] 

Jan Høydahl commented on SOLR-2598:
---

Planning for this to be my second commit to Lucene :) What do you think?

 exampledocs/books.json should use name instead of title
 ---

 Key: SOLR-2598
 URL: https://issues.apache.org/jira/browse/SOLR-2598
 Project: Solr
  Issue Type: Improvement
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.3

 Attachments: SOLR-2598.patch


 The file exampledocs/books.json currently contains two books. But they do not 
 show up in the default solr/browse interface because they use title instead 
 of name, which the Velocity template does not show. Also we should include 
 a few more books

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2489) Remove old lucene.apache.org/solr/who page

2011-06-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052399#comment-13052399
 ] 

Jan Høydahl commented on SOLR-2489:
---

I plan to delete this old defunct page and commit shortly. Agree?

 Remove old lucene.apache.org/solr/who page
 --

 Key: SOLR-2489
 URL: https://issues.apache.org/jira/browse/SOLR-2489
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1, 3.2
Reporter: Jan Høydahl
Priority: Minor
 Fix For: 3.3


 In the distribution, docs/who.html is old - refers to the old Solr committers 
 list at http://lucene.apache.org/solr/who
 Fix would be to simply delete the old page

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2489) Remove old lucene.apache.org/solr/who page

2011-06-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned SOLR-2489:
-

Assignee: Jan Høydahl

 Remove old lucene.apache.org/solr/who page
 --

 Key: SOLR-2489
 URL: https://issues.apache.org/jira/browse/SOLR-2489
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1, 3.2
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.3


 In the distribution, docs/who.html is old - refers to the old Solr committers 
 list at http://lucene.apache.org/solr/who
 Fix would be to simply delete the old page

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2599) FieldCopy Update Processor

2011-06-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned SOLR-2599:
-

Assignee: Jan Høydahl

 FieldCopy Update Processor
 --

 Key: SOLR-2599
 URL: https://issues.apache.org/jira/browse/SOLR-2599
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl

 Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2011-06-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052400#comment-13052400
 ] 

Jan Høydahl commented on SOLR-2487:
---

Objections to choosing to parameterize the build like Hoss suggests?

 Do not include slf4j-jdk14 jar in WAR
 -

 Key: SOLR-2487
 URL: https://issues.apache.org/jira/browse/SOLR-2487
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2, 4.0
Reporter: Jan Høydahl
  Labels: logging, slf4j

 I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
 newbies get up and running. But I find myself re-packaging the war for every 
 customer when adapting to their choice of logger framework, which is 
 counter-productive.
 It would be sufficient to have the jdk-logging binding in example/lib to let 
 the example and tutorial still work OOTB but as soon as you deploy solr.war 
 to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum

2011-06-21 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052402#comment-13052402
 ] 

Chris Male commented on LUCENE-3219:


For the reasons described above, I think its best we don't backport this 
change.  

Uwe, is the work here compatible with what you had planned in LUCENE-3192?  If 
so, I'll go ahead and commit this.

 Change SortField types to an Enum
 -

 Key: LUCENE-3219
 URL: https://issues.apache.org/jira/browse/LUCENE-3219
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Chris Male
Assignee: Chris Male
Priority: Minor
 Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, 
 LUCENE-3219.patch


 When updating my SOLR-2533 patch, one issue was that the int value I had 
 given my new type had been used by another change in the mean time.  Since we 
 don't use these fields in a bitset kind of way, we can convert them to an 
 enum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum

2011-06-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052404#comment-13052404
 ] 

Uwe Schindler commented on LUCENE-3219:
---

Just commit this, the other issue is quite unrelated, I just had same idea.

 Change SortField types to an Enum
 -

 Key: LUCENE-3219
 URL: https://issues.apache.org/jira/browse/LUCENE-3219
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Chris Male
Assignee: Chris Male
Priority: Minor
 Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, 
 LUCENE-3219.patch


 When updating my SOLR-2533 patch, one issue was that the int value I had 
 given my new type had been used by another change in the mean time.  Since we 
 don't use these fields in a bitset kind of way, we can convert them to an 
 enum.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-06-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned SOLR-2458:
-

Assignee: Jan Høydahl

 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
Assignee: Jan Høydahl
  Labels: post.jar
 Fix For: 3.3

 Attachments: SOLR-2458.patch, SOLR-2458.patch


 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-06-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052407#comment-13052407
 ] 

Jan Høydahl commented on SOLR-2458:
---

Has anyone got around to inspecting this patch? I'd like to get this into 3.3.

 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
Assignee: Jan Høydahl
  Labels: post.jar
 Fix For: 3.3

 Attachments: SOLR-2458.patch, SOLR-2458.patch


 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052409#comment-13052409
 ] 

Paul Elschot commented on LUCENE-2454:
--

This overlaps with the BlockJoinQuery of LUCENE-3171, this issue might even be 
closed as duplicate of that one. Which one is preferred?

On using prev/nextSetBit in a safe range, this safe range starts with the 
parent and ends with the largest known child. A variant of prevSetBit could 
take this largest known child as an argument to limit its search, and then from 
the return value one has either a new parent, or one is certain that the 
current parent is the right one. This would also limit the worst case number of 
inspected bits for the group to the group size.

With or without that variant, I think it would be good to add a remark in the 
javadocs about the possible inefficiency of the use of OpenBitSet for larger 
group sizes. When the typical group size gets a lot bigger than the number of 
bits in a long, another implementation might be faster. This remark the in 
javadocs would allow us to wait for someone to come along with bigger group 
sizes and a real performance problem here.



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2383) Velocity: Generalize range and date facet display

2011-06-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052410#comment-13052410
 ] 

Jan Høydahl commented on SOLR-2383:
---

3.3 will support the [from TO to} syntax, right? Attempt to get this in for 
3.3. Grant?

 Velocity: Generalize range and date facet display
 -

 Key: SOLR-2383
 URL: https://issues.apache.org/jira/browse/SOLR-2383
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
  Labels: facet, range, velocity
 Fix For: 3.3

 Attachments: SOLR-2383-branch_32.patch, SOLR-2383.patch, 
 SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch


 Velocity (/browse) GUI has hardcoded price range facet and a hardcoded 
 manufacturedate_dt date facet. Need general solution which work for any 
 facet.range and facet.date.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2341) explore morfologik integration

2011-06-21 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052421#comment-13052421
 ] 

Dawid Weiss commented on LUCENE-2341:
-

I did some analyses on both dictionaries.
{noformat}
Number of lines (distict surface forms):

  3.662.366 morfologik.utf8
  5.086.141 sgjp.utf8

Distinct words (not in both):

  2.729.334 unique.utf8

  - upper/lower case (morfologik has upper case forms, morfeusz only lower case 
surface forms)

acerze
Acerze

  - very rare or jargon;

abszminka
abszytowałem
acetobakteria
acetarsolowi
niebombiasto
hakatystce
hakatystycznościach
warzże

  - differences in spelling;

abelard
abélard

  - acronyms and super-short stuff

aap
aar

Dictinct normalized (lowercase):

  2.564.366 lowered.utf8

  Most of these are very infrequent words or inflection forms. There are minor 
differences or
  missing surface forms in both dictionaries, as in here (mz - morfeusz, mk - 
morfologik):

mz hakersko
mz hakerskość
mz hakerskości
mz hakerskością
mz hakerskościach
mz hakerskościami
mz hakerskościom
mk hakerstw
mk hakerstwa
...
mk hakowałyśmy
mk hakowań
mk hakowaniach
mk hakowaniami
mk hakowaniom
mz hakowatość
mz hakowatości
mz hakowatością
mz hakowatościach
mz hakowatościami
mz hakowatościom
{noformat}

So... the conclusion is pretty consistent with Zipf's law: both dictionaries 
have a fairly different coverage, even if they're quite large. We don't have a 
frequency dictionary for Polish, but I assume most of these surface forms are 
purely theoretical and occur super-rarely in practice. This said, I think we 
should use BOTH dictionaries -- after all there's no harm done if we overdo the 
lemmatization process a little bit, is there?

So... my proposal would be this: I'll integrate Morfeusz's dictionary in 
Morfologik (as an alternative dictionary one can load and use). 

Eventually it would be probably sensible to limit the automaton for use in 
Lucene to store surface forms and lemmas only (no POS tags) and merge both 
dictionaries into a single automaton... but this can  be a future improvement.



 explore morfologik integration
 --

 Key: LUCENE-2341
 URL: https://issues.apache.org/jira/browse/LUCENE-2341
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar


 Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
 available:
 http://sourceforge.net/projects/morfologik/
 This works differently than LUCENE-2298, and ideally would be another option 
 for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2341) explore morfologik integration

2011-06-21 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052423#comment-13052423
 ] 

Dawid Weiss commented on LUCENE-2341:
-

One note wrt patch: I would use an explicit pointer over a list of returned 
WordData entries instead of adding them to a local list:

private ListWordData stemsAcc = new ArrayListWordData();

Right now you're shifting the internal array on each call unnecessarily (just 
increase an int ptr instead):

+  termAtt.setEmpty().append(stemsAcc.remove(0).getStem().toString());

getStem() should also be enough since it's a CharSequence, right? No need for 
an intermediate String.

 explore morfologik integration
 --

 Key: LUCENE-2341
 URL: https://issues.apache.org/jira/browse/LUCENE-2341
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar


 Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
 available:
 http://sourceforge.net/projects/morfologik/
 This works differently than LUCENE-2298, and ideally would be another option 
 for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1431) CommComponent abstracted

2011-06-21 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1431:
-

Attachment: SOLR-1431.patch

This time use a factory to create shardHandler
{code:xml}

requestHandler name=standard class=solr.SearchHandler default=true
!-- other params go here --
 
 shardHandlerFactory class=HttpShardHandlerFactory

int name=socketTimeOut1000/int
int name=connTimeOut5000/int
  /shardHandler
  /requestHandler
{code}

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-21 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052436#comment-13052436
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. This overlaps with the BlockJoinQuery of LUCENE-3171, this issue might even 
be closed as duplicate of that one. Which one is preferred?

We need to look at the likely use cases. 2454 was created to service a use case 
which I expect to be a very common pattern and I'm not sure if LUCENE-3171 
satisfies this need. Apps commonly need to return a selection of both matching 
and non-matching children along with the best parents. Why? - it's a very 
similar rationale to the way that highlighting returns a summary of text - it 
doesn't just return the matched words, it also returns surrounding text as 
useful context when displaying results to users. However, some texts can be 
very large and there's a need to limit what context is brought back.
If we apply this logic to 2454 we can see that for the top parents it is common 
to also want some non-matching children (e.g. for a resume return a person's 
employment history - not just the employments that matched the original search) 
but it is also necessary to summarize some parent's history (e.g. the 
contractor who listed a gazillion positions in his employment history needs 
summarising). A common pattern is for solutions to ask for the best 11 children 
for the best parents and display only 10 - that way the app knows that for 
certain parents there is more data available (i.e. those with 11 matches) and 
can offer a more button to retrieve the extra children for parents of 
interest. 2454 satisfies this use case as follows:
# Use a NestedDocumentQuery to get best parents with child criteria expressed 
as a must
# Use a PerParentLimitedQuery to get a selection of children per top parent 
where MUST belong to a top parent (tested using primary key) and use the child 
criteria again but this time as a SHOULD clause to relevance rank the 
selection of children returned

It's worth considering this sort of use case carefully before making any code 
decisions.



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc

2011-06-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052446#comment-13052446
 ] 

Michael McCandless commented on LUCENE-3223:


Shouldn't this be backported to 3.x too?

 SearchWithSortTask ignores sorting by Doc
 -

 Key: LUCENE-3223
 URL: https://issues.apache.org/jira/browse/LUCENE-3223
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Chris Male
Assignee: Chris Male
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch


 During my work in LUCENE-3912, I found the following code:
 {code}
 if (field.equals(doc)) {
 sortField0 = SortField.FIELD_DOC;
 } if (field.equals(score)) {
 sortField0 = SortField.FIELD_SCORE;
 } ...
 {code}
 This means the setting of SortField.FIELD_DOC is ignored.  While I don't know 
 much about this code, this seems like a valid setting and obviously just a 
 bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2341) explore morfologik integration

2011-06-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052451#comment-13052451
 ] 

Robert Muir commented on LUCENE-2341:
-

{quote}
Eventually it would be probably sensible to limit the automaton for use in 
Lucene to store surface forms and lemmas only (no POS tags) and merge both 
dictionaries into a single automaton... but this can be a future improvement.
{quote}

or alternatively, you can expose the POS tags for each stem to lucene right, 
easiest way would be to put it into TypeAttribute (a string), but you could 
make your own strongly-typed one if thats a better fit.
 
this could be useful for downstream processing.


 explore morfologik integration
 --

 Key: LUCENE-2341
 URL: https://issues.apache.org/jira/browse/LUCENE-2341
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar


 Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
 available:
 http://sourceforge.net/projects/morfologik/
 This works differently than LUCENE-2298, and ideally would be another option 
 for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2011-06-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052456#comment-13052456
 ] 

Robert Muir commented on SOLR-2487:
---

Without knowing anything about logging, I just want to say its a bit scary
to parameterize the build in any way:
* how are the different possibilities going to be tested?
* are all possibilities supported, or is only the default/tested parameter the 
one we officially support?


 Do not include slf4j-jdk14 jar in WAR
 -

 Key: SOLR-2487
 URL: https://issues.apache.org/jira/browse/SOLR-2487
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2, 4.0
Reporter: Jan Høydahl
  Labels: logging, slf4j

 I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
 newbies get up and running. But I find myself re-packaging the war for every 
 customer when adapting to their choice of logger framework, which is 
 counter-productive.
 It would be sufficient to have the jdk-logging binding in example/lib to let 
 the example and tutorial still work OOTB but as soon as you deploy solr.war 
 to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052459#comment-13052459
 ] 

Michael McCandless commented on LUCENE-2454:


{quote}
bq. It uses 2 passes if you also want to collect child docs per parent

I tend to work with distributed indexes so it involves a 2 pass op anyway - one 
to understand best parents across the multiple shards first then the 
perparentlimitedquery to ensure we only pay the retrieve costs for those 
parents that make the final cut.
{quote}

The distributed case can still be done single pass, using LUCENE-3171,
ie each shard returns the top groups and then they are merged in the
front.  This should be substantially faster than doing a 2nd pass out
to all shards.

Also, we now have TopDocs.merge/TopGroups.merge to support this use
case.

bq. This overlaps with the BlockJoinQuery of LUCENE-3171, this issue might even 
be closed as duplicate of that one. Which one is preferred?

I think they are likely dups of one another and I agree we need to
make sure all important use cases are covered.

bq. Apps commonly need to return a selection of both matching and non-matching 
children along with the best parents.

LUCENE-3171 can do this as well, with the same approach as here, ie
doing 2 passes with two different child queries.

However, I think for both this issue and for LUCENE-3171, this means
each child doc must have the parent's PK indexed against it, right?
Ie, for that 2nd query you need some way to return all child docs
under any of the top parents, so the child query is parentID MUST be
in XX, YY, ZZ and childDoc SHOULD XYZ.

In fact, we could make this a single pass capability with LUCENE-3171
and without requireing each child doc index its parent PK, ie also
pull  sort all other non-matching children under any top parent,
because collction within each parent is done when you retrieve the
TopGroups, but this can be a later enhancement.


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3218) Make CFS appendable

2011-06-21 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3218:


Attachment: LUCENE-3218.patch

next iteration - seems close. 

* moved CFW to o.a.l.store and made package private.
* added createCompoundOutput to Directory instead of passing OpenMode
* added write support to CompundFileDirectory
* Separately written file are appended during close if possible (no other file 
is currently written directly to the CF). If files is locked append happens 
once that file is closed.
* IW uses Directory methods only, addFile has been converted to Directory#copy


once thing which still bugs me is the setAbortCheck on CFDirectory.. I wonder 
if we can solve that differently, ideas?


 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action

2011-06-21 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-2610:


Attachment: SOLR-2610.patch

Patch adds a boolean deleteIndex parameter to core unload action.

There is a close hook interface in SolrCore but it is called before the update 
handler and searcher(s) are closed so it cannot be used to delete the index.

Changes:
* Changes the CloseHook interface to an abstract class with a 
preClose(SolrCore) and a postClose(SolrCore) method
* Changed the usage of CloseHook in ReplicationHandler, SolrCoreTest
* CoreAdminHandler adds a closehook on receiving an unload action with 
deleteIndex=true
* Added tests for the new param

Since the CloseHook is used very sparingly, I think it is fine to change it to 
an abstract class but if people feel strongly against it, we can find another 
way.

 Add an option to delete index through CoreAdmin UNLOAD action
 -

 Key: SOLR-2610
 URL: https://issues.apache.org/jira/browse/SOLR-2610
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2610.patch


 Right now, one can unload a Solr Core but the index files are left behind and 
 consume disk space. We should have an option to delete the index when 
 unloading a core.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2341) explore morfologik integration

2011-06-21 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052483#comment-13052483
 ] 

Dawid Weiss commented on LUCENE-2341:
-

I've just published morfologik 1.5.2, Michał. This comes with two dictionaries 
(morfologik and morfeusz) that can be used as one (fallback for missing words) 
or separately, but I would stick to using morfologik as the default dictionary 
(possibly with an option of using morfeusz?). POS tags have a different 
notation in these two resources, so mixing both is probably not a good idea.

Will you update the patch? Thanks.

 explore morfologik integration
 --

 Key: LUCENE-2341
 URL: https://issues.apache.org/jira/browse/LUCENE-2341
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar


 Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
 available:
 http://sourceforge.net/projects/morfologik/
 This works differently than LUCENE-2298, and ideally would be another option 
 for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

2011-06-21 Thread Matteo Melli (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052492#comment-13052492
 ] 

Matteo Melli commented on SOLR-2564:


Hi there,

I'm testing this functionality into my project and found what I think it's a 
bug. The revision I'm working on is 1137889.

I could reproduce the bug with a really simple index (the column is of type 
solr.String):

|| Col1 ||
| 1 |
| 2 |
| 3 |

The bug appear when I try to do a query with grouping mixing parameters start 
(with a value greather than 0) and group.main=true:

http://localhost:8983/solr/test/select/?q=*:*start=1group=truegroup.field=Col1group.main=true

The error trace is:

Jun 21, 2011 1:32:10 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 3
at org.apache.solr.search.DocSlice$1.nextDoc(DocSlice.java:119)
at 
org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:247)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:153)
at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:111)
at 
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:37)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:340)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:242)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)

The problem does not appear without group.main=true so this may be a related 
bug to that option.

PS: I was not sure if there where to open a bug since the version affected is 
still in development. Anyway sorry for any inconvenient.

 Integrating grouping module into Solr 4.0
 -

 Key: SOLR-2564
 URL: https://issues.apache.org/jira/browse/SOLR-2564
 Project: Solr
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Blocker
 Fix For: 4.0

 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, 
 SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, 
 SOLR-2564.patch


 Since work on grouping module is going well. I think it is time to wire this 
 up in Solr.
 Besides the current grouping features Solr provides, Solr will then also 
 support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-21 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Done.

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-21 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: (was: LUCENE-3220.patch)

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-21 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

Done.

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-21 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Comment: was deleted

(was: Done.)

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3171) BlockJoinQuery/Collector

2011-06-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052513#comment-13052513
 ] 

Paul Elschot commented on LUCENE-3171:
--

BlockJoinQuery still needs hashCode/equals, and a javadoc note (as I remarked 
earlier at 2454) about the possible inefficiency of the use of OpenBitSet for 
larger group sizes. When the typical group size gets a lot bigger than the 
number of bits in a long, another implementation might be faster. This remark 
the in javadocs would allow us to wait for someone to come along with bigger 
group sizes and a real performance problem here.

I would prefer to use single pass and for now I only need the parent docs. That 
means that I have no preference for 2454 or this one.


 BlockJoinQuery/Collector
 

 Key: LUCENE-3171
 URL: https://issues.apache.org/jira/browse/LUCENE-3171
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/other
Reporter: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3171.patch, LUCENE-3171.patch


 I created a single-pass Query + Collector to implement nested docs.
 The approach is similar to LUCENE-2454, in that the app must index
 documents in join order, as a block (IW.add/updateDocuments), with
 the parent doc at the end of the block, except that this impl is one
 pass.
 Once you join at indexing time, you can take any query that matches
 child docs and join it up to the parent docID space, using
 BlockJoinQuery.  You then use BlockJoinCollector, which sorts parent
 docs by provided Sort, to gather results, grouped by parent; this
 collector finds any BlockJoinQuerys (using Scorer.visitScorers) and
 retains the child docs corresponding to each collected parent doc.
 After searching is done, you retrieve the TopGroups from a provided
 BlockJoinQuery.
 Like LUCENE-2454, this is less general than the arbitrary joins in
 Solr (SOLR-2272) or parent/child from ElasticSearch
 (https://github.com/elasticsearch/elasticsearch/issues/553), since you
 must do the join at indexing time as a doc block, but it should be
 able to handle nested joins as well as joins to multiple tables,
 though I don't yet have test cases for these.
 I put this in a new Join module (modules/join); I think as we
 refactor join impls we should put them here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2598) exampledocs/books.json should use name instead of title

2011-06-21 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052572#comment-13052572
 ] 

Yonik Seeley commented on SOLR-2598:


Yeah, looks fine.

 exampledocs/books.json should use name instead of title
 ---

 Key: SOLR-2598
 URL: https://issues.apache.org/jira/browse/SOLR-2598
 Project: Solr
  Issue Type: Improvement
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.3

 Attachments: SOLR-2598.patch


 The file exampledocs/books.json currently contains two books. But they do not 
 show up in the default solr/browse interface because they use title instead 
 of name, which the Velocity template does not show. Also we should include 
 a few more books

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3218) Make CFS appendable

2011-06-21 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3218:


Attachment: LUCENE-3218.patch

updated patch NOW containing all files :)

sorry for the missing files in the last patch

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2598) exampledocs/books.json should use name instead of title

2011-06-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-2598.
---

Resolution: Fixed

Committed
trunk: r1138017
3.x: r1138020

 exampledocs/books.json should use name instead of title
 ---

 Key: SOLR-2598
 URL: https://issues.apache.org/jira/browse/SOLR-2598
 Project: Solr
  Issue Type: Improvement
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.3

 Attachments: SOLR-2598.patch


 The file exampledocs/books.json currently contains two books. But they do not 
 show up in the default solr/browse interface because they use title instead 
 of name, which the Velocity template does not show. Also we should include 
 a few more books

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2598) exampledocs/books.json should use name instead of title

2011-06-21 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052595#comment-13052595
 ] 

Yonik Seeley commented on SOLR-2598:


Note that if you click the All tab on JIRA, it will show your two commits 
(hence you don't need to bother to list the revisions if you don't want).

 exampledocs/books.json should use name instead of title
 ---

 Key: SOLR-2598
 URL: https://issues.apache.org/jira/browse/SOLR-2598
 Project: Solr
  Issue Type: Improvement
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.3

 Attachments: SOLR-2598.patch


 The file exampledocs/books.json currently contains two books. But they do not 
 show up in the default solr/browse interface because they use title instead 
 of name, which the Velocity template does not show. Also we should include 
 a few more books

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-06-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052597#comment-13052597
 ] 

Michael McCandless commented on LUCENE-3218:


Patch looks great!

Can we name it createCompoundOutput?  Emphasizes that we are
write-once (this file shouldn't exist), and matches createOutput.

On checkAbort... we could not send that to the CFW and instead call
checkAbort in the outer loops?  (Ie, where we .copy the files in).
The existing CFW already only checks once-per-file anyway...

Maybe instead of asserts for the mis-use of the CFD API (eg no
entries, something is still open), we should make these real
exceptions (ie, thrown even when assertions are off)?

This comment looks stale (in CFW.java)?:
{noformat}
  // Close the output stream. Set the os to null before trying to
  // close so that if an exception occurs during the close, the
  // finally clause below will not attempt to close the stream
  // the second time.
{noformat}

openCompoundOutput needs javadoc.

CFD.createOutput's jdoc says Not Implememented but it is.

The new test cases in TestCompoundFile names its file d.csf ;) Column
stride fields lives on!!  Too many tlas...


 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-21 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052601#comment-13052601
 ] 

Peter Wolanin commented on SOLR-2462:
-

I generated a patch for 3.2 looking at the commit on branch_3x.  It looks 
somewhat different from the last patch by James.

I also just compared the trunk commit to the last patch and it doesn't match 
https://issues.apache.org/jira/secure/attachment/12481574/SOLR-2462.patch  

Did the wrong patch get committed, or was the final patch just never get posted 
to this issue before commit?

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Assignee: Robert Muir
Priority: Critical
 Fix For: 3.3, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2598) exampledocs/books.json should use name instead of title

2011-06-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052600#comment-13052600
 ] 

Jan Høydahl commented on SOLR-2598:
---

Ok, thanks

 exampledocs/books.json should use name instead of title
 ---

 Key: SOLR-2598
 URL: https://issues.apache.org/jira/browse/SOLR-2598
 Project: Solr
  Issue Type: Improvement
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.3

 Attachments: SOLR-2598.patch


 The file exampledocs/books.json currently contains two books. But they do not 
 show up in the default solr/browse interface because they use title instead 
 of name, which the Velocity template does not show. Also we should include 
 a few more books

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1750) SolrInfoMBeanHandler - replacement for stats.jsp and registry.jsp

2011-06-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052606#comment-13052606
 ] 

Jan Høydahl commented on SOLR-1750:
---

The /admin/stats handler is not registered by default, nor is it included in 
example config. I had to add requestHandler name=/admin/stats 
class=org.apache.solr.handler.admin.SolrInfoMBeanHandler / to my solrconfig 
to get it working.

 SolrInfoMBeanHandler - replacement for stats.jsp and registry.jsp
 -

 Key: SOLR-1750
 URL: https://issues.apache.org/jira/browse/SOLR-1750
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Trivial
 Fix For: 1.5, 3.1, 4.0

 Attachments: SOLR-1750-followup.patch, 
 SystemStatsRequestHandler.java, SystemStatsRequestHandler.java, 
 SystemStatsRequestHandler.java


 stats.jsp is cool and all, but suffers from escaping issues, and also is not 
 accessible from SolrJ or other standard Solr APIs.
 Here's a request handler that emits everything stats.jsp does.
 For now, it needs to be registered in solrconfig.xml like this:
 {code}
 requestHandler name=/admin/stats 
 class=solr.SystemStatsRequestHandler /
 {code}
 But will register this in AdminHandlers automatically before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3218) Make CFS appendable

2011-06-21 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3218:


Attachment: LUCENE-3218.patch

final patch. 
* fixed javadocs + several javadoc warnings
* renamed openCompoundOutput to createCompoundOutput
* fixed file extensions in test CSF LOL!!
* copyFileEntry now deletes files that are separately written once copied into 
the CFS.
* converted asserts to exceptions in CFW

I plan to commit this today if nobody objects.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2489) Remove old lucene.apache.org/solr/who page

2011-06-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-2489.
---

Resolution: Fixed

 Remove old lucene.apache.org/solr/who page
 --

 Key: SOLR-2489
 URL: https://issues.apache.org/jira/browse/SOLR-2489
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1, 3.2
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 3.3


 In the distribution, docs/who.html is old - refers to the old Solr committers 
 list at http://lucene.apache.org/solr/who
 Fix would be to simply delete the old page

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action

2011-06-21 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052624#comment-13052624
 ] 

Jason Rutherglen commented on SOLR-2610:


This is good!  I had to write the same functionality into a custom Solr build 
on a project.

 Add an option to delete index through CoreAdmin UNLOAD action
 -

 Key: SOLR-2610
 URL: https://issues.apache.org/jira/browse/SOLR-2610
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2610.patch


 Right now, one can unload a Solr Core but the index files are left behind and 
 consume disk space. We should have an option to delete the index when 
 unloading a core.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-21 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052623#comment-13052623
 ] 

James Dyer commented on SOLR-2462:
--

Peter,

I reviewed Robert's commits (r1132730 to branch_3x ; r1132729 to trunk), and 
they appear to match the 06/Jun/11 15:10 version of the patch.  I looked mostly 
at the change in TestSpellCheckResponse.java, which is the last tweak that was 
made.  Keep in mind there are a few things that were committed that aren't in 
the patch (changes.txt, etc).  Did you have other specific discrepancies in 
mind?

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Assignee: Robert Muir
Priority: Critical
 Fix For: 3.3, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2548) Remove all interning of field names from flex API

2011-06-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2548.


Resolution: Fixed

Committed!  Uwe, I think I fixed all the places where we were making a 
placeholder term just to hold a field...

 Remove all interning of field names from flex API
 -

 Key: LUCENE-2548
 URL: https://issues.apache.org/jira/browse/LUCENE-2548
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2548.patch, LUCENE-2548.patch


 In previous versions of Lucene, interning of fields was important to minimize 
 string comparison cost when iterating TermEnums, to detect changes in field 
 name. As we separated field names from terms in flex, no query compares field 
 names anymore, so the whole performance problematic interning can be removed. 
 I will start with doing this, but we need to carefully review some places 
 e.g. in preflex codec.
 Maybe before this issue we should remove the Term class completely. :-) 
 Robert?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3222) Buffered deletes under count RAM

2011-06-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3222:
---

Attachment: LUCENE-3222.patch

Simple patch, I'll commit shortly  backport.

 Buffered deletes under count RAM
 

 Key: LUCENE-3222
 URL: https://issues.apache.org/jira/browse/LUCENE-3222
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3222.patch


 I found this while working on LUCENE-2548: when we freeze the deletes (create 
 FrozenBufferedDeletes), when we set the bytesUsed we are failing to account 
 for RAM required for the term bytes (and now term field).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3201) improved compound file handling

2011-06-21 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3201.
-

Resolution: Fixed
  Assignee: Simon Willnauer

incorporated in LUCENE-3218 I will track backporting there

 improved compound file handling
 ---

 Key: LUCENE-3201
 URL: https://issues.apache.org/jira/browse/LUCENE-3201
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
Assignee: Simon Willnauer
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3201.patch, LUCENE-3201.patch


 Currently CompoundFileReader could use some improvements, i see the following 
 problems
 * its CSIndexInput extends bufferedindexinput, which is stupid for 
 directories like mmap.
 * it seeks on every readInternal
 * its not possible for a directory to override or improve the handling of 
 compound files.
 for example: it seems if you were impl'ing this thing from scratch, you would 
 just wrap the II directly (not extend BufferedIndexInput,
 and add compound file offset X to seek() calls, and override length(). But of 
 course, then you couldnt throw read past EOF always when you should,
 as a user could read into the next file and be left unaware.
 however, some directories could handle this better. for example MMapDirectory 
 could return an indexinput that simply mmaps the 'slice' of the CFS file.
 its underlying bytebuffer etc naturally does bounds checks already etc, so it 
 wouldnt need to be buffered, not even needing to add any offsets to seek(),
 as its position would just work.
 So I think we should try to refactor this so that a Directory can customize 
 how compound files are handled, the simplest 
 case for the least code change would be to add this to Directory.java:
 {code}
   public Directory openCompoundInput(String filename) {
 return new CompoundFileReader(this, filename);
   }
 {code}
 Because most code depends upon the fact compound files are implemented as a 
 Directory and transparent. at least then a subclass could override...
 but the 'recursion' is a little ugly... we could still label it 
 expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-06-21 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052633#comment-13052633
 ] 

Simon Willnauer commented on LUCENE-3218:
-

Committed in revision 1138063.
I will try to backport this to 3.x if possible

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene 3.3 release soon?

2011-06-21 Thread Jan Høydahl
Grouping is really worth a release! But if group count in facet is within 
reach, wait for that!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 21. juni 2011, at 05.53, Bill Bell wrote:

 +1 wait for grouping post facet counts... Go Martijn v Groningen !!
 
 On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com
 wrote:
 
 +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after
 3.2.
 
 Mike McCandless
 
 http://blog.mikemccandless.com
 
 On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote:
 i was planning on doing an RC in a few weeks actually.
 
 we have a lot of good stuff in there today already, however i wanted
 to give a few weeks for the grouping stuff to run on hudson.
 
 On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 I would say within the next 3 month.
 
 Thoughts?
 
 On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček lukas.vl...@gmail.com
 wrote:
 Hi,
 How soon can we expect official Lucene 3.3 release?
 Best regards,
 Lukas
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene 3.3 release soon?

2011-06-21 Thread Robert Muir
Again, I don't think any future uncommitted features should block a
release, nor should there be a shoving period where features are
shoved in.

I'll be now looking at producing an RC as quickly as possible before
this can happen!

On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl j...@hoydahl.no wrote:
 Grouping is really worth a release! But if group count in facet is within 
 reach, wait for that!

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 21. juni 2011, at 05.53, Bill Bell wrote:

 +1 wait for grouping post facet counts... Go Martijn v Groningen !!

 On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com
 wrote:

 +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after
 3.2.

 Mike McCandless

 http://blog.mikemccandless.com

 On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote:
 i was planning on doing an RC in a few weeks actually.

 we have a lot of good stuff in there today already, however i wanted
 to give a few weeks for the grouping stuff to run on hudson.

 On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 I would say within the next 3 month.

 Thoughts?

 On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček lukas.vl...@gmail.com
 wrote:
 Hi,
 How soon can we expect official Lucene 3.3 release?
 Best regards,
 Lukas

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3222) Buffered deletes under count RAM

2011-06-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3222.


Resolution: Fixed

 Buffered deletes under count RAM
 

 Key: LUCENE-3222
 URL: https://issues.apache.org/jira/browse/LUCENE-3222
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3222.patch


 I found this while working on LUCENE-2548: when we freeze the deletes (create 
 FrozenBufferedDeletes), when we set the bytesUsed we are failing to account 
 for RAM required for the term bytes (and now term field).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2011-06-21 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052643#comment-13052643
 ] 

Hoss Man commented on SOLR-2487:


supported is always a vague term, but like with every other ant property in 
our build file, the default is the supported one that we test, and if you 
override a property when building from source that's a customization and we 
won't promise that it will always work.

it's no different then if they override the javac.source property, or 
build.encoding, etc...




 Do not include slf4j-jdk14 jar in WAR
 -

 Key: SOLR-2487
 URL: https://issues.apache.org/jira/browse/SOLR-2487
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2, 4.0
Reporter: Jan Høydahl
  Labels: logging, slf4j

 I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
 newbies get up and running. But I find myself re-packaging the war for every 
 customer when adapting to their choice of logger framework, which is 
 counter-productive.
 It would be sufficient to have the jdk-logging binding in example/lib to let 
 the example and tutorial still work OOTB but as soon as you deploy solr.war 
 to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3171) BlockJoinQuery/Collector

2011-06-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052642#comment-13052642
 ] 

Michael McCandless commented on LUCENE-3171:


bq. BlockJoinQuery still needs hashCode/equals

Woops, thanks, I'll add!

{quote}
and a javadoc note (as I remarked earlier at 2454) about the possible 
inefficiency of the use of OpenBitSet for larger group sizes. When the typical 
group size gets a lot bigger than the number of bits in a long, another 
implementation might be faster. This remark the in javadocs would allow us to 
wait for someone to come along with bigger group sizes and a real performance 
problem here.
{quote}

Hmm: do you have an improvement in mind for OpenBitSet.prevSetBit to better 
handle large groups?  Or, where is this possible inefficiency (is it something 
specific)?

bq. I would prefer to use single pass and for now I only need the parent docs. 
That means that I have no preference for 2454 or this one.

I wonder how often apps typically need just the parent docs vs the groups (w/ 
child docs)...

But, still this patch only calls .nextSetBit() once per group so that ought to 
be faster than LUCENE-2454, I think... hmm, unless you typically only have 1 
child match per parent.

 BlockJoinQuery/Collector
 

 Key: LUCENE-3171
 URL: https://issues.apache.org/jira/browse/LUCENE-3171
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/other
Reporter: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3171.patch, LUCENE-3171.patch


 I created a single-pass Query + Collector to implement nested docs.
 The approach is similar to LUCENE-2454, in that the app must index
 documents in join order, as a block (IW.add/updateDocuments), with
 the parent doc at the end of the block, except that this impl is one
 pass.
 Once you join at indexing time, you can take any query that matches
 child docs and join it up to the parent docID space, using
 BlockJoinQuery.  You then use BlockJoinCollector, which sorts parent
 docs by provided Sort, to gather results, grouped by parent; this
 collector finds any BlockJoinQuerys (using Scorer.visitScorers) and
 retains the child docs corresponding to each collected parent doc.
 After searching is done, you retrieve the TopGroups from a provided
 BlockJoinQuery.
 Like LUCENE-2454, this is less general than the arbitrary joins in
 Solr (SOLR-2272) or parent/child from ElasticSearch
 (https://github.com/elasticsearch/elasticsearch/issues/553), since you
 must do the join at indexing time as a doc block, but it should be
 able to handle nested joins as well as joins to multiple tables,
 though I don't yet have test cases for these.
 I put this in a new Join module (modules/join); I think as we
 refactor join impls we should put them here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052644#comment-13052644
 ] 

Michael McCandless commented on LUCENE-2454:


bq. A variant of prevSetBit could take this largest known child as an argument 
to limit its search,

I think we should not require the app to know the max number of children per 
parent?  (Ie, we should just grow buffers, etc., on demand as we collect).

I mean, if this information is easily available we could optimize for that 
case, but for some apps it's a good amount of work to record this and update it 
so I don't think it should be a required arg when creating the 
query/collectors, even though it's tempting ;)

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052648#comment-13052648
 ] 

Michael McCandless commented on LUCENE-2454:


bq. A common pattern is for solutions to ask for the best 11 children for the 
best parents and display only 10 - that way the app knows that for certain 
parents there is more data available (i.e. those with 11 matches) and can offer 
a more button to retrieve the extra children for parents of interest

With LUCENE-3171, you should be able to just ask for 10 here, and then check if 
the TopDocs.totalHits is  10 to decide whether to offer the more button.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2011-06-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052649#comment-13052649
 ] 

Robert Muir commented on SOLR-2487:
---

Hoss, ok, I just was trying to figure out the expectations for testing.

Testing with a different classpath or whatever is more difficult than the other 
'non-default' or 'conditional default' parameters that we randomize in Lucene 
to solve this issue (e.g. codecs, directories, locales, mergepolicies, ...), 
thats why I mentioned it.


 Do not include slf4j-jdk14 jar in WAR
 -

 Key: SOLR-2487
 URL: https://issues.apache.org/jira/browse/SOLR-2487
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2, 4.0
Reporter: Jan Høydahl
  Labels: logging, slf4j

 I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
 newbies get up and running. But I find myself re-packaging the war for every 
 customer when adapting to their choice of logger framework, which is 
 counter-productive.
 It would be sufficient to have the jdk-logging binding in example/lib to let 
 the example and tutorial still work OOTB but as soon as you deploy solr.war 
 to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1750) SolrInfoMBeanHandler - replacement for stats.jsp and registry.jsp

2011-06-21 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052656#comment-13052656
 ] 

Hoss Man commented on SOLR-1750:


Jan: as stated above the registration i picked was /admin/mbeans - stats is too 
specific since the component can be used for other purposes then getting stats.

it's also not a default handler -- it's registered if you register the 
AdminHandler

Jonathan: i overlooked your comment until now.  the existing SystemInfoHandler 
isn't deprecated -- it's still very useful and provides information about the 
entire system solr is running in (the jvm, the os, etc...)

 SolrInfoMBeanHandler - replacement for stats.jsp and registry.jsp
 -

 Key: SOLR-1750
 URL: https://issues.apache.org/jira/browse/SOLR-1750
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Trivial
 Fix For: 1.5, 3.1, 4.0

 Attachments: SOLR-1750-followup.patch, 
 SystemStatsRequestHandler.java, SystemStatsRequestHandler.java, 
 SystemStatsRequestHandler.java


 stats.jsp is cool and all, but suffers from escaping issues, and also is not 
 accessible from SolrJ or other standard Solr APIs.
 Here's a request handler that emits everything stats.jsp does.
 For now, it needs to be registered in solrconfig.xml like this:
 {code}
 requestHandler name=/admin/stats 
 class=solr.SystemStatsRequestHandler /
 {code}
 But will register this in AdminHandlers automatically before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene 3.3 release soon?

2011-06-21 Thread Simon Willnauer
On Tue, Jun 21, 2011 at 6:09 PM, Robert Muir rcm...@gmail.com wrote:
 Again, I don't think any future uncommitted features should block a
 release, nor should there be a shoving period where features are
 shoved in.

+1 - release early  often!!!

simon

 I'll be now looking at producing an RC as quickly as possible before
 this can happen!

 On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl j...@hoydahl.no wrote:
 Grouping is really worth a release! But if group count in facet is within 
 reach, wait for that!

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 21. juni 2011, at 05.53, Bill Bell wrote:

 +1 wait for grouping post facet counts... Go Martijn v Groningen !!

 On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com
 wrote:

 +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after
 3.2.

 Mike McCandless

 http://blog.mikemccandless.com

 On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote:
 i was planning on doing an RC in a few weeks actually.

 we have a lot of good stuff in there today already, however i wanted
 to give a few weeks for the grouping stuff to run on hudson.

 On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 I would say within the next 3 month.

 Thoughts?

 On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček lukas.vl...@gmail.com
 wrote:
 Hi,
 How soon can we expect official Lucene 3.3 release?
 Best regards,
 Lukas

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3224) bugs in ByteArrayDataInput

2011-06-21 Thread Robert Muir (JIRA)
bugs in ByteArrayDataInput
--

 Key: LUCENE-3224
 URL: https://issues.apache.org/jira/browse/LUCENE-3224
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir


ByteArrayDataInput has a byte[] ctor, but it doesn't actually work (some things 
like readVint will work, others will fail due to asserts).

The problem is it doesnt set things like limit in the ctor... I think the ctor 
should call reset()
Most code using this passes null to the ctor to initialize it, then uses 
reset(), instead they could just call ByteArrayInput(BytesRef.EMPTY_BYTES) if 
they want to do that.
finally, reset()'s limit looks like it should be offset + len

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3224) bugs in ByteArrayDataInput

2011-06-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052660#comment-13052660
 ] 

Robert Muir commented on LUCENE-3224:
-

also i think we want to assert all bounds checks in here, maybe have a 
checkBounds(int limit) called only from assert that throws read past EOF.

this way we don't rely upon AIOOBE, we could be reading from slices and miss 
bugs in tests.


 bugs in ByteArrayDataInput
 --

 Key: LUCENE-3224
 URL: https://issues.apache.org/jira/browse/LUCENE-3224
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir

 ByteArrayDataInput has a byte[] ctor, but it doesn't actually work (some 
 things like readVint will work, others will fail due to asserts).
 The problem is it doesnt set things like limit in the ctor... I think the 
 ctor should call reset()
 Most code using this passes null to the ctor to initialize it, then uses 
 reset(), instead they could just call ByteArrayInput(BytesRef.EMPTY_BYTES) if 
 they want to do that.
 finally, reset()'s limit looks like it should be offset + len

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: managing CHANGES.txt?

2011-06-21 Thread Robert Muir
On Tue, Jun 21, 2011 at 1:09 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 But there is no way for someone looking at the CHANGES for 4.0 to know
 for certain that the bits that make up that bug fix are in the 4.0 release
 -- the fact that it's listed in 3.2's CHANGES isn't an assurance, because
 4.0 comes from a completely different line of development.


its in the 4.0 CHANGES.txt, under the 3.2 section.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene 3.3 release soon?

2011-06-21 Thread johnmunir

-1 on release early  often.


Let us say you average 6-8 releases a month, this means there will be that many 
versions used by users.  Which means the amount of testing done on a release 
(by real users, in real environment) will be spread thin thus a release will 
not get the same amount of testing it otherwise would.  Not only that, more 
releases means more release specific questions.  Expect to see questions / 
issues reported and you must ask what version are you using? before you can 
answer.


May I suggest a scheduled release, once a quarter, near the end of a quarter?


-JM



-Original Message-
From: Simon Willnauer simon.willna...@googlemail.com
To: dev@lucene.apache.org
Sent: Tue, Jun 21, 2011 12:53 pm
Subject: Re: Lucene 3.3 release soon?


On Tue, Jun 21, 2011 at 6:09 PM, Robert Muir rcm...@gmail.com wrote:
 Again, I don't think any future uncommitted features should block a
 release, nor should there be a shoving period where features are
 shoved in.
+1 - release early  often!!!
simon

 I'll be now looking at producing an RC as quickly as possible before
 this can happen!

 On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl j...@hoydahl.no wrote:
 Grouping is really worth a release! But if group count in facet is within 
each, wait for that!

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 21. juni 2011, at 05.53, Bill Bell wrote:

 +1 wait for grouping post facet counts... Go Martijn v Groningen !!

 On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com
 wrote:

 +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after
 3.2.

 Mike McCandless

 http://blog.mikemccandless.com

 On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote:
 i was planning on doing an RC in a few weeks actually.

 we have a lot of good stuff in there today already, however i wanted
 to give a few weeks for the grouping stuff to run on hudson.

 On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 I would say within the next 3 month.

 Thoughts?

 On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček lukas.vl...@gmail.com
 wrote:
 Hi,
 How soon can we expect official Lucene 3.3 release?
 Best regards,
 Lukas

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
o unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
or additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3224) bugs in ByteArrayDataInput

2011-06-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-3224:
--

Assignee: Michael McCandless

 bugs in ByteArrayDataInput
 --

 Key: LUCENE-3224
 URL: https://issues.apache.org/jira/browse/LUCENE-3224
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Michael McCandless
 Attachments: LUCENE-3224.patch


 ByteArrayDataInput has a byte[] ctor, but it doesn't actually work (some 
 things like readVint will work, others will fail due to asserts).
 The problem is it doesnt set things like limit in the ctor... I think the 
 ctor should call reset()
 Most code using this passes null to the ctor to initialize it, then uses 
 reset(), instead they could just call ByteArrayInput(BytesRef.EMPTY_BYTES) if 
 they want to do that.
 finally, reset()'s limit looks like it should be offset + len

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3224) bugs in ByteArrayDataInput

2011-06-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3224:
---

Attachment: LUCENE-3224.patch

Patch.

 bugs in ByteArrayDataInput
 --

 Key: LUCENE-3224
 URL: https://issues.apache.org/jira/browse/LUCENE-3224
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Michael McCandless
 Attachments: LUCENE-3224.patch


 ByteArrayDataInput has a byte[] ctor, but it doesn't actually work (some 
 things like readVint will work, others will fail due to asserts).
 The problem is it doesnt set things like limit in the ctor... I think the 
 ctor should call reset()
 Most code using this passes null to the ctor to initialize it, then uses 
 reset(), instead they could just call ByteArrayInput(BytesRef.EMPTY_BYTES) if 
 they want to do that.
 finally, reset()'s limit looks like it should be offset + len

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: managing CHANGES.txt?

2011-06-21 Thread Steven A Rowe
Robert,

Is the CHANGES.txt policy you advocate (and police) written up in one place?  
I'm sure you'd like to not have to fix up everybody's entries

Steve

 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Tuesday, June 21, 2011 1:14 PM
 To: dev@lucene.apache.org
 Subject: Re: managing CHANGES.txt?
 
 On Tue, Jun 21, 2011 at 1:09 PM, Chris Hostetter
 hossman_luc...@fucit.org wrote:
 
  But there is no way for someone looking at the CHANGES for 4.0 to know
  for certain that the bits that make up that bug fix are in the 4.0
 release
  -- the fact that it's listed in 3.2's CHANGES isn't an assurance,
 because
  4.0 comes from a completely different line of development.
 
 
 its in the 4.0 CHANGES.txt, under the 3.2 section.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3224) bugs in ByteArrayDataInput

2011-06-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052679#comment-13052679
 ] 

Robert Muir commented on LUCENE-3224:
-

+1

 bugs in ByteArrayDataInput
 --

 Key: LUCENE-3224
 URL: https://issues.apache.org/jira/browse/LUCENE-3224
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Michael McCandless
 Attachments: LUCENE-3224.patch


 ByteArrayDataInput has a byte[] ctor, but it doesn't actually work (some 
 things like readVint will work, others will fail due to asserts).
 The problem is it doesnt set things like limit in the ctor... I think the 
 ctor should call reset()
 Most code using this passes null to the ctor to initialize it, then uses 
 reset(), instead they could just call ByteArrayInput(BytesRef.EMPTY_BYTES) if 
 they want to do that.
 finally, reset()'s limit looks like it should be offset + len

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: managing CHANGES.txt?

2011-06-21 Thread Robert Muir
It wasn't anything i advocate, I'm just describing what it seems like
we do 99% of the time? (in my example, Uwe committed it, and I didnt
fix anything)

On Tue, Jun 21, 2011 at 1:23 PM, Steven A Rowe sar...@syr.edu wrote:
 Robert,

 Is the CHANGES.txt policy you advocate (and police) written up in one place?  
 I'm sure you'd like to not have to fix up everybody's entries

 Steve

 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Tuesday, June 21, 2011 1:14 PM
 To: dev@lucene.apache.org
 Subject: Re: managing CHANGES.txt?

 On Tue, Jun 21, 2011 at 1:09 PM, Chris Hostetter
 hossman_luc...@fucit.org wrote:
 
  But there is no way for someone looking at the CHANGES for 4.0 to know
  for certain that the bits that make up that bug fix are in the 4.0
 release
  -- the fact that it's listed in 3.2's CHANGES isn't an assurance,
 because
  4.0 comes from a completely different line of development.
 

 its in the 4.0 CHANGES.txt, under the 3.2 section.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8966 - Failure

2011-06-21 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8966/

10 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety

Error Message:
Error occurred in thread Thread-72: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/test6473964755tmp/_e_1.prx
 (Too many open files in system)

Stack Trace:
junit.framework.AssertionFailedError: Error occurred in thread Thread-72:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/test6473964755tmp/_e_1.prx
 (Too many open files in system)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343)
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/test6473964755tmp/_e_1.prx
 (Too many open files in system)
at 
org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822)


REGRESSION:  
org.apache.lucene.index.TestIndexWriterWithThreads.testImmediateDiskFullWithThreads

Error Message:
hit unexpected Throwable

Stack Trace:
junit.framework.AssertionFailedError: hit unexpected Throwable
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343)
at 
org.apache.lucene.index.TestIndexWriterWithThreads.testImmediateDiskFullWithThreads(TestIndexWriterWithThreads.java:140)


REGRESSION:  org.apache.lucene.index.TestStressIndexing2.testRandomIWReader

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:605)


FAILED:  
junit.framework.TestSuite.org.apache.lucene.search.TestFieldCacheRangeFilter

Error Message:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/2/test3857338582tmp/_1_1.doc
 (Too many open files in system)

Stack Trace:
java.io.FileNotFoundException: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/2/test3857338582tmp/_1_1.doc
 (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:233)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:110)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:133)
at 
org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:58)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:326)
at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:415)
at org.apache.lucene.store.Directory.openInput(Directory.java:118)
at 
org.apache.lucene.index.codecs.mocksep.MockSingleIntIndexInput.init(MockSingleIntIndexInput.java:40)
at 
org.apache.lucene.index.codecs.mocksep.MockSingleIntFactory.openInput(MockSingleIntFactory.java:31)
at 
org.apache.lucene.index.codecs.sep.IntStreamFactory.openInput(IntStreamFactory.java:28)
at 
org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl.init(SepPostingsReaderImpl.java:66)
at 
org.apache.lucene.index.codecs.mocksep.MockSepCodec.fieldsProducer(MockSepCodec.java:95)
at 
org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader.init(PerFieldCodecWrapper.java:113)
at 
org.apache.lucene.index.PerFieldCodecWrapper.fieldsProducer(PerFieldCodecWrapper.java:189)
at 
org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:88)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:640)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3450)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3119)
at 
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1879)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1874)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1870)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1484)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1236)
   

[jira] [Updated] (SOLR-2452) rewrite solr build system

2011-06-21 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2452:
--

Attachment: SOLR-2452.dir.reshuffle.sh
SOLR-2452-post-reshuffling.patch

This version of the shell script  patch removes Solrj's dependence on Solr 
core tests, by moving SolrJettyTestBase and ExternalPaths from Solr core to 
Solr's test-framework -- it turns out that these were the only two Solr core 
test classes that Solrj depended on.

 rewrite solr build system
 -

 Key: SOLR-2452
 URL: https://issues.apache.org/jira/browse/SOLR-2452
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Steven Rowe
 Fix For: 3.3, 4.0

 Attachments: SOLR-2452-post-reshuffling.patch, 
 SOLR-2452-post-reshuffling.patch, SOLR-2452.dir.reshuffle.sh, 
 SOLR-2452.dir.reshuffle.sh


 As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
 think we should rewrite the solr build system.
 Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3224) bugs in ByteArrayDataInput

2011-06-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3224.


   Resolution: Fixed
Fix Version/s: 4.0

 bugs in ByteArrayDataInput
 --

 Key: LUCENE-3224
 URL: https://issues.apache.org/jira/browse/LUCENE-3224
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-3224.patch


 ByteArrayDataInput has a byte[] ctor, but it doesn't actually work (some 
 things like readVint will work, others will fail due to asserts).
 The problem is it doesnt set things like limit in the ctor... I think the 
 ctor should call reset()
 Most code using this passes null to the ctor to initialize it, then uses 
 reset(), instead they could just call ByteArrayInput(BytesRef.EMPTY_BYTES) if 
 they want to do that.
 finally, reset()'s limit looks like it should be offset + len

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene 3.3 release soon?

2011-06-21 Thread Mark Miller
I think we might target fewer than 6-8 a month. That would be scary! I would 
guess it will be once a month at worse, and often less. Time will tell. 

You must already give version info with questions if you want decent help - 
nothing is going to change that.

- Mark


On Jun 21, 2011, at 1:15 PM, johnmu...@aol.com wrote:

 -1 on release early  often.
  
  
 Let us say you average 6-8 releases a month, this means there will be that 
 many versions used by users.  Which means the amount of testing done on a 
 release (by real users, in real environment) will be spread thin thus a 
 release will not get the same amount of testing it otherwise would.  Not only 
 that, more releases means more release specific questions.  Expect to see 
 questions / issues reported and you must ask what version are you using? 
 before you can answer.
  
  
 May I suggest a scheduled release, once a quarter, near the end of a quarter?
  
  
 -JM
  
  
 -Original Message-
 From: Simon Willnauer simon.willna...@googlemail.com
 To: dev@lucene.apache.org
 Sent: Tue, Jun 21, 2011 12:53 pm
 Subject: Re: Lucene 3.3 release soon?
 
 On Tue, Jun 21, 2011 at 6:09 PM, Robert Muir rcm...@gmail.com
  wrote:
  Again, I don't think any future uncommitted features should block a
  release, nor should there be a shoving period where features are
  shoved in.
 
 +1 - release early  often!!!
 
 simon
 
  I'll be now looking at producing an RC as quickly as possible before
  this can happen!
 
  On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl 
 j...@hoydahl.no
  wrote:
  Grouping is really worth a release! But if group count in facet is within 
 reach, wait for that!
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS - 
 www.cominvent.com
 
  Solr Training - 
 www.solrtraining.com
 
 
  On 21. juni 2011, at 05.53, Bill Bell wrote:
 
  +1 wait for grouping post facet counts... Go Martijn v Groningen !!
 
  On 6/20/11 12:03 PM, Michael McCandless 
 luc...@mikemccandless.com
 
  wrote:
 
  +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after
  3.2.
 
  Mike McCandless
 
  
 http://blog.mikemccandless.com
 
 
  On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir 
 rcm...@gmail.com
  wrote:
  i was planning on doing an RC in a few weeks actually.
 
  we have a lot of good stuff in there today already, however i wanted
  to give a few weeks for the grouping stuff to run on hudson.
 
  On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer
  
 simon.willna...@googlemail.com
  wrote:
  I would say within the next 3 month.
 
  Thoughts?
 
  On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček 
 lukas.vl...@gmail.com
 
  wrote:
  Hi,
  How soon can we expect official Lucene 3.3 release?
  Best regards,
  Lukas
 
  -
  To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
  For additional commands, e-mail: 
 dev-h...@lucene.apache.org
 
 
 
 
  -
  To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
  For additional commands, e-mail: 
 dev-h...@lucene.apache.org
 
 
 
 
  -
  To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
  For additional commands, e-mail: 
 dev-h...@lucene.apache.org
 
 
 
 
 
  -
  To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
  For additional commands, e-mail: 
 dev-h...@lucene.apache.org
 
 
 
 
  -
  To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
  For additional commands, e-mail: 
 dev-h...@lucene.apache.org
 
 
 
 
  -
  To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
  For additional commands, e-mail: 
 dev-h...@lucene.apache.org
 
 
 
 
 -
 To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
 For additional commands, e-mail: 
 dev-h...@lucene.apache.org

- Mark Miller
lucidimagination.com









-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052696#comment-13052696
 ] 

Michael McCandless commented on LUCENE-2454:


bq. I think the only thing 3171 may be missing from my original use cases then 
is that I can use multiple PerParentLimitedQueries in one query to get a limit 
of children of different types e.g. for each parent resume, max 10 results from 
employment detail children and max 10 results from education background 
children.

I think LUCENE-3171 can handle this, or something very similar: the
collector tracks all of the BlockJoinQuerys involved in the top query.

So, you'd have 1 BJQ matching employment detail child docs and
another matching education bg child docs.  The BJC collects the
top parent docs, then you can retrieve separate TopGroups for each
BJQ.

In the end you have a TopGroups for the employment detail child docs
and another TopGroups for the education bg child docs.

Could that work for your use case?


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: managing CHANGES.txt?

2011-06-21 Thread Steven A Rowe
On 6/21/2011 at 1:26 PM, Robert Muir wrote:
 On Tue, Jun 21, 2011 at 1:23 PM, Steven A Rowe sar...@syr.edu wrote:
  Is the CHANGES.txt policy you advocate (and police) written up in one
  place?  I'm sure you'd like to not have to fix up everybody's entries
 
 It wasn't anything i advocate, I'm just describing what it seems like
 we do 99% of the time? (in my example, Uwe committed it, and I didnt
 fix anything)

I'm confused - seems like you're disavowing the role you've been playing as 
CHANGES policeman - yet I've seen at least 10 CHANGES-policing commits within 
the last six weeks?:

http://svn.apache.org/viewvc?rev=1137361view=rev
http://svn.apache.org/viewvc?rev=1137359view=rev
http://svn.apache.org/viewvc?rev=1130564view=rev
http://svn.apache.org/viewvc?rev=1128248view=rev
http://svn.apache.org/viewvc?rev=1128247view=rev
http://svn.apache.org/viewvc?rev=1125127view=rev
http://svn.apache.org/viewvc?rev=1125128view=rev
http://svn.apache.org/viewvc?rev=1125134view=rev
http://svn.apache.org/viewvc?rev=1125135view=rev
http://svn.apache.org/viewvc?rev=1102119view=rev

Again, you obviously have a concrete idea of what should be done - can you 
point to a writeup?

Thanks,
Steve


Re: managing CHANGES.txt?

2011-06-21 Thread Robert Muir
On Tue, Jun 21, 2011 at 1:47 PM, Steven A Rowe sar...@syr.edu wrote:
 On 6/21/2011 at 1:26 PM, Robert Muir wrote:
 On Tue, Jun 21, 2011 at 1:23 PM, Steven A Rowe sar...@syr.edu wrote:
  Is the CHANGES.txt policy you advocate (and police) written up in one
  place?  I'm sure you'd like to not have to fix up everybody's entries

 It wasn't anything i advocate, I'm just describing what it seems like
 we do 99% of the time? (in my example, Uwe committed it, and I didnt
 fix anything)

 I'm confused - seems like you're disavowing the role you've been playing as 
 CHANGES policeman - yet I've seen at least 10 CHANGES-policing commits within 
 the last six weeks?:


I do disavow this role: when CHANGES.txt is jacked up, i fix it, I
don't complain to anyone about it. I dont understand how this makes me
a policeman?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: managing CHANGES.txt?

2011-06-21 Thread Mark Miller

On Jun 21, 2011, at 1:47 PM, Steven A Rowe wrote:

 
 
 Again, you obviously have a concrete idea of what should be done - can you 
 point to a writeup?
 
 Thanks,
 Steve


Thank you Robert for keeping Changes pretty.

-1 to more formalization, or writeups. I've seen the opinions in the emails 
on the topic now and before. Writeups turn into more than they should be over 
time, half the time. They end up stale or over followed.

- Mark Miller
lucidimagination.com









-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene 3.3 release soon?

2011-06-21 Thread johnmunir


My bad, I meant to say a “6-8 releases a year” .. grrr!!

So let me try this again. I don't like the current plan of release early  
often because:
So let me try this again. I don't like the current plan of release early  
often because:

1) It will spread testing thin of any release because fewer real users will be 
using a release when you have too many a year.

2) release early  often is not a well defined production release. It will 
lead to undefined gaps between releases (why X.Y took N weeks, but X.Z took M 
months?). This is why I suggested a quarterly release plan (it's what FF is now 
doing)

3) Do companies jump on a Lucene release as soon as one is made?  No, they have 
a process.  With too many releasees, they will now be more confused which 
releases to use; they want a release that proved itself.

--MJ



-Original Message-
From: Mark Miller markrmil...@gmail.com
To: dev@lucene.apache.org
Cc: simon.willna...@gmail.com
Sent: Tue, Jun 21, 2011 1:32 pm
Subject: Re: Lucene 3.3 release soon?


I think we might target fewer than 6-8 a month. That would be scary! I would 
uess it will be once a month at worse, and often less. Time will tell. 
You must already give version info with questions if you want decent help - 
othing is going to change that.
- Mark

n Jun 21, 2011, at 1:15 PM, johnmu...@aol.com wrote:
 -1 on release early  often.
  
  
 Let us say you average 6-8 releases a month, this means there will be that 
any versions used by users.  Which means the amount of testing done on a 
elease (by real users, in real environment) will be spread thin thus a release 
ill not get the same amount of testing it otherwise would.  Not only that, more 
eleases means more release specific questions.  Expect to see questions / 
ssues reported and you must ask what version are you using? before you can 
nswer.
  
  
 May I suggest a scheduled release, once a quarter, near the end of a quarter?
  
  
 -JM
  
  
 -Original Message-
 From: Simon Willnauer simon.willna...@googlemail.com
 To: dev@lucene.apache.org
 Sent: Tue, Jun 21, 2011 12:53 pm
 Subject: Re: Lucene 3.3 release soon?
 
 On Tue, Jun 21, 2011 at 6:09 PM, Robert Muir rcm...@gmail.com
  wrote:
  Again, I don't think any future uncommitted features should block a
  release, nor should there be a shoving period where features are
  shoved in.
 
 +1 - release early  often!!!
 
 simon
 
  I'll be now looking at producing an RC as quickly as possible before
  this can happen!
 
  On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl 
 j...@hoydahl.no
  wrote:
  Grouping is really worth a release! But if group count in facet is within 
 reach, wait for that!
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS - 
 www.cominvent.com
 
  Solr Training - 
 www.solrtraining.com
 
 
  On 21. juni 2011, at 05.53, Bill Bell wrote:
 
  +1 wait for grouping post facet counts... Go Martijn v Groningen !!
 
  On 6/20/11 12:03 PM, Michael McCandless 
 luc...@mikemccandless.com
 
  wrote:
 
  +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after
  3.2.
 
  Mike McCandless
 
  
 http://blog.mikemccandless.com
 
 
  On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir 
 rcm...@gmail.com
  wrote:
  i was planning on doing an RC in a few weeks actually.
 
  we have a lot of good stuff in there today already, however i wanted
  to give a few weeks for the grouping stuff to run on hudson.
 
  On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer
  
 simon.willna...@googlemail.com
  wrote:
  I would say within the next 3 month.
 
  Thoughts?
 
  On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček 
 lukas.vl...@gmail.com
 
  wrote:
  Hi,
  How soon can we expect official Lucene 3.3 release?
  Best regards,
  Lukas
 
  -
  To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
  For additional commands, e-mail: 
 dev-h...@lucene.apache.org
 
 
 
 
  -
  To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
  For additional commands, e-mail: 
 dev-h...@lucene.apache.org
 
 
 
 
  -
  To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
  For additional commands, e-mail: 
 dev-h...@lucene.apache.org
 
 
 
 
 
  -
  To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
  For additional commands, e-mail: 
 dev-h...@lucene.apache.org
 
 
 
 
  -
  To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
  For additional commands, e-mail: 
 dev-h...@lucene.apache.org
 
 
 
 
  -
  To unsubscribe, e-mail: 
 dev-unsubscr...@lucene.apache.org
 
  For additional commands, e-mail: 
 dev-h...@lucene.apache.org
 
 
 
 
 

Re: Lucene 3.3 release soon?

2011-06-21 Thread Simon Willnauer
On Tue, Jun 21, 2011 at 7:15 PM,  johnmu...@aol.com wrote:
 -1 on release early  often.


John, don't worry we won't do 6 or 8 a month. I think we rather
balance it with the features / bugfixes we can deliver. I think 1
every two month is a good rough estimate.

simon

 Let us say you average 6-8 releases a month, this means there will be that
 many versions used by users.  Which means the amount of testing done on a
 release (by real users, in real environment) will be spread thin thus a
 release will not get the same amount of testing it otherwise would.  Not
 only that, more releases means more release specific questions.  Expect to
 see questions / issues reported and you must ask what version are you
 using? before you can answer.


 May I suggest a scheduled release, once a quarter, near the end of a
 quarter?


 -JM


 -Original Message-
 From: Simon Willnauer simon.willna...@googlemail.com
 To: dev@lucene.apache.org
 Sent: Tue, Jun 21, 2011 12:53 pm
 Subject: Re: Lucene 3.3 release soon?

 On Tue, Jun 21, 2011 at 6:09 PM, Robert Muir rcm...@gmail.com wrote:
 Again, I don't think any future uncommitted features should block a
 release, nor should there be a shoving period where features are
 shoved in.

 +1 - release early  often!!!

 simon

 I'll be now looking at producing an RC as quickly as possible before
 this can happen!

 On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl j...@hoydahl.no wrote:
 Grouping is really worth a release! But if group count in facet is within
 reach, wait for that!

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 21. juni 2011, at 05.53, Bill Bell wrote:

 +1 wait for grouping post facet counts... Go Martijn v Groningen !!

 On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com
 wrote:

 +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after
 3.2.

 Mike McCandless

 http://blog.mikemccandless.com

 On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote:
 i was planning on doing an RC in a few weeks actually.

 we have a lot of good stuff in there today already, however i wanted
 to give a few weeks for the grouping stuff to run on hudson.

 On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 I would say within the next 3 month.

 Thoughts?

 On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček lukas.vl...@gmail.com
 wrote:
 Hi,
 How soon can we expect official Lucene 3.3 release?
 Best regards,
 Lukas

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: managing CHANGES.txt?

2011-06-21 Thread Steven A Rowe
Mark,

Staleness is way better than digging through mail archives, guessing and 
getting it wrong, or re-invention.

Word of mouth doesn't scale.  The Lucene/Solr dev community is growing.

Where I see an opportunity to document current practice, where it is less than 
obvious what to do, I will, modulo free time of course.

Feel free to ignore my idiocy.

Steve

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Tuesday, June 21, 2011 1:54 PM
 To: dev@lucene.apache.org
 Subject: Re: managing CHANGES.txt?
 
 On Jun 21, 2011, at 1:47 PM, Steven A Rowe wrote:
 
  Again, you obviously have a concrete idea of what should be done - can
 you point to a writeup?
 
  Thanks,
  Steve
 
 
 Thank you Robert for keeping Changes pretty.
 
 -1 to more formalization, or writeups. I've seen the opinions in the
 emails on the topic now and before. Writeups turn into more than they
 should be over time, half the time. They end up stale or over followed.
 
 - Mark Miller
 lucidimagination.com



RE: managing CHANGES.txt?

2011-06-21 Thread Steven A Rowe
On 6/21/2011 at 1:52 PM, Robert Muir wrote:
 On Tue, Jun 21, 2011 at 1:47 PM, Steven A Rowe sar...@syr.edu wrote:
  On 6/21/2011 at 1:26 PM, Robert Muir wrote:
   On Tue, Jun 21, 2011 at 1:23 PM, Steven A Rowe sar...@syr.edu wrote:
Is the CHANGES.txt policy you advocate (and police) written up in
one place?  I'm sure you'd like to not have to fix up everybody's
entries
  
   It wasn't anything i advocate, I'm just describing what it seems like
   we do 99% of the time? (in my example, Uwe committed it, and I didnt
   fix anything)
 
  I'm confused - seems like you're disavowing the role you've been
  playing as CHANGES policeman - yet I've seen at least 10 CHANGES-
  policing commits within the last six weeks?:
 
 I do disavow this role: when CHANGES.txt is jacked up, i fix it, I
 don't complain to anyone about it. I dont understand how this makes me
 a policeman?

CHANGES janitor???

Echoing Mark M., thanks for scrubbing.

I was looking to make it possible for others to share the load, by publicizing 
the target.

Steve




Re: managing CHANGES.txt?

2011-06-21 Thread Mark Miller
You 'remore prickly than me today Steve :)

You are of course free to document anything you see fit. And I'm free to weigh 
in on my opinion about documenting :)

That's how it works indeed, and it's a beautiful system.

- Mark

On Jun 21, 2011, at 2:08 PM, Steven A Rowe wrote:

 Mark,
 
 Staleness is way better than digging through mail archives, guessing and 
 getting it wrong, or re-invention.
 
 Word of mouth doesn't scale.  The Lucene/Solr dev community is growing.
 
 Where I see an opportunity to document current practice, where it is less 
 than obvious what to do, I will, modulo free time of course.
 
 Feel free to ignore my idiocy.
 
 Steve
 
 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Tuesday, June 21, 2011 1:54 PM
 To: dev@lucene.apache.org
 Subject: Re: managing CHANGES.txt?
 
 On Jun 21, 2011, at 1:47 PM, Steven A Rowe wrote:
 
 Again, you obviously have a concrete idea of what should be done - can
 you point to a writeup?
 
 Thanks,
 Steve
 
 
 Thank you Robert for keeping Changes pretty.
 
 -1 to more formalization, or writeups. I've seen the opinions in the
 emails on the topic now and before. Writeups turn into more than they
 should be over time, half the time. They end up stale or over followed.
 
 - Mark Miller
 lucidimagination.com
 

- Mark Miller
lucidimagination.com









-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene 3.3 release soon?

2011-06-21 Thread Robert Muir
And here are the reasons why I think we should release often:

1) As far as corporations worried about stability, if they are really
that worried, they should take a look at our stable branch and how
development is done around here, and these concerned corporations
should also take a look at how testing is done on this project. But in
any case, I could care less what corporations think.

2) The way I see it, we started releasing more often about a month
ago, and we also got a bunch of new committers (5, 6, 7? what is it
exactly?) in the last month too. We have a shitload of guys committing
a shitload of good stuff, and we want even more committers to get more
momentum.  Releasing is an important part of encouraging contributors
so that they see what they do actually getting out there.

3) When I look at http://wiki.apache.org/lucene-java/ReleaseNote33 and
http://wiki.apache.org/solr/ReleaseNote33, which only release major
features, not bugfixes or anything (see CHANGES.txt for that!), it
looks solid to me. These are major search features that users want,
some of them (e.g. autocomplete and grouping stuff) have been baking
in trunk for quite some time.

4) Finally, we won't make all users or even committers happy with any
given release. Thats why releases only need 3 +1 votes. That being
said, I'm talking about spinning up an RC soon, right before I go on
vacation. Sure we slipped the last one past hossman, but for this one,
its entirely possible he comes back with 87 problems in the release.
Big deal, worst case the RC fails, and if I'm stuck sitting by the
beach fixing everything he finds and making Lucene/Solr better - well,
life could be a lot worse.

On Tue, Jun 21, 2011 at 2:01 PM,  johnmu...@aol.com wrote:

 My bad, I meant to say a “6-8 releases a year” .. grrr!!

 So let me try this again. I don't like the current plan of release early 
 often because:

 1) It will spread testing thin of any release because fewer real users will
 be using a release when you have too many a year.

 2) release early  often is not a well defined production release. It will
 lead to undefined gaps between releases (why X.Y took N weeks, but X.Z took
 M months?). This is why I suggested a quarterly release plan (it's what FF
 is now doing)

 3) Do companies jump on a Lucene release as soon as one is made?  No, they
 have a process.  With too many releasees, they will now be more confused
 which releases to use; they want a release that proved itself.

 --MJ


 -Original Message-
 From: Mark Miller markrmil...@gmail.com
 To: dev@lucene.apache.org
 Cc: simon.willna...@gmail.com
 Sent: Tue, Jun 21, 2011 1:32 pm
 Subject: Re: Lucene 3.3 release soon?

 I think we might target fewer than 6-8 a month. That would be scary! I would
 guess it will be once a month at worse, and often less. Time will tell.

 You must already give version info with questions if you want decent help -
 nothing is going to change that.

 - Mark


 On Jun 21, 2011, at 1:15 PM, johnmu...@aol.com wrote:

 -1 on release early  often.


 Let us say you average 6-8 releases a month, this means there will be that
 many versions used by users.  Which means the amount of testing done on a
 release (by real users, in real environment) will be spread thin thus a
 release
 will not get the same amount of testing it otherwise would.  Not only that,
 more
 releases means more release specific questions.  Expect to see questions /
 issues reported and you must ask what version are you using? before you
 can
 answer.


 May I suggest a scheduled release, once a quarter, near the end of a
 quarter?


 -JM


 -Original Message-
 From: Simon Willnauer simon.willna...@googlemail.com
 To: dev@lucene.apache.org
 Sent: Tue, Jun 21, 2011 12:53 pm
 Subject: Re: Lucene 3.3 release soon?

 On Tue, Jun 21, 2011 at 6:09 PM, Robert Muir rcm...@gmail.com
  wrote:
  Again, I don't think any future uncommitted features should block a
  release, nor should there be a shoving period where features are
  shoved in.

 +1 - release early  often!!!

 simon
 
  I'll be now looking at producing an RC as quickly as possible before
  this can happen!
 
  On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl 
 j...@hoydahl.no
  wrote:
  Grouping is really worth a release! But if group count in facet is
  within
 reach, wait for that!
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS -
 www.cominvent.com

  Solr Training -
 www.solrtraining.com

 
  On 21. juni 2011, at 05.53, Bill Bell wrote:
 
  +1 wait for grouping post facet counts... Go Martijn v Groningen !!
 
  On 6/20/11 12:03 PM, Michael McCandless 
 luc...@mikemccandless.com
 
  wrote:
 
  +1 to releasing 3.3 in a few weeks... there's a lot of new stuff
  after
  3.2.
 
  Mike McCandless
 
 
 http://blog.mikemccandless.com

 
  On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir 
 rcm...@gmail.com
  wrote:
  i was planning on doing an RC in a few weeks actually.
 
  we have a lot of good stuff in there today already, 

Concerning LUCENE-3079: Facetiing module

2011-06-21 Thread Stefan Trcek
Hallo

I can donate our facette module to the lucene project.

The implementation relies on field cache only, no index scheme, no 
cached filters etc. It is small (about 600 lines of code in 10 
classes). I didn't measure performance, but it handles 1Mio documents 
(30GB) without problems. I suppose it might fit the requirements 
described in LUCENE-3079.

The module supports
- single valued facets
- multi valued facets
- facet filters
- evaluation of facet values that would dismiss due to other facet 
filters.

Let me explain the last point: For the user a facet query
  (color==green) AND (shape==circle OR shape==square)
may look like

Facet color
[ ] (3) red
[x] (5) green
[ ] (7) blue

Facet shape
[x] (9) circle
[ ] (4) line
[x] (2) square

The red/blue/line facet values will display even though the 
corresponding documents are not in the result set. Also there is 
support for filtered facet values with zero results, so users 
understand why they do not get results.

So how to start? Preparing a patch against trunk (currently it is 3.1)?

Stefan Trcek

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8967 - Still Failing

2011-06-21 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8967/

1 tests failed.
FAILED:  org.apache.solr.cloud.ZkControllerTest.testUploadToCloud

Error Message:
Could not connect to ZooKeeper 127.0.0.1:55410/solr within 1000 ms

Stack Trace:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:55410/solr within 1000 ms
at 
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:121)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:69)
at org.apache.solr.cloud.ZkController.init(ZkController.java:104)
at 
org.apache.solr.cloud.ZkControllerTest.testUploadToCloud(ZkControllerTest.java:188)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343)




Build Log (for compile errors):
[...truncated 8538 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3171) BlockJoinQuery/Collector

2011-06-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3171:
---

Attachment: LUCENE-3171.patch

Patch, adding equals and hashCode and clone to BlockJoinQuery.  Also, I now 
throw UOE from get/setBoost, stating that you should do so against the child 
query instead.

 BlockJoinQuery/Collector
 

 Key: LUCENE-3171
 URL: https://issues.apache.org/jira/browse/LUCENE-3171
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/other
Reporter: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3171.patch, LUCENE-3171.patch, LUCENE-3171.patch


 I created a single-pass Query + Collector to implement nested docs.
 The approach is similar to LUCENE-2454, in that the app must index
 documents in join order, as a block (IW.add/updateDocuments), with
 the parent doc at the end of the block, except that this impl is one
 pass.
 Once you join at indexing time, you can take any query that matches
 child docs and join it up to the parent docID space, using
 BlockJoinQuery.  You then use BlockJoinCollector, which sorts parent
 docs by provided Sort, to gather results, grouped by parent; this
 collector finds any BlockJoinQuerys (using Scorer.visitScorers) and
 retains the child docs corresponding to each collected parent doc.
 After searching is done, you retrieve the TopGroups from a provided
 BlockJoinQuery.
 Like LUCENE-2454, this is less general than the arbitrary joins in
 Solr (SOLR-2272) or parent/child from ElasticSearch
 (https://github.com/elasticsearch/elasticsearch/issues/553), since you
 must do the join at indexing time as a doc block, but it should be
 able to handle nested joins as well as joins to multiple tables,
 though I don't yet have test cases for these.
 I put this in a new Join module (modules/join); I think as we
 refactor join impls we should put them here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene 3.3 release soon?

2011-06-21 Thread DM Smith

On 06/21/2011 02:01 PM, johnmu...@aol.com wrote:


My bad, I meant to say a “6-8 releases a year” .. grrr!!
So let me try this again. I don't like the current plan of release 
early  often because:
1) It will spread testing thin of any release because fewer real users 
will be using a release when you have too many a year.
I don't follow. With a release early and often rational, there will be 
less changes in each release. Less to test. The testing of lucene is 
phenomenal and improving with each release.


2) release early  often is not a well defined production release. 
It will lead to undefined gaps between releases (why X.Y took N weeks, 
but X.Z took M months?). This is why I suggested a quarterly release 
plan (it's what FF is now doing)
I think that the pendulum needs to swing and find its natural balance. 
If there is a cost to frequent releases that is unacceptable, it will 
all balance out in the end.
3) Do companies jump on a Lucene release as soon as one is made?  No, 
they have a process.  With too many releasees, they will now be more 
confused which releases to use; they want a release that proved itself.
I can't comment on how all companies do upgrades, but in my experience 
the companies I've been with don't upgrade without a business reason. 
Basically, if the current works then don't upgrade. If the new provides 
a necessary feature for a specific requirement, then determine the 
risk/cost/benefit and decide on whether to upgrade. But at the point of 
upgrade go with the current best. I don't see how there would be 
confusion until 4.0 is released.


In my specific application, upgrades to Lucene happen when my 
application has a feature release and/or a bug release in it's use of 
Lucene. It just doesn't make sense to have an app release that does not 
give specific, visible benefit to end users.



--MJ


-Original Message-
From: Mark Miller markrmil...@gmail.com
To: dev@lucene.apache.org
Cc: simon.willna...@gmail.com
Sent: Tue, Jun 21, 2011 1:32 pm
Subject: Re: Lucene 3.3 release soon?

I think we might target fewer than 6-8 a month. That would be scary! I would
guess it will be once a month at worse, and often less. Time will tell.

You must already give version info with questions if you want decent help -
nothing is going to change that.

- Mark


On Jun 21, 2011, at 1:15 PM,johnmu...@aol.com  mailto:johnmu...@aol.com  
wrote:

  -1 on release early  often.


  Let us say you average 6-8 releases a month, this means there will be that
many versions used by users.  Which means the amount of testing done on a
release (by real users, in real environment) will be spread thin thus a release
will not get the same amount of testing it otherwise would.  Not only that, more
releases means more releasespecific  questions.  Expect to see questions /
issues reported and you must ask what version are you using? before you can
answer.


  May I suggest a scheduled release, once a quarter, near the end of a quarter?


  -JM


  -Original Message-
  From: Simon Willnauersimon.willna...@googlemail.com  
mailto:simon.willna...@googlemail.com
  To:dev@lucene.apache.org  mailto:dev@lucene.apache.org
  Sent: Tue, Jun 21, 2011 12:53 pm
  Subject: Re: Lucene 3.3 release soon?

  On Tue, Jun 21, 2011 at 6:09 PM, Robert Muirrcm...@gmail.com  
mailto:rcm...@gmail.com
wrote:
Again, I don't think any future uncommitted features should block a
release, nor should there be a shoving period where features are
shoved in.

  +1 - release early  often!!!

  simon
  
I'll be now looking at producing an RC as quickly as possible before
this can happen!
  
On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl
  j...@hoydahl.no  mailto:j...@hoydahl.no
wrote:
Grouping is really worth a release! But if group count in facet is within
  reach, wait for that!
  
--
Jan Høydahl, search solution architect
Cominvent AS -
  www.cominvent.com  http://www.cominvent.com/

Solr Training -
  www.solrtraining.com  http://www.solrtraining.com/

  
On 21. juni 2011, at 05.53, Bill Bell wrote:
  
+1 wait for grouping post facet counts... Go Martijn v Groningen !!
  
On 6/20/11 12:03 PM, Michael McCandless
  luc...@mikemccandless.com  mailto:luc...@mikemccandless.com
  
wrote:
  
+1 to releasing 3.3 in a few weeks... there's a lot of new stuff after
3.2.
  
Mike McCandless
  
  
  http://blog.mikemccandless.com  http://blog.mikemccandless.com/

  
On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir
  rcm...@gmail.com  mailto:rcm...@gmail.com
wrote:
i was planning on doing an RC in a few weeks actually.
  
we have a lot of good stuff in there today already, however i wanted
to give a few weeks for the grouping stuff to run on hudson.
  
On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer

  simon.willna...@googlemail.com  mailto:simon.willna...@googlemail.com
wrote:
I would say within the next 3 month.
  
Thoughts?
  
   

Re: Concerning LUCENE-3079: Facetiing module

2011-06-21 Thread Yonik Seeley
On Tue, Jun 21, 2011 at 2:17 PM, Stefan Trcek wzzelfz...@abas.de wrote:
 Hallo

 I can donate our facette module to the lucene project.

Sounds interesting Stefan!

 The implementation relies on field cache only, no index scheme, no
 cached filters etc. It is small (about 600 lines of code in 10
 classes). I didn't measure performance, but it handles 1Mio documents
 (30GB) without problems. I suppose it might fit the requirements
 described in LUCENE-3079.

 The module supports
 - single valued facets
 - multi valued facets
 - facet filters
 - evaluation of facet values that would dismiss due to other facet
 filters.

 Let me explain the last point: For the user a facet query
  (color==green) AND (shape==circle OR shape==square)
 may look like

 Facet color
 [ ] (3) red
 [x] (5) green
 [ ] (7) blue

 Facet shape
 [x] (9) circle
 [ ] (4) line
 [x] (2) square

 The red/blue/line facet values will display even though the
 corresponding documents are not in the result set.

Solr calls this multi-select faceting

 Also there is
 support for filtered facet values with zero results, so users
 understand why they do not get results.

 So how to start? Preparing a patch against trunk (currently it is 3.1)?

Yes, against trunk, which is 4.0-dev

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: managing CHANGES.txt?

2011-06-21 Thread Chris Hostetter

:  But there is no way for someone looking at the CHANGES for 4.0 to know
:  for certain that the bits that make up that bug fix are in the 4.0 release
:  -- the fact that it's listed in 3.2's CHANGES isn't an assurance, because
:  4.0 comes from a completely different line of development.
...
: its in the 4.0 CHANGES.txt, under the 3.2 section.

(sigh ... i tried to let this go, i swear i did...)

You're missing my point entirely.  yes it's in the 3.2 section but all 
that tells the user is that it was fixed on the 3x branch just prior to 
the 3.2 release -- that doesn't give users *any* info about wether that 
bug ever affected (or was fixed) on the completely and radically different 
4x branch.  There were multiple commits -- the bits are not the same.


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers

2011-06-21 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052766#comment-13052766
 ] 

Hoss Man commented on SOLR-2458:


Jan: +1

 post.jar fails on non-XML updateHandlers
 

 Key: SOLR-2458
 URL: https://issues.apache.org/jira/browse/SOLR-2458
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 3.1
Reporter: Jan Høydahl
Assignee: Jan Høydahl
  Labels: post.jar
 Fix For: 3.3

 Attachments: SOLR-2458.patch, SOLR-2458.patch


 SimplePostTool.java by default tries to issue a commit after posting.
 Problem is that it does this by appending commit/ to the stream.
 This does not work when using non-XML requesthandler, such as CSV.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3171) BlockJoinQuery/Collector

2011-06-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052770#comment-13052770
 ] 

Paul Elschot commented on LUCENE-3171:
--

The possible inefficiency is the same as the one for a any sparsely filled 
OpenBitSet.

Another implementation (should be another issue, but since you asked...) could 
be a set of increasing integers, based on a balanced tree structure with a 
moderate fanout (e.g. 32), and all integer values relative to the minimum 
determined by the data for the pointer from the parent. The whole thing could 
be stored in one int[], the pointers would be (forward) indexes into this one 
array, and each internal node would consist of two rows of integers (one data, 
one pointers), and each row would be compressed as a frame of reference into 
the array.

This thing can implement {code}int next(int x){code} and {code}int previous(int 
x){code} easily, and an iterator over this can implement 
{code}advance(target){code} for a DocIdSetIterator, and because of the symmetry 
it can also do that in the reverse direction as needed here.
Compression at higher levels might not be necessary.

For now, there is code for this, except for the frame of reference.
Occasionaly the need for a more space efficient filter shows up on the mailing 
lists, so if anyone want to give this a try...



 BlockJoinQuery/Collector
 

 Key: LUCENE-3171
 URL: https://issues.apache.org/jira/browse/LUCENE-3171
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/other
Reporter: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3171.patch, LUCENE-3171.patch, LUCENE-3171.patch


 I created a single-pass Query + Collector to implement nested docs.
 The approach is similar to LUCENE-2454, in that the app must index
 documents in join order, as a block (IW.add/updateDocuments), with
 the parent doc at the end of the block, except that this impl is one
 pass.
 Once you join at indexing time, you can take any query that matches
 child docs and join it up to the parent docID space, using
 BlockJoinQuery.  You then use BlockJoinCollector, which sorts parent
 docs by provided Sort, to gather results, grouped by parent; this
 collector finds any BlockJoinQuerys (using Scorer.visitScorers) and
 retains the child docs corresponding to each collected parent doc.
 After searching is done, you retrieve the TopGroups from a provided
 BlockJoinQuery.
 Like LUCENE-2454, this is less general than the arbitrary joins in
 Solr (SOLR-2272) or parent/child from ElasticSearch
 (https://github.com/elasticsearch/elasticsearch/issues/553), since you
 must do the join at indexing time as a doc block, but it should be
 able to handle nested joins as well as joins to multiple tables,
 though I don't yet have test cases for these.
 I put this in a new Join module (modules/join); I think as we
 refactor join impls we should put them here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-06-21 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052771#comment-13052771
 ] 

Hoss Man commented on LUCENE-3130:
--

bq. A QP can already solve this issue today, simply by boosting down terms with 
positionIncrement = 0.

That assumes:
a) that every TokenFilter which might inject terms like this will always put 
the most important one first
b) that the amount of boost should be fixed

what i'm suggesting is that we make this more flexible so that people wiring 
together their apps and analyzers have an easy way to guide the queryParsers 
behavior.  if we have allow a well defined attribute for this people can have 
custom analysis that specify arbitrary boosts in cases we may not be able to 
specificly anticipate. (synonyms, entity recognition, common word demoting, 
etc..)

bq. But I really think the implementation details of QP should remain in QP, 
the analysis chain should instead be general and describe up the text.

why don't you consider an attribute that denotes this term is worth less then 
a typical term a general description of the text?

bq. Otherwise, things get really confusing, e.g. what should a ShingleFilter do 
when it combines two tokens that have different BoostAttributes?

It does whatever it already does when it encounters two tokens that may have 
attributes it doesn't know about (ignore them when creating the new token, if i 
remember correctly).  Unrecognized attributes isn't a new problem.

bq. If you do what you describe, what if you then want to tweak the ranking for 
synonyms? You must reindex.

how is that any different from any other aspect of index time synonyms?  if you 
use them you *always* have to reindex when you change your synonyms.

I'm not arguing that index time synonyms is a good idea in general, i'm not 
arguing that this we look for BoostAttributes on tokens feature of the QP 
would be useful (or even a good idea for everyone).  I'm arguing that having 
such a feature would provide an easy way for people who are alreayd customizing 
their analysis to easily modify/influence the behavior of the query parser (w/o 
subclassing) that could still easily work in conjunction with other techniques.

 Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
 give lower boosts
 ---

 Key: LUCENE-3130
 URL: https://issues.apache.org/jira/browse/LUCENE-3130
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Hoss Man

 A recent thread asked if there was anyway to use QueryTime synonyms such that 
 matches on the original term specified by the user would score higher then 
 matches on the synonym.  It occurred to me later that a float Attribute could 
 be set by the SynonymFilter in such situations, and QueryParser could use 
 that float as a boost in the resulting Query.  IThis would be fairly 
 straightforward for the simple synonyms = BooleamQuery case, but we'd have 
 to decide how to handle the case of synonyms with multiple terms that produce 
 MTPQ, possibly just punt for now)
 Likewise, there may be other TokenFilters that inject artificial tokens at 
 query time where it also might make sense to have a reduced boost factor...
 * SynonymFilter
 * CommonGramsFilter
 * WordDelimiterFilter
 * etc...
 In all of these cases, the amount of the boost could me configured, and for 
 back compact could default to 1.0 (or null to not set a boost at all)
 Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
 the boost attribute into the payload attribute, these same filters could give 
 penalizing payloads to terms when used at index time) could give 
 penalizing payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3171) BlockJoinQuery/Collector

2011-06-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052770#comment-13052770
 ] 

Paul Elschot edited comment on LUCENE-3171 at 6/21/11 7:20 PM:
---

The possible inefficiency is the same as the one for a any sparsely filled 
OpenBitSet.

Another implementation (should be another issue, but since you asked...) could 
be a set of increasing integers, based on a balanced tree structure with a 
moderate fanout (e.g. 32), and all integer values relative to the minimum 
determined by the data for the pointer from the parent. The whole thing could 
be stored in one int[], the pointers would be (forward) indexes into this one 
array, and each internal node would consist of two rows of integers (one data, 
one pointers), and each row would be compressed as a frame of reference into 
the array.

This thing can implement {code}int next(int x){code} and {code}int previous(int 
x){code} easily, and an iterator over this can implement 
{code}advance(target){code} for a DocIdSetIterator, and because of the symmetry 
it can also do that in the reverse direction as needed here.
Compression at higher levels might not be necessary.

For now, there is no code for this, except for the frame of reference.
Occasionaly the need for a more space efficient filter shows up on the mailing 
lists, so if anyone wants to give this a try...



  was (Author: paul.elsc...@xs4all.nl):
The possible inefficiency is the same as the one for a any sparsely filled 
OpenBitSet.

Another implementation (should be another issue, but since you asked...) could 
be a set of increasing integers, based on a balanced tree structure with a 
moderate fanout (e.g. 32), and all integer values relative to the minimum 
determined by the data for the pointer from the parent. The whole thing could 
be stored in one int[], the pointers would be (forward) indexes into this one 
array, and each internal node would consist of two rows of integers (one data, 
one pointers), and each row would be compressed as a frame of reference into 
the array.

This thing can implement {code}int next(int x){code} and {code}int previous(int 
x){code} easily, and an iterator over this can implement 
{code}advance(target){code} for a DocIdSetIterator, and because of the symmetry 
it can also do that in the reverse direction as needed here.
Compression at higher levels might not be necessary.

For now, there is code for this, except for the frame of reference.
Occasionaly the need for a more space efficient filter shows up on the mailing 
lists, so if anyone want to give this a try...


  
 BlockJoinQuery/Collector
 

 Key: LUCENE-3171
 URL: https://issues.apache.org/jira/browse/LUCENE-3171
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/other
Reporter: Michael McCandless
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3171.patch, LUCENE-3171.patch, LUCENE-3171.patch


 I created a single-pass Query + Collector to implement nested docs.
 The approach is similar to LUCENE-2454, in that the app must index
 documents in join order, as a block (IW.add/updateDocuments), with
 the parent doc at the end of the block, except that this impl is one
 pass.
 Once you join at indexing time, you can take any query that matches
 child docs and join it up to the parent docID space, using
 BlockJoinQuery.  You then use BlockJoinCollector, which sorts parent
 docs by provided Sort, to gather results, grouped by parent; this
 collector finds any BlockJoinQuerys (using Scorer.visitScorers) and
 retains the child docs corresponding to each collected parent doc.
 After searching is done, you retrieve the TopGroups from a provided
 BlockJoinQuery.
 Like LUCENE-2454, this is less general than the arbitrary joins in
 Solr (SOLR-2272) or parent/child from ElasticSearch
 (https://github.com/elasticsearch/elasticsearch/issues/553), since you
 must do the join at indexing time as a doc block, but it should be
 able to handle nested joins as well as joins to multiple tables,
 though I don't yet have test cases for these.
 I put this in a new Join module (modules/join); I think as we
 refactor join impls we should put them here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2612) Add testpackage and testpackageroot conditions to clustering and analysis-extras build files

2011-06-21 Thread Shalin Shekhar Mangar (JIRA)
Add testpackage and testpackageroot conditions to clustering and 
analysis-extras build files


 Key: SOLR-2612
 URL: https://issues.apache.org/jira/browse/SOLR-2612
 Project: Solr
  Issue Type: Task
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 3.3, 4.0


Clustering and analysis-extras are the only two build files which do not have 
testpackage and testpackageroot exclusions wired into the build file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2612) Add testpackage and testpackageroot conditions to clustering and analysis-extras build files

2011-06-21 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-2612:


Attachment: SOLR-2612.patch

Patch to add testpackage and testpackageroot to clustering and analysis-extras 
build files.

 Add testpackage and testpackageroot conditions to clustering and 
 analysis-extras build files
 

 Key: SOLR-2612
 URL: https://issues.apache.org/jira/browse/SOLR-2612
 Project: Solr
  Issue Type: Task
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 3.3, 4.0

 Attachments: SOLR-2612.patch


 Clustering and analysis-extras are the only two build files which do not have 
 testpackage and testpackageroot exclusions wired into the build file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2612) Add testpackage and testpackageroot conditions to clustering and analysis-extras build files

2011-06-21 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-2612:


Component/s: Build

 Add testpackage and testpackageroot conditions to clustering and 
 analysis-extras build files
 

 Key: SOLR-2612
 URL: https://issues.apache.org/jira/browse/SOLR-2612
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 3.3, 4.0

 Attachments: SOLR-2612.patch


 Clustering and analysis-extras are the only two build files which do not have 
 testpackage and testpackageroot exclusions wired into the build file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >