[JENKINS] Lucene-trunk - Build # 1602 - Failure

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-trunk/1602/

No tests ran.

Build Log (for compile errors):
[...truncated 8987 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2341) explore morfologik integration

2011-06-22 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053079#comment-13053079
 ] 

Dawid Weiss commented on LUCENE-2341:
-

bq. Dawid, do you think it's reasonable to optimize further and use directly a 
list returned by IStemmer.lookup (instead of copying with addAll) ? My concern 
is that (at least in current DictionaryLookup implementation) that list seems 
to be shared by distinct invocations of the lookup method, which would make the 
use of a specific IStemmer not applicable in thread-safe code.

IStemmer implementations are not thread safe anyway, so there is no problem in 
reusing that list. In fact, the returned WordData objects are reused internally 
as well, so you can't store them either (this is done to avoid GC overhead). 

So yes: I missed that, but you'll need to ensure IStemmer instances are not 
shared. This can be done in various ways (thread local, etc), but I think the 
simplest way to do it would be to instantiate PolishStemmer at the 
MorfologikFilter level. This is cheap (the dictionary is loaded once anyway). 

You can then create two constructors in the analyzer -- one with 
PolishStemmer.DICTIONARY and one with the default (I'd suggest MORFOLOGIK). 
Exposing IStemmer constructor will do more harm than good -- thinking ahead is 
good, but in this case I don't think there'll be this many people interested in 
subclassing IStemmer (if anything, they'll plug into Lucene's infrastructure 
directly).

A simple test case spawning 5 or 10 threads in a parallel executor and 
crunching stems on the same analyzer would also be nice to ensure we have 
everything correct wrt multithreading, but it's not that crucial if you don't 
have the time to write it.

Thanks!

 explore morfologik integration
 --

 Key: LUCENE-2341
 URL: https://issues.apache.org/jira/browse/LUCENE-2341
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Dawid Weiss
 Attachments: LUCENE-2341.diff, LUCENE-2341.diff, 
 morfologik-stemming-1.5.0.jar


 Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer 
 available:
 http://sourceforge.net/projects/morfologik/
 This works differently than LUCENE-2298, and ideally would be another option 
 for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2399) Solr Admin Interface, reworked

2011-06-22 Thread Stefan Matheis (steffkes) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-2399:


Attachment: SOLR-2399-110622.patch

Okay, there we go :

{quote}On the 'java-properties' page, is the UI assuming ':' is the path 
seperator?
Can this use the value of path.seperator to split?{quote}
Yes  Yes - Done 
[[commit|https://github.com/steffkes/solr-admin/commit/abb57cacb4a8aa11e406da32ecfa0e2b3caf07be]]

bq. Should the Ping query append a random number so that it avoids HTTP cache? 
Good Idea! - Done 
[[commit|https://github.com/steffkes/solr-admin/commit/61f24c2b08e5b8ca847d197374abf1b3fbd0595a]]


bq. Something for the wishlist... on the threads page, it would be great to 
have a button to expand (and collapse?) all the stack traces. Its hard to 
figure out which thread is doing what just from the title.
I've added an Button at the Top and the Bottom of the Table to show/hide all of 
them w/ one click 
[[commit|https://github.com/steffkes/solr-admin/commit/26378c34ecebe34ce6e80292d8fb02acacb69ead]]



Attached Patch contains all git-changes since our last SVN-Commit. Could you 
also include those images Ryan? They will not go into the SVN-Diff because of 
their binary type :/ 
* https://github.com/steffkes/solr-admin/raw/master/img/ico/toolbox.png
* https://github.com/steffkes/solr-admin/raw/master/img/ico/zone.png
* 
https://github.com/steffkes/solr-admin/raw/master/img/ico/system-monitor--exclamation.png

Thanks! :)

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-06-22 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053080#comment-13053080
 ] 

Bill Bell commented on SOLR-2242:
-

Simon,

I made all those changes except for the termsList one. I think it is useful to 
have the count based on terms.

See attachment.

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.shard.patch, 
 SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field

2011-06-22 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Attachment: SOLR-2242.shard.patch

New patch ready for commit?

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, 
 SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2612) Add testpackage and testpackageroot conditions to clustering and analysis-extras build files

2011-06-22 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-2612.
-

Resolution: Fixed

Committed revision 1138319 on trunk and revision 1138320 on branch_3x.

 Add testpackage and testpackageroot conditions to clustering and 
 analysis-extras build files
 

 Key: SOLR-2612
 URL: https://issues.apache.org/jira/browse/SOLR-2612
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 3.3, 4.0

 Attachments: SOLR-2612.patch


 Clustering and analysis-extras are the only two build files which do not have 
 testpackage and testpackageroot exclusions wired into the build file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2382) DIH Cache Improvements

2011-06-22 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053090#comment-13053090
 ] 

Noble Paul commented on SOLR-2382:
--

The patch does not apply on trunk

 DIH Cache Improvements
 --

 Key: SOLR-2382
 URL: https://issues.apache.org/jira/browse/SOLR-2382
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
 Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
 SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch


 Functionality:
  1. Provide a pluggable caching framework for DIH so that users can choose a 
 cache implementation that best suits their data and application.
  
  2. Provide a means to temporarily cache a child Entity's data without 
 needing to create a special cached implementation of the Entity Processor 
 (such as CachedSqlEntityProcessor).
  
  3. Provide a means to write the final (root entity) DIH output to a cache 
 rather than to Solr.  Then provide a way for a subsequent DIH call to use the 
 cache as an Entity input.  Also provide the ability to do delta updates on 
 such persistent caches.
  
  4. Provide the ability to partition data across multiple caches that can 
 then be fed back into DIH and indexed either to varying Solr Shards, or to 
 the same Core in parallel.
 Use Cases:
  1. We needed a flexible  scalable way to temporarily cache child-entity 
 data prior to joining to parent entities.
   - Using SqlEntityProcessor with Child Entities can cause an n+1 select 
 problem.
   - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching 
 mechanism and does not scale.
   - There is no way to cache non-SQL inputs (ex: flat files, xml, etc).
  
  2. We needed the ability to gather data from long-running entities by a 
 process that runs separate from our main indexing process.
   
  3. We wanted the ability to do a delta import of only the entities that 
 changed.
   - Lucene/Solr requires entire documents to be re-indexed, even if only a 
 few fields changed.
   - Our data comes from 50+ complex sql queries and/or flat files.
   - We do not want to incur overhead re-gathering all of this data if only 1 
 entity's data changed.
   - Persistent DIH caches solve this problem.
   
  4. We want the ability to index several documents in parallel (using 1.4.1, 
 which did not have the threads parameter).
  
  5. In the future, we may need to use Shards, creating a need to easily 
 partition our source data into Shards.
 Implementation Details:
  1. De-couple EntityProcessorBase from caching.  
   - Created a new interface, DIHCache  two implementations:  
 - SortedMapBackedCache - An in-memory cache, used as default with 
 CachedSqlEntityProcessor (now deprecated).
 - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested 
 with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.  
 I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar, 
 so to use or evaluate this patch, download bdb-je from 
 http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 
  
  2. Allow Entity Processors to take a cacheImpl parameter to cause the 
 entity data to be cached (see EntityProcessorBase  DIHCacheProperties).
  
  3. Partially De-couple SolrWriter from DocBuilder
   - Created a new interface DIHWriter,  two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

  4. Create a new Entity Processor, DIHCacheProcessor, which reads a 
 persistent Cache as DIH Entity Input.
  
  5. Support a partition parameter with both DIHCacheWriter and 
 DIHCacheProcessor to allow for easy partitioning of source entity data.
  
  6. Change the semantics of entity.destroy()
   - Previously, it was being called on each iteration of 
 DocBuilder.buildDocument().
   - Now it is does one-time cleanup tasks (like closing or deleting a 
 disk-backed cache) once the entity processor is completed.
   - The only out-of-the-box entity processor that previously implemented 
 destroy() was LineEntitiyProcessor, so this is not a very invasive change.
 General Notes:
 We are near completion in converting our search functionality from a legacy 
 search engine to Solr.  However, I found that DIH did not support caching to 
 the level of our prior product's data import utility.  In order to get our 
 data into Solr, I created these caching enhancements.  Because I believe this 
 has broad application, and because we would like this feature to be supported 
 by the Community, I have front-ported this, enhanced, to Trunk.  I have also 
 added unit tests and verified that all existing test cases 

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-22 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053097#comment-13053097
 ] 

Ryan McKinley commented on SOLR-2399:
-

Thanks Stefan.  Added in #1138323

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-trunk - Build # 1602 - Failure

2011-06-22 Thread Chris Male
Second attempt at fixing the javadoc.  Passes for me now.

On Wed, Jun 22, 2011 at 6:29 PM, Apache Jenkins Server 
jenk...@builds.apache.org wrote:

 Build: https://builds.apache.org/job/Lucene-trunk/1602/

 No tests ran.

 Build Log (for compile errors):
 [...truncated 8987 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Software Developer | JTeam BV.| www.jteam.nl


[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-06-22 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053110#comment-13053110
 ] 

Ryan McKinley commented on SOLR-2399:
-

in #1138328, I added a min-width value -- this should keep things from looking 
rediculous when it gets really small

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2011-06-22 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053113#comment-13053113
 ] 

Simon Willnauer commented on SOLR-2242:
---

bq. New patch ready for commit?

bill, I still see lots of whitespace / indentation problems  in that latest 
patch. Anyway I looked at it and I wonder if we could restructure this a little 
like we could first check if termList != null and do all the cases there and if 
termList == null we get the TermCountsLimit that would remove all the redundant 
getTermCountsLimit / getListedTermCounts calls. Like the termList==null case 
seems very easy and straight forward:
{code}
   if (termList != null) {
NamedListInteger counts = getListedTermCounts(facetValue, 
termList);
switch (numFacetTerms) {
case COUNTS:
  final NamedListInteger resCount = new NamedListInteger();
  counts = resCount;
case COUNTS_AND_VALUES:
  counts.add(numFacetTerms, counts.size());
  break;
}
res.add(key, counts);
  } else {
...
{code}

yet, its hard to refactor this without a single test (note, there might be a 
bug). I would be really happy to see a test-case for this that tests all the 
variations.
Regarding the constants, I think the default case should be a constant too. If 
you use NamedList can you make sure you put the right generic to it if 
possible, otherwise my IDE goes wild and adds warnings all over the place. In 
your case NamedListInteger works fine.

simon

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, 
 SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053115#comment-13053115
 ] 

Robert Muir commented on LUCENE-3226:
-

{quote}
Also, in LUCENE-2921 I plan to get rid of all those ridiculous constant names 
and track the index version at the segment level only. It will be easier, IMO, 
to have an easy to understand constant name when it comes to supporting an 
older index (or remove support for). Perhaps it's only me, but when I read 
those format constant names, I only did that when removing support for older 
indexes. Other than that, they are not very interesting ...

What Hoss reported about CheckIndex is the real problem we should handle here. 
SegmentInfo prints in its toString the code version which created it, which is 
better than seeing -9 IMO, and that should be 3.1 or 3.2. If it's a 3.2.0 
newly created index, you shouldn't see 3.1 reported from 
SegmentInfos.toString. Perhaps CheckIndex needs to be fixed to refer to 
Constants.LUCENE_MAIN_VERSION instead?

Robert, shall we reopen the issue to discuss?
{quote}

We can reopen... but the issue will always exist here, LUCENE-2921 can't solve 
this particular case since its the segments file...


 rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
 

 Key: LUCENE-3226
 URL: https://issues.apache.org/jira/browse/LUCENE-3226
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1, 3.2
Reporter: Hoss Man
Assignee: Robert Muir
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3226.patch


 A 3.2 user recently asked if something was wrong because CheckIndex was 
 reporting his (newly built) index version as...
 {noformat}
 Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1]
 {noformat}
 It seems like there are two very confusing pieces of information here...
 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice.  All 
 other FORMAT_* constants in SegmentInfos are descriptive of the actual change 
 made, and not specific to the version when they were introduced.
 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it 
 Lucene 3.1, which is missleading since that format is alwasy used in 3.2 
 (and probably 3.3, etc...).  
 I suggest:
 a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION
 b) change CheckIndex so that the label for the newest format always ends 
 with  and later (ie: Lucene 3.1 and later) so when we release versions 
 w/o a format change we don't have to remember to manual list them in 
 CheckIndex.  when we *do* make format changes and update CheckIndex  and 
 later can be replaced with  to X.Y and the new format can be added

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2606) Solr sort no longer works on field names with some punctuation in them

2011-06-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053121#comment-13053121
 ] 

Jan Høydahl commented on SOLR-2606:
---

Perhaps a test-class producting randomized (legal) field names of could be of 
use for this and other tests?

 Solr sort no longer works on field names with some punctuation in them
 --

 Key: SOLR-2606
 URL: https://issues.apache.org/jira/browse/SOLR-2606
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1, 3.2
 Environment: Linux
Reporter: Mitsu Hadeishi

 We just upgraded from Solr 1.4 to 3.2. For the most part the upgrade went 
 fine, however we discovered that sorting on field names with dashes in them 
 is no longer working properly. For example, the following query used to work:
 http://[our solr server]/select/?q=computersort=static-need-binary+asc
 and now it gives this error:
 HTTP Status 400 - undefined field static
 type Status report
 message undefined field static
 description The request sent by the client was syntactically incorrect 
 (undefined field static).
 It appears the parser for sorting has been changed so that it now tokenizes 
 differently, and assumes field names cannot have dashes in them. However, 
 field names clearly can have dashes in them. The exact same query which 
 worked fine for us in 1.4 is now breaking in 3.2. Changing the sort field to 
 use a field name that doesn't have a dash in it works just fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex

2011-06-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053123#comment-13053123
 ] 

Shai Erera commented on LUCENE-3226:


How does renaming a constant solve the CheckIndex issue? I commented on the 
constant name, and I think it should reflect the code version it applies to, 
not the feature. Because if e.g. in the same version you add two features, 
incrementally, you wouldn't change the format number twice right? And then the 
constant name becomes meaningless again, or too complicated. It happened to me 
a while ago (can't remember the exact feature though, perhaps it was in 
TermInfos).

I mentioned LUCENE-2921 only because I intended to name the constants exactly 
that (X_Y).

I see you've already reverted the changes you made. I think that the changes to 
CheckIndex could remain though, adding the 3.1+ to the string?

 rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
 

 Key: LUCENE-3226
 URL: https://issues.apache.org/jira/browse/LUCENE-3226
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1, 3.2
Reporter: Hoss Man
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3226.patch


 A 3.2 user recently asked if something was wrong because CheckIndex was 
 reporting his (newly built) index version as...
 {noformat}
 Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1]
 {noformat}
 It seems like there are two very confusing pieces of information here...
 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice.  All 
 other FORMAT_* constants in SegmentInfos are descriptive of the actual change 
 made, and not specific to the version when they were introduced.
 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it 
 Lucene 3.1, which is missleading since that format is alwasy used in 3.2 
 (and probably 3.3, etc...).  
 I suggest:
 a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION
 b) change CheckIndex so that the label for the newest format always ends 
 with  and later (ie: Lucene 3.1 and later) so when we release versions 
 w/o a format change we don't have to remember to manual list them in 
 CheckIndex.  when we *do* make format changes and update CheckIndex  and 
 later can be replaced with  to X.Y and the new format can be added

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8982 - Failure

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8982/

All tests passed

Build Log (for compile errors):
[...truncated 16696 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Solr-trunk - Build # 1540 - Failure

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Solr-trunk/1540/

All tests passed

Build Log (for compile errors):
[...truncated 17830 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3218) Make CFS appendable

2011-06-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3218:


Attachment: LUCENE-3218_3x.patch

here is a patch against 3.x. I had to change one test in lucene/backwards and 
remove some tests from there which used the CFW / CFR.

A review would be good here!

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-22 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053142#comment-13053142
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. Could that work for your use case?

Sounds like it, that's great :)
Do you think there any efficiencies to be gained on the document retrieve side 
of things if you know that the documents commonly being retrieved are 
physically nearby i.e. an app will often retrieve a parent's fields and then 
those from child docs which are required to be physically located adjacent to 
the parent's data. Would existing lower-level caching in Directory or the OS 
mean there's already a good chance of finding child data in cached blocks or 
could a change to file structures and/or doc retrieve APIs radically boost 
parent-plus-child retrieve performance?



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-flexscoring-branch - Build # 17 - Failure

2011-06-22 Thread Apache Jenkins Server
Build: 
https://builds.apache.org/job/Lucene-Solr-tests-only-flexscoring-branch/17/

All tests passed

Build Log (for compile errors):
[...truncated 12028 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8979 - Failure

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8979/

All tests passed

Build Log (for compile errors):
[...truncated 11584 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8983 - Still Failing

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8983/

All tests passed

Build Log (for compile errors):
[...truncated 16184 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8980 - Still Failing

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8980/

All tests passed

Build Log (for compile errors):
[...truncated 12238 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3218) Make CFS appendable

2011-06-22 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3218:


Attachment: LUCENE-3218_tests.patch

Hi Simon, currently this attached patch fails... not sure why yet.

But I think we should resolve this tests issue before backporting

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-1431) CommComponent abstracted

2011-06-22 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-1431.
--

Resolution: Fixed

I have committed it to trunk. We may need more iterations to clean it up

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Jason Rutherglen
Assignee: Noble Paul
 Fix For: 4.0

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch


 We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8984 - Still Failing

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8984/

All tests passed

Build Log (for compile errors):
[...truncated 15320 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8981 - Still Failing

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8981/

All tests passed

Build Log (for compile errors):
[...truncated 12732 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3218) Make CFS appendable

2011-06-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3218:


Attachment: LUCENE-3218_test_fix.patch

thank you robert, while this has actually been tested since its in the base 
class though its now cleaner. The test failure came from RAMDirectory simply 
overriding existing files. I added an explicit check for it.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action

2011-06-22 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-2610:


Attachment: SOLR-2610-branch3x.patch

Patch for branch 3x

 Add an option to delete index through CoreAdmin UNLOAD action
 -

 Key: SOLR-2610
 URL: https://issues.apache.org/jira/browse/SOLR-2610
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2610-branch3x.patch, SOLR-2610.patch


 Right now, one can unload a Solr Core but the index files are left behind and 
 consume disk space. We should have an option to delete the index when 
 unloading a core.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053181#comment-13053181
 ] 

Robert Muir commented on LUCENE-3218:
-

Thanks Simon, I feel better now that we get our open-files-for-write tracking 
back.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8985 - Still Failing

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8985/

All tests passed

Build Log (for compile errors):
[...truncated 16738 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action

2011-06-22 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-2610.
-

Resolution: Fixed

Committed revision 1138405 on trunk and 1138407 on branch_3x.

 Add an option to delete index through CoreAdmin UNLOAD action
 -

 Key: SOLR-2610
 URL: https://issues.apache.org/jira/browse/SOLR-2610
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 3.3, 4.0

 Attachments: SOLR-2610-branch3x.patch, SOLR-2610.patch


 Right now, one can unload a Solr Core but the index files are left behind and 
 consume disk space. We should have an option to delete the index when 
 unloading a core.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #159: POMs out of sync

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/159/

No tests ran.

Build Log (for compile errors):
[...truncated 7519 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading

2011-06-22 Thread Robert Muir (JIRA)
build should allow you (especially hudson) to refer to a local javadocs 
installation instead of downloading
---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir


Currently, we fail on all javadocs warnings.

However, you get a warning if it cannot download the package-list from sun.com
So I think we should allow you optionally set a sysprop using linkoffline.
Then we would get much less hudson fake failures

I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading

2011-06-22 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-3228:
---

Assignee: Robert Muir

 build should allow you (especially hudson) to refer to a local javadocs 
 installation instead of downloading
 ---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir

 Currently, we fail on all javadocs warnings.
 However, you get a warning if it cannot download the package-list from sun.com
 So I think we should allow you optionally set a sysprop using linkoffline.
 Then we would get much less hudson fake failures
 I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053194#comment-13053194
 ] 

Robert Muir commented on LUCENE-3228:
-

as a start, i installed the two freebsd ports for java doc on hudson into 
/usr/local/share/doc/jdk1.5 and jdk1.6

I'll see if i can add the hooks to the build scripts now


 build should allow you (especially hudson) to refer to a local javadocs 
 installation instead of downloading
 ---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir

 Currently, we fail on all javadocs warnings.
 However, you get a warning if it cannot download the package-list from sun.com
 So I think we should allow you optionally set a sysprop using linkoffline.
 Then we would get much less hudson fake failures
 I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8986 - Still Failing

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8986/

All tests passed

Build Log (for compile errors):
[...truncated 16135 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053198#comment-13053198
 ] 

Robert Muir commented on LUCENE-3228:
-

As a partial solution, I setup the 30 minute builds to just directly override 
javadoc.link (and javadoc.link.java for Solr) for our 30 minute builds... we 
don't care about the actual javadoc artifacts or where the links actually point 
to, only that there are no warnings.

This is in r1138418

 build should allow you (especially hudson) to refer to a local javadocs 
 installation instead of downloading
 ---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir

 Currently, we fail on all javadocs warnings.
 However, you get a warning if it cannot download the package-list from sun.com
 So I think we should allow you optionally set a sysprop using linkoffline.
 Then we would get much less hudson fake failures
 I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053200#comment-13053200
 ] 

Robert Muir commented on LUCENE-3228:
-

I noticed also that solr uses an online link for junit javadocs... we should 
download this one and do the same, too.
I'll look at this if the link for the sun javadocs takes for the 30 minute 
builds.

 build should allow you (especially hudson) to refer to a local javadocs 
 installation instead of downloading
 ---

 Key: LUCENE-3228
 URL: https://issues.apache.org/jira/browse/LUCENE-3228
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir

 Currently, we fail on all javadocs warnings.
 However, you get a warning if it cannot download the package-list from sun.com
 So I think we should allow you optionally set a sysprop using linkoffline.
 Then we would get much less hudson fake failures
 I feel like Mike opened an issue for this already but I cannot find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8983 - Failure

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8983/

All tests passed

Build Log (for compile errors):
[...truncated 13538 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3229) Overlaped SpanNearQuery

2011-06-22 Thread ludovic Boutros (JIRA)
Overlaped SpanNearQuery
---

 Key: LUCENE-3229
 URL: https://issues.apache.org/jira/browse/LUCENE-3229
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1
 Environment: Windows XP, Java 1.6
Reporter: ludovic Boutros
Priority: Minor


While using Span queries I think I've found a little bug.

With a document like this (from the TestNearSpansOrdered unit test) :

w1 w2 w3 w4 w5

If I try to search for this span query :

spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)

the above document is returned and I think it should not because 'w4' is not 
after 'w5'.
The 2 spans are not ordered, because there is an overlap.

I will add a test patch in the TestNearSpansOrdered unit test.
I will add a patch to solve this issue too.
Basicaly it modifies the two docSpansOrdered functions to make sure that the 
spans does not overlap.




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3229) Overlaped SpanNearQuery

2011-06-22 Thread ludovic Boutros (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ludovic Boutros updated LUCENE-3229:


Attachment: SpanOverlapTestUnit.diff

Add the Test unit.

 Overlaped SpanNearQuery
 ---

 Key: LUCENE-3229
 URL: https://issues.apache.org/jira/browse/LUCENE-3229
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1
 Environment: Windows XP, Java 1.6
Reporter: ludovic Boutros
Priority: Minor
 Attachments: SpanOverlapTestUnit.diff


 While using Span queries I think I've found a little bug.
 With a document like this (from the TestNearSpansOrdered unit test) :
 w1 w2 w3 w4 w5
 If I try to search for this span query :
 spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)
 the above document is returned and I think it should not because 'w4' is not 
 after 'w5'.
 The 2 spans are not ordered, because there is an overlap.
 I will add a test patch in the TestNearSpansOrdered unit test.
 I will add a patch to solve this issue too.
 Basicaly it modifies the two docSpansOrdered functions to make sure that the 
 spans does not overlap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3229) Overlaped SpanNearQuery

2011-06-22 Thread ludovic Boutros (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ludovic Boutros updated LUCENE-3229:


Attachment: SpanOverlap.diff

add a Patch.

 Overlaped SpanNearQuery
 ---

 Key: LUCENE-3229
 URL: https://issues.apache.org/jira/browse/LUCENE-3229
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1
 Environment: Windows XP, Java 1.6
Reporter: ludovic Boutros
Priority: Minor
 Attachments: SpanOverlap.diff, SpanOverlapTestUnit.diff


 While using Span queries I think I've found a little bug.
 With a document like this (from the TestNearSpansOrdered unit test) :
 w1 w2 w3 w4 w5
 If I try to search for this span query :
 spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)
 the above document is returned and I think it should not because 'w4' is not 
 after 'w5'.
 The 2 spans are not ordered, because there is an overlap.
 I will add a test patch in the TestNearSpansOrdered unit test.
 I will add a patch to solve this issue too.
 Basicaly it modifies the two docSpansOrdered functions to make sure that the 
 spans does not overlap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1298) FunctionQuery results as pseudo-fields

2011-06-22 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053212#comment-13053212
 ] 

Koji Sekiguchi commented on SOLR-1298:
--

Hi, I'm using solr example data on trunk.

If I post q=ipodfl=score,price , Solr returns score and price as expected.
But if I post q=ipodfl=score,log(price) , Solr returns score, the value of 
log(price) and rest of all fields.

 FunctionQuery results as pseudo-fields
 --

 Key: SOLR-1298
 URL: https://issues.apache.org/jira/browse/SOLR-1298
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-1298-FieldValues.patch, SOLR-1298.patch


 It would be helpful if the results of FunctionQueries could be added as 
 fields to a document. 
 Couple of options here:
 1. Run FunctionQuery as part of relevance score and add that piece to the 
 document
 2. Run the function (not really a query) during Document/Field retrieval

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8984 - Still Failing

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8984/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringAbortWithThreadsOnlyOnce

Error Message:
MockDirectoryWrapper: cannot close: there are still open files: {}

Stack Trace:
java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still 
open files: {}
at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:473)
at 
org.apache.lucene.index.TestIndexWriterWithThreads._testMultipleThreadsFailure(TestIndexWriterWithThreads.java:279)
at 
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringAbortWithThreadsOnlyOnce(TestIndexWriterWithThreads.java:366)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343)




Build Log (for compile errors):
[...truncated 3264 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2011-06-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1979:
--

Attachment: SOLR-1979.patch

New version. Example of accepted params:

{code}
 processor 
class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
   defaults
 str name=langidtrue/str
 str name=langid.fltitle,subject,text,keywords/str
 str name=langid.langFieldlanguage_s/str
 str name=langid.langsFieldlanguages/str
 str name=langid.overwritefalse/str
 float name=langid.threshold0.5/float
 str name=langid.whitelistno,en,es,dk/str
 str name=langid.maptrue/str
 str name=langid.map.fltitle,text/str
 bool name=langid.map.overwritefalse/bool
 bool name=langid.map.keepOrigfalse/bool
 bool name=langid.map.individualfalse/bool
 str name=langid.map.individual.fl/str
 str name=langid.fallbackFieldsmeta_content_language,lang/str
 str name=langid.fallbacken/str
   /defaults
 /processor
{code}

The only mandatory parameter is langid.fl
To enable field name mapping, set langid.map=true. It will then map field names 
for all fields in langid.fl. If the set of fields to map is different from 
langid.fl, supply langid.map.fl. Those fields will then be renamed with a 
language suffix equal to the language detected from the langid.fl fields.

If you require detecting languages separately for each field, supply 
langid.map.individual=true. The supplied fields will then be renamed according 
to detected language on an individual basis. If the set of fields to detect 
individually is different from the already supplied langid.fl or langid.map.fl, 
supply langid.map.individual.fl. The fields listed in langid.map.individual.fl 
will then be detected individually, while the rest of the mapping fields will 
be mapped according to global document language.

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch


 We need the ability to detect language of some random text in order to act 
 upon it, such as indexing the content into language aware fields. Another 
 usecase is to be able to filter/facet on language on random unstructured 
 content.
 To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
 processor is configurable like this:
 {code:xml} 
   processor 
 class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
 str name=inputFieldsname,subject/str
 str name=outputFieldlanguage_s/str
 str name=idFieldid/str
 str name=fallbacken/str
   /processor
 {code} 
 It will then read the text from inputFields name and subject, perform 
 language identification and output the ISO code for the detected language in 
 the outputField. If no language was detected, fallback language is used.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2011-06-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-1979:
--

Description: 
Language identification from document fields, and mapping of field names to 
language-specific fields based on detected language.

Wrap the Tika LanguageIdentifier in an UpdateProcessor.

  was:
We need the ability to detect language of some random text in order to act upon 
it, such as indexing the content into language aware fields. Another usecase is 
to be able to filter/facet on language on random unstructured content.

To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
processor is configurable like this:

{code:xml} 
  processor 
class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
str name=inputFieldsname,subject/str
str name=outputFieldlanguage_s/str
str name=idFieldid/str
str name=fallbacken/str
  /processor
{code} 

It will then read the text from inputFields name and subject, perform language 
identification and output the ISO code for the detected language in the 
outputField. If no language was detected, fallback language is used.


 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch


 Language identification from document fields, and mapping of field names to 
 language-specific fields based on detected language.
 Wrap the Tika LanguageIdentifier in an UpdateProcessor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2011-06-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053227#comment-13053227
 ] 

Jan Høydahl commented on SOLR-1979:
---

One question regarding the JUnit test: I now use
{code}
assertU(commit());
{code}
How can I add update request params to this commit? To select another update 
chain from different tests, I'd like to add update params on the fly, e.g.:
{code}
assertU(commit(), update.chain=langid2);
{code}

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch


 Language identification from document fields, and mapping of field names to 
 language-specific fields based on detected language.
 Wrap the Tika LanguageIdentifier in an UpdateProcessor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex

2011-06-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053237#comment-13053237
 ] 

Shai Erera commented on LUCENE-3226:


how about printing the oldest and newest segment version?

 rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
 

 Key: LUCENE-3226
 URL: https://issues.apache.org/jira/browse/LUCENE-3226
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1, 3.2
Reporter: Hoss Man
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3226.patch


 A 3.2 user recently asked if something was wrong because CheckIndex was 
 reporting his (newly built) index version as...
 {noformat}
 Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1]
 {noformat}
 It seems like there are two very confusing pieces of information here...
 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice.  All 
 other FORMAT_* constants in SegmentInfos are descriptive of the actual change 
 made, and not specific to the version when they were introduced.
 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it 
 Lucene 3.1, which is missleading since that format is alwasy used in 3.2 
 (and probably 3.3, etc...).  
 I suggest:
 a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION
 b) change CheckIndex so that the label for the newest format always ends 
 with  and later (ie: Lucene 3.1 and later) so when we release versions 
 w/o a format change we don't have to remember to manual list them in 
 CheckIndex.  when we *do* make format changes and update CheckIndex  and 
 later can be replaced with  to X.Y and the new format can be added

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 8984 - Still Failing

2011-06-22 Thread Simon Willnauer
I just committed a fix for this

simon

On Wed, Jun 22, 2011 at 2:51 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8984/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringAbortWithThreadsOnlyOnce

 Error Message:
 MockDirectoryWrapper: cannot close: there are still open files: {}

 Stack Trace:
 java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are 
 still open files: {}
        at 
 org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:473)
        at 
 org.apache.lucene.index.TestIndexWriterWithThreads._testMultipleThreadsFailure(TestIndexWriterWithThreads.java:279)
        at 
 org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringAbortWithThreadsOnlyOnce(TestIndexWriterWithThreads.java:366)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343)




 Build Log (for compile errors):
 [...truncated 3264 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053241#comment-13053241
 ] 

Robert Muir commented on LUCENE-3226:
-

This would be good (as we can compute it from the segments file), but, we just 
have to think about how to display the case where this is null: we know its = 
3.0 in this case... but we don't know any more than that?

Still we should do it, especially in 4.x when most indexes being checkIndexed 
will have this filled out (except 3.0 indexes)

 rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
 

 Key: LUCENE-3226
 URL: https://issues.apache.org/jira/browse/LUCENE-3226
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1, 3.2
Reporter: Hoss Man
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3226.patch


 A 3.2 user recently asked if something was wrong because CheckIndex was 
 reporting his (newly built) index version as...
 {noformat}
 Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1]
 {noformat}
 It seems like there are two very confusing pieces of information here...
 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice.  All 
 other FORMAT_* constants in SegmentInfos are descriptive of the actual change 
 made, and not specific to the version when they were introduced.
 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it 
 Lucene 3.1, which is missleading since that format is alwasy used in 3.2 
 (and probably 3.3, etc...).  
 I suggest:
 a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION
 b) change CheckIndex so that the label for the newest format always ends 
 with  and later (ie: Lucene 3.1 and later) so when we release versions 
 w/o a format change we don't have to remember to manual list them in 
 CheckIndex.  when we *do* make format changes and update CheckIndex  and 
 later can be replaced with  to X.Y and the new format can be added

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3218) Make CFS appendable

2011-06-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3218.
-

Resolution: Fixed

backported to 3.x - thanks guys

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef

2011-06-22 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053245#comment-13053245
 ] 

Mike Sokolov commented on LUCENE-3080:
--

There could be a good reason though for using byte-offsets in highlighting. I 
have in mind an optimization that would pull in text from an external file or 
other source, enabling highlighting without stored fields.  For best 
performance the snippet should be pulled from the external source using random 
access to storage, but this requires byte offsets.  I think this might be a big 
win for large field values.

This could only be done if the highlighter doesn't need to perform any text 
manipulation itself, so it's not really appropriate for Highlighter, as Robert 
said, but in the case of FVH it might be possible to implement.  I'm looking at 
this, but wondering before I get too deep in if anyone can comment on the 
feasibility of using byte offsets - I'm unclear on what they get used for other 
than highlighting: would it cause problems to have a CharFilter that returns 
corrected offsets such that char positions in the analyzed text are 
translated into byte positions in the source text? 

 cutover highlighter to BytesRef
 ---

 Key: LUCENE-3080
 URL: https://issues.apache.org/jira/browse/LUCENE-3080
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless

 Highlighter still uses char[] terms (consumes tokens from the analyzer as 
 char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
 trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex

2011-06-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053248#comment-13053248
 ] 

Shai Erera commented on LUCENE-3226:


We can print pre-3.1.

But, if somebody opened a 3.0 / 2.x index w/ 3.1+ and all segments were 
'touched' by the 3.1+ code, then their version would be 3.0 or 2.x (i.e., 
not null). So it could be that someone opens two indexes, and CheckIndex 
reports oldVersion=pre-3.1 for one and oldVersion=2.x for the other. I 
think it's acceptable though.

 rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
 

 Key: LUCENE-3226
 URL: https://issues.apache.org/jira/browse/LUCENE-3226
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.1, 3.2
Reporter: Hoss Man
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3226.patch


 A 3.2 user recently asked if something was wrong because CheckIndex was 
 reporting his (newly built) index version as...
 {noformat}
 Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1]
 {noformat}
 It seems like there are two very confusing pieces of information here...
 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice.  All 
 other FORMAT_* constants in SegmentInfos are descriptive of the actual change 
 made, and not specific to the version when they were introduced.
 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it 
 Lucene 3.1, which is missleading since that format is alwasy used in 3.2 
 (and probably 3.3, etc...).  
 I suggest:
 a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION
 b) change CheckIndex so that the label for the newest format always ends 
 with  and later (ie: Lucene 3.1 and later) so when we release versions 
 w/o a format change we don't have to remember to manual list them in 
 CheckIndex.  when we *do* make format changes and update CheckIndex  and 
 later can be replaced with  to X.Y and the new format can be added

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053254#comment-13053254
 ] 

Robert Muir commented on LUCENE-3080:
-

Mike, its an interesting idea, as I think the offsets are intended to be opaque 
to the app (so you should be able to use byte offsets if you want).

There are some problems though, especially tokenfilters that muck with offsets:
NGramTokenFilter, WordDelimiterFilter, ...

In general there are assumptions here that offsets are utf16.

 cutover highlighter to BytesRef
 ---

 Key: LUCENE-3080
 URL: https://issues.apache.org/jira/browse/LUCENE-3080
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless

 Highlighter still uses char[] terms (consumes tokens from the analyzer as 
 char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
 trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef

2011-06-22 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053275#comment-13053275
 ] 

Mike Sokolov commented on LUCENE-3080:
--

It might be a bit more complicated?  Looks like WordDelimiterFilter, in 
generatePart and concatenate, eg, performs computation with the offsets.  So it 
would either need to know the units of the offsets it was passed, or be given 
more than just a correctOffset() method: rather it seems to require something 
like addCharsToOffset (offset, charOffsetIncr), where charOffsetIncr is a 
number of chars, but offset is in some unspecified unit.

 cutover highlighter to BytesRef
 ---

 Key: LUCENE-3080
 URL: https://issues.apache.org/jira/browse/LUCENE-3080
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless

 Highlighter still uses char[] terms (consumes tokens from the analyzer as 
 char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
 trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field

2011-06-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3216:


Attachment: LUCENE-3216_floats.patch

here is a first patch that converts the floats impl to buffer values in ram 
during indexing but writes values directly during merge. all tests pass

I plan to commit this soon too. Rather go small iterations here instead of a 
large patch.

 Store DocValues per segment instead of per field
 

 Key: LUCENE-3216
 URL: https://issues.apache.org/jira/browse/LUCENE-3216
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3216_floats.patch


 currently we are storing docvalues per field which results in at least one 
 file per field that uses docvalues (or at most two per field per segment 
 depending on the impl.). Yet, we should try to by default pack docvalues into 
 a single file if possible. To enable this we need to hold all docvalues in 
 memory during indexing and write them to disk once we flush a segment. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3230) Make FSDirectory.fsync() public and static

2011-06-22 Thread Shai Erera (JIRA)
Make FSDirectory.fsync() public and static
--

 Key: LUCENE-3230
 URL: https://issues.apache.org/jira/browse/LUCENE-3230
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/store
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.3, 4.0


I find FSDirectory.fsync() (today protected and instance method) very useful as 
a utility to sync() files. I'd like create a FSDirectory.sync() utility which 
contains the exact same impl of FSDir.fsync(), and have the latter call it. We 
can have it part of IOUtils too, as it's a completely standalone utility.

I would get rid of FSDir.fsync() if it wasn't protected (as if encouraging 
people to override it). I doubt anyone really overrides it (our core 
Directories don't).

Also, while reviewing the code, I noticed that if IOE occurs, the code sleeps 
for 5 msec. If an InterruptedException occurs then, it immediately throws 
ThreadIE, completely ignoring the fact that it slept due to IOE. Shouldn't we 
at least pass IOE.getMessage() on ThreadIE?

The patch is trivial, so I'd like to get some feedback before I post it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053281#comment-13053281
 ] 

Robert Muir commented on LUCENE-3080:
-

yes: in general I think it would be problematic, especially since most tests 
use only all-ascii data.

Another problem on this issue is that if you want to use bytes, but with the 
Tokenizer-analysis-chain, it only takes Reader, so you cannot assume anything 
about the original bytes or encoding (e.g. that its UTF-8 for example).





 cutover highlighter to BytesRef
 ---

 Key: LUCENE-3080
 URL: https://issues.apache.org/jira/browse/LUCENE-3080
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless

 Highlighter still uses char[] terms (consumes tokens from the analyzer as 
 char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
 trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #156: POMs out of sync

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/156/

No tests ran.

Build Log (for compile errors):
[...truncated 7007 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3229) Overlaped SpanNearQuery

2011-06-22 Thread ludovic Boutros (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053286#comment-13053286
 ] 

ludovic Boutros commented on LUCENE-3229:
-

testSpanNearUnOrdered unit test does not work anymore.

The unordered SpanNear class uses the ordering function of the ordered SpanNear 
class. Perhaps, it should use its own ordering function witch allows the span 
overlaps.
I will check.

 Overlaped SpanNearQuery
 ---

 Key: LUCENE-3229
 URL: https://issues.apache.org/jira/browse/LUCENE-3229
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1
 Environment: Windows XP, Java 1.6
Reporter: ludovic Boutros
Priority: Minor
 Attachments: SpanOverlap.diff, SpanOverlapTestUnit.diff


 While using Span queries I think I've found a little bug.
 With a document like this (from the TestNearSpansOrdered unit test) :
 w1 w2 w3 w4 w5
 If I try to search for this span query :
 spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)
 the above document is returned and I think it should not because 'w4' is not 
 after 'w5'.
 The 2 spans are not ordered, because there is an overlap.
 I will add a test patch in the TestNearSpansOrdered unit test.
 I will add a patch to solve this issue too.
 Basicaly it modifies the two docSpansOrdered functions to make sure that the 
 spans does not overlap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2614) stats with pivot

2011-06-22 Thread pengyao (JIRA)
stats with pivot


 Key: SOLR-2614
 URL: https://issues.apache.org/jira/browse/SOLR-2614
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, SearchComponents - other
Affects Versions: 4.0
Reporter: pengyao
 Fix For: 4.0


 Is it possible to get stats (like Stats Component: min ,max, sum, count,

missing, sumOfSquares, mean and stddev) from numeric fields inside
hierarchical facets (with more than one level, like Pivot)?

 I would like to query:
...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
 and get min, max, sum, count, etc. from numeric_field1 and
numeric_field2 from all combinations of field_x, field_y and field_z
(hierarchical values).


 Using stats.facet I get just one field at one level and using
facet.pivot I get just counts, but no stats.

 Looping in client application to do all combinations of facets values
will be to slow because there is a lot of combinations.


 Thanks a lot!


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2614) stats with pivot

2011-06-22 Thread pengyao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengyao updated SOLR-2614:
--

Component/s: (was: Schema and Analysis)
   Priority: Critical  (was: Major)
Description: 
 Is it possible to get stats (like Stats Component: min ,max, sum, count,

missing, sumOfSquares, mean and stddev) from numeric fields inside
hierarchical facets (with more than one level, like Pivot)?

 I would like to query:
...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
 and get min, max, sum, count, etc. from numeric_field1 and
numeric_field2 from all combinations of field_x, field_y and field_z
(hierarchical values).


 Using stats.facet I get just one field at one level and using
facet.pivot I get just counts, but no stats.

 Looping in client application to do all combinations of facets values
will be to slow because there is a lot of combinations.


 Thanks a lot!


this  is  very  import,because  only counts value,it's no use for sometimes.
please add   stats with pivot  in solr 4.0 

thanks a lot

  was:
 Is it possible to get stats (like Stats Component: min ,max, sum, count,

missing, sumOfSquares, mean and stddev) from numeric fields inside
hierarchical facets (with more than one level, like Pivot)?

 I would like to query:
...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
 and get min, max, sum, count, etc. from numeric_field1 and
numeric_field2 from all combinations of field_x, field_y and field_z
(hierarchical values).


 Using stats.facet I get just one field at one level and using
facet.pivot I get just counts, but no stats.

 Looping in client application to do all combinations of facets values
will be to slow because there is a lot of combinations.


 Thanks a lot!



 stats with pivot
 

 Key: SOLR-2614
 URL: https://issues.apache.org/jira/browse/SOLR-2614
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.0
Reporter: pengyao
Priority: Critical
 Fix For: 4.0


  Is it possible to get stats (like Stats Component: min ,max, sum, count,
 missing, sumOfSquares, mean and stddev) from numeric fields inside
 hierarchical facets (with more than one level, like Pivot)?
  I would like to query:
 ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
  and get min, max, sum, count, etc. from numeric_field1 and
 numeric_field2 from all combinations of field_x, field_y and field_z
 (hierarchical values).
  Using stats.facet I get just one field at one level and using
 facet.pivot I get just counts, but no stats.
  Looping in client application to do all combinations of facets values
 will be to slow because there is a lot of combinations.
  Thanks a lot!
 this  is  very  import,because  only counts value,it's no use for sometimes.
 please add   stats with pivot  in solr 4.0 
 thanks a lot

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef

2011-06-22 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053300#comment-13053300
 ] 

Mike Sokolov commented on LUCENE-3080:
--

Yeah I knew that at some point, but stuffed it away as something to think about 
later :) There really is no way to pass byte streams into the analysis chain.  
Maybe providing a character encoding to the filter could enable it to compute 
the needed byte offsets. 

 cutover highlighter to BytesRef
 ---

 Key: LUCENE-3080
 URL: https://issues.apache.org/jira/browse/LUCENE-3080
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless

 Highlighter still uses char[] terms (consumes tokens from the analyzer as 
 char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
 trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-22 Thread David Mark Nemeskey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mark Nemeskey updated LUCENE-3220:


Attachment: LUCENE-3220.patch

EasySimilarity added. Lots of questions and nocommit in the code.

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3229) Overlaped SpanNearQuery

2011-06-22 Thread ludovic Boutros (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ludovic Boutros updated LUCENE-3229:


Attachment: SpanOverlap2.diff

add a patch for the SpanNearUnOrdered class. Everything should be ok now.

 Overlaped SpanNearQuery
 ---

 Key: LUCENE-3229
 URL: https://issues.apache.org/jira/browse/LUCENE-3229
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1
 Environment: Windows XP, Java 1.6
Reporter: ludovic Boutros
Priority: Minor
 Attachments: SpanOverlap.diff, SpanOverlap2.diff, 
 SpanOverlapTestUnit.diff


 While using Span queries I think I've found a little bug.
 With a document like this (from the TestNearSpansOrdered unit test) :
 w1 w2 w3 w4 w5
 If I try to search for this span query :
 spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)
 the above document is returned and I think it should not because 'w4' is not 
 after 'w5'.
 The 2 spans are not ordered, because there is an overlap.
 I will add a test patch in the TestNearSpansOrdered unit test.
 I will add a patch to solve this issue too.
 Basicaly it modifies the two docSpansOrdered functions to make sure that the 
 spans does not overlap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3229) Overlaped SpanNearQuery

2011-06-22 Thread ludovic Boutros (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053286#comment-13053286
 ] 

ludovic Boutros edited comment on LUCENE-3229 at 6/22/11 3:32 PM:
--

testSpanNearUnOrdered unit test does not work anymore.

The unordered SpanNear class uses the ordering function of the ordered SpanNear 
class. Perhaps, it should use its own ordering function which allows the span 
overlaps.
I will check.

  was (Author: lboutros):
testSpanNearUnOrdered unit test does not work anymore.

The unordered SpanNear class uses the ordering function of the ordered SpanNear 
class. Perhaps, it should use its own ordering function witch allows the span 
overlaps.
I will check.
  
 Overlaped SpanNearQuery
 ---

 Key: LUCENE-3229
 URL: https://issues.apache.org/jira/browse/LUCENE-3229
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1
 Environment: Windows XP, Java 1.6
Reporter: ludovic Boutros
Priority: Minor
 Attachments: SpanOverlap.diff, SpanOverlap2.diff, 
 SpanOverlapTestUnit.diff


 While using Span queries I think I've found a little bug.
 With a document like this (from the TestNearSpansOrdered unit test) :
 w1 w2 w3 w4 w5
 If I try to search for this span query :
 spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)
 the above document is returned and I think it should not because 'w4' is not 
 after 'w5'.
 The 2 spans are not ordered, because there is an overlap.
 I will add a test patch in the TestNearSpansOrdered unit test.
 I will add a patch to solve this issue too.
 Basicaly it modifies the two docSpansOrdered functions to make sure that the 
 spans does not overlap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2614) stats with pivot

2011-06-22 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053310#comment-13053310
 ] 

Ryan McKinley commented on SOLR-2614:
-

not currently.

patches welcome!


 stats with pivot
 

 Key: SOLR-2614
 URL: https://issues.apache.org/jira/browse/SOLR-2614
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.0
Reporter: pengyao
Priority: Critical
 Fix For: 4.0


  Is it possible to get stats (like Stats Component: min ,max, sum, count,
 missing, sumOfSquares, mean and stddev) from numeric fields inside
 hierarchical facets (with more than one level, like Pivot)?
  I would like to query:
 ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
  and get min, max, sum, count, etc. from numeric_field1 and
 numeric_field2 from all combinations of field_x, field_y and field_z
 (hierarchical values).
  Using stats.facet I get just one field at one level and using
 facet.pivot I get just counts, but no stats.
  Looping in client application to do all combinations of facets values
 will be to slow because there is a lot of combinations.
  Thanks a lot!
 this  is  very  import,because  only counts value,it's no use for sometimes.
 please add   stats with pivot  in solr 4.0 
 thanks a lot

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2382) DIH Cache Improvements

2011-06-22 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053313#comment-13053313
 ] 

James Dyer commented on SOLR-2382:
--

Noble,

I just updated to the latest and re-applied this patch and it worked for me.  
If you can give me specifics I'll try to dig more to see what might be going 
wrong.  Also, in case you're not on the very latest, there were some very 
recent commits from about a week ago that broke the previous versions of this 
patch (r1135954  r1136789). This newest patch will only work on code from 
after those commits.

 DIH Cache Improvements
 --

 Key: SOLR-2382
 URL: https://issues.apache.org/jira/browse/SOLR-2382
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
 Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
 SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch


 Functionality:
  1. Provide a pluggable caching framework for DIH so that users can choose a 
 cache implementation that best suits their data and application.
  
  2. Provide a means to temporarily cache a child Entity's data without 
 needing to create a special cached implementation of the Entity Processor 
 (such as CachedSqlEntityProcessor).
  
  3. Provide a means to write the final (root entity) DIH output to a cache 
 rather than to Solr.  Then provide a way for a subsequent DIH call to use the 
 cache as an Entity input.  Also provide the ability to do delta updates on 
 such persistent caches.
  
  4. Provide the ability to partition data across multiple caches that can 
 then be fed back into DIH and indexed either to varying Solr Shards, or to 
 the same Core in parallel.
 Use Cases:
  1. We needed a flexible  scalable way to temporarily cache child-entity 
 data prior to joining to parent entities.
   - Using SqlEntityProcessor with Child Entities can cause an n+1 select 
 problem.
   - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching 
 mechanism and does not scale.
   - There is no way to cache non-SQL inputs (ex: flat files, xml, etc).
  
  2. We needed the ability to gather data from long-running entities by a 
 process that runs separate from our main indexing process.
   
  3. We wanted the ability to do a delta import of only the entities that 
 changed.
   - Lucene/Solr requires entire documents to be re-indexed, even if only a 
 few fields changed.
   - Our data comes from 50+ complex sql queries and/or flat files.
   - We do not want to incur overhead re-gathering all of this data if only 1 
 entity's data changed.
   - Persistent DIH caches solve this problem.
   
  4. We want the ability to index several documents in parallel (using 1.4.1, 
 which did not have the threads parameter).
  
  5. In the future, we may need to use Shards, creating a need to easily 
 partition our source data into Shards.
 Implementation Details:
  1. De-couple EntityProcessorBase from caching.  
   - Created a new interface, DIHCache  two implementations:  
 - SortedMapBackedCache - An in-memory cache, used as default with 
 CachedSqlEntityProcessor (now deprecated).
 - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested 
 with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.  
 I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar, 
 so to use or evaluate this patch, download bdb-je from 
 http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 
  
  2. Allow Entity Processors to take a cacheImpl parameter to cause the 
 entity data to be cached (see EntityProcessorBase  DIHCacheProperties).
  
  3. Partially De-couple SolrWriter from DocBuilder
   - Created a new interface DIHWriter,  two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

  4. Create a new Entity Processor, DIHCacheProcessor, which reads a 
 persistent Cache as DIH Entity Input.
  
  5. Support a partition parameter with both DIHCacheWriter and 
 DIHCacheProcessor to allow for easy partitioning of source entity data.
  
  6. Change the semantics of entity.destroy()
   - Previously, it was being called on each iteration of 
 DocBuilder.buildDocument().
   - Now it is does one-time cleanup tasks (like closing or deleting a 
 disk-backed cache) once the entity processor is completed.
   - The only out-of-the-box entity processor that previously implemented 
 destroy() was LineEntitiyProcessor, so this is not a very invasive change.
 General Notes:
 We are near completion in converting our search functionality from a legacy 
 search engine to Solr.  However, I found that DIH did not support caching 

[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053319#comment-13053319
 ] 

Robert Muir commented on LUCENE-3080:
-

Well, personally i am hesitant to introduce any encodings or bytes into our 
current analysis chain, because its unnecessary complexity that will introduce 
bugs (at the moment, its the users responsibility to create the appropriate 
Reader etc).

Furthermore, not all character sets can be 'corrected' with a linear conversion 
like this: for example some actually order the text in a different direction, 
and things like that... there are many quirks to non-unicode character sets.

Maybe as a start, it would be useful to prototype some simple experiments with 
a binary analysis chain and hackup a highlighter to work with them? This way 
we would have an understanding of what the potential performance gain is.

Here's some example code for a dead simple binary analysis chain that only uses 
bytes the whole way through, you could take these ideas and prototype one with 
just all ascii-terms and split on the space byte and such:
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestBinaryTerms.java
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/BinaryTokenStream.java
 


 cutover highlighter to BytesRef
 ---

 Key: LUCENE-3080
 URL: https://issues.apache.org/jira/browse/LUCENE-3080
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless

 Highlighter still uses char[] terms (consumes tokens from the analyzer as 
 char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
 trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2615) Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE level

2011-06-22 Thread David Smiley (JIRA)
Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE level
---

 Key: SOLR-2615
 URL: https://issues.apache.org/jira/browse/SOLR-2615
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: David Smiley
Priority: Minor
 Fix For: 3.3


It would be great if the LogUpdateProcessor logged each command (add, delete, 
...) at debug (Fine) level. Presently it only logs a summary of 8 commands 
and it does so at the very end.

The attached patch implements this.
* I moved the LogUpdateProcessor ahead of RunUpdateProcessor so that the debug 
level log happens before Solr does anything with it. It should not affect the 
ordering of the existing summary log which happens at finish(). 
* I changed UpdateRequestProcessor's static log variable to be an instance 
variable that uses the current class name. I think this makes much more sense 
since I want to be able to alter logging levels for a specific processor 
without doing it for all of them. This change did require me to tweak the 
factory's detection of the log level which avoids creating the 
LogUpdateProcessor.
* There was an NPE bug in AddUpdateCommand.getPrintableId() in the event there 
is no schema unique field. I fixed that.

You may notice I use SLF4J's nifty log.debug(message blah {} blah, var) 
syntax, which is both performant and concise as there's no point in guarding 
the debug message with an isDebugEnabled() since debug() will internally check 
this any way and there is no string concatenation if debug isn't enabled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2615) Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE level

2011-06-22 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2615:
---

Attachment: SOLR-2615_LogUpdateProcessor_debug_logging.patch

 Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE 
 level
 ---

 Key: SOLR-2615
 URL: https://issues.apache.org/jira/browse/SOLR-2615
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: David Smiley
Priority: Minor
 Fix For: 3.3

 Attachments: SOLR-2615_LogUpdateProcessor_debug_logging.patch


 It would be great if the LogUpdateProcessor logged each command (add, delete, 
 ...) at debug (Fine) level. Presently it only logs a summary of 8 commands 
 and it does so at the very end.
 The attached patch implements this.
 * I moved the LogUpdateProcessor ahead of RunUpdateProcessor so that the 
 debug level log happens before Solr does anything with it. It should not 
 affect the ordering of the existing summary log which happens at finish(). 
 * I changed UpdateRequestProcessor's static log variable to be an instance 
 variable that uses the current class name. I think this makes much more sense 
 since I want to be able to alter logging levels for a specific processor 
 without doing it for all of them. This change did require me to tweak the 
 factory's detection of the log level which avoids creating the 
 LogUpdateProcessor.
 * There was an NPE bug in AddUpdateCommand.getPrintableId() in the event 
 there is no schema unique field. I fixed that.
 You may notice I use SLF4J's nifty log.debug(message blah {} blah, var) 
 syntax, which is both performant and concise as there's no point in guarding 
 the debug message with an isDebugEnabled() since debug() will internally 
 check this any way and there is no string concatenation if debug isn't 
 enabled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2616) Include jdk14 logging configuration file

2011-06-22 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2616:
---

Attachment: SOLR-2616_jdk14logging_setup.patch

 Include jdk14 logging configuration file
 

 Key: SOLR-2616
 URL: https://issues.apache.org/jira/browse/SOLR-2616
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Priority: Minor
 Fix For: 3.3

 Attachments: SOLR-2616_jdk14logging_setup.patch


 The /example/ Jetty Solr configuration should include a basic logging 
 configuration file.  Looking at this wiki page: 
 http://wiki.apache.org/solr/LoggingInDefaultJettySetup  I am creating this 
 patch. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053329#comment-13053329
 ] 

Robert Muir commented on LUCENE-3220:
-

Just took a look, a few things that might help:

* yes the maxdoc does not reflect deletions, but neither does things like 
totalTermFreq or docFreq either... so its best to not worry about deletions in 
the scoring and to be consistent and use the stats (e.g. maxDoc, not numDocs) 
that do not take deletions into account.

* for the computeStats(TermContext... termContexts) its wierd to sum the DF 
across the different terms in the case? But i don't honestly have any 
suggestions here... maybe in this case we should make a EasyPhraseStats that 
computes the EasyStats for each term, so its not hiding anything or limiting 
anyone? and you could then do an instanceof check and have a different method 
like scorePhrase() that it forwards to in case its an EasyPhraseStats? In 
general i'm not sure how other ranking systems tend to handle this case, the 
phrase estimation for IDF in lucene's formula is done by summing the IDFs


 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3079) Facetiing module

2011-06-22 Thread Stefan Trcek (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Trcek updated LUCENE-3079:
-

Attachment: LUCENE-3079.patch

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-22 Thread Stefan Trcek (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053361#comment-13053361
 ] 

Stefan Trcek commented on LUCENE-3079:
--

This patch was generated by git and tested to apply with
patch -p0 -i LUCENE-3079.patch --dry-run
Be patient if anything went wrong.

Review starting points may be
- FacetSearcherTest.testSimpleFacetWithIndexSearcher() or
- FacetSearcher.facetCollectSearch()

Functions.java may be  dismissed in favor of Guava.
If you are willing to keep it I'll strip it down to the required parts.

--

The implementation relies on field cache only, no index scheme, no 
cached filters etc. It supports
- single valued facets (Facet.java)
- multi valued facets (Facet.MultiValued.java)
- facet filters (see FacetSearcher.java)
- evaluation of facet values that would dismiss due to other facet 
filters (Yonik says Solr calls this multi-select faceting).
(realized by FacetSearcher.fillFacetsForGuiMode())

Let me explain the last point: For the user a facet query
  (color==green) AND (shape==circle OR shape==square)
may look like

Facet color
[ ] (3) red
[x] (5) green
[ ] (7) blue

Facet shape
[x] (9) circle
[ ] (4) line
[x] (2) square

The red/blue/line facet values will display even though the 
corresponding documents are not in the result set. Also there is 
support for filtered facet values with zero results, so users 
understand why they do not get results.


 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3231) Add fixed size DocValues int variants expose Arrays where possible

2011-06-22 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3231:


Attachment: LUCENE-3231.patch

here is a super rough patch with nocommits (and even missing nocommits) showing 
the idea. this is heavy work in progress though

 Add fixed size DocValues int variants  expose Arrays where possible
 

 Key: LUCENE-3231
 URL: https://issues.apache.org/jira/browse/LUCENE-3231
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3231.patch


 currently we only have variable bit packed ints implementation. for flexible 
 scoring or loading field caches it is desirable to have fixed int 
 implementations for 8, 16, 32 and 64 bit. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2382) DIH Cache Improvements

2011-06-22 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2382:
-

Attachment: SOLR-2382.patch

Just found a little bug in SortedMapBackedCache.  This patch version includes a 
fix for it.

 DIH Cache Improvements
 --

 Key: SOLR-2382
 URL: https://issues.apache.org/jira/browse/SOLR-2382
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
 Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
 SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch


 Functionality:
  1. Provide a pluggable caching framework for DIH so that users can choose a 
 cache implementation that best suits their data and application.
  
  2. Provide a means to temporarily cache a child Entity's data without 
 needing to create a special cached implementation of the Entity Processor 
 (such as CachedSqlEntityProcessor).
  
  3. Provide a means to write the final (root entity) DIH output to a cache 
 rather than to Solr.  Then provide a way for a subsequent DIH call to use the 
 cache as an Entity input.  Also provide the ability to do delta updates on 
 such persistent caches.
  
  4. Provide the ability to partition data across multiple caches that can 
 then be fed back into DIH and indexed either to varying Solr Shards, or to 
 the same Core in parallel.
 Use Cases:
  1. We needed a flexible  scalable way to temporarily cache child-entity 
 data prior to joining to parent entities.
   - Using SqlEntityProcessor with Child Entities can cause an n+1 select 
 problem.
   - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching 
 mechanism and does not scale.
   - There is no way to cache non-SQL inputs (ex: flat files, xml, etc).
  
  2. We needed the ability to gather data from long-running entities by a 
 process that runs separate from our main indexing process.
   
  3. We wanted the ability to do a delta import of only the entities that 
 changed.
   - Lucene/Solr requires entire documents to be re-indexed, even if only a 
 few fields changed.
   - Our data comes from 50+ complex sql queries and/or flat files.
   - We do not want to incur overhead re-gathering all of this data if only 1 
 entity's data changed.
   - Persistent DIH caches solve this problem.
   
  4. We want the ability to index several documents in parallel (using 1.4.1, 
 which did not have the threads parameter).
  
  5. In the future, we may need to use Shards, creating a need to easily 
 partition our source data into Shards.
 Implementation Details:
  1. De-couple EntityProcessorBase from caching.  
   - Created a new interface, DIHCache  two implementations:  
 - SortedMapBackedCache - An in-memory cache, used as default with 
 CachedSqlEntityProcessor (now deprecated).
 - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested 
 with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.  
 I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar, 
 so to use or evaluate this patch, download bdb-je from 
 http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 
  
  2. Allow Entity Processors to take a cacheImpl parameter to cause the 
 entity data to be cached (see EntityProcessorBase  DIHCacheProperties).
  
  3. Partially De-couple SolrWriter from DocBuilder
   - Created a new interface DIHWriter,  two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

  4. Create a new Entity Processor, DIHCacheProcessor, which reads a 
 persistent Cache as DIH Entity Input.
  
  5. Support a partition parameter with both DIHCacheWriter and 
 DIHCacheProcessor to allow for easy partitioning of source entity data.
  
  6. Change the semantics of entity.destroy()
   - Previously, it was being called on each iteration of 
 DocBuilder.buildDocument().
   - Now it is does one-time cleanup tasks (like closing or deleting a 
 disk-backed cache) once the entity processor is completed.
   - The only out-of-the-box entity processor that previously implemented 
 destroy() was LineEntitiyProcessor, so this is not a very invasive change.
 General Notes:
 We are near completion in converting our search functionality from a legacy 
 search engine to Solr.  However, I found that DIH did not support caching to 
 the level of our prior product's data import utility.  In order to get our 
 data into Solr, I created these caching enhancements.  Because I believe this 
 has broad application, and because we would like this feature to be supported 
 by the Community, I have front-ported this, enhanced, to Trunk.  I have also 
 added 

[jira] [Updated] (SOLR-2382) DIH Cache Improvements

2011-06-22 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2382:
-

Attachment: SOLR-2382.patch

Sorry...that last patch included some unrelated code.  This one is correct.

 DIH Cache Improvements
 --

 Key: SOLR-2382
 URL: https://issues.apache.org/jira/browse/SOLR-2382
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
 Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
 SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch


 Functionality:
  1. Provide a pluggable caching framework for DIH so that users can choose a 
 cache implementation that best suits their data and application.
  
  2. Provide a means to temporarily cache a child Entity's data without 
 needing to create a special cached implementation of the Entity Processor 
 (such as CachedSqlEntityProcessor).
  
  3. Provide a means to write the final (root entity) DIH output to a cache 
 rather than to Solr.  Then provide a way for a subsequent DIH call to use the 
 cache as an Entity input.  Also provide the ability to do delta updates on 
 such persistent caches.
  
  4. Provide the ability to partition data across multiple caches that can 
 then be fed back into DIH and indexed either to varying Solr Shards, or to 
 the same Core in parallel.
 Use Cases:
  1. We needed a flexible  scalable way to temporarily cache child-entity 
 data prior to joining to parent entities.
   - Using SqlEntityProcessor with Child Entities can cause an n+1 select 
 problem.
   - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching 
 mechanism and does not scale.
   - There is no way to cache non-SQL inputs (ex: flat files, xml, etc).
  
  2. We needed the ability to gather data from long-running entities by a 
 process that runs separate from our main indexing process.
   
  3. We wanted the ability to do a delta import of only the entities that 
 changed.
   - Lucene/Solr requires entire documents to be re-indexed, even if only a 
 few fields changed.
   - Our data comes from 50+ complex sql queries and/or flat files.
   - We do not want to incur overhead re-gathering all of this data if only 1 
 entity's data changed.
   - Persistent DIH caches solve this problem.
   
  4. We want the ability to index several documents in parallel (using 1.4.1, 
 which did not have the threads parameter).
  
  5. In the future, we may need to use Shards, creating a need to easily 
 partition our source data into Shards.
 Implementation Details:
  1. De-couple EntityProcessorBase from caching.  
   - Created a new interface, DIHCache  two implementations:  
 - SortedMapBackedCache - An in-memory cache, used as default with 
 CachedSqlEntityProcessor (now deprecated).
 - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested 
 with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.  
 I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar, 
 so to use or evaluate this patch, download bdb-je from 
 http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 
  
  2. Allow Entity Processors to take a cacheImpl parameter to cause the 
 entity data to be cached (see EntityProcessorBase  DIHCacheProperties).
  
  3. Partially De-couple SolrWriter from DocBuilder
   - Created a new interface DIHWriter,  two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

  4. Create a new Entity Processor, DIHCacheProcessor, which reads a 
 persistent Cache as DIH Entity Input.
  
  5. Support a partition parameter with both DIHCacheWriter and 
 DIHCacheProcessor to allow for easy partitioning of source entity data.
  
  6. Change the semantics of entity.destroy()
   - Previously, it was being called on each iteration of 
 DocBuilder.buildDocument().
   - Now it is does one-time cleanup tasks (like closing or deleting a 
 disk-backed cache) once the entity processor is completed.
   - The only out-of-the-box entity processor that previously implemented 
 destroy() was LineEntitiyProcessor, so this is not a very invasive change.
 General Notes:
 We are near completion in converting our search functionality from a legacy 
 search engine to Solr.  However, I found that DIH did not support caching to 
 the level of our prior product's data import utility.  In order to get our 
 data into Solr, I created these caching enhancements.  Because I believe this 
 has broad application, and because we would like this feature to be supported 
 by the Community, I have front-ported this, enhanced, to Trunk.  I have also 
 added unit tests and 

[jira] [Updated] (SOLR-2382) DIH Cache Improvements

2011-06-22 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2382:
-

Attachment: (was: SOLR-2382.patch)

 DIH Cache Improvements
 --

 Key: SOLR-2382
 URL: https://issues.apache.org/jira/browse/SOLR-2382
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
 Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
 SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch


 Functionality:
  1. Provide a pluggable caching framework for DIH so that users can choose a 
 cache implementation that best suits their data and application.
  
  2. Provide a means to temporarily cache a child Entity's data without 
 needing to create a special cached implementation of the Entity Processor 
 (such as CachedSqlEntityProcessor).
  
  3. Provide a means to write the final (root entity) DIH output to a cache 
 rather than to Solr.  Then provide a way for a subsequent DIH call to use the 
 cache as an Entity input.  Also provide the ability to do delta updates on 
 such persistent caches.
  
  4. Provide the ability to partition data across multiple caches that can 
 then be fed back into DIH and indexed either to varying Solr Shards, or to 
 the same Core in parallel.
 Use Cases:
  1. We needed a flexible  scalable way to temporarily cache child-entity 
 data prior to joining to parent entities.
   - Using SqlEntityProcessor with Child Entities can cause an n+1 select 
 problem.
   - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching 
 mechanism and does not scale.
   - There is no way to cache non-SQL inputs (ex: flat files, xml, etc).
  
  2. We needed the ability to gather data from long-running entities by a 
 process that runs separate from our main indexing process.
   
  3. We wanted the ability to do a delta import of only the entities that 
 changed.
   - Lucene/Solr requires entire documents to be re-indexed, even if only a 
 few fields changed.
   - Our data comes from 50+ complex sql queries and/or flat files.
   - We do not want to incur overhead re-gathering all of this data if only 1 
 entity's data changed.
   - Persistent DIH caches solve this problem.
   
  4. We want the ability to index several documents in parallel (using 1.4.1, 
 which did not have the threads parameter).
  
  5. In the future, we may need to use Shards, creating a need to easily 
 partition our source data into Shards.
 Implementation Details:
  1. De-couple EntityProcessorBase from caching.  
   - Created a new interface, DIHCache  two implementations:  
 - SortedMapBackedCache - An in-memory cache, used as default with 
 CachedSqlEntityProcessor (now deprecated).
 - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested 
 with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.  
 I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar, 
 so to use or evaluate this patch, download bdb-je from 
 http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 
  
  2. Allow Entity Processors to take a cacheImpl parameter to cause the 
 entity data to be cached (see EntityProcessorBase  DIHCacheProperties).
  
  3. Partially De-couple SolrWriter from DocBuilder
   - Created a new interface DIHWriter,  two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

  4. Create a new Entity Processor, DIHCacheProcessor, which reads a 
 persistent Cache as DIH Entity Input.
  
  5. Support a partition parameter with both DIHCacheWriter and 
 DIHCacheProcessor to allow for easy partitioning of source entity data.
  
  6. Change the semantics of entity.destroy()
   - Previously, it was being called on each iteration of 
 DocBuilder.buildDocument().
   - Now it is does one-time cleanup tasks (like closing or deleting a 
 disk-backed cache) once the entity processor is completed.
   - The only out-of-the-box entity processor that previously implemented 
 destroy() was LineEntitiyProcessor, so this is not a very invasive change.
 General Notes:
 We are near completion in converting our search functionality from a legacy 
 search engine to Solr.  However, I found that DIH did not support caching to 
 the level of our prior product's data import utility.  In order to get our 
 data into Solr, I created these caching enhancements.  Because I believe this 
 has broad application, and because we would like this feature to be supported 
 by the Community, I have front-ported this, enhanced, to Trunk.  I have also 
 added unit tests and verified that all existing test cases pass.  I believe 
 this patch 

[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef

2011-06-22 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053432#comment-13053432
 ] 

Mike Sokolov commented on LUCENE-3080:
--

I agree it's necessary to prove there is some point to all this - I'm working 
on getting some numbers.  At the moment I'm just assuming ASCII encoding, but 
I'll take a look at the binary stuff too - thanks.

 cutover highlighter to BytesRef
 ---

 Key: LUCENE-3080
 URL: https://issues.apache.org/jira/browse/LUCENE-3080
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless

 Highlighter still uses char[] terms (consumes tokens from the analyzer as 
 char[] not as BytesRef), which is causing problems for merging SOLR-2497 to 
 trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-22 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

I am not sure whether the MergeInfo used in SegmentMerger#mergeFields

I have kept most of the nocommits there even after correcting it for reference.

In MockDirectoryWrapper#crash() to randomize IOContext I have used either a 
READONCE or DEFAULT or Merge context. Is this the correct way to go?

In LuceneTeseCase#newDirectory(), MockDirectoryWrapper#createOutput(), 
MockDirectoryWrapper#openInput() will randomizing the context here help? 

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2586) example work logs directories needed?

2011-06-22 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053468#comment-13053468
 ] 

David Smiley commented on SOLR-2586:


So if work is needed (to avoid rare error conditions if a temp directory is 
used), that still leaves the question of logs.  The only thing approaching 
use of this directory is some commented-out configuration in jetty.xml. So as 
it stands, it really isn't used. I think if if someone uncomments that part of 
jetty.xml, then they can very well make the logs directory.  What I'm after 
here is a little bit of simplification for new users. I certainly don't get any 
heartburn over these directories, but if someone new sees logs and never sees 
anything go there, they might think something is wrong. And removing it is one 
less directory.  I say this after updating my Solr book, walking the users 
through the directory layout in the 1st chapter.  No big deal, but 
simplification/clarity is good.


 example work  logs directories needed?
 ---

 Key: SOLR-2586
 URL: https://issues.apache.org/jira/browse/SOLR-2586
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: David Smiley
Priority: Minor

 Firstly, what prompted this issue was me wanting to use a git solr mirror but 
 finding that git's lack of empty-directory support made the example ant 
 task fail. This task requires examples/work to be in place so that it can 
 delete its contents. Fixing this was a simple matter of adding:
 {code:xml}
 mkdir dir=${example}/work /!-- in case not there --
 {code}
 Right before the delete task.
 But then it occurred to me, why even have a work directory since Jetty will 
 apparently use a temp directory instead. -- try for yourself (stdout snippet):
 bq. 2011-06-11 00:51:26.177:INFO::Extract 
 file:/SmileyDev/Search/lucene-solr/solr/example/webapps/solr.war to 
 /var/folders/zo/zoQJvqc9E0076p0THiri+k+++TI/-Tmp-/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp
 On my Mac, this same directory was used for multiple runs, so somehow Jetty 
 or the VM figures out how to reuse it.
 Since this example setup isn't a *real* installation -- it's just for 
 demonstration, arguably it should not contain what it doesn't need.  
 Likewise, perhaps the empty example/logs directory should be deleted. It's 
 not used by default any way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2586) example work logs directories needed?

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053500#comment-13053500
 ] 

Robert Muir commented on SOLR-2586:
---

is this issue really about the git problem or about making things simpler?

If you want to make things simpler, you would be mentioning things like:
* move example-dih to contrib/dih
* remove mapping-ISOLatin1Accent.txt, we have the foldToAscii and its confusing 
to have both
* ...

But i see you only targeting empty directories, which cause little confusion at 
all.

 example work  logs directories needed?
 ---

 Key: SOLR-2586
 URL: https://issues.apache.org/jira/browse/SOLR-2586
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: David Smiley
Priority: Minor

 Firstly, what prompted this issue was me wanting to use a git solr mirror but 
 finding that git's lack of empty-directory support made the example ant 
 task fail. This task requires examples/work to be in place so that it can 
 delete its contents. Fixing this was a simple matter of adding:
 {code:xml}
 mkdir dir=${example}/work /!-- in case not there --
 {code}
 Right before the delete task.
 But then it occurred to me, why even have a work directory since Jetty will 
 apparently use a temp directory instead. -- try for yourself (stdout snippet):
 bq. 2011-06-11 00:51:26.177:INFO::Extract 
 file:/SmileyDev/Search/lucene-solr/solr/example/webapps/solr.war to 
 /var/folders/zo/zoQJvqc9E0076p0THiri+k+++TI/-Tmp-/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp
 On my Mac, this same directory was used for multiple runs, so somehow Jetty 
 or the VM figures out how to reuse it.
 Since this example setup isn't a *real* installation -- it's just for 
 demonstration, arguably it should not contain what it doesn't need.  
 Likewise, perhaps the empty example/logs directory should be deleted. It's 
 not used by default any way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Commented] (LUCENENET-426) Mark BaseFragmentsBuilder methods as virtual

2011-06-22 Thread Itamar Syn-Hershko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053511#comment-13053511
 ] 

Itamar Syn-Hershko commented on LUCENENET-426:
--

Apparently that was not enough. I hit a need to override this one too:

protected Field[] GetFields(IndexReader reader, int docId, String fieldName)

Perhaps it'd make sense to make all protected virtual? In Java you can override 
anything that is not final, so that would be compatible with the original 
version.

 Mark BaseFragmentsBuilder methods as virtual
 

 Key: LUCENENET-426
 URL: https://issues.apache.org/jira/browse/LUCENENET-426
 Project: Lucene.Net
  Issue Type: Improvement
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 3.x, 
 Lucene.Net 2.9.4g
Reporter: Itamar Syn-Hershko
Priority: Minor
 Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g

 Attachments: fvh.patch


 Without marking methods in BaseFragmentsBuilder as virtual, it is meaningless 
 to have FragmentsBuilder deriving from a class named Base, since most of 
 its functionality cannot be overridden. Attached is a patch for marking the 
 important methods virtual.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SOLR-2586) example work logs directories needed?

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053518#comment-13053518
 ] 

Robert Muir commented on SOLR-2586:
---

by the way, if you want to solve the git problem, upload a patch that adds a 
gitignore file or .keep_me hidden file or whatever, I'll even commit it, and 
I'm the biggest git-hater there is.

then, you could fix your git problem, and separately we could deal with 
simplifying the example.


 example work  logs directories needed?
 ---

 Key: SOLR-2586
 URL: https://issues.apache.org/jira/browse/SOLR-2586
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: David Smiley
Priority: Minor

 Firstly, what prompted this issue was me wanting to use a git solr mirror but 
 finding that git's lack of empty-directory support made the example ant 
 task fail. This task requires examples/work to be in place so that it can 
 delete its contents. Fixing this was a simple matter of adding:
 {code:xml}
 mkdir dir=${example}/work /!-- in case not there --
 {code}
 Right before the delete task.
 But then it occurred to me, why even have a work directory since Jetty will 
 apparently use a temp directory instead. -- try for yourself (stdout snippet):
 bq. 2011-06-11 00:51:26.177:INFO::Extract 
 file:/SmileyDev/Search/lucene-solr/solr/example/webapps/solr.war to 
 /var/folders/zo/zoQJvqc9E0076p0THiri+k+++TI/-Tmp-/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp
 On my Mac, this same directory was used for multiple runs, so somehow Jetty 
 or the VM figures out how to reuse it.
 Since this example setup isn't a *real* installation -- it's just for 
 demonstration, arguably it should not contain what it doesn't need.  
 Likewise, perhaps the empty example/logs directory should be deleted. It's 
 not used by default any way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-3.x - Build # 416 - Failure

2011-06-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-3.x/416/

No tests ran.

Build Log (for compile errors):
[...truncated 10795 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules

2011-06-22 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male reassigned LUCENE-2883:
--

Assignee: Chris Male

 Consolidate Solr   Lucene FunctionQuery into modules
 -

 Key: LUCENE-2883
 URL: https://issues.apache.org/jira/browse/LUCENE-2883
 Project: Lucene - Java
  Issue Type: Task
  Components: core/search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Chris Male
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2883.patch


 Spin-off from the [dev list | 
 http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3232) Move MutableValues to Queries Module

2011-06-22 Thread Chris Male (JIRA)
Move MutableValues to Queries Module


 Key: LUCENE-3232
 URL: https://issues.apache.org/jira/browse/LUCENE-3232
 Project: Lucene - Java
  Issue Type: Sub-task
Reporter: Chris Male


Solr makes use of the MutableValue* series of classes to improve performance of 
grouping by FunctionQuery (I think).  As such they are used in ValueSource 
implementations.  Consequently we need to move these classes in order to move 
the ValueSources.

I'll also use this issue to establish the Queries module where the 
FunctionQueries will lie.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules

2011-06-22 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053566#comment-13053566
 ] 

Chris Male commented on LUCENE-2883:


Rather than doing all the work in this issue, I'm going to spin off a few 
subtasks and resolve this one by one.

 Consolidate Solr   Lucene FunctionQuery into modules
 -

 Key: LUCENE-2883
 URL: https://issues.apache.org/jira/browse/LUCENE-2883
 Project: Lucene - Java
  Issue Type: Task
  Components: core/search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Chris Male
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2883.patch


 Spin-off from the [dev list | 
 http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-22 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053580#comment-13053580
 ] 

Yonik Seeley commented on LUCENE-3079:
--

bq. if Solr needed to be so deeply intertwined with caching, schema, etc., 
other apps that want to facet will have the same needs

Sort of an aside, but not really specific applications are much easier.  A 
lot more indirection is required in Solr and a schema is needed for pretty much 
everything.  Without the schema, a client would specify sort=foo desc and 
Solr would have no idea how to do that.  A specific application just does it 
because they have the knowledge of what all the fields are.  It's why people 
have gotten along just fine without a schema in Lucene thus far.  If you're 
building another Solr... yes, you need something like a schema.



 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-22 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053581#comment-13053581
 ] 

Jason Rutherglen commented on LUCENE-3079:
--

Schemas should probably be a module that makes use of embedding the field types 
per-segment, this is something the faceting module could/should use.  I think 
is what LUCENE-2308 is aiming for?  Though I thought there was another Jira 
issue created by Simon for this as well.

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-22 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053587#comment-13053587
 ] 

Chris Male commented on LUCENE-3079:


I don't think any Facet module needs to be concerned with Schemas.  Instead the 
module can expose an API which asks for the information it needs to make the 
best choices.  Solr can then provide that information based on its Schema, pure 
Lucene users can do it however they want. 

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-22 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053591#comment-13053591
 ] 

Jason Rutherglen commented on LUCENE-3079:
--

bq. I don't think any Facet module needs to be concerned with Schemas

Right, they should be field type aware.

 Facetiing module
 

 Key: LUCENE-3079
 URL: https://issues.apache.org/jira/browse/LUCENE-3079
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3079.patch


 Faceting is a hugely important feature, available in Solr today but
 not [easily] usable by Lucene-only apps.
 We should fix this, by creating a shared faceting module.
 Ideally, we factor out Solr's faceting impl, and maybe poach/merge
 from other impls (eg Bobo browse).
 Hoss describes some important challenges we'll face in doing this
 (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here:
 {noformat}
 To look at faceting as a concrete example, there are big the reasons 
 faceting works so well in Solr: Solr has total control over the 
 index, knows exactly when the index has changed to rebuild caches, has a 
 strict schema so it can make sense of field types and 
 pick faceting algos accordingly, has multi-phase distributed search 
 approach to get exact counts efficiently across multiple shards, etc...
 (and there are still a lot of additional enhancements and improvements 
 that can be made to take even more advantage of knowledge solr has because 
 it owns the index that we no one has had time to tackle)
 {noformat}
 This is a great list of the things we face in refactoring.  It's also
 important because, if Solr needed to be so deeply intertwined with
 caching, schema, etc., other apps that want to facet will have the
 same needs and so we really have to address them in creating the
 shared module.
 I think we should get a basic faceting module started, but should not
 cut Solr over at first.  We should iterate on the module, fold in
 improvements, etc., and then, once we can fully verify that cutting
 over doesn't hurt Solr (ie lose functionality or performance) we can
 later cutover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3232) Move MutableValues to Queries Module

2011-06-22 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053593#comment-13053593
 ] 

Chris Male commented on LUCENE-3232:


Code to execute before patch:

{code}
svn mkdir --parents modules/queries/src/java/org/apache/lucene/queries/function
svn move solr/src/java/org/apache/solr/search/MutableValue.java 
modules/queries/src/java/org/apache/lucene/queries/function/MutableValue.java
svn move solr/src/java/org/apache/solr/search/MutableValueFloat.java 
modules/queries/src/java/org/apache/lucene/queries/function/MutableValueFloat.java
svn move solr/src/java/org/apache/solr/search/MutableValueBool.java 
modules/queries/src/java/org/apache/lucene/queries/function/MutableValueBool.java
svn move solr/src/java/org/apache/solr/search/MutableValueDate.java 
modules/queries/src/java/org/apache/lucene/queries/function/MutableValueDate.java
svn move solr/src/java/org/apache/solr/search/MutableValueDouble.java 
modules/queries/src/java/org/apache/lucene/queries/function/MutableValueDouble.java
svn move solr/src/java/org/apache/solr/search/MutableValueInt.java 
modules/queries/src/java/org/apache/lucene/queries/function/MutableValueInt.java
svn move solr/src/java/org/apache/solr/search/MutableValueLong.java 
modules/queries/src/java/org/apache/lucene/queries/function/MutableValueLong.java
svn move solr/src/java/org/apache/solr/search/MutableValueStr.java 
modules/queries/src/java/org/apache/lucene/queries/function/MutableValueStr.java
{code}

 Move MutableValues to Queries Module
 

 Key: LUCENE-3232
 URL: https://issues.apache.org/jira/browse/LUCENE-3232
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0


 Solr makes use of the MutableValue* series of classes to improve performance 
 of grouping by FunctionQuery (I think).  As such they are used in ValueSource 
 implementations.  Consequently we need to move these classes in order to move 
 the ValueSources.
 I'll also use this issue to establish the Queries module where the 
 FunctionQueries will lie.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3232) Move MutableValues to Queries Module

2011-06-22 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3232:
---

Attachment: LUCENE-3232.patch

Patch that establishes the Queries module and moves the MutableValue classes.  
Includes intellij, eclipse and maven work.

Everything compiles and tests pass.

It'd be great if someone could review.  I'll commit in a few days.

 Move MutableValues to Queries Module
 

 Key: LUCENE-3232
 URL: https://issues.apache.org/jira/browse/LUCENE-3232
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3232.patch


 Solr makes use of the MutableValue* series of classes to improve performance 
 of grouping by FunctionQuery (I think).  As such they are used in ValueSource 
 implementations.  Consequently we need to move these classes in order to move 
 the ValueSources.
 I'll also use this issue to establish the Queries module where the 
 FunctionQueries will lie.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2615) Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE level

2011-06-22 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053597#comment-13053597
 ] 

Yonik Seeley commented on SOLR-2615:


bq. You may notice I use SLF4J's nifty log.debug(message blah {} blah, var) 
syntax, which is both performant and concise as there's no point in guarding 
the debug message with an isDebugEnabled() since debug() will internally check 
this any way and there is no string concatenation if debug isn't enabled.

I think there is still a point to caching isDebugEnabled() though.  The 
implementation most likely involves checking volatile variables, and can 
involve checking a hierarchy of loggers.  I assume the cost may be different 
for different logging implementations too.  Better to just cache if you can and 
not worry about it.

 Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE 
 level
 ---

 Key: SOLR-2615
 URL: https://issues.apache.org/jira/browse/SOLR-2615
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: David Smiley
Priority: Minor
 Fix For: 3.3

 Attachments: SOLR-2615_LogUpdateProcessor_debug_logging.patch


 It would be great if the LogUpdateProcessor logged each command (add, delete, 
 ...) at debug (Fine) level. Presently it only logs a summary of 8 commands 
 and it does so at the very end.
 The attached patch implements this.
 * I moved the LogUpdateProcessor ahead of RunUpdateProcessor so that the 
 debug level log happens before Solr does anything with it. It should not 
 affect the ordering of the existing summary log which happens at finish(). 
 * I changed UpdateRequestProcessor's static log variable to be an instance 
 variable that uses the current class name. I think this makes much more sense 
 since I want to be able to alter logging levels for a specific processor 
 without doing it for all of them. This change did require me to tweak the 
 factory's detection of the log level which avoids creating the 
 LogUpdateProcessor.
 * There was an NPE bug in AddUpdateCommand.getPrintableId() in the event 
 there is no schema unique field. I fixed that.
 You may notice I use SLF4J's nifty log.debug(message blah {} blah, var) 
 syntax, which is both performant and concise as there's no point in guarding 
 the debug message with an isDebugEnabled() since debug() will internally 
 check this any way and there is no string concatenation if debug isn't 
 enabled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3232) Move MutableValues to Queries Module

2011-06-22 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053599#comment-13053599
 ] 

Yonik Seeley commented on LUCENE-3232:
--

These are useful beyond function queries... perhaps they should not be in the 
function module?

 Move MutableValues to Queries Module
 

 Key: LUCENE-3232
 URL: https://issues.apache.org/jira/browse/LUCENE-3232
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3232.patch


 Solr makes use of the MutableValue* series of classes to improve performance 
 of grouping by FunctionQuery (I think).  As such they are used in ValueSource 
 implementations.  Consequently we need to move these classes in order to move 
 the ValueSources.
 I'll also use this issue to establish the Queries module where the 
 FunctionQueries will lie.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3232) Move MutableValues to Queries Module

2011-06-22 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053600#comment-13053600
 ] 

Chris Male commented on LUCENE-3232:


I've debated this backwards and forwards.  Do they have a use case out of 
function queries at the moment?  If so then yeah I'll happily put them 
somewhere else.  Otherwise I'll cross that bridge at the time.

 Move MutableValues to Queries Module
 

 Key: LUCENE-3232
 URL: https://issues.apache.org/jira/browse/LUCENE-3232
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3232.patch


 Solr makes use of the MutableValue* series of classes to improve performance 
 of grouping by FunctionQuery (I think).  As such they are used in ValueSource 
 implementations.  Consequently we need to move these classes in order to move 
 the ValueSources.
 I'll also use this issue to establish the Queries module where the 
 FunctionQueries will lie.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3232) Move MutableValues to Queries Module

2011-06-22 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053601#comment-13053601
 ] 

Chris Male commented on LUCENE-3232:


Actually scrap that question, I'll put them somewhere else immediately.

 Move MutableValues to Queries Module
 

 Key: LUCENE-3232
 URL: https://issues.apache.org/jira/browse/LUCENE-3232
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3232.patch


 Solr makes use of the MutableValue* series of classes to improve performance 
 of grouping by FunctionQuery (I think).  As such they are used in ValueSource 
 implementations.  Consequently we need to move these classes in order to move 
 the ValueSources.
 I'll also use this issue to establish the Queries module where the 
 FunctionQueries will lie.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3232) Move MutableValues to Common Module

2011-06-22 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-3232:
---

Summary: Move MutableValues to Common Module  (was: Move MutableValues to 
Queries Module)

 Move MutableValues to Common Module
 ---

 Key: LUCENE-3232
 URL: https://issues.apache.org/jira/browse/LUCENE-3232
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Reporter: Chris Male
 Fix For: 4.0

 Attachments: LUCENE-3232.patch


 Solr makes use of the MutableValue* series of classes to improve performance 
 of grouping by FunctionQuery (I think).  As such they are used in ValueSource 
 implementations.  Consequently we need to move these classes in order to move 
 the ValueSources.
 I'll also use this issue to establish the Queries module where the 
 FunctionQueries will lie.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >