[JENKINS] Lucene-trunk - Build # 1602 - Failure
Build: https://builds.apache.org/job/Lucene-trunk/1602/ No tests ran. Build Log (for compile errors): [...truncated 8987 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053079#comment-13053079 ] Dawid Weiss commented on LUCENE-2341: - bq. Dawid, do you think it's reasonable to optimize further and use directly a list returned by IStemmer.lookup (instead of copying with addAll) ? My concern is that (at least in current DictionaryLookup implementation) that list seems to be shared by distinct invocations of the lookup method, which would make the use of a specific IStemmer not applicable in thread-safe code. IStemmer implementations are not thread safe anyway, so there is no problem in reusing that list. In fact, the returned WordData objects are reused internally as well, so you can't store them either (this is done to avoid GC overhead). So yes: I missed that, but you'll need to ensure IStemmer instances are not shared. This can be done in various ways (thread local, etc), but I think the simplest way to do it would be to instantiate PolishStemmer at the MorfologikFilter level. This is cheap (the dictionary is loaded once anyway). You can then create two constructors in the analyzer -- one with PolishStemmer.DICTIONARY and one with the default (I'd suggest MORFOLOGIK). Exposing IStemmer constructor will do more harm than good -- thinking ahead is good, but in this case I don't think there'll be this many people interested in subclassing IStemmer (if anything, they'll plug into Lucene's infrastructure directly). A simple test case spawning 5 or 10 threads in a parallel executor and crunching stems on the same analyzer would also be nice to ensure we have everything correct wrt multithreading, but it's not that crucial if you don't have the time to write it. Thanks! explore morfologik integration -- Key: LUCENE-2341 URL: https://issues.apache.org/jira/browse/LUCENE-2341 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-2341.diff, LUCENE-2341.diff, morfologik-stemming-1.5.0.jar Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer available: http://sourceforge.net/projects/morfologik/ This works differently than LUCENE-2298, and ideally would be another option for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-2399: Attachment: SOLR-2399-110622.patch Okay, there we go : {quote}On the 'java-properties' page, is the UI assuming ':' is the path seperator? Can this use the value of path.seperator to split?{quote} Yes Yes - Done [[commit|https://github.com/steffkes/solr-admin/commit/abb57cacb4a8aa11e406da32ecfa0e2b3caf07be]] bq. Should the Ping query append a random number so that it avoids HTTP cache? Good Idea! - Done [[commit|https://github.com/steffkes/solr-admin/commit/61f24c2b08e5b8ca847d197374abf1b3fbd0595a]] bq. Something for the wishlist... on the threads page, it would be great to have a button to expand (and collapse?) all the stack traces. Its hard to figure out which thread is doing what just from the title. I've added an Button at the Top and the Bottom of the Table to show/hide all of them w/ one click [[commit|https://github.com/steffkes/solr-admin/commit/26378c34ecebe34ce6e80292d8fb02acacb69ead]] Attached Patch contains all git-changes since our last SVN-Commit. Could you also include those images Ryan? They will not go into the SVN-Diff because of their binary type :/ * https://github.com/steffkes/solr-admin/raw/master/img/ico/toolbox.png * https://github.com/steffkes/solr-admin/raw/master/img/ico/zone.png * https://github.com/steffkes/solr-admin/raw/master/img/ico/system-monitor--exclamation.png Thanks! :) Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053080#comment-13053080 ] Bill Bell commented on SOLR-2242: - Simon, I made all those changes except for the termsList one. I think it is useful to have the count based on terms. See attachment. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bell updated SOLR-2242: Attachment: SOLR-2242.shard.patch New patch ready for commit? Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2612) Add testpackage and testpackageroot conditions to clustering and analysis-extras build files
[ https://issues.apache.org/jira/browse/SOLR-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-2612. - Resolution: Fixed Committed revision 1138319 on trunk and revision 1138320 on branch_3x. Add testpackage and testpackageroot conditions to clustering and analysis-extras build files Key: SOLR-2612 URL: https://issues.apache.org/jira/browse/SOLR-2612 Project: Solr Issue Type: Task Components: Build Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Trivial Fix For: 3.3, 4.0 Attachments: SOLR-2612.patch Clustering and analysis-extras are the only two build files which do not have testpackage and testpackageroot exclusions wired into the build file. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053090#comment-13053090 ] Noble Paul commented on SOLR-2382: -- The patch does not apply on trunk DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6. Change the semantics of entity.destroy() - Previously, it was being called on each iteration of DocBuilder.buildDocument(). - Now it is does one-time cleanup tasks (like closing or deleting a disk-backed cache) once the entity processor is completed. - The only out-of-the-box entity processor that previously implemented destroy() was LineEntitiyProcessor, so this is not a very invasive change. General Notes: We are near completion in converting our search functionality from a legacy search engine to Solr. However, I found that DIH did not support caching to the level of our prior product's data import utility. In order to get our data into Solr, I created these caching enhancements. Because I believe this has broad application, and because we would like this feature to be supported by the Community, I have front-ported this, enhanced, to Trunk. I have also added unit tests and verified that all existing test cases
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053097#comment-13053097 ] Ryan McKinley commented on SOLR-2399: - Thanks Stefan. Added in #1138323 Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk - Build # 1602 - Failure
Second attempt at fixing the javadoc. Passes for me now. On Wed, Jun 22, 2011 at 6:29 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-trunk/1602/ No tests ran. Build Log (for compile errors): [...truncated 8987 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Software Developer | JTeam BV.| www.jteam.nl
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053110#comment-13053110 ] Ryan McKinley commented on SOLR-2399: - in #1138328, I added a min-width value -- this should keep things from looking rediculous when it gets really small Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053113#comment-13053113 ] Simon Willnauer commented on SOLR-2242: --- bq. New patch ready for commit? bill, I still see lots of whitespace / indentation problems in that latest patch. Anyway I looked at it and I wonder if we could restructure this a little like we could first check if termList != null and do all the cases there and if termList == null we get the TermCountsLimit that would remove all the redundant getTermCountsLimit / getListedTermCounts calls. Like the termList==null case seems very easy and straight forward: {code} if (termList != null) { NamedListInteger counts = getListedTermCounts(facetValue, termList); switch (numFacetTerms) { case COUNTS: final NamedListInteger resCount = new NamedListInteger(); counts = resCount; case COUNTS_AND_VALUES: counts.add(numFacetTerms, counts.size()); break; } res.add(key, counts); } else { ... {code} yet, its hard to refactor this without a single test (note, there might be a bug). I would be really happy to see a test-case for this that tests all the variations. Regarding the constants, I think the default case should be a constant too. If you use NamedList can you make sure you put the right generic to it if possible, otherwise my IDE goes wild and adds warnings all over the place. In your case NamedListInteger works fine. simon Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
[ https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053115#comment-13053115 ] Robert Muir commented on LUCENE-3226: - {quote} Also, in LUCENE-2921 I plan to get rid of all those ridiculous constant names and track the index version at the segment level only. It will be easier, IMO, to have an easy to understand constant name when it comes to supporting an older index (or remove support for). Perhaps it's only me, but when I read those format constant names, I only did that when removing support for older indexes. Other than that, they are not very interesting ... What Hoss reported about CheckIndex is the real problem we should handle here. SegmentInfo prints in its toString the code version which created it, which is better than seeing -9 IMO, and that should be 3.1 or 3.2. If it's a 3.2.0 newly created index, you shouldn't see 3.1 reported from SegmentInfos.toString. Perhaps CheckIndex needs to be fixed to refer to Constants.LUCENE_MAIN_VERSION instead? Robert, shall we reopen the issue to discuss? {quote} We can reopen... but the issue will always exist here, LUCENE-2921 can't solve this particular case since its the segments file... rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex Key: LUCENE-3226 URL: https://issues.apache.org/jira/browse/LUCENE-3226 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1, 3.2 Reporter: Hoss Man Assignee: Robert Muir Fix For: 3.3, 4.0 Attachments: LUCENE-3226.patch A 3.2 user recently asked if something was wrong because CheckIndex was reporting his (newly built) index version as... {noformat} Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1] {noformat} It seems like there are two very confusing pieces of information here... 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice. All other FORMAT_* constants in SegmentInfos are descriptive of the actual change made, and not specific to the version when they were introduced. 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it Lucene 3.1, which is missleading since that format is alwasy used in 3.2 (and probably 3.3, etc...). I suggest: a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION b) change CheckIndex so that the label for the newest format always ends with and later (ie: Lucene 3.1 and later) so when we release versions w/o a format change we don't have to remember to manual list them in CheckIndex. when we *do* make format changes and update CheckIndex and later can be replaced with to X.Y and the new format can be added -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2606) Solr sort no longer works on field names with some punctuation in them
[ https://issues.apache.org/jira/browse/SOLR-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053121#comment-13053121 ] Jan Høydahl commented on SOLR-2606: --- Perhaps a test-class producting randomized (legal) field names of could be of use for this and other tests? Solr sort no longer works on field names with some punctuation in them -- Key: SOLR-2606 URL: https://issues.apache.org/jira/browse/SOLR-2606 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1, 3.2 Environment: Linux Reporter: Mitsu Hadeishi We just upgraded from Solr 1.4 to 3.2. For the most part the upgrade went fine, however we discovered that sorting on field names with dashes in them is no longer working properly. For example, the following query used to work: http://[our solr server]/select/?q=computersort=static-need-binary+asc and now it gives this error: HTTP Status 400 - undefined field static type Status report message undefined field static description The request sent by the client was syntactically incorrect (undefined field static). It appears the parser for sorting has been changed so that it now tokenizes differently, and assumes field names cannot have dashes in them. However, field names clearly can have dashes in them. The exact same query which worked fine for us in 1.4 is now breaking in 3.2. Changing the sort field to use a field name that doesn't have a dash in it works just fine. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
[ https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053123#comment-13053123 ] Shai Erera commented on LUCENE-3226: How does renaming a constant solve the CheckIndex issue? I commented on the constant name, and I think it should reflect the code version it applies to, not the feature. Because if e.g. in the same version you add two features, incrementally, you wouldn't change the format number twice right? And then the constant name becomes meaningless again, or too complicated. It happened to me a while ago (can't remember the exact feature though, perhaps it was in TermInfos). I mentioned LUCENE-2921 only because I intended to name the constants exactly that (X_Y). I see you've already reverted the changes you made. I think that the changes to CheckIndex could remain though, adding the 3.1+ to the string? rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex Key: LUCENE-3226 URL: https://issues.apache.org/jira/browse/LUCENE-3226 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1, 3.2 Reporter: Hoss Man Fix For: 3.3, 4.0 Attachments: LUCENE-3226.patch A 3.2 user recently asked if something was wrong because CheckIndex was reporting his (newly built) index version as... {noformat} Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1] {noformat} It seems like there are two very confusing pieces of information here... 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice. All other FORMAT_* constants in SegmentInfos are descriptive of the actual change made, and not specific to the version when they were introduced. 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it Lucene 3.1, which is missleading since that format is alwasy used in 3.2 (and probably 3.3, etc...). I suggest: a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION b) change CheckIndex so that the label for the newest format always ends with and later (ie: Lucene 3.1 and later) so when we release versions w/o a format change we don't have to remember to manual list them in CheckIndex. when we *do* make format changes and update CheckIndex and later can be replaced with to X.Y and the new format can be added -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8982 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8982/ All tests passed Build Log (for compile errors): [...truncated 16696 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-trunk - Build # 1540 - Failure
Build: https://builds.apache.org/job/Solr-trunk/1540/ All tests passed Build Log (for compile errors): [...truncated 17830 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3218: Attachment: LUCENE-3218_3x.patch here is a patch against 3.x. I had to change one test in lucene/backwards and remove some tests from there which used the CFW / CFR. A review would be good here! Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053142#comment-13053142 ] Mark Harwood commented on LUCENE-2454: -- bq. Could that work for your use case? Sounds like it, that's great :) Do you think there any efficiencies to be gained on the document retrieve side of things if you know that the documents commonly being retrieved are physically nearby i.e. an app will often retrieve a parent's fields and then those from child docs which are required to be physically located adjacent to the parent's data. Would existing lower-level caching in Directory or the OS mean there's already a good chance of finding child data in cached blocks or could a change to file structures and/or doc retrieve APIs radically boost parent-plus-child retrieve performance? Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-flexscoring-branch - Build # 17 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-flexscoring-branch/17/ All tests passed Build Log (for compile errors): [...truncated 12028 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8979 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8979/ All tests passed Build Log (for compile errors): [...truncated 11584 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8983 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8983/ All tests passed Build Log (for compile errors): [...truncated 16184 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8980 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8980/ All tests passed Build Log (for compile errors): [...truncated 12238 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3218: Attachment: LUCENE-3218_tests.patch Hi Simon, currently this attached patch fails... not sure why yet. But I think we should resolve this tests issue before backporting Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-1431) CommComponent abstracted
[ https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-1431. -- Resolution: Fixed I have committed it to trunk. We may need more iterations to clean it up CommComponent abstracted Key: SOLR-1431 URL: https://issues.apache.org/jira/browse/SOLR-1431 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Noble Paul Fix For: 4.0 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch We'll abstract CommComponent in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8984 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8984/ All tests passed Build Log (for compile errors): [...truncated 15320 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8981 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8981/ All tests passed Build Log (for compile errors): [...truncated 12732 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3218: Attachment: LUCENE-3218_test_fix.patch thank you robert, while this has actually been tested since its in the base class though its now cleaner. The test failure came from RAMDirectory simply overriding existing files. I added an explicit check for it. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action
[ https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-2610: Attachment: SOLR-2610-branch3x.patch Patch for branch 3x Add an option to delete index through CoreAdmin UNLOAD action - Key: SOLR-2610 URL: https://issues.apache.org/jira/browse/SOLR-2610 Project: Solr Issue Type: Improvement Components: multicore Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 3.3, 4.0 Attachments: SOLR-2610-branch3x.patch, SOLR-2610.patch Right now, one can unload a Solr Core but the index files are left behind and consume disk space. We should have an option to delete the index when unloading a core. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053181#comment-13053181 ] Robert Muir commented on LUCENE-3218: - Thanks Simon, I feel better now that we get our open-files-for-write tracking back. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8985 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8985/ All tests passed Build Log (for compile errors): [...truncated 16738 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action
[ https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-2610. - Resolution: Fixed Committed revision 1138405 on trunk and 1138407 on branch_3x. Add an option to delete index through CoreAdmin UNLOAD action - Key: SOLR-2610 URL: https://issues.apache.org/jira/browse/SOLR-2610 Project: Solr Issue Type: Improvement Components: multicore Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 3.3, 4.0 Attachments: SOLR-2610-branch3x.patch, SOLR-2610.patch Right now, one can unload a Solr Core but the index files are left behind and consume disk space. We should have an option to delete the index when unloading a core. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #159: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/159/ No tests ran. Build Log (for compile errors): [...truncated 7519 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading
build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading --- Key: LUCENE-3228 URL: https://issues.apache.org/jira/browse/LUCENE-3228 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Currently, we fail on all javadocs warnings. However, you get a warning if it cannot download the package-list from sun.com So I think we should allow you optionally set a sysprop using linkoffline. Then we would get much less hudson fake failures I feel like Mike opened an issue for this already but I cannot find it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading
[ https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned LUCENE-3228: --- Assignee: Robert Muir build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading --- Key: LUCENE-3228 URL: https://issues.apache.org/jira/browse/LUCENE-3228 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Currently, we fail on all javadocs warnings. However, you get a warning if it cannot download the package-list from sun.com So I think we should allow you optionally set a sysprop using linkoffline. Then we would get much less hudson fake failures I feel like Mike opened an issue for this already but I cannot find it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading
[ https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053194#comment-13053194 ] Robert Muir commented on LUCENE-3228: - as a start, i installed the two freebsd ports for java doc on hudson into /usr/local/share/doc/jdk1.5 and jdk1.6 I'll see if i can add the hooks to the build scripts now build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading --- Key: LUCENE-3228 URL: https://issues.apache.org/jira/browse/LUCENE-3228 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Currently, we fail on all javadocs warnings. However, you get a warning if it cannot download the package-list from sun.com So I think we should allow you optionally set a sysprop using linkoffline. Then we would get much less hudson fake failures I feel like Mike opened an issue for this already but I cannot find it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8986 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8986/ All tests passed Build Log (for compile errors): [...truncated 16135 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading
[ https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053198#comment-13053198 ] Robert Muir commented on LUCENE-3228: - As a partial solution, I setup the 30 minute builds to just directly override javadoc.link (and javadoc.link.java for Solr) for our 30 minute builds... we don't care about the actual javadoc artifacts or where the links actually point to, only that there are no warnings. This is in r1138418 build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading --- Key: LUCENE-3228 URL: https://issues.apache.org/jira/browse/LUCENE-3228 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Currently, we fail on all javadocs warnings. However, you get a warning if it cannot download the package-list from sun.com So I think we should allow you optionally set a sysprop using linkoffline. Then we would get much less hudson fake failures I feel like Mike opened an issue for this already but I cannot find it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3228) build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading
[ https://issues.apache.org/jira/browse/LUCENE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053200#comment-13053200 ] Robert Muir commented on LUCENE-3228: - I noticed also that solr uses an online link for junit javadocs... we should download this one and do the same, too. I'll look at this if the link for the sun javadocs takes for the 30 minute builds. build should allow you (especially hudson) to refer to a local javadocs installation instead of downloading --- Key: LUCENE-3228 URL: https://issues.apache.org/jira/browse/LUCENE-3228 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Currently, we fail on all javadocs warnings. However, you get a warning if it cannot download the package-list from sun.com So I think we should allow you optionally set a sysprop using linkoffline. Then we would get much less hudson fake failures I feel like Mike opened an issue for this already but I cannot find it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8983 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8983/ All tests passed Build Log (for compile errors): [...truncated 13538 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3229) Overlaped SpanNearQuery
Overlaped SpanNearQuery --- Key: LUCENE-3229 URL: https://issues.apache.org/jira/browse/LUCENE-3229 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.1 Environment: Windows XP, Java 1.6 Reporter: ludovic Boutros Priority: Minor While using Span queries I think I've found a little bug. With a document like this (from the TestNearSpansOrdered unit test) : w1 w2 w3 w4 w5 If I try to search for this span query : spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true) the above document is returned and I think it should not because 'w4' is not after 'w5'. The 2 spans are not ordered, because there is an overlap. I will add a test patch in the TestNearSpansOrdered unit test. I will add a patch to solve this issue too. Basicaly it modifies the two docSpansOrdered functions to make sure that the spans does not overlap. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3229) Overlaped SpanNearQuery
[ https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ludovic Boutros updated LUCENE-3229: Attachment: SpanOverlapTestUnit.diff Add the Test unit. Overlaped SpanNearQuery --- Key: LUCENE-3229 URL: https://issues.apache.org/jira/browse/LUCENE-3229 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.1 Environment: Windows XP, Java 1.6 Reporter: ludovic Boutros Priority: Minor Attachments: SpanOverlapTestUnit.diff While using Span queries I think I've found a little bug. With a document like this (from the TestNearSpansOrdered unit test) : w1 w2 w3 w4 w5 If I try to search for this span query : spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true) the above document is returned and I think it should not because 'w4' is not after 'w5'. The 2 spans are not ordered, because there is an overlap. I will add a test patch in the TestNearSpansOrdered unit test. I will add a patch to solve this issue too. Basicaly it modifies the two docSpansOrdered functions to make sure that the spans does not overlap. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3229) Overlaped SpanNearQuery
[ https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ludovic Boutros updated LUCENE-3229: Attachment: SpanOverlap.diff add a Patch. Overlaped SpanNearQuery --- Key: LUCENE-3229 URL: https://issues.apache.org/jira/browse/LUCENE-3229 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.1 Environment: Windows XP, Java 1.6 Reporter: ludovic Boutros Priority: Minor Attachments: SpanOverlap.diff, SpanOverlapTestUnit.diff While using Span queries I think I've found a little bug. With a document like this (from the TestNearSpansOrdered unit test) : w1 w2 w3 w4 w5 If I try to search for this span query : spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true) the above document is returned and I think it should not because 'w4' is not after 'w5'. The 2 spans are not ordered, because there is an overlap. I will add a test patch in the TestNearSpansOrdered unit test. I will add a patch to solve this issue too. Basicaly it modifies the two docSpansOrdered functions to make sure that the spans does not overlap. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1298) FunctionQuery results as pseudo-fields
[ https://issues.apache.org/jira/browse/SOLR-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053212#comment-13053212 ] Koji Sekiguchi commented on SOLR-1298: -- Hi, I'm using solr example data on trunk. If I post q=ipodfl=score,price , Solr returns score and price as expected. But if I post q=ipodfl=score,log(price) , Solr returns score, the value of log(price) and rest of all fields. FunctionQuery results as pseudo-fields -- Key: SOLR-1298 URL: https://issues.apache.org/jira/browse/SOLR-1298 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Yonik Seeley Priority: Minor Fix For: 4.0 Attachments: SOLR-1298-FieldValues.patch, SOLR-1298.patch It would be helpful if the results of FunctionQueries could be added as fields to a document. Couple of options here: 1. Run FunctionQuery as part of relevance score and add that piece to the document 2. Run the function (not really a query) during Document/Field retrieval -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8984 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8984/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringAbortWithThreadsOnlyOnce Error Message: MockDirectoryWrapper: cannot close: there are still open files: {} Stack Trace: java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {} at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:473) at org.apache.lucene.index.TestIndexWriterWithThreads._testMultipleThreadsFailure(TestIndexWriterWithThreads.java:279) at org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringAbortWithThreadsOnlyOnce(TestIndexWriterWithThreads.java:366) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343) Build Log (for compile errors): [...truncated 3264 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-1979: -- Attachment: SOLR-1979.patch New version. Example of accepted params: {code} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory defaults str name=langidtrue/str str name=langid.fltitle,subject,text,keywords/str str name=langid.langFieldlanguage_s/str str name=langid.langsFieldlanguages/str str name=langid.overwritefalse/str float name=langid.threshold0.5/float str name=langid.whitelistno,en,es,dk/str str name=langid.maptrue/str str name=langid.map.fltitle,text/str bool name=langid.map.overwritefalse/bool bool name=langid.map.keepOrigfalse/bool bool name=langid.map.individualfalse/bool str name=langid.map.individual.fl/str str name=langid.fallbackFieldsmeta_content_language,lang/str str name=langid.fallbacken/str /defaults /processor {code} The only mandatory parameter is langid.fl To enable field name mapping, set langid.map=true. It will then map field names for all fields in langid.fl. If the set of fields to map is different from langid.fl, supply langid.map.fl. Those fields will then be renamed with a language suffix equal to the language detected from the langid.fl fields. If you require detecting languages separately for each field, supply langid.map.individual=true. The supplied fields will then be renamed according to detected language on an individual basis. If the set of fields to detect individually is different from the already supplied langid.fl or langid.map.fl, supply langid.map.individual.fl. The fields listed in langid.map.individual.fl will then be detected individually, while the rest of the mapping fields will be mapped according to global document language. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-1979: -- Description: Language identification from document fields, and mapping of field names to language-specific fields based on detected language. Wrap the Tika LanguageIdentifier in an UpdateProcessor. was: We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch Language identification from document fields, and mapping of field names to language-specific fields based on detected language. Wrap the Tika LanguageIdentifier in an UpdateProcessor. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053227#comment-13053227 ] Jan Høydahl commented on SOLR-1979: --- One question regarding the JUnit test: I now use {code} assertU(commit()); {code} How can I add update request params to this commit? To select another update chain from different tests, I'd like to add update params on the fly, e.g.: {code} assertU(commit(), update.chain=langid2); {code} Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch Language identification from document fields, and mapping of field names to language-specific fields based on detected language. Wrap the Tika LanguageIdentifier in an UpdateProcessor. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
[ https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053237#comment-13053237 ] Shai Erera commented on LUCENE-3226: how about printing the oldest and newest segment version? rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex Key: LUCENE-3226 URL: https://issues.apache.org/jira/browse/LUCENE-3226 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1, 3.2 Reporter: Hoss Man Fix For: 3.3, 4.0 Attachments: LUCENE-3226.patch A 3.2 user recently asked if something was wrong because CheckIndex was reporting his (newly built) index version as... {noformat} Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1] {noformat} It seems like there are two very confusing pieces of information here... 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice. All other FORMAT_* constants in SegmentInfos are descriptive of the actual change made, and not specific to the version when they were introduced. 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it Lucene 3.1, which is missleading since that format is alwasy used in 3.2 (and probably 3.3, etc...). I suggest: a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION b) change CheckIndex so that the label for the newest format always ends with and later (ie: Lucene 3.1 and later) so when we release versions w/o a format change we don't have to remember to manual list them in CheckIndex. when we *do* make format changes and update CheckIndex and later can be replaced with to X.Y and the new format can be added -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 8984 - Still Failing
I just committed a fix for this simon On Wed, Jun 22, 2011 at 2:51 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8984/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringAbortWithThreadsOnlyOnce Error Message: MockDirectoryWrapper: cannot close: there are still open files: {} Stack Trace: java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {} at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:473) at org.apache.lucene.index.TestIndexWriterWithThreads._testMultipleThreadsFailure(TestIndexWriterWithThreads.java:279) at org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringAbortWithThreadsOnlyOnce(TestIndexWriterWithThreads.java:366) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343) Build Log (for compile errors): [...truncated 3264 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
[ https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053241#comment-13053241 ] Robert Muir commented on LUCENE-3226: - This would be good (as we can compute it from the segments file), but, we just have to think about how to display the case where this is null: we know its = 3.0 in this case... but we don't know any more than that? Still we should do it, especially in 4.x when most indexes being checkIndexed will have this filled out (except 3.0 indexes) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex Key: LUCENE-3226 URL: https://issues.apache.org/jira/browse/LUCENE-3226 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1, 3.2 Reporter: Hoss Man Fix For: 3.3, 4.0 Attachments: LUCENE-3226.patch A 3.2 user recently asked if something was wrong because CheckIndex was reporting his (newly built) index version as... {noformat} Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1] {noformat} It seems like there are two very confusing pieces of information here... 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice. All other FORMAT_* constants in SegmentInfos are descriptive of the actual change made, and not specific to the version when they were introduced. 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it Lucene 3.1, which is missleading since that format is alwasy used in 3.2 (and probably 3.3, etc...). I suggest: a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION b) change CheckIndex so that the label for the newest format always ends with and later (ie: Lucene 3.1 and later) so when we release versions w/o a format change we don't have to remember to manual list them in CheckIndex. when we *do* make format changes and update CheckIndex and later can be replaced with to X.Y and the new format can be added -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-3218. - Resolution: Fixed backported to 3.x - thanks guys Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef
[ https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053245#comment-13053245 ] Mike Sokolov commented on LUCENE-3080: -- There could be a good reason though for using byte-offsets in highlighting. I have in mind an optimization that would pull in text from an external file or other source, enabling highlighting without stored fields. For best performance the snippet should be pulled from the external source using random access to storage, but this requires byte offsets. I think this might be a big win for large field values. This could only be done if the highlighter doesn't need to perform any text manipulation itself, so it's not really appropriate for Highlighter, as Robert said, but in the case of FVH it might be possible to implement. I'm looking at this, but wondering before I get too deep in if anyone can comment on the feasibility of using byte offsets - I'm unclear on what they get used for other than highlighting: would it cause problems to have a CharFilter that returns corrected offsets such that char positions in the analyzed text are translated into byte positions in the source text? cutover highlighter to BytesRef --- Key: LUCENE-3080 URL: https://issues.apache.org/jira/browse/LUCENE-3080 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Reporter: Michael McCandless Highlighter still uses char[] terms (consumes tokens from the analyzer as char[] not as BytesRef), which is causing problems for merging SOLR-2497 to trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3226) rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex
[ https://issues.apache.org/jira/browse/LUCENE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053248#comment-13053248 ] Shai Erera commented on LUCENE-3226: We can print pre-3.1. But, if somebody opened a 3.0 / 2.x index w/ 3.1+ and all segments were 'touched' by the 3.1+ code, then their version would be 3.0 or 2.x (i.e., not null). So it could be that someone opens two indexes, and CheckIndex reports oldVersion=pre-3.1 for one and oldVersion=2.x for the other. I think it's acceptable though. rename SegmentInfos.FORMAT_3_1 and improve description in CheckIndex Key: LUCENE-3226 URL: https://issues.apache.org/jira/browse/LUCENE-3226 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.1, 3.2 Reporter: Hoss Man Fix For: 3.3, 4.0 Attachments: LUCENE-3226.patch A 3.2 user recently asked if something was wrong because CheckIndex was reporting his (newly built) index version as... {noformat} Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1] {noformat} It seems like there are two very confusing pieces of information here... 1) the variable name of SegmentInfos.FORMAT_3_1 seems like poor choice. All other FORMAT_* constants in SegmentInfos are descriptive of the actual change made, and not specific to the version when they were introduced. 2) whatever the name of the FORMAT_* variable, CheckIndex is labeling it Lucene 3.1, which is missleading since that format is alwasy used in 3.2 (and probably 3.3, etc...). I suggest: a) rename FORMAT_3_1 to something like FORMAT_SEGMENT_RECORDS_VERSION b) change CheckIndex so that the label for the newest format always ends with and later (ie: Lucene 3.1 and later) so when we release versions w/o a format change we don't have to remember to manual list them in CheckIndex. when we *do* make format changes and update CheckIndex and later can be replaced with to X.Y and the new format can be added -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef
[ https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053254#comment-13053254 ] Robert Muir commented on LUCENE-3080: - Mike, its an interesting idea, as I think the offsets are intended to be opaque to the app (so you should be able to use byte offsets if you want). There are some problems though, especially tokenfilters that muck with offsets: NGramTokenFilter, WordDelimiterFilter, ... In general there are assumptions here that offsets are utf16. cutover highlighter to BytesRef --- Key: LUCENE-3080 URL: https://issues.apache.org/jira/browse/LUCENE-3080 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Reporter: Michael McCandless Highlighter still uses char[] terms (consumes tokens from the analyzer as char[] not as BytesRef), which is causing problems for merging SOLR-2497 to trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef
[ https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053275#comment-13053275 ] Mike Sokolov commented on LUCENE-3080: -- It might be a bit more complicated? Looks like WordDelimiterFilter, in generatePart and concatenate, eg, performs computation with the offsets. So it would either need to know the units of the offsets it was passed, or be given more than just a correctOffset() method: rather it seems to require something like addCharsToOffset (offset, charOffsetIncr), where charOffsetIncr is a number of chars, but offset is in some unspecified unit. cutover highlighter to BytesRef --- Key: LUCENE-3080 URL: https://issues.apache.org/jira/browse/LUCENE-3080 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Reporter: Michael McCandless Highlighter still uses char[] terms (consumes tokens from the analyzer as char[] not as BytesRef), which is causing problems for merging SOLR-2497 to trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Attachment: LUCENE-3216_floats.patch here is a first patch that converts the floats impl to buffer values in ram during indexing but writes values directly during merge. all tests pass I plan to commit this soon too. Rather go small iterations here instead of a large patch. Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216_floats.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3230) Make FSDirectory.fsync() public and static
Make FSDirectory.fsync() public and static -- Key: LUCENE-3230 URL: https://issues.apache.org/jira/browse/LUCENE-3230 Project: Lucene - Java Issue Type: New Feature Components: core/store Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.3, 4.0 I find FSDirectory.fsync() (today protected and instance method) very useful as a utility to sync() files. I'd like create a FSDirectory.sync() utility which contains the exact same impl of FSDir.fsync(), and have the latter call it. We can have it part of IOUtils too, as it's a completely standalone utility. I would get rid of FSDir.fsync() if it wasn't protected (as if encouraging people to override it). I doubt anyone really overrides it (our core Directories don't). Also, while reviewing the code, I noticed that if IOE occurs, the code sleeps for 5 msec. If an InterruptedException occurs then, it immediately throws ThreadIE, completely ignoring the fact that it slept due to IOE. Shouldn't we at least pass IOE.getMessage() on ThreadIE? The patch is trivial, so I'd like to get some feedback before I post it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef
[ https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053281#comment-13053281 ] Robert Muir commented on LUCENE-3080: - yes: in general I think it would be problematic, especially since most tests use only all-ascii data. Another problem on this issue is that if you want to use bytes, but with the Tokenizer-analysis-chain, it only takes Reader, so you cannot assume anything about the original bytes or encoding (e.g. that its UTF-8 for example). cutover highlighter to BytesRef --- Key: LUCENE-3080 URL: https://issues.apache.org/jira/browse/LUCENE-3080 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Reporter: Michael McCandless Highlighter still uses char[] terms (consumes tokens from the analyzer as char[] not as BytesRef), which is causing problems for merging SOLR-2497 to trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #156: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/156/ No tests ran. Build Log (for compile errors): [...truncated 7007 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3229) Overlaped SpanNearQuery
[ https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053286#comment-13053286 ] ludovic Boutros commented on LUCENE-3229: - testSpanNearUnOrdered unit test does not work anymore. The unordered SpanNear class uses the ordering function of the ordered SpanNear class. Perhaps, it should use its own ordering function witch allows the span overlaps. I will check. Overlaped SpanNearQuery --- Key: LUCENE-3229 URL: https://issues.apache.org/jira/browse/LUCENE-3229 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.1 Environment: Windows XP, Java 1.6 Reporter: ludovic Boutros Priority: Minor Attachments: SpanOverlap.diff, SpanOverlapTestUnit.diff While using Span queries I think I've found a little bug. With a document like this (from the TestNearSpansOrdered unit test) : w1 w2 w3 w4 w5 If I try to search for this span query : spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true) the above document is returned and I think it should not because 'w4' is not after 'w5'. The 2 spans are not ordered, because there is an overlap. I will add a test patch in the TestNearSpansOrdered unit test. I will add a patch to solve this issue too. Basicaly it modifies the two docSpansOrdered functions to make sure that the spans does not overlap. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2614) stats with pivot
stats with pivot Key: SOLR-2614 URL: https://issues.apache.org/jira/browse/SOLR-2614 Project: Solr Issue Type: Improvement Components: Schema and Analysis, SearchComponents - other Affects Versions: 4.0 Reporter: pengyao Fix For: 4.0 Is it possible to get stats (like Stats Component: min ,max, sum, count, missing, sumOfSquares, mean and stddev) from numeric fields inside hierarchical facets (with more than one level, like Pivot)? I would like to query: ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z and get min, max, sum, count, etc. from numeric_field1 and numeric_field2 from all combinations of field_x, field_y and field_z (hierarchical values). Using stats.facet I get just one field at one level and using facet.pivot I get just counts, but no stats. Looping in client application to do all combinations of facets values will be to slow because there is a lot of combinations. Thanks a lot! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2614) stats with pivot
[ https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengyao updated SOLR-2614: -- Component/s: (was: Schema and Analysis) Priority: Critical (was: Major) Description: Is it possible to get stats (like Stats Component: min ,max, sum, count, missing, sumOfSquares, mean and stddev) from numeric fields inside hierarchical facets (with more than one level, like Pivot)? I would like to query: ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z and get min, max, sum, count, etc. from numeric_field1 and numeric_field2 from all combinations of field_x, field_y and field_z (hierarchical values). Using stats.facet I get just one field at one level and using facet.pivot I get just counts, but no stats. Looping in client application to do all combinations of facets values will be to slow because there is a lot of combinations. Thanks a lot! this is very import,because only counts value,it's no use for sometimes. please add stats with pivot in solr 4.0 thanks a lot was: Is it possible to get stats (like Stats Component: min ,max, sum, count, missing, sumOfSquares, mean and stddev) from numeric fields inside hierarchical facets (with more than one level, like Pivot)? I would like to query: ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z and get min, max, sum, count, etc. from numeric_field1 and numeric_field2 from all combinations of field_x, field_y and field_z (hierarchical values). Using stats.facet I get just one field at one level and using facet.pivot I get just counts, but no stats. Looping in client application to do all combinations of facets values will be to slow because there is a lot of combinations. Thanks a lot! stats with pivot Key: SOLR-2614 URL: https://issues.apache.org/jira/browse/SOLR-2614 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.0 Reporter: pengyao Priority: Critical Fix For: 4.0 Is it possible to get stats (like Stats Component: min ,max, sum, count, missing, sumOfSquares, mean and stddev) from numeric fields inside hierarchical facets (with more than one level, like Pivot)? I would like to query: ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z and get min, max, sum, count, etc. from numeric_field1 and numeric_field2 from all combinations of field_x, field_y and field_z (hierarchical values). Using stats.facet I get just one field at one level and using facet.pivot I get just counts, but no stats. Looping in client application to do all combinations of facets values will be to slow because there is a lot of combinations. Thanks a lot! this is very import,because only counts value,it's no use for sometimes. please add stats with pivot in solr 4.0 thanks a lot -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef
[ https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053300#comment-13053300 ] Mike Sokolov commented on LUCENE-3080: -- Yeah I knew that at some point, but stuffed it away as something to think about later :) There really is no way to pass byte streams into the analysis chain. Maybe providing a character encoding to the filter could enable it to compute the needed byte offsets. cutover highlighter to BytesRef --- Key: LUCENE-3080 URL: https://issues.apache.org/jira/browse/LUCENE-3080 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Reporter: Michael McCandless Highlighter still uses char[] terms (consumes tokens from the analyzer as char[] not as BytesRef), which is causing problems for merging SOLR-2497 to trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch EasySimilarity added. Lots of questions and nocommit in the code. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3229) Overlaped SpanNearQuery
[ https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ludovic Boutros updated LUCENE-3229: Attachment: SpanOverlap2.diff add a patch for the SpanNearUnOrdered class. Everything should be ok now. Overlaped SpanNearQuery --- Key: LUCENE-3229 URL: https://issues.apache.org/jira/browse/LUCENE-3229 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.1 Environment: Windows XP, Java 1.6 Reporter: ludovic Boutros Priority: Minor Attachments: SpanOverlap.diff, SpanOverlap2.diff, SpanOverlapTestUnit.diff While using Span queries I think I've found a little bug. With a document like this (from the TestNearSpansOrdered unit test) : w1 w2 w3 w4 w5 If I try to search for this span query : spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true) the above document is returned and I think it should not because 'w4' is not after 'w5'. The 2 spans are not ordered, because there is an overlap. I will add a test patch in the TestNearSpansOrdered unit test. I will add a patch to solve this issue too. Basicaly it modifies the two docSpansOrdered functions to make sure that the spans does not overlap. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3229) Overlaped SpanNearQuery
[ https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053286#comment-13053286 ] ludovic Boutros edited comment on LUCENE-3229 at 6/22/11 3:32 PM: -- testSpanNearUnOrdered unit test does not work anymore. The unordered SpanNear class uses the ordering function of the ordered SpanNear class. Perhaps, it should use its own ordering function which allows the span overlaps. I will check. was (Author: lboutros): testSpanNearUnOrdered unit test does not work anymore. The unordered SpanNear class uses the ordering function of the ordered SpanNear class. Perhaps, it should use its own ordering function witch allows the span overlaps. I will check. Overlaped SpanNearQuery --- Key: LUCENE-3229 URL: https://issues.apache.org/jira/browse/LUCENE-3229 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.1 Environment: Windows XP, Java 1.6 Reporter: ludovic Boutros Priority: Minor Attachments: SpanOverlap.diff, SpanOverlap2.diff, SpanOverlapTestUnit.diff While using Span queries I think I've found a little bug. With a document like this (from the TestNearSpansOrdered unit test) : w1 w2 w3 w4 w5 If I try to search for this span query : spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true) the above document is returned and I think it should not because 'w4' is not after 'w5'. The 2 spans are not ordered, because there is an overlap. I will add a test patch in the TestNearSpansOrdered unit test. I will add a patch to solve this issue too. Basicaly it modifies the two docSpansOrdered functions to make sure that the spans does not overlap. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2614) stats with pivot
[ https://issues.apache.org/jira/browse/SOLR-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053310#comment-13053310 ] Ryan McKinley commented on SOLR-2614: - not currently. patches welcome! stats with pivot Key: SOLR-2614 URL: https://issues.apache.org/jira/browse/SOLR-2614 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.0 Reporter: pengyao Priority: Critical Fix For: 4.0 Is it possible to get stats (like Stats Component: min ,max, sum, count, missing, sumOfSquares, mean and stddev) from numeric fields inside hierarchical facets (with more than one level, like Pivot)? I would like to query: ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z and get min, max, sum, count, etc. from numeric_field1 and numeric_field2 from all combinations of field_x, field_y and field_z (hierarchical values). Using stats.facet I get just one field at one level and using facet.pivot I get just counts, but no stats. Looping in client application to do all combinations of facets values will be to slow because there is a lot of combinations. Thanks a lot! this is very import,because only counts value,it's no use for sometimes. please add stats with pivot in solr 4.0 thanks a lot -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053313#comment-13053313 ] James Dyer commented on SOLR-2382: -- Noble, I just updated to the latest and re-applied this patch and it worked for me. If you can give me specifics I'll try to dig more to see what might be going wrong. Also, in case you're not on the very latest, there were some very recent commits from about a week ago that broke the previous versions of this patch (r1135954 r1136789). This newest patch will only work on code from after those commits. DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6. Change the semantics of entity.destroy() - Previously, it was being called on each iteration of DocBuilder.buildDocument(). - Now it is does one-time cleanup tasks (like closing or deleting a disk-backed cache) once the entity processor is completed. - The only out-of-the-box entity processor that previously implemented destroy() was LineEntitiyProcessor, so this is not a very invasive change. General Notes: We are near completion in converting our search functionality from a legacy search engine to Solr. However, I found that DIH did not support caching
[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef
[ https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053319#comment-13053319 ] Robert Muir commented on LUCENE-3080: - Well, personally i am hesitant to introduce any encodings or bytes into our current analysis chain, because its unnecessary complexity that will introduce bugs (at the moment, its the users responsibility to create the appropriate Reader etc). Furthermore, not all character sets can be 'corrected' with a linear conversion like this: for example some actually order the text in a different direction, and things like that... there are many quirks to non-unicode character sets. Maybe as a start, it would be useful to prototype some simple experiments with a binary analysis chain and hackup a highlighter to work with them? This way we would have an understanding of what the potential performance gain is. Here's some example code for a dead simple binary analysis chain that only uses bytes the whole way through, you could take these ideas and prototype one with just all ascii-terms and split on the space byte and such: http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestBinaryTerms.java http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/BinaryTokenStream.java cutover highlighter to BytesRef --- Key: LUCENE-3080 URL: https://issues.apache.org/jira/browse/LUCENE-3080 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Reporter: Michael McCandless Highlighter still uses char[] terms (consumes tokens from the analyzer as char[] not as BytesRef), which is causing problems for merging SOLR-2497 to trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2615) Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE level
Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE level --- Key: SOLR-2615 URL: https://issues.apache.org/jira/browse/SOLR-2615 Project: Solr Issue Type: Improvement Components: update Reporter: David Smiley Priority: Minor Fix For: 3.3 It would be great if the LogUpdateProcessor logged each command (add, delete, ...) at debug (Fine) level. Presently it only logs a summary of 8 commands and it does so at the very end. The attached patch implements this. * I moved the LogUpdateProcessor ahead of RunUpdateProcessor so that the debug level log happens before Solr does anything with it. It should not affect the ordering of the existing summary log which happens at finish(). * I changed UpdateRequestProcessor's static log variable to be an instance variable that uses the current class name. I think this makes much more sense since I want to be able to alter logging levels for a specific processor without doing it for all of them. This change did require me to tweak the factory's detection of the log level which avoids creating the LogUpdateProcessor. * There was an NPE bug in AddUpdateCommand.getPrintableId() in the event there is no schema unique field. I fixed that. You may notice I use SLF4J's nifty log.debug(message blah {} blah, var) syntax, which is both performant and concise as there's no point in guarding the debug message with an isDebugEnabled() since debug() will internally check this any way and there is no string concatenation if debug isn't enabled. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2615) Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE level
[ https://issues.apache.org/jira/browse/SOLR-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-2615: --- Attachment: SOLR-2615_LogUpdateProcessor_debug_logging.patch Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE level --- Key: SOLR-2615 URL: https://issues.apache.org/jira/browse/SOLR-2615 Project: Solr Issue Type: Improvement Components: update Reporter: David Smiley Priority: Minor Fix For: 3.3 Attachments: SOLR-2615_LogUpdateProcessor_debug_logging.patch It would be great if the LogUpdateProcessor logged each command (add, delete, ...) at debug (Fine) level. Presently it only logs a summary of 8 commands and it does so at the very end. The attached patch implements this. * I moved the LogUpdateProcessor ahead of RunUpdateProcessor so that the debug level log happens before Solr does anything with it. It should not affect the ordering of the existing summary log which happens at finish(). * I changed UpdateRequestProcessor's static log variable to be an instance variable that uses the current class name. I think this makes much more sense since I want to be able to alter logging levels for a specific processor without doing it for all of them. This change did require me to tweak the factory's detection of the log level which avoids creating the LogUpdateProcessor. * There was an NPE bug in AddUpdateCommand.getPrintableId() in the event there is no schema unique field. I fixed that. You may notice I use SLF4J's nifty log.debug(message blah {} blah, var) syntax, which is both performant and concise as there's no point in guarding the debug message with an isDebugEnabled() since debug() will internally check this any way and there is no string concatenation if debug isn't enabled. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2616) Include jdk14 logging configuration file
[ https://issues.apache.org/jira/browse/SOLR-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-2616: --- Attachment: SOLR-2616_jdk14logging_setup.patch Include jdk14 logging configuration file Key: SOLR-2616 URL: https://issues.apache.org/jira/browse/SOLR-2616 Project: Solr Issue Type: Improvement Reporter: David Smiley Priority: Minor Fix For: 3.3 Attachments: SOLR-2616_jdk14logging_setup.patch The /example/ Jetty Solr configuration should include a basic logging configuration file. Looking at this wiki page: http://wiki.apache.org/solr/LoggingInDefaultJettySetup I am creating this patch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053329#comment-13053329 ] Robert Muir commented on LUCENE-3220: - Just took a look, a few things that might help: * yes the maxdoc does not reflect deletions, but neither does things like totalTermFreq or docFreq either... so its best to not worry about deletions in the scoring and to be consistent and use the stats (e.g. maxDoc, not numDocs) that do not take deletions into account. * for the computeStats(TermContext... termContexts) its wierd to sum the DF across the different terms in the case? But i don't honestly have any suggestions here... maybe in this case we should make a EasyPhraseStats that computes the EasyStats for each term, so its not hiding anything or limiting anyone? and you could then do an instanceof check and have a different method like scorePhrase() that it forwards to in case its an EasyPhraseStats? In general i'm not sure how other ranking systems tend to handle this case, the phrase estimation for IDF in lucene's formula is done by summing the IDFs Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3079) Facetiing module
[ https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Trcek updated LUCENE-3079: - Attachment: LUCENE-3079.patch Facetiing module Key: LUCENE-3079 URL: https://issues.apache.org/jira/browse/LUCENE-3079 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-3079.patch Faceting is a hugely important feature, available in Solr today but not [easily] usable by Lucene-only apps. We should fix this, by creating a shared faceting module. Ideally, we factor out Solr's faceting impl, and maybe poach/merge from other impls (eg Bobo browse). Hoss describes some important challenges we'll face in doing this (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here: {noformat} To look at faceting as a concrete example, there are big the reasons faceting works so well in Solr: Solr has total control over the index, knows exactly when the index has changed to rebuild caches, has a strict schema so it can make sense of field types and pick faceting algos accordingly, has multi-phase distributed search approach to get exact counts efficiently across multiple shards, etc... (and there are still a lot of additional enhancements and improvements that can be made to take even more advantage of knowledge solr has because it owns the index that we no one has had time to tackle) {noformat} This is a great list of the things we face in refactoring. It's also important because, if Solr needed to be so deeply intertwined with caching, schema, etc., other apps that want to facet will have the same needs and so we really have to address them in creating the shared module. I think we should get a basic faceting module started, but should not cut Solr over at first. We should iterate on the module, fold in improvements, etc., and then, once we can fully verify that cutting over doesn't hurt Solr (ie lose functionality or performance) we can later cutover. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3079) Facetiing module
[ https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053361#comment-13053361 ] Stefan Trcek commented on LUCENE-3079: -- This patch was generated by git and tested to apply with patch -p0 -i LUCENE-3079.patch --dry-run Be patient if anything went wrong. Review starting points may be - FacetSearcherTest.testSimpleFacetWithIndexSearcher() or - FacetSearcher.facetCollectSearch() Functions.java may be dismissed in favor of Guava. If you are willing to keep it I'll strip it down to the required parts. -- The implementation relies on field cache only, no index scheme, no cached filters etc. It supports - single valued facets (Facet.java) - multi valued facets (Facet.MultiValued.java) - facet filters (see FacetSearcher.java) - evaluation of facet values that would dismiss due to other facet filters (Yonik says Solr calls this multi-select faceting). (realized by FacetSearcher.fillFacetsForGuiMode()) Let me explain the last point: For the user a facet query (color==green) AND (shape==circle OR shape==square) may look like Facet color [ ] (3) red [x] (5) green [ ] (7) blue Facet shape [x] (9) circle [ ] (4) line [x] (2) square The red/blue/line facet values will display even though the corresponding documents are not in the result set. Also there is support for filtered facet values with zero results, so users understand why they do not get results. Facetiing module Key: LUCENE-3079 URL: https://issues.apache.org/jira/browse/LUCENE-3079 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-3079.patch Faceting is a hugely important feature, available in Solr today but not [easily] usable by Lucene-only apps. We should fix this, by creating a shared faceting module. Ideally, we factor out Solr's faceting impl, and maybe poach/merge from other impls (eg Bobo browse). Hoss describes some important challenges we'll face in doing this (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here: {noformat} To look at faceting as a concrete example, there are big the reasons faceting works so well in Solr: Solr has total control over the index, knows exactly when the index has changed to rebuild caches, has a strict schema so it can make sense of field types and pick faceting algos accordingly, has multi-phase distributed search approach to get exact counts efficiently across multiple shards, etc... (and there are still a lot of additional enhancements and improvements that can be made to take even more advantage of knowledge solr has because it owns the index that we no one has had time to tackle) {noformat} This is a great list of the things we face in refactoring. It's also important because, if Solr needed to be so deeply intertwined with caching, schema, etc., other apps that want to facet will have the same needs and so we really have to address them in creating the shared module. I think we should get a basic faceting module started, but should not cut Solr over at first. We should iterate on the module, fold in improvements, etc., and then, once we can fully verify that cutting over doesn't hurt Solr (ie lose functionality or performance) we can later cutover. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3231) Add fixed size DocValues int variants expose Arrays where possible
[ https://issues.apache.org/jira/browse/LUCENE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3231: Attachment: LUCENE-3231.patch here is a super rough patch with nocommits (and even missing nocommits) showing the idea. this is heavy work in progress though Add fixed size DocValues int variants expose Arrays where possible Key: LUCENE-3231 URL: https://issues.apache.org/jira/browse/LUCENE-3231 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3231.patch currently we only have variable bit packed ints implementation. for flexible scoring or loading field caches it is desirable to have fixed int implementations for 8, 16, 32 and 64 bit. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2382: - Attachment: SOLR-2382.patch Just found a little bug in SortedMapBackedCache. This patch version includes a fix for it. DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6. Change the semantics of entity.destroy() - Previously, it was being called on each iteration of DocBuilder.buildDocument(). - Now it is does one-time cleanup tasks (like closing or deleting a disk-backed cache) once the entity processor is completed. - The only out-of-the-box entity processor that previously implemented destroy() was LineEntitiyProcessor, so this is not a very invasive change. General Notes: We are near completion in converting our search functionality from a legacy search engine to Solr. However, I found that DIH did not support caching to the level of our prior product's data import utility. In order to get our data into Solr, I created these caching enhancements. Because I believe this has broad application, and because we would like this feature to be supported by the Community, I have front-ported this, enhanced, to Trunk. I have also added
[jira] [Updated] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2382: - Attachment: SOLR-2382.patch Sorry...that last patch included some unrelated code. This one is correct. DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6. Change the semantics of entity.destroy() - Previously, it was being called on each iteration of DocBuilder.buildDocument(). - Now it is does one-time cleanup tasks (like closing or deleting a disk-backed cache) once the entity processor is completed. - The only out-of-the-box entity processor that previously implemented destroy() was LineEntitiyProcessor, so this is not a very invasive change. General Notes: We are near completion in converting our search functionality from a legacy search engine to Solr. However, I found that DIH did not support caching to the level of our prior product's data import utility. In order to get our data into Solr, I created these caching enhancements. Because I believe this has broad application, and because we would like this feature to be supported by the Community, I have front-ported this, enhanced, to Trunk. I have also added unit tests and
[jira] [Updated] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2382: - Attachment: (was: SOLR-2382.patch) DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6. Change the semantics of entity.destroy() - Previously, it was being called on each iteration of DocBuilder.buildDocument(). - Now it is does one-time cleanup tasks (like closing or deleting a disk-backed cache) once the entity processor is completed. - The only out-of-the-box entity processor that previously implemented destroy() was LineEntitiyProcessor, so this is not a very invasive change. General Notes: We are near completion in converting our search functionality from a legacy search engine to Solr. However, I found that DIH did not support caching to the level of our prior product's data import utility. In order to get our data into Solr, I created these caching enhancements. Because I believe this has broad application, and because we would like this feature to be supported by the Community, I have front-ported this, enhanced, to Trunk. I have also added unit tests and verified that all existing test cases pass. I believe this patch
[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef
[ https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053432#comment-13053432 ] Mike Sokolov commented on LUCENE-3080: -- I agree it's necessary to prove there is some point to all this - I'm working on getting some numbers. At the moment I'm just assuming ASCII encoding, but I'll take a look at the binary stuff too - thanks. cutover highlighter to BytesRef --- Key: LUCENE-3080 URL: https://issues.apache.org/jira/browse/LUCENE-3080 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Reporter: Michael McCandless Highlighter still uses char[] terms (consumes tokens from the analyzer as char[] not as BytesRef), which is causing problems for merging SOLR-2497 to trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-2793: -- Attachment: LUCENE-2793.patch I am not sure whether the MergeInfo used in SegmentMerger#mergeFields I have kept most of the nocommits there even after correcting it for reference. In MockDirectoryWrapper#crash() to randomize IOContext I have used either a READONCE or DEFAULT or Merge context. Is this the correct way to go? In LuceneTeseCase#newDirectory(), MockDirectoryWrapper#createOutput(), MockDirectoryWrapper#openInput() will randomizing the context here help? Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2586) example work logs directories needed?
[ https://issues.apache.org/jira/browse/SOLR-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053468#comment-13053468 ] David Smiley commented on SOLR-2586: So if work is needed (to avoid rare error conditions if a temp directory is used), that still leaves the question of logs. The only thing approaching use of this directory is some commented-out configuration in jetty.xml. So as it stands, it really isn't used. I think if if someone uncomments that part of jetty.xml, then they can very well make the logs directory. What I'm after here is a little bit of simplification for new users. I certainly don't get any heartburn over these directories, but if someone new sees logs and never sees anything go there, they might think something is wrong. And removing it is one less directory. I say this after updating my Solr book, walking the users through the directory layout in the 1st chapter. No big deal, but simplification/clarity is good. example work logs directories needed? --- Key: SOLR-2586 URL: https://issues.apache.org/jira/browse/SOLR-2586 Project: Solr Issue Type: Improvement Components: Build Reporter: David Smiley Priority: Minor Firstly, what prompted this issue was me wanting to use a git solr mirror but finding that git's lack of empty-directory support made the example ant task fail. This task requires examples/work to be in place so that it can delete its contents. Fixing this was a simple matter of adding: {code:xml} mkdir dir=${example}/work /!-- in case not there -- {code} Right before the delete task. But then it occurred to me, why even have a work directory since Jetty will apparently use a temp directory instead. -- try for yourself (stdout snippet): bq. 2011-06-11 00:51:26.177:INFO::Extract file:/SmileyDev/Search/lucene-solr/solr/example/webapps/solr.war to /var/folders/zo/zoQJvqc9E0076p0THiri+k+++TI/-Tmp-/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp On my Mac, this same directory was used for multiple runs, so somehow Jetty or the VM figures out how to reuse it. Since this example setup isn't a *real* installation -- it's just for demonstration, arguably it should not contain what it doesn't need. Likewise, perhaps the empty example/logs directory should be deleted. It's not used by default any way. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2586) example work logs directories needed?
[ https://issues.apache.org/jira/browse/SOLR-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053500#comment-13053500 ] Robert Muir commented on SOLR-2586: --- is this issue really about the git problem or about making things simpler? If you want to make things simpler, you would be mentioning things like: * move example-dih to contrib/dih * remove mapping-ISOLatin1Accent.txt, we have the foldToAscii and its confusing to have both * ... But i see you only targeting empty directories, which cause little confusion at all. example work logs directories needed? --- Key: SOLR-2586 URL: https://issues.apache.org/jira/browse/SOLR-2586 Project: Solr Issue Type: Improvement Components: Build Reporter: David Smiley Priority: Minor Firstly, what prompted this issue was me wanting to use a git solr mirror but finding that git's lack of empty-directory support made the example ant task fail. This task requires examples/work to be in place so that it can delete its contents. Fixing this was a simple matter of adding: {code:xml} mkdir dir=${example}/work /!-- in case not there -- {code} Right before the delete task. But then it occurred to me, why even have a work directory since Jetty will apparently use a temp directory instead. -- try for yourself (stdout snippet): bq. 2011-06-11 00:51:26.177:INFO::Extract file:/SmileyDev/Search/lucene-solr/solr/example/webapps/solr.war to /var/folders/zo/zoQJvqc9E0076p0THiri+k+++TI/-Tmp-/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp On my Mac, this same directory was used for multiple runs, so somehow Jetty or the VM figures out how to reuse it. Since this example setup isn't a *real* installation -- it's just for demonstration, arguably it should not contain what it doesn't need. Likewise, perhaps the empty example/logs directory should be deleted. It's not used by default any way. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Commented] (LUCENENET-426) Mark BaseFragmentsBuilder methods as virtual
[ https://issues.apache.org/jira/browse/LUCENENET-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053511#comment-13053511 ] Itamar Syn-Hershko commented on LUCENENET-426: -- Apparently that was not enough. I hit a need to override this one too: protected Field[] GetFields(IndexReader reader, int docId, String fieldName) Perhaps it'd make sense to make all protected virtual? In Java you can override anything that is not final, so that would be compatible with the original version. Mark BaseFragmentsBuilder methods as virtual Key: LUCENENET-426 URL: https://issues.apache.org/jira/browse/LUCENENET-426 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 3.x, Lucene.Net 2.9.4g Reporter: Itamar Syn-Hershko Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: fvh.patch Without marking methods in BaseFragmentsBuilder as virtual, it is meaningless to have FragmentsBuilder deriving from a class named Base, since most of its functionality cannot be overridden. Attached is a patch for marking the important methods virtual. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SOLR-2586) example work logs directories needed?
[ https://issues.apache.org/jira/browse/SOLR-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053518#comment-13053518 ] Robert Muir commented on SOLR-2586: --- by the way, if you want to solve the git problem, upload a patch that adds a gitignore file or .keep_me hidden file or whatever, I'll even commit it, and I'm the biggest git-hater there is. then, you could fix your git problem, and separately we could deal with simplifying the example. example work logs directories needed? --- Key: SOLR-2586 URL: https://issues.apache.org/jira/browse/SOLR-2586 Project: Solr Issue Type: Improvement Components: Build Reporter: David Smiley Priority: Minor Firstly, what prompted this issue was me wanting to use a git solr mirror but finding that git's lack of empty-directory support made the example ant task fail. This task requires examples/work to be in place so that it can delete its contents. Fixing this was a simple matter of adding: {code:xml} mkdir dir=${example}/work /!-- in case not there -- {code} Right before the delete task. But then it occurred to me, why even have a work directory since Jetty will apparently use a temp directory instead. -- try for yourself (stdout snippet): bq. 2011-06-11 00:51:26.177:INFO::Extract file:/SmileyDev/Search/lucene-solr/solr/example/webapps/solr.war to /var/folders/zo/zoQJvqc9E0076p0THiri+k+++TI/-Tmp-/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp On my Mac, this same directory was used for multiple runs, so somehow Jetty or the VM figures out how to reuse it. Since this example setup isn't a *real* installation -- it's just for demonstration, arguably it should not contain what it doesn't need. Likewise, perhaps the empty example/logs directory should be deleted. It's not used by default any way. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-3.x - Build # 416 - Failure
Build: https://builds.apache.org/job/Lucene-3.x/416/ No tests ran. Build Log (for compile errors): [...truncated 10795 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules
[ https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male reassigned LUCENE-2883: -- Assignee: Chris Male Consolidate Solr Lucene FunctionQuery into modules - Key: LUCENE-2883 URL: https://issues.apache.org/jira/browse/LUCENE-2883 Project: Lucene - Java Issue Type: Task Components: core/search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Chris Male Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2883.patch Spin-off from the [dev list | http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3232) Move MutableValues to Queries Module
Move MutableValues to Queries Module Key: LUCENE-3232 URL: https://issues.apache.org/jira/browse/LUCENE-3232 Project: Lucene - Java Issue Type: Sub-task Reporter: Chris Male Solr makes use of the MutableValue* series of classes to improve performance of grouping by FunctionQuery (I think). As such they are used in ValueSource implementations. Consequently we need to move these classes in order to move the ValueSources. I'll also use this issue to establish the Queries module where the FunctionQueries will lie. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules
[ https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053566#comment-13053566 ] Chris Male commented on LUCENE-2883: Rather than doing all the work in this issue, I'm going to spin off a few subtasks and resolve this one by one. Consolidate Solr Lucene FunctionQuery into modules - Key: LUCENE-2883 URL: https://issues.apache.org/jira/browse/LUCENE-2883 Project: Lucene - Java Issue Type: Task Components: core/search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Chris Male Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2883.patch Spin-off from the [dev list | http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3079) Facetiing module
[ https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053580#comment-13053580 ] Yonik Seeley commented on LUCENE-3079: -- bq. if Solr needed to be so deeply intertwined with caching, schema, etc., other apps that want to facet will have the same needs Sort of an aside, but not really specific applications are much easier. A lot more indirection is required in Solr and a schema is needed for pretty much everything. Without the schema, a client would specify sort=foo desc and Solr would have no idea how to do that. A specific application just does it because they have the knowledge of what all the fields are. It's why people have gotten along just fine without a schema in Lucene thus far. If you're building another Solr... yes, you need something like a schema. Facetiing module Key: LUCENE-3079 URL: https://issues.apache.org/jira/browse/LUCENE-3079 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-3079.patch Faceting is a hugely important feature, available in Solr today but not [easily] usable by Lucene-only apps. We should fix this, by creating a shared faceting module. Ideally, we factor out Solr's faceting impl, and maybe poach/merge from other impls (eg Bobo browse). Hoss describes some important challenges we'll face in doing this (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here: {noformat} To look at faceting as a concrete example, there are big the reasons faceting works so well in Solr: Solr has total control over the index, knows exactly when the index has changed to rebuild caches, has a strict schema so it can make sense of field types and pick faceting algos accordingly, has multi-phase distributed search approach to get exact counts efficiently across multiple shards, etc... (and there are still a lot of additional enhancements and improvements that can be made to take even more advantage of knowledge solr has because it owns the index that we no one has had time to tackle) {noformat} This is a great list of the things we face in refactoring. It's also important because, if Solr needed to be so deeply intertwined with caching, schema, etc., other apps that want to facet will have the same needs and so we really have to address them in creating the shared module. I think we should get a basic faceting module started, but should not cut Solr over at first. We should iterate on the module, fold in improvements, etc., and then, once we can fully verify that cutting over doesn't hurt Solr (ie lose functionality or performance) we can later cutover. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3079) Facetiing module
[ https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053581#comment-13053581 ] Jason Rutherglen commented on LUCENE-3079: -- Schemas should probably be a module that makes use of embedding the field types per-segment, this is something the faceting module could/should use. I think is what LUCENE-2308 is aiming for? Though I thought there was another Jira issue created by Simon for this as well. Facetiing module Key: LUCENE-3079 URL: https://issues.apache.org/jira/browse/LUCENE-3079 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-3079.patch Faceting is a hugely important feature, available in Solr today but not [easily] usable by Lucene-only apps. We should fix this, by creating a shared faceting module. Ideally, we factor out Solr's faceting impl, and maybe poach/merge from other impls (eg Bobo browse). Hoss describes some important challenges we'll face in doing this (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here: {noformat} To look at faceting as a concrete example, there are big the reasons faceting works so well in Solr: Solr has total control over the index, knows exactly when the index has changed to rebuild caches, has a strict schema so it can make sense of field types and pick faceting algos accordingly, has multi-phase distributed search approach to get exact counts efficiently across multiple shards, etc... (and there are still a lot of additional enhancements and improvements that can be made to take even more advantage of knowledge solr has because it owns the index that we no one has had time to tackle) {noformat} This is a great list of the things we face in refactoring. It's also important because, if Solr needed to be so deeply intertwined with caching, schema, etc., other apps that want to facet will have the same needs and so we really have to address them in creating the shared module. I think we should get a basic faceting module started, but should not cut Solr over at first. We should iterate on the module, fold in improvements, etc., and then, once we can fully verify that cutting over doesn't hurt Solr (ie lose functionality or performance) we can later cutover. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3079) Facetiing module
[ https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053587#comment-13053587 ] Chris Male commented on LUCENE-3079: I don't think any Facet module needs to be concerned with Schemas. Instead the module can expose an API which asks for the information it needs to make the best choices. Solr can then provide that information based on its Schema, pure Lucene users can do it however they want. Facetiing module Key: LUCENE-3079 URL: https://issues.apache.org/jira/browse/LUCENE-3079 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-3079.patch Faceting is a hugely important feature, available in Solr today but not [easily] usable by Lucene-only apps. We should fix this, by creating a shared faceting module. Ideally, we factor out Solr's faceting impl, and maybe poach/merge from other impls (eg Bobo browse). Hoss describes some important challenges we'll face in doing this (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here: {noformat} To look at faceting as a concrete example, there are big the reasons faceting works so well in Solr: Solr has total control over the index, knows exactly when the index has changed to rebuild caches, has a strict schema so it can make sense of field types and pick faceting algos accordingly, has multi-phase distributed search approach to get exact counts efficiently across multiple shards, etc... (and there are still a lot of additional enhancements and improvements that can be made to take even more advantage of knowledge solr has because it owns the index that we no one has had time to tackle) {noformat} This is a great list of the things we face in refactoring. It's also important because, if Solr needed to be so deeply intertwined with caching, schema, etc., other apps that want to facet will have the same needs and so we really have to address them in creating the shared module. I think we should get a basic faceting module started, but should not cut Solr over at first. We should iterate on the module, fold in improvements, etc., and then, once we can fully verify that cutting over doesn't hurt Solr (ie lose functionality or performance) we can later cutover. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3079) Facetiing module
[ https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053591#comment-13053591 ] Jason Rutherglen commented on LUCENE-3079: -- bq. I don't think any Facet module needs to be concerned with Schemas Right, they should be field type aware. Facetiing module Key: LUCENE-3079 URL: https://issues.apache.org/jira/browse/LUCENE-3079 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-3079.patch Faceting is a hugely important feature, available in Solr today but not [easily] usable by Lucene-only apps. We should fix this, by creating a shared faceting module. Ideally, we factor out Solr's faceting impl, and maybe poach/merge from other impls (eg Bobo browse). Hoss describes some important challenges we'll face in doing this (http://markmail.org/message/5w35c2fr4zkiwsz6), copied here: {noformat} To look at faceting as a concrete example, there are big the reasons faceting works so well in Solr: Solr has total control over the index, knows exactly when the index has changed to rebuild caches, has a strict schema so it can make sense of field types and pick faceting algos accordingly, has multi-phase distributed search approach to get exact counts efficiently across multiple shards, etc... (and there are still a lot of additional enhancements and improvements that can be made to take even more advantage of knowledge solr has because it owns the index that we no one has had time to tackle) {noformat} This is a great list of the things we face in refactoring. It's also important because, if Solr needed to be so deeply intertwined with caching, schema, etc., other apps that want to facet will have the same needs and so we really have to address them in creating the shared module. I think we should get a basic faceting module started, but should not cut Solr over at first. We should iterate on the module, fold in improvements, etc., and then, once we can fully verify that cutting over doesn't hurt Solr (ie lose functionality or performance) we can later cutover. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3232) Move MutableValues to Queries Module
[ https://issues.apache.org/jira/browse/LUCENE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053593#comment-13053593 ] Chris Male commented on LUCENE-3232: Code to execute before patch: {code} svn mkdir --parents modules/queries/src/java/org/apache/lucene/queries/function svn move solr/src/java/org/apache/solr/search/MutableValue.java modules/queries/src/java/org/apache/lucene/queries/function/MutableValue.java svn move solr/src/java/org/apache/solr/search/MutableValueFloat.java modules/queries/src/java/org/apache/lucene/queries/function/MutableValueFloat.java svn move solr/src/java/org/apache/solr/search/MutableValueBool.java modules/queries/src/java/org/apache/lucene/queries/function/MutableValueBool.java svn move solr/src/java/org/apache/solr/search/MutableValueDate.java modules/queries/src/java/org/apache/lucene/queries/function/MutableValueDate.java svn move solr/src/java/org/apache/solr/search/MutableValueDouble.java modules/queries/src/java/org/apache/lucene/queries/function/MutableValueDouble.java svn move solr/src/java/org/apache/solr/search/MutableValueInt.java modules/queries/src/java/org/apache/lucene/queries/function/MutableValueInt.java svn move solr/src/java/org/apache/solr/search/MutableValueLong.java modules/queries/src/java/org/apache/lucene/queries/function/MutableValueLong.java svn move solr/src/java/org/apache/solr/search/MutableValueStr.java modules/queries/src/java/org/apache/lucene/queries/function/MutableValueStr.java {code} Move MutableValues to Queries Module Key: LUCENE-3232 URL: https://issues.apache.org/jira/browse/LUCENE-3232 Project: Lucene - Java Issue Type: Sub-task Components: core/search Reporter: Chris Male Fix For: 4.0 Solr makes use of the MutableValue* series of classes to improve performance of grouping by FunctionQuery (I think). As such they are used in ValueSource implementations. Consequently we need to move these classes in order to move the ValueSources. I'll also use this issue to establish the Queries module where the FunctionQueries will lie. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3232) Move MutableValues to Queries Module
[ https://issues.apache.org/jira/browse/LUCENE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3232: --- Attachment: LUCENE-3232.patch Patch that establishes the Queries module and moves the MutableValue classes. Includes intellij, eclipse and maven work. Everything compiles and tests pass. It'd be great if someone could review. I'll commit in a few days. Move MutableValues to Queries Module Key: LUCENE-3232 URL: https://issues.apache.org/jira/browse/LUCENE-3232 Project: Lucene - Java Issue Type: Sub-task Components: core/search Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-3232.patch Solr makes use of the MutableValue* series of classes to improve performance of grouping by FunctionQuery (I think). As such they are used in ValueSource implementations. Consequently we need to move these classes in order to move the ValueSources. I'll also use this issue to establish the Queries module where the FunctionQueries will lie. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2615) Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE level
[ https://issues.apache.org/jira/browse/SOLR-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053597#comment-13053597 ] Yonik Seeley commented on SOLR-2615: bq. You may notice I use SLF4J's nifty log.debug(message blah {} blah, var) syntax, which is both performant and concise as there's no point in guarding the debug message with an isDebugEnabled() since debug() will internally check this any way and there is no string concatenation if debug isn't enabled. I think there is still a point to caching isDebugEnabled() though. The implementation most likely involves checking volatile variables, and can involve checking a hierarchy of loggers. I assume the cost may be different for different logging implementations too. Better to just cache if you can and not worry about it. Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE level --- Key: SOLR-2615 URL: https://issues.apache.org/jira/browse/SOLR-2615 Project: Solr Issue Type: Improvement Components: update Reporter: David Smiley Priority: Minor Fix For: 3.3 Attachments: SOLR-2615_LogUpdateProcessor_debug_logging.patch It would be great if the LogUpdateProcessor logged each command (add, delete, ...) at debug (Fine) level. Presently it only logs a summary of 8 commands and it does so at the very end. The attached patch implements this. * I moved the LogUpdateProcessor ahead of RunUpdateProcessor so that the debug level log happens before Solr does anything with it. It should not affect the ordering of the existing summary log which happens at finish(). * I changed UpdateRequestProcessor's static log variable to be an instance variable that uses the current class name. I think this makes much more sense since I want to be able to alter logging levels for a specific processor without doing it for all of them. This change did require me to tweak the factory's detection of the log level which avoids creating the LogUpdateProcessor. * There was an NPE bug in AddUpdateCommand.getPrintableId() in the event there is no schema unique field. I fixed that. You may notice I use SLF4J's nifty log.debug(message blah {} blah, var) syntax, which is both performant and concise as there's no point in guarding the debug message with an isDebugEnabled() since debug() will internally check this any way and there is no string concatenation if debug isn't enabled. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3232) Move MutableValues to Queries Module
[ https://issues.apache.org/jira/browse/LUCENE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053599#comment-13053599 ] Yonik Seeley commented on LUCENE-3232: -- These are useful beyond function queries... perhaps they should not be in the function module? Move MutableValues to Queries Module Key: LUCENE-3232 URL: https://issues.apache.org/jira/browse/LUCENE-3232 Project: Lucene - Java Issue Type: Sub-task Components: core/search Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-3232.patch Solr makes use of the MutableValue* series of classes to improve performance of grouping by FunctionQuery (I think). As such they are used in ValueSource implementations. Consequently we need to move these classes in order to move the ValueSources. I'll also use this issue to establish the Queries module where the FunctionQueries will lie. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3232) Move MutableValues to Queries Module
[ https://issues.apache.org/jira/browse/LUCENE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053600#comment-13053600 ] Chris Male commented on LUCENE-3232: I've debated this backwards and forwards. Do they have a use case out of function queries at the moment? If so then yeah I'll happily put them somewhere else. Otherwise I'll cross that bridge at the time. Move MutableValues to Queries Module Key: LUCENE-3232 URL: https://issues.apache.org/jira/browse/LUCENE-3232 Project: Lucene - Java Issue Type: Sub-task Components: core/search Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-3232.patch Solr makes use of the MutableValue* series of classes to improve performance of grouping by FunctionQuery (I think). As such they are used in ValueSource implementations. Consequently we need to move these classes in order to move the ValueSources. I'll also use this issue to establish the Queries module where the FunctionQueries will lie. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3232) Move MutableValues to Queries Module
[ https://issues.apache.org/jira/browse/LUCENE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053601#comment-13053601 ] Chris Male commented on LUCENE-3232: Actually scrap that question, I'll put them somewhere else immediately. Move MutableValues to Queries Module Key: LUCENE-3232 URL: https://issues.apache.org/jira/browse/LUCENE-3232 Project: Lucene - Java Issue Type: Sub-task Components: core/search Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-3232.patch Solr makes use of the MutableValue* series of classes to improve performance of grouping by FunctionQuery (I think). As such they are used in ValueSource implementations. Consequently we need to move these classes in order to move the ValueSources. I'll also use this issue to establish the Queries module where the FunctionQueries will lie. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3232) Move MutableValues to Common Module
[ https://issues.apache.org/jira/browse/LUCENE-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3232: --- Summary: Move MutableValues to Common Module (was: Move MutableValues to Queries Module) Move MutableValues to Common Module --- Key: LUCENE-3232 URL: https://issues.apache.org/jira/browse/LUCENE-3232 Project: Lucene - Java Issue Type: Sub-task Components: core/search Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-3232.patch Solr makes use of the MutableValue* series of classes to improve performance of grouping by FunctionQuery (I think). As such they are used in ValueSource implementations. Consequently we need to move these classes in order to move the ValueSources. I'll also use this issue to establish the Queries module where the FunctionQueries will lie. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org