RE: [Lucene.Net] Merging 3.0.3 into Trunk
Jira and then just submit your own patch imo Sent from my Windows Phone From: Stefan Bodewig Sent: 3/1/2012 7:23 AM To: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] Merging 3.0.3 into Trunk On 2012-02-29, Stefan Bodewig wrote: On 2012-02-28, Christopher Currens wrote: Alright, it's done! 3.0.3 is now merged in with Trunk! I'll see to running RAT and looking at the line-ends over the next few days so we can get them fixed once and not run into it with the release. I went for EOLs first and there are 621 files outside of lib and doc that need to be fixed. What I have now is not just a patch (of more than 200k lines), but also a list of 621 files that need their svn:eol-style property to be set. I can create a JIRA ticket for that attaching my patch and the list of files to fix or - since I technically am a committer - could just commit my cleaned up workspace as is (plus JIRA ticket that I'd open and close myself). What would you prefer? RAT doesn't really make sense before the line feeds are correct (I've seen quite a few files without license headers by manual inspection). Stefan
Re: [Lucene.Net] FW: trouble getting cms content to work correctly
Is it safe to add content to the site through the CMS again? Anything to be wary of or not explicitly do? On Wed, Feb 15, 2012 at 6:30 PM, Prescott Nasser geobmx...@hotmail.comwrote: Took all day, but Joe was there babysitting and correcting things for us. Basically there is a bug in svn 1.6.17 that the CMS is based on, which is making our commits a pain at the moment. Once that gets upgraded it should be relatively smooth sailing. It won't help us though if we want still planning on updating massive amounts of documentation on a regular basis. Thanks Joe, I can't thank you enough for the help today. ~PrescottDate: Wed, 15 Feb 2012 14:49:48 -0800 From: joe_schae...@yahoo.com Subject: Re: trouble getting cms content to work correctly To: geobmx...@hotmail.com After some testing it appears that this performancebug is fixed in svn 1.7, but the CMS is currentlyrunning 1.6.17. I hope to have the host upgradedwithin the next 30 days or so, but for now I stillrecommend using the script. From: Prescott Nasser geobmx...@hotmail.com To: joe_schae...@yahoo.com Sent: Wednesday, February 15, 2012 5:28 PM Subject: RE: trouble getting cms content to work correctly Alright - sounds good Thanks again! ~P Date: Wed, 15 Feb 2012 14:25:45 -0800 From: joe_schae...@yahoo.com Subject: Re: trouble getting cms content to work correctly To: geobmx...@hotmail.com I'm having some svn people look at the merge issues.Right now all I can suggest is that you publish usingthe publish.pl script on people.apache.org. It's takingme about 10 min total to carry that out, which is certainlytoo long given the nature of the changes it's merging,but I'll let you know what I find out. From: Prescott Nasser geobmx...@hotmail.com To: joe_schae...@yahoo.com Sent: Wednesday, February 15, 2012 5:13 PM Subject: RE: trouble getting cms content to work correctly It's butt ugly - all in one directory, 8206 files. I'd prefer a more natural docs structure, but that's how it gets generated ~P Date: Wed, 15 Feb 2012 14:10:47 -0800 From: joe_schae...@yahoo.com Subject: Re: trouble getting cms content to work correctly To: geobmx...@hotmail.com Ok lemee kill it and use the publish.pl scripton people to see if I can get it to work right.Just curious tho- about how many files do youhave within that docs dir- all in one dir I presume? From: Prescott Nasser geobmx...@hotmail.com To: joe_schae...@yahoo.com Sent: Wednesday, February 15, 2012 5:08 PM Subject: RE: trouble getting cms content to work correctly I'm thinking still merge funk Date: Wed, 15 Feb 2012 14:05:21 -0800 From: joe_schae...@yahoo.com Subject: Re: trouble getting cms content to work correctly To: geobmx...@hotmail.com Looks like it just completed. Hmm, goahead and publish and lets try this onemore time. From: Joe Schaefer joe_schae...@yahoo.com To: Prescott Nasser geobmx...@hotmail.com Sent: Wednesday, February 15, 2012 5:02 PM Subject: Re: trouble getting cms content to work correctly Yeah more merge funk. Leave it run for now,but don't take any further action until youhear from me. From: Prescott Nasser geobmx...@hotmail.com To: joe_schae...@yahoo.com Sent: Wednesday, February 15, 2012 4:59 PM Subject: RE: trouble getting cms content to work correctly I hate to be the bearer of bad news... still taking days to publish (I'm not sure if there is a merge error or not) let me know I'll kill this quick Date: Wed, 15 Feb 2012 13:54:52 -0800 From: joe_schae...@yahoo.com Subject: Re: trouble getting cms content to work correctly To: geobmx...@hotmail.com Yeah try out the webgui and edit/commit/publisha minor change. It should take you no more thana minute or so total. From: Prescott Nasser geobmx...@hotmail.com To: joe_schae...@yahoo.com Sent: Wednesday, February 15, 2012 4:52 PM Subject: RE: trouble getting cms content to work correctly Man that sounds like a tool full of awesome! Ok - so for the moment no new docs, a simple edit should be quick? Date: Wed, 15 Feb 2012 13:48:40 -0800 From: joe_schae...@yahoo.com Subject: Re: trouble getting cms content to work correctly To: geobmx...@hotmail.com Ok it's now fixed and your site should work as expectedat this point. I had to redact the lazy_publish featureand reserve it for admins only because you don't actuallyhave permission to completely remove your publication sitefrom svn and that's not something I can offer without providingyou and every other committer with the ability to nukeeach other's entire sites. From: Joe Schaefer joe_schae...@yahoo.com To: Prescott Nasser geobmx...@hotmail.com Sent: Wednesday, February 15, 2012 4:20 PM Subject: Re: trouble getting cms content to work correctly Yeah well it's probably timing out somewhere along the line.I'm looking into your svn tree now to see if I can figure outwhat's
RE: [Lucene.Net] FW: trouble getting cms content to work correctly
I'm told it should only take about 10 minutes to publish now (as opposed to ~2.5 hours before). No harm in trying ;) ~P Date: Thu, 1 Mar 2012 21:47:49 -0500 From: mhern...@wickedsoftware.net To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] FW: trouble getting cms content to work correctly Is it safe to add content to the site through the CMS again? Anything to be wary of or not explicitly do? On Wed, Feb 15, 2012 at 6:30 PM, Prescott Nasser geobmx...@hotmail.comwrote: Took all day, but Joe was there babysitting and correcting things for us. Basically there is a bug in svn 1.6.17 that the CMS is based on, which is making our commits a pain at the moment. Once that gets upgraded it should be relatively smooth sailing. It won't help us though if we want still planning on updating massive amounts of documentation on a regular basis. Thanks Joe, I can't thank you enough for the help today. ~PrescottDate: Wed, 15 Feb 2012 14:49:48 -0800 From: joe_schae...@yahoo.com Subject: Re: trouble getting cms content to work correctly To: geobmx...@hotmail.com After some testing it appears that this performancebug is fixed in svn 1.7, but the CMS is currentlyrunning 1.6.17. I hope to have the host upgradedwithin the next 30 days or so, but for now I stillrecommend using the script. From: Prescott Nasser geobmx...@hotmail.com To: joe_schae...@yahoo.com Sent: Wednesday, February 15, 2012 5:28 PM Subject: RE: trouble getting cms content to work correctly Alright - sounds good Thanks again! ~P Date: Wed, 15 Feb 2012 14:25:45 -0800 From: joe_schae...@yahoo.com Subject: Re: trouble getting cms content to work correctly To: geobmx...@hotmail.com I'm having some svn people look at the merge issues.Right now all I can suggest is that you publish usingthe publish.pl script on people.apache.org. It's takingme about 10 min total to carry that out, which is certainlytoo long given the nature of the changes it's merging,but I'll let you know what I find out. From: Prescott Nasser geobmx...@hotmail.com To: joe_schae...@yahoo.com Sent: Wednesday, February 15, 2012 5:13 PM Subject: RE: trouble getting cms content to work correctly It's butt ugly - all in one directory, 8206 files. I'd prefer a more natural docs structure, but that's how it gets generated ~P Date: Wed, 15 Feb 2012 14:10:47 -0800 From: joe_schae...@yahoo.com Subject: Re: trouble getting cms content to work correctly To: geobmx...@hotmail.com Ok lemee kill it and use the publish.pl scripton people to see if I can get it to work right.Just curious tho- about how many files do youhave within that docs dir- all in one dir I presume? From: Prescott Nasser geobmx...@hotmail.com To: joe_schae...@yahoo.com Sent: Wednesday, February 15, 2012 5:08 PM Subject: RE: trouble getting cms content to work correctly I'm thinking still merge funk Date: Wed, 15 Feb 2012 14:05:21 -0800 From: joe_schae...@yahoo.com Subject: Re: trouble getting cms content to work correctly To: geobmx...@hotmail.com Looks like it just completed. Hmm, goahead and publish and lets try this onemore time. From: Joe Schaefer joe_schae...@yahoo.com To: Prescott Nasser geobmx...@hotmail.com Sent: Wednesday, February 15, 2012 5:02 PM Subject: Re: trouble getting cms content to work correctly Yeah more merge funk. Leave it run for now,but don't take any further action until youhear from me. From: Prescott Nasser geobmx...@hotmail.com To: joe_schae...@yahoo.com Sent: Wednesday, February 15, 2012 4:59 PM Subject: RE: trouble getting cms content to work correctly I hate to be the bearer of bad news... still taking days to publish (I'm not sure if there is a merge error or not) let me know I'll kill this quick Date: Wed, 15 Feb 2012 13:54:52 -0800 From: joe_schae...@yahoo.com Subject: Re: trouble getting cms content to work correctly To: geobmx...@hotmail.com Yeah try out the webgui and edit/commit/publisha minor change. It should take you no more thana minute or so total. From: Prescott Nasser geobmx...@hotmail.com To: joe_schae...@yahoo.com Sent: Wednesday, February 15, 2012 4:52 PM Subject: RE: trouble getting cms content to work correctly Man that sounds like a tool full of awesome! Ok - so for the moment no new docs, a simple edit should be quick? Date: Wed, 15 Feb 2012 13:48:40 -0800 From: joe_schae...@yahoo.com Subject: Re: trouble getting cms content to work correctly To: geobmx...@hotmail.com Ok it's now fixed and your site should work as expectedat this point. I had to redact the lazy_publish featureand reserve it for admins only because you don't actuallyhave permission to completely remove your publication sitefrom svn and that's
[Lucene.Net] [jira] [Resolved] (LUCENENET-473) Fix linefeeds in more than 600 files
[ https://issues.apache.org/jira/browse/LUCENENET-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Bodewig resolved LUCENENET-473. -- Resolution: Fixed fixed with svn revision 1296052 Fix linefeeds in more than 600 files Key: LUCENENET-473 URL: https://issues.apache.org/jira/browse/LUCENENET-473 Project: Lucene.Net Issue Type: Bug Affects Versions: Lucene.Net 3.0.3 Reporter: Stefan Bodewig Assignee: Stefan Bodewig Fix For: Lucene.Net 3.0.3 There are more than 600 files which need the svn:eol-style property set to native and a few that should rather be LF or CRLF. Many files contain inconsistent line-ends. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: [Lucene.Net] Merging 3.0.3 into Trunk
Thanks Stefan! From: bode...@apache.org To: lucene-net-...@incubator.apache.org Date: Fri, 2 Mar 2012 06:23:01 +0100 Subject: Re: [Lucene.Net] Merging 3.0.3 into Trunk On 2012-03-01, Christopher Currens wrote: I agree with Prescott. Make a patch for that sucker! :) Done Stefan
RE: [Lucene.Net] Official stance on API changes between major versions
Andy, I appreciate your coppers. Everyone is really quiet, It seems you want us to move forward, I want us to move forward, Chris is actually holding back work becuase of this - lets go for the breaking changes. ~P Date: Thu, 1 Mar 2012 23:11:47 + From: andy.p...@gmail.com To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Official stance on API changes between major versions I am not a commiter but my company makes extensive use of Lucene.net. So here's my two pennies... I understand that there is a commendable motivation to be gentle with api changes. Wanting to give plenty of warning by obsoleting methods. Several points. First is that there is a change to the major version number. Users should expect changes to the api. Next, when this project was restarted last year the stated direction was to get caught up with the Java version and also to move towards a more dotnet style interface. The discussions on the list do occasionally get bogged down in this kind of too and fro. A coach my sports team used once said something along these lines... If the team can't choose then no one has made a convincing argument. So make a choice, any choice and just get on with it. If it turns out to be the wrong choice then at least you've learnt something. This is software. It's changeable. My bias is that I want what's in V4 (codecs, NRT etc). I'm willing to take some pain if it means this project can accelerate. I would imagine that most serious uses of Lucene would be hidden within a service or at least isolated in some way, not dotted around all over the application. This is what isolation is for, to protect components from change. The impact of even fairly major api changes should be quite localised and refactorable. Intimidating, yes. More than a bit scary, of course. But worth it for getting the newer bits. By all means be professional, make proposals, have some discussion. But please let's not be too conservative, too timid. 2.9.4g is a good release. We've been using it since shortly after it seemed stable. If there are users that need some stability then they should be advised to stick with g for a while. Now that that is done and a hearty thank you for the work on both the code and the Apache process. My vote would be for some more radical changes to be allowed. Lets get through 3.0.3 and on to 3.5 and 4.0. Lets get to one of the original goals which is functional parity with Java and lets be bold with some of the dotnet modifications (note that being bold does not mean that one is reckless). I'm sure that some will say, yeah great sentiment, now send some patches. I agree. I have sent some very minor patches previously and it frustrates me that my company has not contributed more. We have just taken on a lot more people so I hope that we can be more active with Lucene.net soon. --Andy On 28 February 2012 18:17, Christopher Currens currens.ch...@gmail.comwrote: I *really* don't mean to be a bother to anyone, but I'd like to continue work on this. I feel that until I can get a better sense of how the group feels about this, I can't make much progress. Perhaps this radio silence is just because this email thread got lost in among the others. On Fri, Feb 24, 2012 at 6:50 PM, Prescott Nasser geobmx...@hotmail.com wrote: Im not against breaking compatibility when changing the version number to a new major 2 - 3. Im not sure how others feel. Matching Java access modifiers seems like the right move. That said, what if we mark obsolete in 3.0.3 and when we make the jump to 4.0 wipe them out? In my head we shouldn't spend too much time cleaning up 3.0.3 aside from bug fixes if were just going to swap it for 4.0 in the near future. There has to be a break at some point, making it with a major release is the best place to make it. Sent from my Windows Phone From: Christopher Currens Sent: 2/24/2012 2:45 PM To: lucene-net-dev@lucene.apache.org Subject: [Lucene.Net] Official stance on API changes between major versions A bit of background about what I've been doing lately on the project. Because we've now confirmed that the .NET 3.0.3 branch is a completed port of Java 3.0.3 version, I've been spending time trying to work on some of the bugs and improvements that are assigned to this version. There wasn't any real discussion about the actual features, I just created some (based on mailing list discussions) and assigned them to the 3.0.3 release. The improvements I've been working on lately are ones that have bugged me specifically since I've started using Lucene.NET. I've worked on https://issues.apache.org/jira/browse/LUCENENET-468 and https://issues.apache.org/jira/browse/LUCENENET-470 so far. LUCENENET-740 is pretty much completed, all of the classes that implemented
[jira] [Updated] (SOLR-3162) Continue work on new admin UI
[ https://issues.apache.org/jira/browse/SOLR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3162: Attachment: SOLR-3162.patch Updated Patch, contains: * edismax Options on Query-Tab * Check if System-Information on Dashboard is available * Fixed Param-Handling on Dataimport * Autoload™ Functionality on Schema-Browser * Dummy Debug-Option on Cloud-Tab Continue work on new admin UI - Key: SOLR-3162 URL: https://issues.apache.org/jira/browse/SOLR-3162 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-3162-index.png, SOLR-3162-schema-browser.png, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch There have been more improvements to how the new UI works, but the current open bugs are getting hard to keep straight. This is the new catch-all JIRA for continued improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219906#comment-13219906 ] Tommaso Teofili commented on SOLR-3013: --- thanks Steven, now fixing Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2608) TestReplicationHandler is flakey
[ https://issues.apache.org/jira/browse/SOLR-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219909#comment-13219909 ] Sami Siren commented on SOLR-2608: -- I am also seeing this test fail quite often. The stacktrace is now different: {code} 23987 T1101 oasc.SolrException.log SEVERE SnapPull failed :org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1388) at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:505) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:348) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:298) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:163) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92) at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:603) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1356) ... 13 more 25051 T3 oasu.ConcurrentLRUCache.finalize SEVERE ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! 40748 T1120 oasc.SolrException.log SEVERE SnapPull failed :org.apache.solr.common.SolrException: Unable to download _7_1.del completely. Downloaded 0!=92 at org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1081) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:961) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:587) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:322) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:298) at org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:179) {code} TestReplicationHandler is flakey Key: SOLR-2608 URL: https://issues.apache.org/jira/browse/SOLR-2608 Project: Solr Issue Type: Bug Reporter: selckin I've been running some while(1) tests on trunk, and TestReplicationHandler is very flakey it fails about every 10th run. Probably not a bug, but the test not waiting correctly {code} [junit] Testsuite: org.apache.solr.handler.TestReplicationHandler [junit] Testcase: org.apache.solr.handler.TestReplicationHandler: FAILED [junit] ERROR: SolrIndexSearcher opens=48 closes=47 [junit] junit.framework.AssertionFailedError: ERROR: SolrIndexSearcher opens=48 closes=47 [junit] at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:131) [junit] at org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:74) [junit] [junit] [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 40.772 sec [junit] [junit] - Standard Error - [junit] 19-Jun-2011 21:26:44 org.apache.solr.handler.SnapPuller fetchLatestIndex [junit] SEVERE: Master at: http://localhost:51817/solr/replication is not available. Index fetch failed. Exception: Connection refused [junit] 19-Jun-2011 21:26:49 org.apache.solr.common.SolrException log [junit] SEVERE: java.util.concurrent.RejectedExecutionException [junit] at
[jira] [Commented] (SOLR-3162) Continue work on new admin UI
[ https://issues.apache.org/jira/browse/SOLR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219911#comment-13219911 ] Stefan Matheis (steffkes) commented on SOLR-3162: - Sami, yepp also noticed that .. already fixed after taking the screenshot :) Erick, Are the double Quotes still there? The Patch should remove all {{replaceAll}} usages, so the raw content should be visible right know. I'm not completely sure which sources are used for the cloud-tab and the {{/admin/file}} Handler, so it maybe give you different output ;o Cloud-Tree Expander still not working? Even w/ the latest Patch? Just to be sure, cleared the Browser-Cache? Continue work on new admin UI - Key: SOLR-3162 URL: https://issues.apache.org/jira/browse/SOLR-3162 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-3162-index.png, SOLR-3162-schema-browser.png, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch There have been more improvements to how the new UI works, but the current open bugs are getting hard to keep straight. This is the new catch-all JIRA for continued improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-3162) Continue work on new admin UI
[ https://issues.apache.org/jira/browse/SOLR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219911#comment-13219911 ] Stefan Matheis (steffkes) edited comment on SOLR-3162 at 3/1/12 8:34 AM: - Sami, yepp also noticed that .. already fixed after taking the screenshot :) Erick, Are the double Quotes still there? The Patch should remove all {{replaceAll}} usages, so the raw content should be visible right know. I'm not completely sure which sources are used for the cloud-tab and the {{/admin/file}} Handler, so it maybe give you different output ;o Cloud-Tree Expanding still not working? Even w/ the latest Patch? Just to be sure, cleared the Browser-Cache? was (Author: steffkes): Sami, yepp also noticed that .. already fixed after taking the screenshot :) Erick, Are the double Quotes still there? The Patch should remove all {{replaceAll}} usages, so the raw content should be visible right know. I'm not completely sure which sources are used for the cloud-tab and the {{/admin/file}} Handler, so it maybe give you different output ;o Cloud-Tree Expander still not working? Even w/ the latest Patch? Just to be sure, cleared the Browser-Cache? Continue work on new admin UI - Key: SOLR-3162 URL: https://issues.apache.org/jira/browse/SOLR-3162 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-3162-index.png, SOLR-3162-schema-browser.png, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch There have been more improvements to how the new UI works, but the current open bugs are getting hard to keep straight. This is the new catch-all JIRA for continued improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Stefan Matheis
Congrats, Welcome. On Thu, Mar 1, 2012 at 10:04 AM, Ryan McKinley ryan...@gmail.com wrote: I'm pleased to announce that Stefan Matheis has joined our ranks as a committer. He has given the solr admin UI some much needed love. It now looks like it belongs in 2012! Stefan, it is tradition that you introduce yourself with a brief bio. Your SVN access should be ready to go. Welcome! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Software Developer | DutchWorks | www.dutchworks.nl
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1845 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1845/ 1 tests failed. REGRESSION: org.apache.solr.cloud.RecoveryZkTest.testDistribSearch Error Message: java.lang.AssertionError: Some threads threw uncaught exceptions! Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase.tearDownInternal(LuceneTestCase.java:788) at org.apache.lucene.util.LuceneTestCase.access$1100(LuceneTestCase.java:138) at org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:612) at org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:511) at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:573) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) at org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:816) at org.apache.lucene.util.LuceneTestCase.tearDownInternal(LuceneTestCase.java:760) Build Log (for compile errors): [...truncated 9846 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219939#comment-13219939 ] Olivier Dutrieux commented on SOLR-2438: I try it yesterday the 3.6-SNAPSHOT and I remark something strange : ||raw query||parsed query||comment|| |name:LéCTROD\*|name:lectrod\*|fill good| |name:\*LéCTROD|name:lectrod|{color:red} that remove the wildcard !!!{color} | |name:\*LéCTROD\*|name:lectrod|{color:red} that remove all wildcards !!!{color} | I would like to know if it's normal that if the wildcard is on the first position on the raw query, the wildcard is remove on the parsed query ? {code:title=schema.xml|borderStyle=solid} types fieldtype name=text_fr class=solr.TextField analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=multiterm tokenizer class=solr.StandardTokenizerFactory / filter class=solr.StandardFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype /types fields field name=name type=text_fr indexed=true stored=true multiValued=true/ /fields {code} Duto Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Assignee: Erick Erickson Fix For: 3.6, 4.0 Attachments: SOLR-2438-3x.patch, SOLR-2438-3x.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438_3x.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3011) DIH MultiThreaded bug
[ https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219940#comment-13219940 ] Wenca Petr commented on SOLR-3011: -- Hi, I've just applied this patch and it solved my problem with multithreaded indexing from sql using berkeley backed cache, which was opened x times (for each thread) bud closed only by one thread, so it remained opened. After the path, the cache is opened only once and properly closed but each thread seems to index all documents. If I have 5000 documents and 4 threads then full import say: Added/Updated: 2 documents. DIH MultiThreaded bug - Key: SOLR-3011 URL: https://issues.apache.org/jira/browse/SOLR-3011 Project: Solr Issue Type: Sub-task Components: contrib - DataImportHandler Affects Versions: 3.5, 4.0 Reporter: Mikhail Khludnev Priority: Minor Fix For: 4.0 Attachments: SOLR-3011.patch, SOLR-3011.patch current DIH design is not thread safe. see last comments at SOLR-2382 and SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly it's a SOLR-2947 patch from 28th Dec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
ConjunctionScorer.doNext() overstays?
Due to the odd behaviour of a custom Scorer of mine I discovered ConjunctionScorer.doNext() could loop indefinitely. It does not bail out as soon as any scorer.advance() call it makes reports back NO_MORE_DOCS. Is there not a performance optimisation to be gained in exiting as soon as this happens? At this stage I cannot see any point in continuing to advance other scorers - a quick look at TermScorer suggests that any questionable calls made by ConjunctionScorer to advance to NO_MORE_DOCS receives no special treatment and disk will be hit as a consequence. I added an extra condition to the while loop on the 3.5 source: while ((doc != NO_MORE_DOCS) ((firstScorer = scorers[first]).docID() doc)) { and Junit tests passed.I haven't been able to benchmark performance improvements but it looks like it would be sensible to make the change anyway. Cheers, Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12569 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12569/ All tests passed Build Log (for compile errors): [...truncated 14641 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-trunk - Build # 1779 - Failure
Build: https://builds.apache.org/job/Solr-trunk/1779/ 1 tests failed. REGRESSION: org.apache.solr.TestDistributedSearch.testDistribSearch Error Message: java.lang.AssertionError: Some threads threw uncaught exceptions! Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase.tearDownInternal(LuceneTestCase.java:788) at org.apache.lucene.util.LuceneTestCase.access$1100(LuceneTestCase.java:138) at org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:612) at org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:511) at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:573) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) at org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:816) at org.apache.lucene.util.LuceneTestCase.tearDownInternal(LuceneTestCase.java:760) Build Log (for compile errors): [...truncated 10431 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ThreadPool threads leaking to suite scope.
1) initialize threads eagerly; use ThreadPoolExecutor and call prestartAllCoreThreads. this could be applied to LTC on the trunk. I did this but threads still leak out from unclosed readers created by LTC#newSearcher. I don't know why, but this isn't called -- r.addReaderClosedListener(new ReaderClosedListener() { @Override public void onClose(IndexReader reader) { shutdownExecutorService(ex); } }); Clues? Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12570 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12570/ All tests passed Build Log (for compile errors): [...truncated 14657 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: ThreadPool threads leaking to suite scope.
I think the problem in newSearcher ist hat sometimes the reader is wrapped. If its wrapped, the underlying reader is only closed, not the wrapper. But the listener is added to the wrapper. We should add the listener to the original inner reader. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Dawid Weiss [mailto:dawid.we...@gmail.com] Sent: Thursday, March 01, 2012 11:51 AM To: dev@lucene.apache.org Subject: Re: ThreadPool threads leaking to suite scope. 1) initialize threads eagerly; use ThreadPoolExecutor and call prestartAllCoreThreads. this could be applied to LTC on the trunk. I did this but threads still leak out from unclosed readers created by LTC#newSearcher. I don't know why, but this isn't called -- r.addReaderClosedListener(new ReaderClosedListener() { @Override public void onClose(IndexReader reader) { shutdownExecutorService(ex); } }); Clues? Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3177) Excluding tagged filter in StatsComponent
[ https://issues.apache.org/jira/browse/SOLR-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219955#comment-13219955 ] CP commented on SOLR-3177: -- This feature is also necessary while using multi-select range facets with facet.range to get min and max of a field to set facet.range.start and facet.range.end. Excluding tagged filter in StatsComponent - Key: SOLR-3177 URL: https://issues.apache.org/jira/browse/SOLR-3177 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5, 3.6 Reporter: Mark Schoy Priority: Minor Labels: localparams, stats, statscomponent It would be useful to exclude the effects of some fq params from the set of documents used to compute stats -- similar to how you can exclude tagged filters when generating facet counts... https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters So that it's possible to do something like this... http://localhost:8983/solr/select?fq={!tag=priceFilter}price:[1 TO 20]q=*:*stats=truestats.field={!ex=priceFilter}price If you want to create a price slider this is very useful because then you can filter the price ([1 TO 20) and nevertheless get the lower and upper bound of the unfiltered price (min=0, max=100): {noformat} |-[---]--| $0 $1 $20$100 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2703) Add support for the Lucene Surround Parser
[ https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219958#comment-13219958 ] Shalu Singh commented on SOLR-2703: --- Hi Ahmet, i am trying to include the SOLR-2703.patch into SOLR 3.5 downloaded from SVN branches to provide Surround parser. But it is not working after including the 2703 SOLR patch. Do u know how to apply the same?? Add support for the Lucene Surround Parser -- Key: SOLR-2703 URL: https://issues.apache.org/jira/browse/SOLR-2703 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.0 Reporter: Simon Rosenthal Assignee: Erik Hatcher Priority: Minor Fix For: 4.0 Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch The Lucene/contrib surround parser provides support for span queries. This issue adds a Solr plugin for this parser -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ThreadPool threads leaking to suite scope.
I don't know how to fix it, Uwe, but I know there's definitely something not all right with it because threads just keep accumulating (as new searchers are created). I've pushed a static seed for which this is repeatable; this is a heavily worked-on branch but it may lead you to how to fix this: git clone git://github.com/dweiss/lucene_solr.git git checkout 935e1e9e9a350d6b35b23c4545caf78e82b42747 try to run TestPhraseQuery (you'll need -ea in Eclipse). Dawid On Thu, Mar 1, 2012 at 11:55 AM, Uwe Schindler u...@thetaphi.de wrote: I think the problem in newSearcher ist hat sometimes the reader is wrapped. If its wrapped, the underlying reader is only closed, not the wrapper. But the listener is added to the wrapper. We should add the listener to the original inner reader. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Dawid Weiss [mailto:dawid.we...@gmail.com] Sent: Thursday, March 01, 2012 11:51 AM To: dev@lucene.apache.org Subject: Re: ThreadPool threads leaking to suite scope. 1) initialize threads eagerly; use ThreadPoolExecutor and call prestartAllCoreThreads. this could be applied to LTC on the trunk. I did this but threads still leak out from unclosed readers created by LTC#newSearcher. I don't know why, but this isn't called -- r.addReaderClosedListener(new ReaderClosedListener() { @Override public void onClose(IndexReader reader) { shutdownExecutorService(ex); } }); Clues? Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3011) DIH MultiThreaded bug
[ https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219961#comment-13219961 ] Mikhail Khludnev commented on SOLR-3011: Petr, Your feedback is quite appreciated. How much your full indexing time is reduced after multythreading is enabled? Pls be informed that you are under risk of SOLR-2804. DIH MultiThreaded bug - Key: SOLR-3011 URL: https://issues.apache.org/jira/browse/SOLR-3011 Project: Solr Issue Type: Sub-task Components: contrib - DataImportHandler Affects Versions: 3.5, 4.0 Reporter: Mikhail Khludnev Priority: Minor Fix For: 4.0 Attachments: SOLR-3011.patch, SOLR-3011.patch current DIH design is not thread safe. see last comments at SOLR-2382 and SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly it's a SOLR-2947 patch from 28th Dec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219962#comment-13219962 ] Tommaso Teofili commented on SOLR-3013: --- it should be ok now. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: ThreadPool threads leaking to suite scope.
Hi, Yeah it's strange. I checked the code: It either does wrap and runs single-thread searches or it does *not* wrap and runs several threads. So theoretically it should work correctly... We have to check, if all IndexReaders are correctly closed and the listeners are called. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: dawid.we...@gmail.com [mailto:dawid.we...@gmail.com] On Behalf Of Dawid Weiss Sent: Thursday, March 01, 2012 12:02 PM To: dev@lucene.apache.org Subject: Re: ThreadPool threads leaking to suite scope. I don't know how to fix it, Uwe, but I know there's definitely something not all right with it because threads just keep accumulating (as new searchers are created). I've pushed a static seed for which this is repeatable; this is a heavily worked-on branch but it may lead you to how to fix this: git clone git://github.com/dweiss/lucene_solr.git git checkout 935e1e9e9a350d6b35b23c4545caf78e82b42747 try to run TestPhraseQuery (you'll need -ea in Eclipse). Dawid On Thu, Mar 1, 2012 at 11:55 AM, Uwe Schindler u...@thetaphi.de wrote: I think the problem in newSearcher ist hat sometimes the reader is wrapped. If its wrapped, the underlying reader is only closed, not the wrapper. But the listener is added to the wrapper. We should add the listener to the original inner reader. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Dawid Weiss [mailto:dawid.we...@gmail.com] Sent: Thursday, March 01, 2012 11:51 AM To: dev@lucene.apache.org Subject: Re: ThreadPool threads leaking to suite scope. 1) initialize threads eagerly; use ThreadPoolExecutor and call prestartAllCoreThreads. this could be applied to LTC on the trunk. I did this but threads still leak out from unclosed readers created by LTC#newSearcher. I don't know why, but this isn't called -- r.addReaderClosedListener(new ReaderClosedListener() { @Override public void onClose(IndexReader reader) { shutdownExecutorService(ex); } }); Clues? Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 12568 - Still Failing
Hi Tommaso: Can you check the javadocs-warnings? We have now 15 of them and this fails the Jenkins builds...: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12568/console Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Apache Jenkins Server [mailto:jenk...@builds.apache.org] Sent: Thursday, March 01, 2012 8:59 AM To: dev@lucene.apache.org Subject: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 12568 - Still Failing Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12568/ All tests passed Build Log (for compile errors): [...truncated 14777 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.
[ https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-3821: --- Assignee: Doron Cohen SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds. --- Key: LUCENE-3821 URL: https://issues.apache.org/jira/browse/LUCENE-3821 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.5, 4.0 Reporter: Naomi Dushay Assignee: Doron Cohen Attachments: LUCENE-3821_test.patch, schema.xml, solrconfig-test.xml The general bug is a case where a phrase with no slop is found, but if you add slop its not. I committed a test today (TestSloppyPhraseQuery2) that actually triggers this case, jenkins just hasn't had enough time to chew on it. ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make it fail on trunk or 3.x -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 12568 - Still Failing
Hi Uwe, I just checked in the fix for that, should be ok now. Tommaso 2012/3/1 Uwe Schindler u...@thetaphi.de Hi Tommaso: Can you check the javadocs-warnings? We have now 15 of them and this fails the Jenkins builds...: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12568/console Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Apache Jenkins Server [mailto:jenk...@builds.apache.org] Sent: Thursday, March 01, 2012 8:59 AM To: dev@lucene.apache.org Subject: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 12568 - Still Failing Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12568/ All tests passed Build Log (for compile errors): [...truncated 14777 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState
Most Codec.*Format().*Reader() methods should use SegmentReadState -- Key: LUCENE-3836 URL: https://issues.apache.org/jira/browse/LUCENE-3836 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 4.0 Codec formats API for opening readers is inconsistent - sometimes it uses SegmentReadState, in other cases it uses individual arguments that are already available via SegmentReadState. This complicates extending the API, e.g. if additional per-segment state would need to be passed to the readers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1846 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1846/ All tests passed Build Log (for compile errors): [...truncated 14441 lines...] check-misc-uptodate: jar-misc: check-spatial-uptodate: jar-spatial: check-grouping-uptodate: jar-grouping: check-queries-uptodate: jar-queries: check-queryparser-uptodate: jar-queryparser: prep-lucene-jars: common.init: compile-lucene-core: jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [javac] Compiling 1 source file to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/lucene/build/core/classes/java [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] 1 warning compile-core: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/build/contrib/solr-uima/classes/java [javac] Compiling 8 source files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/build/contrib/solr-uima/classes/java [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21: error: package org.apache.lucene.analysis.uima does not exist [javac] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21: error: package org.apache.lucene.analysis.uima does not exist [javac] import org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26: error: package org.apache.lucene.analysis.uima.ae does not exist [javac] import org.apache.lucene.analysis.uima.ae.AEProvider; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27: error: package org.apache.lucene.analysis.uima.ae does not exist [javac] import org.apache.lucene.analysis.uima.ae.AEProviderFactory; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51: error: cannot find symbol [javac] private AEProvider aeProvider; [javac] ^ [javac] symbol: class AEProvider [javac] location: class UIMAUpdateRequestProcessor [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:44: error: cannot find symbol [javac] return new UIMAAnnotationsTokenizer(descriptorPath, tokenType, input); [javac]^ [javac] symbol: class UIMAAnnotationsTokenizer [javac] location: class UIMAAnnotationsTokenizerFactory [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:46: error: cannot find symbol [javac] return new UIMATypeAwareAnnotationsTokenizer(descriptorPath, tokenType, featurePath, input); [javac]^ [javac] symbol: class UIMATypeAwareAnnotationsTokenizer [javac] location: class UIMATypeAwareAnnotationsTokenizerFactory [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:64: error: cannot find symbol [javac] aeProvider = AEProviderFactory.getInstance().getAEProvider(solrCore.getName(), [javac] ^ [javac] symbol: variable AEProviderFactory [javac] location: class UIMAUpdateRequestProcessor [javac] 8 errors [...truncated 15 lines...] - To
Re: ConjunctionScorer.doNext() overstays?
I got round to some benchmarking of this change on Wikipedia content which shows a small improvement: http://goo.gl/60wJG Aside from the small performance gain to be had, it just feels more logical if ConjunctionScorer does not issue sub scorers with a request to advance to NO_MORE_DOCS. - Original Message - From: mark harwood markharw...@yahoo.co.uk To: dev@lucene.apache.org dev@lucene.apache.org Cc: Sent: Thursday, 1 March 2012, 9:39 Subject: ConjunctionScorer.doNext() overstays? Due to the odd behaviour of a custom Scorer of mine I discovered ConjunctionScorer.doNext() could loop indefinitely. It does not bail out as soon as any scorer.advance() call it makes reports back NO_MORE_DOCS. Is there not a performance optimisation to be gained in exiting as soon as this happens? At this stage I cannot see any point in continuing to advance other scorers - a quick look at TermScorer suggests that any questionable calls made by ConjunctionScorer to advance to NO_MORE_DOCS receives no special treatment and disk will be hit as a consequence. I added an extra condition to the while loop on the 3.5 source: while ((doc != NO_MORE_DOCS) ((firstScorer = scorers[first]).docID() doc)) { and Junit tests passed.I haven't been able to benchmark performance improvements but it looks like it would be sensible to make the change anyway. Cheers, Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState
[ https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-3836: -- Attachment: LUCENE-3836.patch Patch that implements the change. If there are no objections I'd like to commit this soon. Most Codec.*Format().*Reader() methods should use SegmentReadState -- Key: LUCENE-3836 URL: https://issues.apache.org/jira/browse/LUCENE-3836 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 4.0 Attachments: LUCENE-3836.patch Codec formats API for opening readers is inconsistent - sometimes it uses SegmentReadState, in other cases it uses individual arguments that are already available via SegmentReadState. This complicates extending the API, e.g. if additional per-segment state would need to be passed to the readers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12571 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12571/ All tests passed Build Log (for compile errors): [...truncated 12267 lines...] check-memory-uptodate: jar-memory: check-misc-uptodate: jar-misc: check-spatial-uptodate: jar-spatial: check-grouping-uptodate: jar-grouping: check-queries-uptodate: jar-queries: check-queryparser-uptodate: jar-queryparser: prep-lucene-jars: common.init: compile-lucene-core: jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [javac] Compiling 1 source file to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/core/classes/java compile-core: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/contrib/solr-uima/classes/java [javac] Compiling 8 source files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/contrib/solr-uima/classes/java [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21: package org.apache.lucene.analysis.uima does not exist [javac] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21: package org.apache.lucene.analysis.uima does not exist [javac] import org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26: package org.apache.lucene.analysis.uima.ae does not exist [javac] import org.apache.lucene.analysis.uima.ae.AEProvider; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27: package org.apache.lucene.analysis.uima.ae does not exist [javac] import org.apache.lucene.analysis.uima.ae.AEProviderFactory; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51: cannot find symbol [javac] symbol : class AEProvider [javac] location: class org.apache.solr.uima.processor.UIMAUpdateRequestProcessor [javac] private AEProvider aeProvider; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:44: cannot find symbol [javac] symbol : class UIMAAnnotationsTokenizer [javac] location: class org.apache.solr.uima.analysis.UIMAAnnotationsTokenizerFactory [javac] return new UIMAAnnotationsTokenizer(descriptorPath, tokenType, input); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:46: cannot find symbol [javac] symbol : class UIMATypeAwareAnnotationsTokenizer [javac] location: class org.apache.solr.uima.analysis.UIMATypeAwareAnnotationsTokenizerFactory [javac] return new UIMATypeAwareAnnotationsTokenizer(descriptorPath, tokenType, featurePath, input); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:64: cannot find symbol [javac] symbol : variable AEProviderFactory [javac] location: class org.apache.solr.uima.processor.UIMAUpdateRequestProcessor [javac] aeProvider = AEProviderFactory.getInstance().getAEProvider(solrCore.getName(), [javac] ^ [javac] 8 errors [...truncated 14 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3767) Explore streaming Viterbi search in Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen reassigned LUCENE-3767: -- Assignee: Christian Moen (was: Michael McCandless) Explore streaming Viterbi search in Kuromoji Key: LUCENE-3767 URL: https://issues.apache.org/jira/browse/LUCENE-3767 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Michael McCandless Assignee: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3767.patch, LUCENE-3767.patch, LUCENE-3767.patch, LUCENE-3767.patch, LUCENE-3767.patch, SolrXml-5498.xml, compound_diffs.txt I've been playing with the idea of changing the Kuromoji viterbi search to be 2 passes (intersect, backtrace) instead of 4 passes (break into sentences, intersect, score, backtrace)... this is very much a work in progress, so I'm just getting my current state up. It's got tons of nocommits, doesn't properly handle the user dict nor extended modes yet, etc. One thing I'm playing with is to add a double backtrace for the long compound tokens, ie, instead of penalizing these tokens so that shorter tokens are picked, leave the scores unchanged but on backtrace take that penalty and use it as a threshold for a 2nd best segmentation... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState
[ https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220003#comment-13220003 ] Robert Muir commented on LUCENE-3836: - I think this change is OK: I just want to mention that avoiding SegmentReadState was definitely intentional... well most of my issues are really based on SegmentWriteState, but I think the whole concept is broken, see below: SegmentWriteState is bad news, for many codec APIs they would be underpopulated, or even have bogus data! For example, what would be SegmentWriteState.numDocs for StoredFieldsWriter? I understand that at a glance having foo(A) where A has A.B and A.C and A.D seems simpler than foo(B, C), but I think its confusing to pass A at all if there is an A.D thats somehow bogus, invalid, etc. In that case its actually much clearer to pass B and C directly... personally I think we should revisit these 'argument holder' APIs and likely remove them completely. Because of that: for most codec APIs I avoided SegmentWriteState and also SegmentReadState (for symmetry). Most Codec.*Format().*Reader() methods should use SegmentReadState -- Key: LUCENE-3836 URL: https://issues.apache.org/jira/browse/LUCENE-3836 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 4.0 Attachments: LUCENE-3836.patch Codec formats API for opening readers is inconsistent - sometimes it uses SegmentReadState, in other cases it uses individual arguments that are already available via SegmentReadState. This complicates extending the API, e.g. if additional per-segment state would need to be passed to the readers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index
[ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Spencer updated SOLR-3185: --- Description: Using solr.PatternReplaceCharFilterFactory to replace 'A B' (no quotes) with 'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query analysis will give the expected result of 'AB'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools. This is the affected charFilter: charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^\w)\s[amp;]\s(\w) replacement=$1amp;$2 / was: Using solr.PatternReplaceCharFilterFactory to replace 'A B' (no quotes) with 'AB' (no spaces) will result in 'Aamp;B' being indexed. Query analysis will give the expected result of 'AB'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools. This is the affected charFilter: charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^\w)\s[amp;]\s(\w) replacement=$1amp;$2 / PatternReplaceCharFilterFactory can't replace with ampersands in index -- Key: SOLR-3185 URL: https://issues.apache.org/jira/browse/SOLR-3185 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 3.5 Reporter: Mike Spencer Priority: Minor Labels: PatternReplaceCharFilter, regex Using solr.PatternReplaceCharFilterFactory to replace 'A B' (no quotes) with 'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query analysis will give the expected result of 'AB'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools. This is the affected charFilter: charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^\w)\s[amp;]\s(\w) replacement=$1amp;$2 / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index
[ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Spencer updated SOLR-3185: --- Description: Using solr.PatternReplaceCharFilterFactory to replace 'A B' (no quotes) with 'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query analysis will give the expected result of 'AB'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools. This is the affected charFilter: charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^\w)\s[amp;]\s(\w) replacement=$1amp;amp;$2 / was: Using solr.PatternReplaceCharFilterFactory to replace 'A B' (no quotes) with 'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query analysis will give the expected result of 'AB'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools. This is the affected charFilter: charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^\w)\s[amp;]\s(\w) replacement=$1amp;$2 / PatternReplaceCharFilterFactory can't replace with ampersands in index -- Key: SOLR-3185 URL: https://issues.apache.org/jira/browse/SOLR-3185 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 3.5 Reporter: Mike Spencer Priority: Minor Labels: PatternReplaceCharFilter, regex Using solr.PatternReplaceCharFilterFactory to replace 'A B' (no quotes) with 'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query analysis will give the expected result of 'AB'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools. This is the affected charFilter: charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^\w)\s[amp;]\s(\w) replacement=$1amp;amp;$2 / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220014#comment-13220014 ] Erick Erickson commented on SOLR-2438: -- Duto: A couple of things. First, in the future could you post this kind of usage question to the users list? See: http://lucene.apache.org/solr/discussion.html. No big deal, but that way more people see the discussion and benefit. But to your question: Have you enabled leading wildcard? See the ReversedWildcardFilterFactory. Leading wildcards need some special handling because in the simple case, finding them means you have to examine every term in the field and can be very expensive. Second, you could get away with just using one analyzer since they're all the same, as analyzer . . . /analyzer if no 'type=...' is specified, then the index and query and multiterm chains are use the analyzer definition. I doubt this issue is related to this JIRA, I think it's just the normal leading wildcard issues. Here's a discussion of this in some detail if you haven't seen it yet: http://www.lucidimagination.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ Erick Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Assignee: Erick Erickson Fix For: 3.6, 4.0 Attachments: SOLR-2438-3x.patch, SOLR-2438-3x.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438_3x.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3011) DIH MultiThreaded bug
[ https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220016#comment-13220016 ] Wenca Petr commented on SOLR-3011: -- Hi Mikhail, I know about 2804, I solved it by disabling logging as someone adviced (I think). Without multithreading a was able to index about 15k documents per minute, with 4 threads average about 45k per minute. After applying your patch it seems to me that it fell to 30k per minute. But the number of processed documents is wrong. I have 5 documents to be indexed. I start a full dump, it precesses about 44k documents during the first minute, but it continues after 50k to total 200k of processed with decreasing number of docs per minute with total time of more than 7 minutes. After the commit the index contains 50k documents which is right. DIH MultiThreaded bug - Key: SOLR-3011 URL: https://issues.apache.org/jira/browse/SOLR-3011 Project: Solr Issue Type: Sub-task Components: contrib - DataImportHandler Affects Versions: 3.5, 4.0 Reporter: Mikhail Khludnev Priority: Minor Fix For: 4.0 Attachments: SOLR-3011.patch, SOLR-3011.patch current DIH design is not thread safe. see last comments at SOLR-2382 and SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly it's a SOLR-2947 patch from 28th Dec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ConjunctionScorer.doNext() overstays?
Hmm, the tradeoff is an added per-hit check (doc != NO_MORE_DOCS), vs the one-time cost at the end of calling advance(NO_MORE_DOCS) for each sub-clause? I think in general this isn't a good tradeoff? Ie what about the case where we and high-freq, and similarly freq'd, terms together? Then, the per-hit check will at some point dominate? It's valid to pass NO_MORE_DOCS to DocsEnum.advance. Mike McCandless http://blog.mikemccandless.com On Thu, Mar 1, 2012 at 7:22 AM, mark harwood markharw...@yahoo.co.uk wrote: I got round to some benchmarking of this change on Wikipedia content which shows a small improvement: http://goo.gl/60wJG Aside from the small performance gain to be had, it just feels more logical if ConjunctionScorer does not issue sub scorers with a request to advance to NO_MORE_DOCS. - Original Message - From: mark harwood markharw...@yahoo.co.uk To: dev@lucene.apache.org dev@lucene.apache.org Cc: Sent: Thursday, 1 March 2012, 9:39 Subject: ConjunctionScorer.doNext() overstays? Due to the odd behaviour of a custom Scorer of mine I discovered ConjunctionScorer.doNext() could loop indefinitely. It does not bail out as soon as any scorer.advance() call it makes reports back NO_MORE_DOCS. Is there not a performance optimisation to be gained in exiting as soon as this happens? At this stage I cannot see any point in continuing to advance other scorers - a quick look at TermScorer suggests that any questionable calls made by ConjunctionScorer to advance to NO_MORE_DOCS receives no special treatment and disk will be hit as a consequence. I added an extra condition to the while loop on the 3.5 source: while ((doc != NO_MORE_DOCS) ((firstScorer = scorers[first]).docID() doc)) { and Junit tests passed.I haven't been able to benchmark performance improvements but it looks like it would be sensible to make the change anyway. Cheers, Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ConjunctionScorer.doNext() overstays?
I would have assumed the many int comparisons would cost less than the superfluous disk accesses? (I bow to your considerable experience in this area!) What is the worst-case scenario on added disk reads? Could it be as bad as numberOfSegments x numberOfOtherscorers before the query winds up? On the index I tried, it looked like an improvement - the spreadsheet I linked to has the source for the benchmark on a second worksheet if you want to give it a whirl on a different dataset. - Original Message - From: Michael McCandless luc...@mikemccandless.com To: dev@lucene.apache.org; mark harwood markharw...@yahoo.co.uk Cc: Sent: Thursday, 1 March 2012, 13:31 Subject: Re: ConjunctionScorer.doNext() overstays? Hmm, the tradeoff is an added per-hit check (doc != NO_MORE_DOCS), vs the one-time cost at the end of calling advance(NO_MORE_DOCS) for each sub-clause? I think in general this isn't a good tradeoff? Ie what about the case where we and high-freq, and similarly freq'd, terms together? Then, the per-hit check will at some point dominate? It's valid to pass NO_MORE_DOCS to DocsEnum.advance. Mike McCandless http://blog.mikemccandless.com On Thu, Mar 1, 2012 at 7:22 AM, mark harwood markharw...@yahoo.co.uk wrote: I got round to some benchmarking of this change on Wikipedia content which shows a small improvement: http://goo.gl/60wJG Aside from the small performance gain to be had, it just feels more logical if ConjunctionScorer does not issue sub scorers with a request to advance to NO_MORE_DOCS. - Original Message - From: mark harwood markharw...@yahoo.co.uk To: dev@lucene.apache.org dev@lucene.apache.org Cc: Sent: Thursday, 1 March 2012, 9:39 Subject: ConjunctionScorer.doNext() overstays? Due to the odd behaviour of a custom Scorer of mine I discovered ConjunctionScorer.doNext() could loop indefinitely. It does not bail out as soon as any scorer.advance() call it makes reports back NO_MORE_DOCS. Is there not a performance optimisation to be gained in exiting as soon as this happens? At this stage I cannot see any point in continuing to advance other scorers - a quick look at TermScorer suggests that any questionable calls made by ConjunctionScorer to advance to NO_MORE_DOCS receives no special treatment and disk will be hit as a consequence. I added an extra condition to the while loop on the 3.5 source: while ((doc != NO_MORE_DOCS) ((firstScorer = scorers[first]).docID() doc)) { and Junit tests passed.I haven't been able to benchmark performance improvements but it looks like it would be sensible to make the change anyway. Cheers, Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3181) New Admin UI, allow user to somehow cut/paste all the old Zookeeper info.
[ https://issues.apache.org/jira/browse/SOLR-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3181: Attachment: SOLR-3181.patch Hm, about something like that? We could allow {{?dump=true}} as Param for the ZookeeperServlet, reuse {{printZnode()}} which is already used for showing the Details (Yes, the Output contains actually escaped quotes, because the Change from SOLR-3162 is pending) New Admin UI, allow user to somehow cut/paste all the old Zookeeper info. --- Key: SOLR-3181 URL: https://issues.apache.org/jira/browse/SOLR-3181 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Environment: n/a Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Attachments: SOLR-3181.patch When tracking down issues with ZK, the devs ask about various bits of data from the cloud pages. It would be convenient to be able to just capture all the data from the old /solr/admin/zookeeper.jsp page in the admin interface to be able to send it to anyone debugging the info. Perhaps just a get debug info for Apache. Or even more cool copy debug info to clipboard if that's possible. Is this just the raw data that the cloud view is manipulating? It doesn't have to be pretty although indentation would be nice. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3174) Visualize Cluster State
[ https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220053#comment-13220053 ] Stefan Matheis (steffkes) commented on SOLR-3174: - I'll try to launch a small Cloud on my local VMWare and build an example w/ each of these libraries .. so we'll see which fits our requirements best - will need your input on this, for sure ; Visualize Cluster State --- Key: SOLR-3174 URL: https://issues.apache.org/jira/browse/SOLR-3174 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley It would be great to visualize the cluster state in the new UI. See Mark's wish: https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState
[ https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220058#comment-13220058 ] Michael McCandless commented on LUCENE-3836: I agree catch-all argument holder classes are dangerous... they can bloat over time and probably lead to bugs... Most Codec.*Format().*Reader() methods should use SegmentReadState -- Key: LUCENE-3836 URL: https://issues.apache.org/jira/browse/LUCENE-3836 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 4.0 Attachments: LUCENE-3836.patch Codec formats API for opening readers is inconsistent - sometimes it uses SegmentReadState, in other cases it uses individual arguments that are already available via SegmentReadState. This complicates extending the API, e.g. if additional per-segment state would need to be passed to the readers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1846 - Still Failing
Tomasso, it looks like the solr/contrib/uima/ build is broken? -Original Message- From: Apache Jenkins Server [mailto:jenk...@builds.apache.org] Sent: Thursday, March 01, 2012 7:16 AM To: dev@lucene.apache.org Subject: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1846 - Still Failing Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1846/ All tests passed Build Log (for compile errors): [...truncated 14441 lines...] check-misc-uptodate: jar-misc: check-spatial-uptodate: jar-spatial: check-grouping-uptodate: jar-grouping: check-queries-uptodate: jar-queries: check-queryparser-uptodate: jar-queryparser: prep-lucene-jars: common.init: compile-lucene-core: jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [javac] Compiling 1 source file to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/lucene/build/core/classes/java [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] 1 warning compile-core: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/build/contrib/solr-uima/classes/java [javac] Compiling 8 source files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/build/contrib/solr-uima/classes/java [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21: error: package org.apache.lucene.analysis.uima does not exist [javac] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21: error: package org.apache.lucene.analysis.uima does not exist [javac] import org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26: error: package org.apache.lucene.analysis.uima.ae does not exist [javac] import org.apache.lucene.analysis.uima.ae.AEProvider; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27: error: package org.apache.lucene.analysis.uima.ae does not exist [javac] import org.apache.lucene.analysis.uima.ae.AEProviderFactory; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51: error: cannot find symbol [javac] private AEProvider aeProvider; [javac] ^ [javac] symbol: class AEProvider [javac] location: class UIMAUpdateRequestProcessor [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:44: error: cannot find symbol [javac] return new UIMAAnnotationsTokenizer(descriptorPath, tokenType, input); [javac]^ [javac] symbol: class UIMAAnnotationsTokenizer [javac] location: class UIMAAnnotationsTokenizerFactory [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:46: error: cannot find symbol [javac] return new UIMATypeAwareAnnotationsTokenizer(descriptorPath, tokenType, featurePath, input); [javac]^ [javac] symbol: class UIMATypeAwareAnnotationsTokenizer [javac] location: class UIMATypeAwareAnnotationsTokenizerFactory [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:64: error: cannot find symbol [javac] aeProvider =
Re: ConjunctionScorer.doNext() overstays?
On Thu, Mar 1, 2012 at 8:49 AM, mark harwood markharw...@yahoo.co.uk wrote: I would have assumed the many int comparisons would cost less than the superfluous disk accesses? (I bow to your considerable experience in this area!) What is the worst-case scenario on added disk reads? Could it be as bad as numberOfSegments x numberOfOtherscorers before the query winds up? Well, it depends -- the disk access is a one-time thing but the added per-hit check is per-hit. At some point it'll cross over... I think likely the advance(NO_MORE_DOCS) will not usually hit disk: our skipper impl fully pre-buffers (in RAM) the top skip lists I think? Even if we do go to disk it's likely the OS pre-cached those bytes in its IO buffer. On the index I tried, it looked like an improvement - the spreadsheet I linked to has the source for the benchmark on a second worksheet if you want to give it a whirl on a different dataset. Maybe try it on a more balanced case? Ie, N high-freq terms whose freq is close-ish? And on slow queries (I think the results in your spreadsheet are very fast queries right? The slowest one was ~0.95 msec per query, if I'm reading it right?). In general I think not slowing down the worst-case queries is much more important that speeding up the super-fast queries. Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3173) Database semantics - insert and update
[ https://issues.apache.org/jira/browse/SOLR-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220062#comment-13220062 ] Per Steffensen commented on SOLR-3173: -- Believe we will be able to use _version_ if: a) There is a realtime way of getting the _version_ corresponding to a given id (or whatever you use as uniqueKey). Lets call this getRealtimeVersion(id) b) The _version_ for a given id returned by getRealtimeVersion(id) never changes unless changes has been made to the document with that id (created, updated or deleted) c) That getRealtimeVersion(id) will immediately return that new _version_ as soon a change has been made - no soft- or hard-commit necessary. Well that is the realtime part :-) d) I will always get a negative number (hopefully always -1) from getRealtimeVersion(id) when calling with an id, where there is no corresponding document in the solr-core. No matter if there has never been such a document or if it has been there but has been deleted. Can you please confirm or correct me on the above bullets, Yonik. It would also be very helpfull if you would provide the code for getRealtimeVersion(id), assuming that I am in the DirectUpdateHandler2. Thanks alot! Guess this version-checking stuff is only necessary on primary (or master or whatever you call it) shards and not on replica (or slave). How do I know in DirectUpdateHandler2 if I am primary/master- or replica/slave-shard? Regret a little bit the idea about different URLs stated in comment above. Guess I would just like to state info about the wanted semantics in the query in some other way. I guess it would be nice with a semantics URL-param with the possible values db-insert, db-update, db-update-version-checked and classic-solr-update: - semantics=db-insert: Index document doc if and only if getRealtimeVersion(doc.id) returns -1. Else return DocumentAlreadyExist error - semantics=db-update: Replace existing document if it exists, else return DocumentDoesNotExist error - semantics=db-update-version-checked: As db-update but if _version_ on the provided document does not correspond to existing getRealtimeVersion(doc.id) return VersionConflict error - semantics=classic-solr-update: Do exactly as update does today in Solr classic-solr-update will be used if semantics is not specified in update request - it is the default. In solrconfig.xml you will be able to change default semantics plus provide a list of semantics that are not allowed. Regards, Per Steffensen Database semantics - insert and update -- Key: SOLR-3173 URL: https://issues.apache.org/jira/browse/SOLR-3173 Project: Solr Issue Type: New Feature Components: update Affects Versions: 3.5 Environment: All Reporter: Per Steffensen Assignee: Per Steffensen Labels: RDBMS, insert, nosql, uniqueKey, update Fix For: 4.0 Original Estimate: 168h Remaining Estimate: 168h In order increase the ability of Solr to be used as a NoSql database (lots of concurrent inserts, updates, deletes and queries in the entire lifetime of the index) instead of just a search index (first: everything indexed (in one thread), after: only queries), I would like Solr to support the following features inspired by RDBMSs and other NoSql databases. * Given a solr-core with a schema containing a uniqueKey-field uniqueField and a document Dold, when trying to INSERT a new document Dnew where Dold.uniqueField is equal to Dnew.uniqueField, then I want a DocumentAlredyExists error. If no such document Dold exists I want Dnew indexed into the solr-core. * Given a solr-core with a schema containing a uniqueKey-field uniqueField and a document Dold, when trying to UPDATE a document Dnew where Dold.uniqueField is equal to Dnew.uniqueField I want Dold deleted from and Dnew added to the index (just as it is today).If no such document Dold exists I want nothing to happen (Dnew is not added to the index) The essence of this issue is to be able to state your intent (insert or update) and have slightly different semantics (from each other and the existing update) depending on you intent. The functionality provided by this issue is only really meaningfull when you run with updateLog activated. This issue might be solved more or less at the same time as SOLR-3178, and only one single SVN patch might be given to cover both issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To
Re: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1846 - Still Failing
I removed too many lines inside its build.xml in r1295508 commit, I'm working to fix it. Tommaso 2012/3/1 Steven A Rowe sar...@syr.edu Tomasso, it looks like the solr/contrib/uima/ build is broken? -Original Message- From: Apache Jenkins Server [mailto:jenk...@builds.apache.org] Sent: Thursday, March 01, 2012 7:16 AM To: dev@lucene.apache.org Subject: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1846 - Still Failing Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1846/ All tests passed Build Log (for compile errors): [...truncated 14441 lines...] check-misc-uptodate: jar-misc: check-spatial-uptodate: jar-spatial: check-grouping-uptodate: jar-grouping: check-queries-uptodate: jar-queries: check-queryparser-uptodate: jar-queryparser: prep-lucene-jars: common.init: compile-lucene-core: jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [javac] Compiling 1 source file to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/lucene/build/core/classes/java [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] 1 warning compile-core: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/build/contrib/solr-uima/classes/java [javac] Compiling 8 source files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/build/contrib/solr-uima/classes/java [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21: error: package org.apache.lucene.analysis.uima does not exist [javac] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21: error: package org.apache.lucene.analysis.uima does not exist [javac] import org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26: error: package org.apache.lucene.analysis.uima.ae does not exist [javac] import org.apache.lucene.analysis.uima.ae.AEProvider; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27: error: package org.apache.lucene.analysis.uima.ae does not exist [javac] import org.apache.lucene.analysis.uima.ae.AEProviderFactory; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51: error: cannot find symbol [javac] private AEProvider aeProvider; [javac] ^ [javac] symbol: class AEProvider [javac] location: class UIMAUpdateRequestProcessor [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:44: error: cannot find symbol [javac] return new UIMAAnnotationsTokenizer(descriptorPath, tokenType, input); [javac]^ [javac] symbol: class UIMAAnnotationsTokenizer [javac] location: class UIMAAnnotationsTokenizerFactory [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:46: error: cannot find symbol [javac] return new UIMATypeAwareAnnotationsTokenizer(descriptorPath, tokenType, featurePath, input); [javac]^ [javac] symbol: class UIMATypeAwareAnnotationsTokenizer [javac] location: class UIMATypeAwareAnnotationsTokenizerFactory [javac]
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12572 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12572/ All tests passed Build Log (for compile errors): [...truncated 12207 lines...] check-memory-uptodate: jar-memory: check-misc-uptodate: jar-misc: check-spatial-uptodate: jar-spatial: check-grouping-uptodate: jar-grouping: check-queries-uptodate: jar-queries: check-queryparser-uptodate: jar-queryparser: prep-lucene-jars: common.init: compile-lucene-core: jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [javac] Compiling 1 source file to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/core/classes/java compile-core: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/contrib/solr-uima/classes/java [javac] Compiling 8 source files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/contrib/solr-uima/classes/java [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21: package org.apache.lucene.analysis.uima does not exist [javac] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21: package org.apache.lucene.analysis.uima does not exist [javac] import org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26: package org.apache.lucene.analysis.uima.ae does not exist [javac] import org.apache.lucene.analysis.uima.ae.AEProvider; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27: package org.apache.lucene.analysis.uima.ae does not exist [javac] import org.apache.lucene.analysis.uima.ae.AEProviderFactory; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51: cannot find symbol [javac] symbol : class AEProvider [javac] location: class org.apache.solr.uima.processor.UIMAUpdateRequestProcessor [javac] private AEProvider aeProvider; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:44: cannot find symbol [javac] symbol : class UIMAAnnotationsTokenizer [javac] location: class org.apache.solr.uima.analysis.UIMAAnnotationsTokenizerFactory [javac] return new UIMAAnnotationsTokenizer(descriptorPath, tokenType, input); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:46: cannot find symbol [javac] symbol : class UIMATypeAwareAnnotationsTokenizer [javac] location: class org.apache.solr.uima.analysis.UIMATypeAwareAnnotationsTokenizerFactory [javac] return new UIMATypeAwareAnnotationsTokenizer(descriptorPath, tokenType, featurePath, input); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:64: cannot find symbol [javac] symbol : variable AEProviderFactory [javac] location: class org.apache.solr.uima.processor.UIMAUpdateRequestProcessor [javac] aeProvider = AEProviderFactory.getInstance().getAEProvider(solrCore.getName(), [javac] ^ [javac] 8 errors [...truncated 14 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState
[ https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220071#comment-13220071 ] Andrzej Bialecki commented on LUCENE-3836: --- I hear you .. SegmentWriteState is bad, I agree. But the argument about SegmentWriteState is not really applicable to SegmentReadState - write state is mutable and can change under your feet whereas SegmentReadState is immutable, created once in SegmentReader and used only for initialization of format readers. On the other hand, if we insist that we always pass individual arguments around then providing some additional segment-global context to format readers requires changing method signatures (adding arguments). The background for this issue is that I started looking at updateable fields, where updates are put in a segment (or reader) of its own and they provide an overlay for the main segment, with a special codec magic to pull and remap data from the overlay as the main data is accessed. However, in order to do that I need to provide this data when format readers are initialized. I can't do this when creating a Codec instance (Codec is automatically instantiated) or when creating Codec.*Format(), because format instances are usually shared as well. So the idea I had in mind was to use SegmentReaderState uniformly, and put this overlay data in SegmentReadState so that it's passed down to formats during format readers' creation. I'm open to other ideas... :) Most Codec.*Format().*Reader() methods should use SegmentReadState -- Key: LUCENE-3836 URL: https://issues.apache.org/jira/browse/LUCENE-3836 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 4.0 Attachments: LUCENE-3836.patch Codec formats API for opening readers is inconsistent - sometimes it uses SegmentReadState, in other cases it uses individual arguments that are already available via SegmentReadState. This complicates extending the API, e.g. if additional per-segment state would need to be passed to the readers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3174) Visualize Cluster State
[ https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220076#comment-13220076 ] Mark Miller commented on SOLR-3174: --- If you are on a unix machine, then in /solr/cloud-dev you could just run solrcloud-start.sh and it starts up a 2 shard, 4 node cluster automatically. Unfortunately, no windows bat files currently :( Visualize Cluster State --- Key: SOLR-3174 URL: https://issues.apache.org/jira/browse/SOLR-3174 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley It would be great to visualize the cluster state in the new UI. See Mark's wish: https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState
[ https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220082#comment-13220082 ] Michael McCandless commented on LUCENE-3836: {quote} The background for this issue is that I started looking at updateable fields, where updates are put in a segment (or reader) of its own and they provide an overlay for the main segment, with a special codec magic to pull and remap data from the overlay as the main data is accessed. However, in order to do that I need to provide this data when format readers are initialized. I can't do this when creating a Codec instance (Codec is automatically instantiated) or when creating Codec.*Format(), because format instances are usually shared as well. {quote} Sweet! Couldn't the stacking/overlaying live above codec? Eg, the codec thinks it's reading 3 segments, but really the code above knows there's 1 base segment and 2 stacked on top? Most Codec.*Format().*Reader() methods should use SegmentReadState -- Key: LUCENE-3836 URL: https://issues.apache.org/jira/browse/LUCENE-3836 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 4.0 Attachments: LUCENE-3836.patch Codec formats API for opening readers is inconsistent - sometimes it uses SegmentReadState, in other cases it uses individual arguments that are already available via SegmentReadState. This complicates extending the API, e.g. if additional per-segment state would need to be passed to the readers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState
[ https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220093#comment-13220093 ] Andrzej Bialecki commented on LUCENE-3836: --- I think this could work, too - I would instantiate the overlay data in SegmentReader, and then I'd create the overlay codec's format readers in SegmentReader, using the original format readers plus the overlay data. I'll try this approach ... I'll create a separate issue to discuss this. Let's close this as won't fix for now. Most Codec.*Format().*Reader() methods should use SegmentReadState -- Key: LUCENE-3836 URL: https://issues.apache.org/jira/browse/LUCENE-3836 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 4.0 Attachments: LUCENE-3836.patch Codec formats API for opening readers is inconsistent - sometimes it uses SegmentReadState, in other cases it uses individual arguments that are already available via SegmentReadState. This complicates extending the API, e.g. if additional per-segment state would need to be passed to the readers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Merging 3.0.3 into Trunk
On 2012-02-29, Stefan Bodewig wrote: On 2012-02-28, Christopher Currens wrote: Alright, it's done! 3.0.3 is now merged in with Trunk! I'll see to running RAT and looking at the line-ends over the next few days so we can get them fixed once and not run into it with the release. I went for EOLs first and there are 621 files outside of lib and doc that need to be fixed. What I have now is not just a patch (of more than 200k lines), but also a list of 621 files that need their svn:eol-style property to be set. I can create a JIRA ticket for that attaching my patch and the list of files to fix or - since I technically am a committer - could just commit my cleaned up workspace as is (plus JIRA ticket that I'd open and close myself). What would you prefer? RAT doesn't really make sense before the line feeds are correct (I've seen quite a few files without license headers by manual inspection). Stefan
[jira] [Issue Comment Edited] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState
[ https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220093#comment-13220093 ] Andrzej Bialecki edited comment on LUCENE-3836 at 3/1/12 3:22 PM: --- I think this could work, too - I would instantiate the overlay data in SegmentReader, and then I'd create the overlay codec's format readers in SegmentReader, using the original format readers plus the overlay data. I'll try this approach ... I'll create a separate issue to discuss this. (The reason I'm doing this at the codec level is that I wanted to avoid heavy mods to SegmentReader, and it's easier to visualize how this data is re-mapped and stacked at the level of fairly simple codec APIs). Let's close this as won't fix for now. was (Author: ab): I think this could work, too - I would instantiate the overlay data in SegmentReader, and then I'd create the overlay codec's format readers in SegmentReader, using the original format readers plus the overlay data. I'll try this approach ... I'll create a separate issue to discuss this. Let's close this as won't fix for now. Most Codec.*Format().*Reader() methods should use SegmentReadState -- Key: LUCENE-3836 URL: https://issues.apache.org/jira/browse/LUCENE-3836 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 4.0 Attachments: LUCENE-3836.patch Codec formats API for opening readers is inconsistent - sometimes it uses SegmentReadState, in other cases it uses individual arguments that are already available via SegmentReadState. This complicates extending the API, e.g. if additional per-segment state would need to be passed to the readers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState
[ https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220105#comment-13220105 ] Robert Muir commented on LUCENE-3836: - {quote} (The reason I'm doing this at the codec level is that I wanted to avoid heavy mods to SegmentReader, and it's easier to visualize how this data is re-mapped and stacked at the level of fairly simple codec APIs). {quote} But SegmentReader is fairly simple these days, its just basically a pointer to a core (SegmentCoreReaders) + deletes. Maybe it should stay the same, but instead we could have a StackedReader (perhaps a bad name), that points to multiple cores + deletes + mask files or whatever it needs and returns masked enums over the underlying Enums itself (e.g. combining enums from the underlying impls, passing masks down as Bits, and such). SegmentReader would stay as-is. Most Codec.*Format().*Reader() methods should use SegmentReadState -- Key: LUCENE-3836 URL: https://issues.apache.org/jira/browse/LUCENE-3836 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 4.0 Attachments: LUCENE-3836.patch Codec formats API for opening readers is inconsistent - sometimes it uses SegmentReadState, in other cases it uses individual arguments that are already available via SegmentReadState. This complicates extending the API, e.g. if additional per-segment state would need to be passed to the readers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #407: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/407/ No tests ran. Build Log (for compile errors): [...truncated 20066 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3837) A modest proposal for updateable fields
A modest proposal for updateable fields --- Key: LUCENE-3837 URL: https://issues.apache.org/jira/browse/LUCENE-3837 Project: Lucene - Java Issue Type: New Feature Components: core/index Affects Versions: 4.0 Reporter: Andrzej Bialecki I'd like to propose a simple design for implementing updateable fields in Lucene. This design has some limitations, so I'm not claiming it will be appropriate for every use case, and it's obvious it has some performance consequences, but at least it's a start... This proposal uses a concept of overlays or stacked updates, where the original data is not removed but instead it's overlaid with the new data. I propose to reuse as much of the existing APIs as possible, and represent updates as an IndexReader. Updates to documents in a specific segment would be collected in an overlay index specific to that segment, i.e. there would be as many overlay indexes as there are segments in the primary index. A field update would be represented as a new document in the overlay index . The document would consist of just the updated fields, plus a field that records the id in the primary segment of the document affected by the update. These updates would be processed as usual via secondary IndexWriter-s, as many as there are primary segments, so the same analysis chains would be used, the same field types, etc. On opening a segment with updates the SegmentReader (see also LUCENE-3836) would check for the presence of the overlay index, and if so it would open it first (as an AtomicReader? or it would open individual codec format readers? perhaps it should load the whole thing into memory?), and it would construct an in-memory map between the primary's docId-s and the overlay's docId-s. And finally it would wrap the original format readers with overlay readers, initialized also with the id map. Now, when consumers of the 4D API would ask for specific data, the overlay readers would first re-map the primary's docId to the overlay's docId, and check whether overlay data exists for that docId and this type of data (e.g. postings, stored fields, vectors) and return this data instead of the original. Otherwise they would return the original data. One obvious performance issue with this appraoch is that the sequential access to primary data would translate into random access to the overlay data. This could be solved by sorting the overlay index so that at least the overlay ids increase monotonically as primary ids do. Updates to the primary index would be handled as usual, i.e. segment merges, since the segments with updates would pretend to have no overlays) would just work as usual, only the overlay index would have to be deleted once the primary segment is deleted after merge. Updates to the existing documents that already had some fields updated would be again handled as usual, only underneath they would open an IndexWriter on the overlay index for a specific segment. That's the broad idea. Feel free to pipe in - I started some coding at the codec level but got stuck using the approach in LUCENE-3836. The approach that uses a modified SegmentReader seems more promising. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState
[ https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved LUCENE-3836. --- Resolution: Won't Fix Thanks for the insightful comments - this looks promising. I opened LUCENE-3837 to discuss a broader design for updateable fields. Most Codec.*Format().*Reader() methods should use SegmentReadState -- Key: LUCENE-3836 URL: https://issues.apache.org/jira/browse/LUCENE-3836 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 4.0 Attachments: LUCENE-3836.patch Codec formats API for opening readers is inconsistent - sometimes it uses SegmentReadState, in other cases it uses individual arguments that are already available via SegmentReadState. This complicates extending the API, e.g. if additional per-segment state would need to be passed to the readers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ConjunctionScorer.doNext() overstays?
Fair points. I've tried several sized indexes and blends of query term frequencies now and the results swing only marginally between the 2 implementations. Sometimes the exiting early logic is marginally faster and other times marginally slower. Using a larger index seemed to reduce the improvement I had seen on my initial results. So overall, not a clear improvement and not worth bothering with because, as you suggest, various disk caching strategies probably mitigate the cost of the added reads. Based on your comments re the added int comparison cost in that hot loop it made me think that the abstract docIdSetIterator.docId() method call could be questioned on that basis too? It looks like all DocIdSetIterator subclasses maintain a doc variable mutated elsewhere in advance() and next() calls and docID() is meant to be idempotent so presumably a shared variable in the base class could avoid a docID() method invocation? Anyhoo the profiler did not show that method up as any sort of hotspot so I don't think it's an issue. Thanks, Mike. - Original Message - From: Michael McCandless luc...@mikemccandless.com To: dev@lucene.apache.org; mark harwood markharw...@yahoo.co.uk Cc: Sent: Thursday, 1 March 2012, 14:18 Subject: Re: ConjunctionScorer.doNext() overstays? On Thu, Mar 1, 2012 at 8:49 AM, mark harwood markharw...@yahoo.co.uk wrote: I would have assumed the many int comparisons would cost less than the superfluous disk accesses? (I bow to your considerable experience in this area!) What is the worst-case scenario on added disk reads? Could it be as bad as numberOfSegments x numberOfOtherscorers before the query winds up? Well, it depends -- the disk access is a one-time thing but the added per-hit check is per-hit. At some point it'll cross over... I think likely the advance(NO_MORE_DOCS) will not usually hit disk: our skipper impl fully pre-buffers (in RAM) the top skip lists I think? Even if we do go to disk it's likely the OS pre-cached those bytes in its IO buffer. On the index I tried, it looked like an improvement - the spreadsheet I linked to has the source for the benchmark on a second worksheet if you want to give it a whirl on a different dataset. Maybe try it on a more balanced case? Ie, N high-freq terms whose freq is close-ish? And on slow queries (I think the results in your spreadsheet are very fast queries right? The slowest one was ~0.95 msec per query, if I'm reading it right?). In general I think not slowing down the worst-case queries is much more important that speeding up the super-fast queries. Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields
[ https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220163#comment-13220163 ] Robert Muir commented on LUCENE-3837: - Some concerns about scoring: # the stats problem: maybe we should allow overlay readers to just return -1 for docfreq? I dont like the situation today where preflex codec doesnt implement all the stats (the whole -1 situation and 'optional' stats is frustrating), but I think its worse to return out of bounds stuff, e.g. where docfreq maxdoc. I think totalTermFreq is safe to just sum up though (its wrong, but not out of bounds), and similarity could use this safely as to compute expected IDF instead. Still, this part will be messy, unlike the newer stats in 4.0, lots of code I think expects that docFreq is always supported. Another possibility that I think I like more is to treat this conceptually just like deletes in every way, so all stats are supported but maxDoc is wrong (includes masked-away documents), then nothing is out of bounds. So in this case we would add maxDoc(field), which is only used for scoring. For a normal reader this just returns maxDoc() as implemented today... # the norms problem: although norms are implemented as docValues, currently all similarities assume that getArray()/hasArray() is implemented... but here I'm not sure that would be the case? we should probably measure if the method call really even hurts, in general its a burden on the codec I think to require that norms actually be representable as an array (maybe other use cases would want other data structures for less RAM)... we could solve both of these issues separately and independently if we decide what what we want to do. A modest proposal for updateable fields --- Key: LUCENE-3837 URL: https://issues.apache.org/jira/browse/LUCENE-3837 Project: Lucene - Java Issue Type: New Feature Components: core/index Affects Versions: 4.0 Reporter: Andrzej Bialecki I'd like to propose a simple design for implementing updateable fields in Lucene. This design has some limitations, so I'm not claiming it will be appropriate for every use case, and it's obvious it has some performance consequences, but at least it's a start... This proposal uses a concept of overlays or stacked updates, where the original data is not removed but instead it's overlaid with the new data. I propose to reuse as much of the existing APIs as possible, and represent updates as an IndexReader. Updates to documents in a specific segment would be collected in an overlay index specific to that segment, i.e. there would be as many overlay indexes as there are segments in the primary index. A field update would be represented as a new document in the overlay index . The document would consist of just the updated fields, plus a field that records the id in the primary segment of the document affected by the update. These updates would be processed as usual via secondary IndexWriter-s, as many as there are primary segments, so the same analysis chains would be used, the same field types, etc. On opening a segment with updates the SegmentReader (see also LUCENE-3836) would check for the presence of the overlay index, and if so it would open it first (as an AtomicReader? or it would open individual codec format readers? perhaps it should load the whole thing into memory?), and it would construct an in-memory map between the primary's docId-s and the overlay's docId-s. And finally it would wrap the original format readers with overlay readers, initialized also with the id map. Now, when consumers of the 4D API would ask for specific data, the overlay readers would first re-map the primary's docId to the overlay's docId, and check whether overlay data exists for that docId and this type of data (e.g. postings, stored fields, vectors) and return this data instead of the original. Otherwise they would return the original data. One obvious performance issue with this appraoch is that the sequential access to primary data would translate into random access to the overlay data. This could be solved by sorting the overlay index so that at least the overlay ids increase monotonically as primary ids do. Updates to the primary index would be handled as usual, i.e. segment merges, since the segments with updates would pretend to have no overlays) would just work as usual, only the overlay index would have to be deleted once the primary segment is deleted after merge. Updates to the existing documents that already had some fields updated would be again handled as usual, only underneath they would open an IndexWriter on the overlay index for a specific segment. That's the broad
[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields
[ https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220194#comment-13220194 ] Andrzej Bialecki commented on LUCENE-3837: --- Ad 1. I don't think it's such a big deal, we already return approximate stats (too high counts) in presence of deletes. I think we should go all the way, at least initially, and ignore stats from an overlay completely, unless the data is present only in the overlay - e.g. for terms not present in the main index. Ad 2. I think that if getArray() is supported then on the first call we have to roll-in all updates to the main array created from the primary. A modest proposal for updateable fields --- Key: LUCENE-3837 URL: https://issues.apache.org/jira/browse/LUCENE-3837 Project: Lucene - Java Issue Type: New Feature Components: core/index Affects Versions: 4.0 Reporter: Andrzej Bialecki I'd like to propose a simple design for implementing updateable fields in Lucene. This design has some limitations, so I'm not claiming it will be appropriate for every use case, and it's obvious it has some performance consequences, but at least it's a start... This proposal uses a concept of overlays or stacked updates, where the original data is not removed but instead it's overlaid with the new data. I propose to reuse as much of the existing APIs as possible, and represent updates as an IndexReader. Updates to documents in a specific segment would be collected in an overlay index specific to that segment, i.e. there would be as many overlay indexes as there are segments in the primary index. A field update would be represented as a new document in the overlay index . The document would consist of just the updated fields, plus a field that records the id in the primary segment of the document affected by the update. These updates would be processed as usual via secondary IndexWriter-s, as many as there are primary segments, so the same analysis chains would be used, the same field types, etc. On opening a segment with updates the SegmentReader (see also LUCENE-3836) would check for the presence of the overlay index, and if so it would open it first (as an AtomicReader? or it would open individual codec format readers? perhaps it should load the whole thing into memory?), and it would construct an in-memory map between the primary's docId-s and the overlay's docId-s. And finally it would wrap the original format readers with overlay readers, initialized also with the id map. Now, when consumers of the 4D API would ask for specific data, the overlay readers would first re-map the primary's docId to the overlay's docId, and check whether overlay data exists for that docId and this type of data (e.g. postings, stored fields, vectors) and return this data instead of the original. Otherwise they would return the original data. One obvious performance issue with this appraoch is that the sequential access to primary data would translate into random access to the overlay data. This could be solved by sorting the overlay index so that at least the overlay ids increase monotonically as primary ids do. Updates to the primary index would be handled as usual, i.e. segment merges, since the segments with updates would pretend to have no overlays) would just work as usual, only the overlay index would have to be deleted once the primary segment is deleted after merge. Updates to the existing documents that already had some fields updated would be again handled as usual, only underneath they would open an IndexWriter on the overlay index for a specific segment. That's the broad idea. Feel free to pipe in - I started some coding at the codec level but got stuck using the approach in LUCENE-3836. The approach that uses a modified SegmentReader seems more promising. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ConjunctionScorer.doNext() overstays?
On Thu, Mar 1, 2012 at 11:55 AM, mark harwood markharw...@yahoo.co.uk wrote: Based on your comments re the added int comparison cost in that hot loop it made me think that the abstract docIdSetIterator.docId() method call could be questioned on that basis too? It looks like all DocIdSetIterator subclasses maintain a doc variable mutated elsewhere in advance() and next() calls and docID() is meant to be idempotent so presumably a shared variable in the base class could avoid a docID() method invocation? Anyhoo the profiler did not show that method up as any sort of hotspot so I don't think it's an issue. Maybe we could explore that? I'm not sure about hotspot implications though... (vs private int accessible only via getter). Ideally, consumers of DISI should hold onto the int docID returned from next/advance and use that... (ie, don't call docID() again, unless it's too hard to hold onto the returned doc). Thanks, Mike. Thank you! Keep the ideas coming :) Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields
[ https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220207#comment-13220207 ] Michael McCandless commented on LUCENE-3837: Could we use the actual docID (ie same docID as the base segment)? This way we wouldn't need the (possibly large) int[] to remap on each access. I guess for postings this is OK (we can pass PostingsFormat any docIDs), but for eg stored fields, term vectors, doc values, it's not (they can't handle sparse docIDs). Also, can't we directly write the stacked segments ourselves? (Ie, within a single IW). We'd need to extend SegmentInfo(s) to record which segments stack on which, and fix MP to understand stacking (and aggressively target the stacks). A modest proposal for updateable fields --- Key: LUCENE-3837 URL: https://issues.apache.org/jira/browse/LUCENE-3837 Project: Lucene - Java Issue Type: New Feature Components: core/index Affects Versions: 4.0 Reporter: Andrzej Bialecki I'd like to propose a simple design for implementing updateable fields in Lucene. This design has some limitations, so I'm not claiming it will be appropriate for every use case, and it's obvious it has some performance consequences, but at least it's a start... This proposal uses a concept of overlays or stacked updates, where the original data is not removed but instead it's overlaid with the new data. I propose to reuse as much of the existing APIs as possible, and represent updates as an IndexReader. Updates to documents in a specific segment would be collected in an overlay index specific to that segment, i.e. there would be as many overlay indexes as there are segments in the primary index. A field update would be represented as a new document in the overlay index . The document would consist of just the updated fields, plus a field that records the id in the primary segment of the document affected by the update. These updates would be processed as usual via secondary IndexWriter-s, as many as there are primary segments, so the same analysis chains would be used, the same field types, etc. On opening a segment with updates the SegmentReader (see also LUCENE-3836) would check for the presence of the overlay index, and if so it would open it first (as an AtomicReader? or it would open individual codec format readers? perhaps it should load the whole thing into memory?), and it would construct an in-memory map between the primary's docId-s and the overlay's docId-s. And finally it would wrap the original format readers with overlay readers, initialized also with the id map. Now, when consumers of the 4D API would ask for specific data, the overlay readers would first re-map the primary's docId to the overlay's docId, and check whether overlay data exists for that docId and this type of data (e.g. postings, stored fields, vectors) and return this data instead of the original. Otherwise they would return the original data. One obvious performance issue with this appraoch is that the sequential access to primary data would translate into random access to the overlay data. This could be solved by sorting the overlay index so that at least the overlay ids increase monotonically as primary ids do. Updates to the primary index would be handled as usual, i.e. segment merges, since the segments with updates would pretend to have no overlays) would just work as usual, only the overlay index would have to be deleted once the primary segment is deleted after merge. Updates to the existing documents that already had some fields updated would be again handled as usual, only underneath they would open an IndexWriter on the overlay index for a specific segment. That's the broad idea. Feel free to pipe in - I started some coding at the codec level but got stuck using the approach in LUCENE-3836. The approach that uses a modified SegmentReader seems more promising. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields
[ https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220208#comment-13220208 ] Robert Muir commented on LUCENE-3837: - {quote} Ad 1. I don't think it's such a big deal, we already return approximate stats (too high counts) in presence of deletes. I think we should go all the way, at least initially, and ignore stats from an overlay completely, unless the data is present only in the overlay - e.g. for terms not present in the main index. {quote} I disagree: it may not be a big deal for DefaultSimilarity, but its important for other scoring implementations. Initially its extremely important we get this stuff right before committing anything! Large problems can result when the statistics are inconsistent with what is 'discovered' in the docsenum. This is because many scoring models expect certain relationships to hold true: such as a single doc's tf value won't exceed totalTermFreq. We had to do significant work already to ensure consistency, though in some cases the problems could not totally be solved (BasicModelD, BasicModelP, BasicModelBE+NormalizationH3, etc) and we had to unfortunately resort to only leaving warnings in the javadocs. I'm fairly certain in all cases we avoid things like NaN or negative scores, but when the function 'inverts relevance' is aweful too. So I think we need a consistent model for stats: thats why I lean towards maxDoc(field), which is consistent in every way with how we handle deletes, and it won't yield any surprises. A modest proposal for updateable fields --- Key: LUCENE-3837 URL: https://issues.apache.org/jira/browse/LUCENE-3837 Project: Lucene - Java Issue Type: New Feature Components: core/index Affects Versions: 4.0 Reporter: Andrzej Bialecki I'd like to propose a simple design for implementing updateable fields in Lucene. This design has some limitations, so I'm not claiming it will be appropriate for every use case, and it's obvious it has some performance consequences, but at least it's a start... This proposal uses a concept of overlays or stacked updates, where the original data is not removed but instead it's overlaid with the new data. I propose to reuse as much of the existing APIs as possible, and represent updates as an IndexReader. Updates to documents in a specific segment would be collected in an overlay index specific to that segment, i.e. there would be as many overlay indexes as there are segments in the primary index. A field update would be represented as a new document in the overlay index . The document would consist of just the updated fields, plus a field that records the id in the primary segment of the document affected by the update. These updates would be processed as usual via secondary IndexWriter-s, as many as there are primary segments, so the same analysis chains would be used, the same field types, etc. On opening a segment with updates the SegmentReader (see also LUCENE-3836) would check for the presence of the overlay index, and if so it would open it first (as an AtomicReader? or it would open individual codec format readers? perhaps it should load the whole thing into memory?), and it would construct an in-memory map between the primary's docId-s and the overlay's docId-s. And finally it would wrap the original format readers with overlay readers, initialized also with the id map. Now, when consumers of the 4D API would ask for specific data, the overlay readers would first re-map the primary's docId to the overlay's docId, and check whether overlay data exists for that docId and this type of data (e.g. postings, stored fields, vectors) and return this data instead of the original. Otherwise they would return the original data. One obvious performance issue with this appraoch is that the sequential access to primary data would translate into random access to the overlay data. This could be solved by sorting the overlay index so that at least the overlay ids increase monotonically as primary ids do. Updates to the primary index would be handled as usual, i.e. segment merges, since the segments with updates would pretend to have no overlays) would just work as usual, only the overlay index would have to be deleted once the primary segment is deleted after merge. Updates to the existing documents that already had some fields updated would be again handled as usual, only underneath they would open an IndexWriter on the overlay index for a specific segment. That's the broad idea. Feel free to pipe in - I started some coding at the codec level but got stuck using the approach in LUCENE-3836. The approach that uses a modified SegmentReader seems more promising. -- This message is
[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields
[ https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220211#comment-13220211 ] Michael McCandless commented on LUCENE-3837: I think for scoring the wrong yet consistent stats approach is good? (Just like deletes). So, an update would affect scoring (eg on update the field now has 4 occurrences of python vs only 1 occurrence before, so now it gets a better score), but the scoring will not precisely match the scores I'd get from a full re-index instead of an update. A modest proposal for updateable fields --- Key: LUCENE-3837 URL: https://issues.apache.org/jira/browse/LUCENE-3837 Project: Lucene - Java Issue Type: New Feature Components: core/index Affects Versions: 4.0 Reporter: Andrzej Bialecki I'd like to propose a simple design for implementing updateable fields in Lucene. This design has some limitations, so I'm not claiming it will be appropriate for every use case, and it's obvious it has some performance consequences, but at least it's a start... This proposal uses a concept of overlays or stacked updates, where the original data is not removed but instead it's overlaid with the new data. I propose to reuse as much of the existing APIs as possible, and represent updates as an IndexReader. Updates to documents in a specific segment would be collected in an overlay index specific to that segment, i.e. there would be as many overlay indexes as there are segments in the primary index. A field update would be represented as a new document in the overlay index . The document would consist of just the updated fields, plus a field that records the id in the primary segment of the document affected by the update. These updates would be processed as usual via secondary IndexWriter-s, as many as there are primary segments, so the same analysis chains would be used, the same field types, etc. On opening a segment with updates the SegmentReader (see also LUCENE-3836) would check for the presence of the overlay index, and if so it would open it first (as an AtomicReader? or it would open individual codec format readers? perhaps it should load the whole thing into memory?), and it would construct an in-memory map between the primary's docId-s and the overlay's docId-s. And finally it would wrap the original format readers with overlay readers, initialized also with the id map. Now, when consumers of the 4D API would ask for specific data, the overlay readers would first re-map the primary's docId to the overlay's docId, and check whether overlay data exists for that docId and this type of data (e.g. postings, stored fields, vectors) and return this data instead of the original. Otherwise they would return the original data. One obvious performance issue with this appraoch is that the sequential access to primary data would translate into random access to the overlay data. This could be solved by sorting the overlay index so that at least the overlay ids increase monotonically as primary ids do. Updates to the primary index would be handled as usual, i.e. segment merges, since the segments with updates would pretend to have no overlays) would just work as usual, only the overlay index would have to be deleted once the primary segment is deleted after merge. Updates to the existing documents that already had some fields updated would be again handled as usual, only underneath they would open an IndexWriter on the overlay index for a specific segment. That's the broad idea. Feel free to pipe in - I started some coding at the codec level but got stuck using the approach in LUCENE-3836. The approach that uses a modified SegmentReader seems more promising. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Stefan Matheis
Welcome, Stefan!Your UI work is definitely much appreciated and very nice looking. Erik On Feb 29, 2012, at 16:04 , Ryan McKinley wrote: I'm pleased to announce that Stefan Matheis has joined our ranks as a committer. He has given the solr admin UI some much needed love. It now looks like it belongs in 2012! Stefan, it is tradition that you introduce yourself with a brief bio. Your SVN access should be ready to go. Welcome! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields
[ https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220300#comment-13220300 ] Andrzej Bialecki commented on LUCENE-3837: --- That was my point, we should be able to come up with estimates that yield slightly wrong yet consistent stats. I don't know the details of new similarities, so it's up to you Robert to come up with suggestions :) A modest proposal for updateable fields --- Key: LUCENE-3837 URL: https://issues.apache.org/jira/browse/LUCENE-3837 Project: Lucene - Java Issue Type: New Feature Components: core/index Affects Versions: 4.0 Reporter: Andrzej Bialecki I'd like to propose a simple design for implementing updateable fields in Lucene. This design has some limitations, so I'm not claiming it will be appropriate for every use case, and it's obvious it has some performance consequences, but at least it's a start... This proposal uses a concept of overlays or stacked updates, where the original data is not removed but instead it's overlaid with the new data. I propose to reuse as much of the existing APIs as possible, and represent updates as an IndexReader. Updates to documents in a specific segment would be collected in an overlay index specific to that segment, i.e. there would be as many overlay indexes as there are segments in the primary index. A field update would be represented as a new document in the overlay index . The document would consist of just the updated fields, plus a field that records the id in the primary segment of the document affected by the update. These updates would be processed as usual via secondary IndexWriter-s, as many as there are primary segments, so the same analysis chains would be used, the same field types, etc. On opening a segment with updates the SegmentReader (see also LUCENE-3836) would check for the presence of the overlay index, and if so it would open it first (as an AtomicReader? or it would open individual codec format readers? perhaps it should load the whole thing into memory?), and it would construct an in-memory map between the primary's docId-s and the overlay's docId-s. And finally it would wrap the original format readers with overlay readers, initialized also with the id map. Now, when consumers of the 4D API would ask for specific data, the overlay readers would first re-map the primary's docId to the overlay's docId, and check whether overlay data exists for that docId and this type of data (e.g. postings, stored fields, vectors) and return this data instead of the original. Otherwise they would return the original data. One obvious performance issue with this appraoch is that the sequential access to primary data would translate into random access to the overlay data. This could be solved by sorting the overlay index so that at least the overlay ids increase monotonically as primary ids do. Updates to the primary index would be handled as usual, i.e. segment merges, since the segments with updates would pretend to have no overlays) would just work as usual, only the overlay index would have to be deleted once the primary segment is deleted after merge. Updates to the existing documents that already had some fields updated would be again handled as usual, only underneath they would open an IndexWriter on the overlay index for a specific segment. That's the broad idea. Feel free to pipe in - I started some coding at the codec level but got stuck using the approach in LUCENE-3836. The approach that uses a modified SegmentReader seems more promising. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields
[ https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220315#comment-13220315 ] Andrzej Bialecki commented on LUCENE-3837: --- bq. Could we use the actual docID (ie same docID as the base segment)? Updates may arrive out of order, so the updates will naturally get different internal IDs (also, if you wanted to use the same ids they would have gaps). I don't know if various parts of Lucene can handle out of order ids coming from iterators? If we wanted to match the ids early then we would have to sort them, a la IndexSorter, on every flush and on every merge, which seems too costly. So, a re-mapping structure seems like a decent compromise. Yes, it could be large - we could put artificial limits on the number of updates before we do a merge. bq. Also, can't we directly write the stacked segments ourselves? (Ie, within a single IW). I don't know, it didn't seem likely to me - AFAIK IW operates on a single segment before flushing it? And updates could refer to docs outside the current segment. A modest proposal for updateable fields --- Key: LUCENE-3837 URL: https://issues.apache.org/jira/browse/LUCENE-3837 Project: Lucene - Java Issue Type: New Feature Components: core/index Affects Versions: 4.0 Reporter: Andrzej Bialecki I'd like to propose a simple design for implementing updateable fields in Lucene. This design has some limitations, so I'm not claiming it will be appropriate for every use case, and it's obvious it has some performance consequences, but at least it's a start... This proposal uses a concept of overlays or stacked updates, where the original data is not removed but instead it's overlaid with the new data. I propose to reuse as much of the existing APIs as possible, and represent updates as an IndexReader. Updates to documents in a specific segment would be collected in an overlay index specific to that segment, i.e. there would be as many overlay indexes as there are segments in the primary index. A field update would be represented as a new document in the overlay index . The document would consist of just the updated fields, plus a field that records the id in the primary segment of the document affected by the update. These updates would be processed as usual via secondary IndexWriter-s, as many as there are primary segments, so the same analysis chains would be used, the same field types, etc. On opening a segment with updates the SegmentReader (see also LUCENE-3836) would check for the presence of the overlay index, and if so it would open it first (as an AtomicReader? or it would open individual codec format readers? perhaps it should load the whole thing into memory?), and it would construct an in-memory map between the primary's docId-s and the overlay's docId-s. And finally it would wrap the original format readers with overlay readers, initialized also with the id map. Now, when consumers of the 4D API would ask for specific data, the overlay readers would first re-map the primary's docId to the overlay's docId, and check whether overlay data exists for that docId and this type of data (e.g. postings, stored fields, vectors) and return this data instead of the original. Otherwise they would return the original data. One obvious performance issue with this appraoch is that the sequential access to primary data would translate into random access to the overlay data. This could be solved by sorting the overlay index so that at least the overlay ids increase monotonically as primary ids do. Updates to the primary index would be handled as usual, i.e. segment merges, since the segments with updates would pretend to have no overlays) would just work as usual, only the overlay index would have to be deleted once the primary segment is deleted after merge. Updates to the existing documents that already had some fields updated would be again handled as usual, only underneath they would open an IndexWriter on the overlay index for a specific segment. That's the broad idea. Feel free to pipe in - I started some coding at the codec level but got stuck using the approach in LUCENE-3836. The approach that uses a modified SegmentReader seems more promising. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3837) A modest proposal for updateable fields
[ https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220315#comment-13220315 ] Andrzej Bialecki edited comment on LUCENE-3837 at 3/1/12 8:17 PM: --- bq. Could we use the actual docID (ie same docID as the base segment)? Updates may arrive out of order, so the updates will naturally get different internal IDs (also, if you wanted to use the same ids they would have gaps). I don't know if various parts of Lucene can handle out of order ids coming from iterators? If we wanted to match the ids early then we would have to sort them, a la IndexSorter, on every flush and on every merge, which seems too costly. So, a re-mapping structure seems like a decent compromise. Yes, it could be large - we could put artificial limits on the number of updates before we force a merge. bq. Also, can't we directly write the stacked segments ourselves? (Ie, within a single IW). I don't know, it didn't seem likely to me - AFAIK IW operates on a single segment before flushing it? And updates could refer to docs outside the current segment. was (Author: ab): bq. Could we use the actual docID (ie same docID as the base segment)? Updates may arrive out of order, so the updates will naturally get different internal IDs (also, if you wanted to use the same ids they would have gaps). I don't know if various parts of Lucene can handle out of order ids coming from iterators? If we wanted to match the ids early then we would have to sort them, a la IndexSorter, on every flush and on every merge, which seems too costly. So, a re-mapping structure seems like a decent compromise. Yes, it could be large - we could put artificial limits on the number of updates before we do a merge. bq. Also, can't we directly write the stacked segments ourselves? (Ie, within a single IW). I don't know, it didn't seem likely to me - AFAIK IW operates on a single segment before flushing it? And updates could refer to docs outside the current segment. A modest proposal for updateable fields --- Key: LUCENE-3837 URL: https://issues.apache.org/jira/browse/LUCENE-3837 Project: Lucene - Java Issue Type: New Feature Components: core/index Affects Versions: 4.0 Reporter: Andrzej Bialecki I'd like to propose a simple design for implementing updateable fields in Lucene. This design has some limitations, so I'm not claiming it will be appropriate for every use case, and it's obvious it has some performance consequences, but at least it's a start... This proposal uses a concept of overlays or stacked updates, where the original data is not removed but instead it's overlaid with the new data. I propose to reuse as much of the existing APIs as possible, and represent updates as an IndexReader. Updates to documents in a specific segment would be collected in an overlay index specific to that segment, i.e. there would be as many overlay indexes as there are segments in the primary index. A field update would be represented as a new document in the overlay index . The document would consist of just the updated fields, plus a field that records the id in the primary segment of the document affected by the update. These updates would be processed as usual via secondary IndexWriter-s, as many as there are primary segments, so the same analysis chains would be used, the same field types, etc. On opening a segment with updates the SegmentReader (see also LUCENE-3836) would check for the presence of the overlay index, and if so it would open it first (as an AtomicReader? or it would open individual codec format readers? perhaps it should load the whole thing into memory?), and it would construct an in-memory map between the primary's docId-s and the overlay's docId-s. And finally it would wrap the original format readers with overlay readers, initialized also with the id map. Now, when consumers of the 4D API would ask for specific data, the overlay readers would first re-map the primary's docId to the overlay's docId, and check whether overlay data exists for that docId and this type of data (e.g. postings, stored fields, vectors) and return this data instead of the original. Otherwise they would return the original data. One obvious performance issue with this appraoch is that the sequential access to primary data would translate into random access to the overlay data. This could be solved by sorting the overlay index so that at least the overlay ids increase monotonically as primary ids do. Updates to the primary index would be handled as usual, i.e. segment merges, since the segments with updates would pretend to have no overlays) would just work as usual, only the
[jira] [Created] (SOLR-3188) New admin page: Enable Polling button disappears after disabling polling and reloading page
New admin page: Enable Polling button disappears after disabling polling and reloading page - Key: SOLR-3188 URL: https://issues.apache.org/jira/browse/SOLR-3188 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0 Reporter: Neil Hooey Priority: Minor When you go to this URL on a slave: http://localhost:8983/solr/#/singlecore/replication And click the Disable Polling button, you see a red bar that says invalid_master. I'm not sure why I get this red bar, as I haven't tested it outside of my own installation, but it seems normal. If you then refresh the page, the Replicate Now and Enable Polling buttons disappear. It seems like their generation is being interrupted by the invalid_master error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2020) HttpComponentsSolrServer
[ https://issues.apache.org/jira/browse/SOLR-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated SOLR-2020: - Attachment: SOLR-2020.patch This patch completes the conversion. All tests pass but there's still some cleanup work to do + couple of places where I cut corners. HttpComponentsSolrServer Key: SOLR-2020 URL: https://issues.apache.org/jira/browse/SOLR-2020 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4.1 Environment: Any Reporter: Chantal Ackermann Priority: Minor Attachments: HttpComponentsSolrServer.java, HttpComponentsSolrServerTest.java, SOLR-2020-HttpSolrServer.patch, SOLR-2020.patch Implementation of SolrServer that uses the Apache Http Components framework. Http Components (http://hc.apache.org/) is the successor of Commons HttpClient and thus HttpComponentsSolrServer would be a successor of CommonsHttpSolrServer, in the future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ConjunctionScorer.doNext() overstays?
Ideally, consumers of DISI should hold onto the int docID returned from next/advance and use that... (ie, don't call docID() again, unless it's too hard to hold onto the returned doc). Yes, I remember raising that way back when: https://issues.apache.org/jira/browse/LUCENE-584?focusedCommentId=12564415page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12564415 Back then Mike B raised the issue of backwards compatibility so I don't know if the 4.0 release presents the opportunity to revisit that idea - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields
[ https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220387#comment-13220387 ] Shai Erera commented on LUCENE-3837: Andrzej, this brings back [old memories|http://mail-archives.apache.org/mod_mbox/lucene-dev/201004.mbox/%3cu2s786fde51004250432gd50bec64m9b2f6ee6dd495...@mail.gmail.com%3E] :-). The core difference in your proposal is that the updates are processed in a separate index, and that at runtime we use a PQ to match documents and collapse all the updates, right? And these updates will be reflected in the main index on segment merges, right? I personally prefer a more integrated solution then one that's based on matching PQs, but since I barely did something with my proposal for 2 years, I guess that your progress is better than no progress at all. One comment -- when the updates are collapsed, the may not just simply 'replace' what exists before them. I could see an update to a document which adds a stored field, and therefore if I'll call IndexReader.document(i), I'd expect to see that stored field with all the ones that existed before it. At the time I felt that modifying Lucene to add stacked segments is way too complicated, and the indexing internals kept changing by the day. But now Codecs seem to be very stable, and trunk's code changes relax, so perhaps it'll be worthwhile taking a second look at that proposal? (but only if you feel like it) A modest proposal for updateable fields --- Key: LUCENE-3837 URL: https://issues.apache.org/jira/browse/LUCENE-3837 Project: Lucene - Java Issue Type: New Feature Components: core/index Affects Versions: 4.0 Reporter: Andrzej Bialecki I'd like to propose a simple design for implementing updateable fields in Lucene. This design has some limitations, so I'm not claiming it will be appropriate for every use case, and it's obvious it has some performance consequences, but at least it's a start... This proposal uses a concept of overlays or stacked updates, where the original data is not removed but instead it's overlaid with the new data. I propose to reuse as much of the existing APIs as possible, and represent updates as an IndexReader. Updates to documents in a specific segment would be collected in an overlay index specific to that segment, i.e. there would be as many overlay indexes as there are segments in the primary index. A field update would be represented as a new document in the overlay index . The document would consist of just the updated fields, plus a field that records the id in the primary segment of the document affected by the update. These updates would be processed as usual via secondary IndexWriter-s, as many as there are primary segments, so the same analysis chains would be used, the same field types, etc. On opening a segment with updates the SegmentReader (see also LUCENE-3836) would check for the presence of the overlay index, and if so it would open it first (as an AtomicReader? or it would open individual codec format readers? perhaps it should load the whole thing into memory?), and it would construct an in-memory map between the primary's docId-s and the overlay's docId-s. And finally it would wrap the original format readers with overlay readers, initialized also with the id map. Now, when consumers of the 4D API would ask for specific data, the overlay readers would first re-map the primary's docId to the overlay's docId, and check whether overlay data exists for that docId and this type of data (e.g. postings, stored fields, vectors) and return this data instead of the original. Otherwise they would return the original data. One obvious performance issue with this appraoch is that the sequential access to primary data would translate into random access to the overlay data. This could be solved by sorting the overlay index so that at least the overlay ids increase monotonically as primary ids do. Updates to the primary index would be handled as usual, i.e. segment merges, since the segments with updates would pretend to have no overlays) would just work as usual, only the overlay index would have to be deleted once the primary segment is deleted after merge. Updates to the existing documents that already had some fields updated would be again handled as usual, only underneath they would open an IndexWriter on the overlay index for a specific segment. That's the broad idea. Feel free to pipe in - I started some coding at the codec level but got stuck using the approach in LUCENE-3836. The approach that uses a modified SegmentReader seems more promising. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your
[jira] [Commented] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index
[ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220421#comment-13220421 ] Dawid Weiss commented on SOLR-3185: --- Are there any other filters in the chain? Because PatternReplaceCharFilterFactory itself doesn't replace any html entities so it'd be weird. Also, can you quote the XML verbatim? If you have this: {noformat} charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^\w)\s[amp;]\s(\w) replacement=$1amp;amp;$2 / {noformat} then indeed the replaced value will be: {noformat} $1amp;$2 {noformat} PatternReplaceCharFilterFactory can't replace with ampersands in index -- Key: SOLR-3185 URL: https://issues.apache.org/jira/browse/SOLR-3185 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 3.5 Reporter: Mike Spencer Priority: Minor Labels: PatternReplaceCharFilter, regex Using solr.PatternReplaceCharFilterFactory to replace 'A B' (no quotes) with 'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query analysis will give the expected result of 'AB'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools. This is the affected charFilter: charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^\w)\s[amp;]\s(\w) replacement=$1amp;amp;$2 / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields
[ https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220420#comment-13220420 ] Andrzej Bialecki commented on LUCENE-3837: --- bq. I guess that your progress is better than no progress at all. That's my perspective too, and it's reflected in the title of this issue... I remember your description and in fact my proposal is somewhat similar. It does not use PQs, but indeed it merges updates on the fly, at the cost of keeping a static map of primary-secondary ids and random seeking in the secondary index to retrieve matching data. Please check the description above. And then once a segment merge is executed the overlay data will be integrated into the main data, because the merge process will pull in this mix of new and old without being aware of it - it will be hidden by Codec's read formats. Codec abstractions are great for this kind of manipulations. bq. One comment – when the updates are collapsed, the may not just simply 'replace' what exists before them. Right, old data will be returned if not overlaid by new data, meaning that e.g. old stored field values will be returned for all other fields except the updated field, and for that field the data from the overlay will be returned. A modest proposal for updateable fields --- Key: LUCENE-3837 URL: https://issues.apache.org/jira/browse/LUCENE-3837 Project: Lucene - Java Issue Type: New Feature Components: core/index Affects Versions: 4.0 Reporter: Andrzej Bialecki I'd like to propose a simple design for implementing updateable fields in Lucene. This design has some limitations, so I'm not claiming it will be appropriate for every use case, and it's obvious it has some performance consequences, but at least it's a start... This proposal uses a concept of overlays or stacked updates, where the original data is not removed but instead it's overlaid with the new data. I propose to reuse as much of the existing APIs as possible, and represent updates as an IndexReader. Updates to documents in a specific segment would be collected in an overlay index specific to that segment, i.e. there would be as many overlay indexes as there are segments in the primary index. A field update would be represented as a new document in the overlay index . The document would consist of just the updated fields, plus a field that records the id in the primary segment of the document affected by the update. These updates would be processed as usual via secondary IndexWriter-s, as many as there are primary segments, so the same analysis chains would be used, the same field types, etc. On opening a segment with updates the SegmentReader (see also LUCENE-3836) would check for the presence of the overlay index, and if so it would open it first (as an AtomicReader? or it would open individual codec format readers? perhaps it should load the whole thing into memory?), and it would construct an in-memory map between the primary's docId-s and the overlay's docId-s. And finally it would wrap the original format readers with overlay readers, initialized also with the id map. Now, when consumers of the 4D API would ask for specific data, the overlay readers would first re-map the primary's docId to the overlay's docId, and check whether overlay data exists for that docId and this type of data (e.g. postings, stored fields, vectors) and return this data instead of the original. Otherwise they would return the original data. One obvious performance issue with this appraoch is that the sequential access to primary data would translate into random access to the overlay data. This could be solved by sorting the overlay index so that at least the overlay ids increase monotonically as primary ids do. Updates to the primary index would be handled as usual, i.e. segment merges, since the segments with updates would pretend to have no overlays) would just work as usual, only the overlay index would have to be deleted once the primary segment is deleted after merge. Updates to the existing documents that already had some fields updated would be again handled as usual, only underneath they would open an IndexWriter on the overlay index for a specific segment. That's the broad idea. Feel free to pipe in - I started some coding at the codec level but got stuck using the approach in LUCENE-3836. The approach that uses a modified SegmentReader seems more promising. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [Lucene.Net] Merging 3.0.3 into Trunk
I agree with Prescott. Make a patch for that sucker! :) Thanks, Christopher On Thu, Mar 1, 2012 at 9:57 AM, Prescott Nasser geobmx...@hotmail.comwrote: Jira and then just submit your own patch imo Sent from my Windows Phone From: Stefan Bodewig Sent: 3/1/2012 7:23 AM To: lucene-net-...@incubator.apache.org Subject: Re: [Lucene.Net] Merging 3.0.3 into Trunk On 2012-02-29, Stefan Bodewig wrote: On 2012-02-28, Christopher Currens wrote: Alright, it's done! 3.0.3 is now merged in with Trunk! I'll see to running RAT and looking at the line-ends over the next few days so we can get them fixed once and not run into it with the release. I went for EOLs first and there are 621 files outside of lib and doc that need to be fixed. What I have now is not just a patch (of more than 200k lines), but also a list of 621 files that need their svn:eol-style property to be set. I can create a JIRA ticket for that attaching my patch and the list of files to fix or - since I technically am a committer - could just commit my cleaned up workspace as is (plus JIRA ticket that I'd open and close myself). What would you prefer? RAT doesn't really make sense before the line feeds are correct (I've seen quite a few files without license headers by manual inspection). Stefan
Re: toString on Thread
On Thu, Mar 1, 2012 at 5:20 PM, Dawid Weiss dawid.we...@gmail.com wrote: Overriding toString on a Thread is not a good idea. Can I remove it or at least make it simpler in ConcurrentMergeScheduler? This override caused a fantastic deadlock -- an interesting possibility I didn't think of -- again, when dumping threads (for the exception string) Thread.toString was invoked from what I thought was an isolated monitor (and it was); only toString had its own monitors underneath and here's what happened (simplified): Ouch! Now I've got to go think if we've done anything like that in Solr... -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: toString on Thread
Scary! I think remove it? Though it is nice to see what segments are being merged by the thread... but this risk is awful. App can turn on IW's infoStream to see it too... Mike McCandless http://blog.mikemccandless.com On Thu, Mar 1, 2012 at 5:20 PM, Dawid Weiss dawid.we...@gmail.com wrote: Overriding toString on a Thread is not a good idea. Can I remove it or at least make it simpler in ConcurrentMergeScheduler? This override caused a fantastic deadlock -- an interesting possibility I didn't think of -- again, when dumping threads (for the exception string) Thread.toString was invoked from what I thought was an isolated monitor (and it was); only toString had its own monitors underneath and here's what happened (simplified): Lucene Merge Thread #1: at org.apache.lucene.index.IndexWriter.segString(IndexWriter.java:3764) - waiting to lock L5 (a org.apache.lucene.index.IndexWriter) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.toString(ConcurrentMergeScheduler.java:499) ... at org.apache.lucene.util.LuceneTestCase.getRandom(LuceneTestCase.java:276) at org.apache.lucene.index.TestTransactions.access$100(TestTransactions.java:33) at org.apache.lucene.index.TestTransactions$RandomFailure.eval(TestTransactions.java:40) at org.apache.lucene.store.MockDirectoryWrapper.maybeThrowDeterministicException(MockDirectoryWrapper.java:688) - locked L4 (a org.apache.lucene.store.MockDirectoryWrapper) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:415) - locked L4 (a org.apache.lucene.store.MockDirectoryWrapper) at org.apache.lucene.codecs.lucene40.Lucene40FieldInfosWriter.write(Lucene40FieldInfosWriter.java:56) at org.apache.lucene.index.SegmentMerger.mergeFieldInfos(SegmentMerger.java:194) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3623) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3257) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451) Lucene Merge Thread #0: at org.apache.lucene.store.MockDirectoryWrapper.listAll(MockDirectoryWrapper.java:695) - waiting to lock L4 (a org.apache.lucene.store.MockDirectoryWrapper) at org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:345) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3272) - locked L5 (a org.apache.lucene.index.IndexWriter) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451) A classic, isn't it? Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: toString on Thread
Ouch! Now I've got to go think if we've done anything like that in Solr... Yeah... I honestly never thought about such possibility and I don't think any sane person would ;) I think this qualifies as a hack similar to the solution to this puzzler: http://wouter.coekaerts.be/2012/puzzle-clowns Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: toString on Thread
Though it is nice to see what segments are being merged by the thread... but this risk is awful. App can turn on IW's infoStream to see it too... This could be possible by updating a volatile string somewhere and only exposing it in toString override, but I don't know if this is worth the effort. Volatile will impose an additional happens-before, etc. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: toString on Thread
Now I've got to go think if we've done anything like that in Solr... I did a quick check via Eclipse's Java search and it seems nothing else overrides Thread#toString() or Thread#getName. Can't guarantee anything, but 99% sure we're safe from this one. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3017) Allow edismax stopword filter factory implementation to be specified
[ https://issues.apache.org/jira/browse/SOLR-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Dodsworth resolved SOLR-3017. - Resolution: Fixed Yonik's fix resolves this. Much appreciated. Allow edismax stopword filter factory implementation to be specified Key: SOLR-3017 URL: https://issues.apache.org/jira/browse/SOLR-3017 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Michael Dodsworth Priority: Minor Fix For: 4.0 Attachments: SOLR-3017-without-guava-alternative.patch, SOLR-3017.patch, SOLR-3017.patch, edismax_stop_filter_factory.patch Currently, the edismax query parser assumes that stopword filtering is being done by StopFilter: the removal of the stop filter is performed by looking for an instance of 'StopFilterFactory' (hard-coded) within the associated field's analysis chain. We'd like to be able to use our own stop filters whilst keeping the edismax stopword removal goodness. The supplied patch allows the stopword filter factory class to be supplied as a param, stopwordFilterClassName. If no value is given, the default (StopFilterFactory) is used. Another option I looked into was to extend StopFilterFactory to create our own filter. Unfortunately, StopFilterFactory's 'create' method returns StopFilter, not TokenStream. StopFilter is also final. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: toString on Thread
On Thu, Mar 1, 2012 at 5:30 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Ouch! Now I've got to go think if we've done anything like that in Solr... Yeah... I honestly never thought about such possibility and I don't think any sane person would ;) I think this qualifies as a hack similar to the solution to this puzzler: http://wouter.coekaerts.be/2012/puzzle-clowns Cute! Hadn't seen that. My first thought to a solution matched the first comment on the solutions page (no spoilers!). I missed the more elegant official solution. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Official stance on API changes between major versions
I am not a commiter but my company makes extensive use of Lucene.net. So here's my two pennies... I understand that there is a commendable motivation to be gentle with api changes. Wanting to give plenty of warning by obsoleting methods. Several points. First is that there is a change to the major version number. Users should expect changes to the api. Next, when this project was restarted last year the stated direction was to get caught up with the Java version and also to move towards a more dotnet style interface. The discussions on the list do occasionally get bogged down in this kind of too and fro. A coach my sports team used once said something along these lines... If the team can't choose then no one has made a convincing argument. So make a choice, any choice and just get on with it. If it turns out to be the wrong choice then at least you've learnt something. This is software. It's changeable. My bias is that I want what's in V4 (codecs, NRT etc). I'm willing to take some pain if it means this project can accelerate. I would imagine that most serious uses of Lucene would be hidden within a service or at least isolated in some way, not dotted around all over the application. This is what isolation is for, to protect components from change. The impact of even fairly major api changes should be quite localised and refactorable. Intimidating, yes. More than a bit scary, of course. But worth it for getting the newer bits. By all means be professional, make proposals, have some discussion. But please let's not be too conservative, too timid. 2.9.4g is a good release. We've been using it since shortly after it seemed stable. If there are users that need some stability then they should be advised to stick with g for a while. Now that that is done and a hearty thank you for the work on both the code and the Apache process. My vote would be for some more radical changes to be allowed. Lets get through 3.0.3 and on to 3.5 and 4.0. Lets get to one of the original goals which is functional parity with Java and lets be bold with some of the dotnet modifications (note that being bold does not mean that one is reckless). I'm sure that some will say, yeah great sentiment, now send some patches. I agree. I have sent some very minor patches previously and it frustrates me that my company has not contributed more. We have just taken on a lot more people so I hope that we can be more active with Lucene.net soon. --Andy On 28 February 2012 18:17, Christopher Currens currens.ch...@gmail.comwrote: I *really* don't mean to be a bother to anyone, but I'd like to continue work on this. I feel that until I can get a better sense of how the group feels about this, I can't make much progress. Perhaps this radio silence is just because this email thread got lost in among the others. On Fri, Feb 24, 2012 at 6:50 PM, Prescott Nasser geobmx...@hotmail.com wrote: Im not against breaking compatibility when changing the version number to a new major 2 - 3. Im not sure how others feel. Matching Java access modifiers seems like the right move. That said, what if we mark obsolete in 3.0.3 and when we make the jump to 4.0 wipe them out? In my head we shouldn't spend too much time cleaning up 3.0.3 aside from bug fixes if were just going to swap it for 4.0 in the near future. There has to be a break at some point, making it with a major release is the best place to make it. Sent from my Windows Phone From: Christopher Currens Sent: 2/24/2012 2:45 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] Official stance on API changes between major versions A bit of background about what I've been doing lately on the project. Because we've now confirmed that the .NET 3.0.3 branch is a completed port of Java 3.0.3 version, I've been spending time trying to work on some of the bugs and improvements that are assigned to this version. There wasn't any real discussion about the actual features, I just created some (based on mailing list discussions) and assigned them to the 3.0.3 release. The improvements I've been working on lately are ones that have bugged me specifically since I've started using Lucene.NET. I've worked on https://issues.apache.org/jira/browse/LUCENENET-468 and https://issues.apache.org/jira/browse/LUCENENET-470 so far. LUCENENET-740 is pretty much completed, all of the classes that implemented Closeable() now implement IDisposable, having a public void Dispose() and/or protected virtual void Dispose(bool disposing), depending if the class is sealed or not. What is left to do on that issue would be to make sure that all of the tests are a) overriding the protected dispose method as needed and b) are actually calling Dispose or are in a using statement. I've done quite a bit of work on LUCENENET-468, as well, though it is going far slower than 470, because there's a lot more
[jira] [Updated] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index
[ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Spencer updated SOLR-3185: --- Description: Using solr.PatternReplaceCharFilterFactory to replace {noformat}A B{noformat} with {noformat}AB{noformat} will result in {noformat}Aamp;B{noformat} being indexed. Query analysis will give the expected result of {noformat}AB{noformat}. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools. This is the affected charFilter: {noformat} charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^\w)\s[amp;]\s(\w) replacement=$1amp;$2 / {noformat} was: Using solr.PatternReplaceCharFilterFactory to replace 'A B' (no quotes) with 'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query analysis will give the expected result of 'AB'. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools. This is the affected charFilter: charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^\w)\s[amp;]\s(\w) replacement=$1amp;amp;$2 / PatternReplaceCharFilterFactory can't replace with ampersands in index -- Key: SOLR-3185 URL: https://issues.apache.org/jira/browse/SOLR-3185 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 3.5 Reporter: Mike Spencer Priority: Minor Labels: PatternReplaceCharFilter, regex Using solr.PatternReplaceCharFilterFactory to replace {noformat}A B{noformat} with {noformat}AB{noformat} will result in {noformat}Aamp;B{noformat} being indexed. Query analysis will give the expected result of {noformat}AB{noformat}. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools. This is the affected charFilter: {noformat} charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^\w)\s[amp;]\s(\w) replacement=$1amp;$2 / {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index
[ https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220605#comment-13220605 ] Mike Spencer commented on SOLR-3185: Sorry, had improper formatting before. Due to how the XML configuration needs to deal with ampersands I have to use the amp;amp; code instead of the character. It reads it fine but writes it literally instead of outputting the ampersand character. PatternReplaceCharFilterFactory can't replace with ampersands in index -- Key: SOLR-3185 URL: https://issues.apache.org/jira/browse/SOLR-3185 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 3.5 Reporter: Mike Spencer Priority: Minor Labels: PatternReplaceCharFilter, regex Using solr.PatternReplaceCharFilterFactory to replace {noformat}A B{noformat} with {noformat}AB{noformat} will result in {noformat}Aamp;B{noformat} being indexed. Query analysis will give the expected result of {noformat}AB{noformat}. I examined the index with both standalone Luke and the schema browser field and the index value is incorrect in both tools. This is the affected charFilter: {noformat} charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^\w)\s[amp;]\s(\w) replacement=$1amp;$2 / {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3795) Replace spatial contrib module with LSP's spatial-lucene module
[ https://issues.apache.org/jira/browse/LUCENE-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220614#comment-13220614 ] Ryan McKinley commented on LUCENE-3795: --- OK, I think the branch is ready to go. The one thing I don't like is that the spatial4j.jar gets included twice, once in the modules 'lib' directory and again in the solr lib directory. I could not figure out how to have the solr build compile and distribute this one Replace spatial contrib module with LSP's spatial-lucene module --- Key: LUCENE-3795 URL: https://issues.apache.org/jira/browse/LUCENE-3795 Project: Lucene - Java Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Fix For: 4.0 I propose that Lucene's spatial contrib module be replaced with the spatial-lucene module within Lucene Spatial Playground (LSP). LSP has been in development for approximately 1 year by David Smiley, Ryan McKinley, and Chris Male and we feel it is ready. LSP is here: http://code.google.com/p/lucene-spatial-playground/ and the spatial-lucene module is intuitively in svn/trunk/spatial-lucene/. I'll add more comments to prevent the issue description from being too long. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3795) Replace spatial contrib module with LSP's spatial-lucene module
[ https://issues.apache.org/jira/browse/LUCENE-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220688#comment-13220688 ] David Smiley commented on LUCENE-3795: -- For those following along here, the former spatial-base module portion of this code is now an ASL licensed 3rd party jar dependency: http://spatial4j.com Spatial4J Basically half of LSP is there now going by this new name. The other half is here as the new lucene spatial module. I agree that the branch looks ready to be merged into trunk. Replace spatial contrib module with LSP's spatial-lucene module --- Key: LUCENE-3795 URL: https://issues.apache.org/jira/browse/LUCENE-3795 Project: Lucene - Java Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Fix For: 4.0 I propose that Lucene's spatial contrib module be replaced with the spatial-lucene module within Lucene Spatial Playground (LSP). LSP has been in development for approximately 1 year by David Smiley, Ryan McKinley, and Chris Male and we feel it is ready. LSP is here: http://code.google.com/p/lucene-spatial-playground/ and the spatial-lucene module is intuitively in svn/trunk/spatial-lucene/. I'll add more comments to prevent the issue description from being too long. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3060) add highlighter support to SurroundQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220696#comment-13220696 ] Shalu Singh commented on SOLR-3060: --- Hi Ahmet, i am trying to include the SOLR-2703.patch into SOLR 3.5 downloaded from SVN branches to provide Surround parser. But it is not working after including the 2703 SOLR patch. Do u know how to apply the same?? add highlighter support to SurroundQParserPlugin - Key: SOLR-3060 URL: https://issues.apache.org/jira/browse/SOLR-3060 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Reporter: Ahmet Arslan Priority: Minor Fix For: 4.0 Attachments: SOLR-3060.patch, SOLR-3060.patch Highlighter does not recognize SrndQuery family. http://search-lucene.com/m/FuDsU1sTjgM http://search-lucene.com/m/wD8c11gNTb61 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org