Re: [VOTE] Release Lucene/Solr 5.5.5 RC2
SUCCESS! [1:15:56.228143] +1 Thanks! Sanne On 20 October 2017 at 16:28, Steve Rowe <sar...@gmail.com> wrote: > Please vote for release candidate 2 for Lucene/Solr 5.5.5 > > The artifacts can be downloaded from: > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.5.5-RC2-revb3441673c21c83762035dc21d3827ad16aa17b68 > > You can run the smoke tester directly with this command: > > python3 -u dev-tools/scripts/smokeTestRelease.py \ > https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.5.5-RC2-revb3441673c21c83762035dc21d3827ad16aa17b68 > > Here's my +1 > SUCCESS! [0:53:51.570213] > > -- > Steve > www.lucidworks.com > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #263: Backporting of SOLR-11477 on branch_5_5
Github user Sanne closed the pull request at: https://github.com/apache/lucene-solr/pull/263 --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #263: Backporting of SOLR-11477 on branch_5_5
Github user Sanne commented on the issue: https://github.com/apache/lucene-solr/pull/263 Thanks @sarowe ! closing --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #263: Backporting of SOLR-11477 on branch_5_5
GitHub user Sanne opened a pull request: https://github.com/apache/lucene-solr/pull/263 Backporting of SOLR-11477 on branch_5_5 This is an adaptation of last weeks' security fix SOLR-11477 by (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke) (aka @cpoerschke @uschindler ) to the 5_5 branch. The main difference with the original patch is in the inability of using lambdas, and not having some of the new generation testing helpers. In the CHANGES file I wasn't sure how to name this, I've opted to call it "version 5.5.6". Maybe I should simply omit the version? HTH You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sanne/lucene-solr SOLR-11477-on-5_5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/263.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #263 commit 590dca88dedc44242d155d476b1e4dca99a25f12 Author: Christine Poerschke <cpoersc...@apache.org> Date: 2017-10-13T11:46:58Z SOLR-11477: Disallow resolving of external entities in Lucene queryparser/xml/CoreParser and SolrCoreParser (defType=xmlparser or {!xmlparser ...}) by default. (Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke) --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6989) Implement MMapDirectory unmapping for coming Java 9 changes
[ https://issues.apache.org/jira/browse/LUCENE-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752649#comment-15752649 ] Sanne Grinovero commented on LUCENE-6989: - We have more recent releases of Hibernate Search using Lucene 5.5.x, but we typically aim to support older releases as well, for some reasonable time. It just so happens that Lucene 5.3 isn't that old yet in our perspective. While I constantly work to motivate people to move to the latest, for many Lucene 5.3 is working just great. The OSS communities we target typically will not expect API changes in a maintenance release, and we happen to (proudly) expose Lucene as public API, as I believe that hiding it all under some wrapping layer would not be able to be as powerful. Since we expose Lucene as public API implies I can't really update my dependency to Lucene with other than a micro (bugfix) release, when doing a micro/bugfix release myself: people got used that a Lucene major/minor update will only happen in an Hibernate Search major/minor update. Of course if that's not feasible, we might have to advise that those older releases won't be compatible with Java 9; that's a possible outcome, I guess we'll see how the final Java 9 release will make this doable. See you at FOSDEM, hopefully with my colleague Andrew Haley as well ;-) > Implement MMapDirectory unmapping for coming Java 9 changes > --- > > Key: LUCENE-6989 > URL: https://issues.apache.org/jira/browse/LUCENE-6989 > Project: Lucene - Core > Issue Type: Task > Components: core/store >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Labels: Java9 > Fix For: 6.0, 6.4 > > Attachments: LUCENE-6989-disable5x.patch, > LUCENE-6989-disable5x.patch, LUCENE-6989-fixbuild148.patch, > LUCENE-6989-v2.patch, LUCENE-6989-v3-post-b148.patch, LUCENE-6989.patch, > LUCENE-6989.patch, LUCENE-6989.patch, LUCENE-6989.patch > > > Originally, the sun.misc.Cleaner interface was declared as "critical API" in > [JEP 260|http://openjdk.java.net/jeps/260 ] > Unfortunately the decission was changed in favor of a oficially supported > {{java.lang.ref.Cleaner}} API. A side effect of this change is to move all > existing {{sun.misc.Cleaner}} APIs into a non-exported package. This causes > our forceful unmapping to no longer work, because we can get the cleaner > instance via reflection, but trying to invoke it will throw one of the new > Jigsaw RuntimeException because it is completely inaccessible. This will make > our forceful unmapping fail. There are also no changes in Garbage collector, > the problem still exists. > For more information see this [mailing list > thread|http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-January/thread.html#38243]. > This commit will likely be done, making our unmapping efforts no longer > working. Alan Bateman is aware of this issue and will open a new issue at > OpenJDK to allow forceful unmapping without using the now private > sun.misc.Cleaner. The idea is to let the internal class sun.misc.Cleaner > implement the Runable interface, so we can simply cast to runable and call > the run() method to unmap. The code would then work. This will lead to minor > changes in our unmapper in MMapDirectory: An instanceof check and casting if > possible. > I opened this issue to keep track and implement the changes as soon as > possible, so people will have working unmapping when java 9 comes out. > Current Lucene versions will no longer work with Java 9. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6989) Implement MMapDirectory unmapping for coming Java 9 changes
[ https://issues.apache.org/jira/browse/LUCENE-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659554#comment-15659554 ] Sanne Grinovero commented on LUCENE-6989: - Hi all, is there an update on this? I see several patches were committed, and the Hotspot issue JDK-8150436 is marked resolved, yet this issue is not. I'm particularly interested in the backport to 5.5 (actually ideally to 5.3); if someone could guide me I'll try to help with it. Thanks! > Implement MMapDirectory unmapping for coming Java 9 changes > --- > > Key: LUCENE-6989 > URL: https://issues.apache.org/jira/browse/LUCENE-6989 > Project: Lucene - Core > Issue Type: Task > Components: core/store >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Labels: Java9 > Fix For: 6.0 > > Attachments: LUCENE-6989-disable5x.patch, > LUCENE-6989-disable5x.patch, LUCENE-6989-v2.patch, LUCENE-6989.patch, > LUCENE-6989.patch, LUCENE-6989.patch, LUCENE-6989.patch > > > Originally, the sun.misc.Cleaner interface was declared as "critical API" in > [JEP 260|http://openjdk.java.net/jeps/260 ] > Unfortunately the decission was changed in favor of a oficially supported > {{java.lang.ref.Cleaner}} API. A side effect of this change is to move all > existing {{sun.misc.Cleaner}} APIs into a non-exported package. This causes > our forceful unmapping to no longer work, because we can get the cleaner > instance via reflection, but trying to invoke it will throw one of the new > Jigsaw RuntimeException because it is completely inaccessible. This will make > our forceful unmapping fail. There are also no changes in Garbage collector, > the problem still exists. > For more information see this [mailing list > thread|http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-January/thread.html#38243]. > This commit will likely be done, making our unmapping efforts no longer > working. Alan Bateman is aware of this issue and will open a new issue at > OpenJDK to allow forceful unmapping without using the now private > sun.misc.Cleaner. The idea is to let the internal class sun.misc.Cleaner > implement the Runable interface, so we can simply cast to runable and call > the run() method to unmap. The code would then work. This will lead to minor > changes in our unmapper in MMapDirectory: An instanceof check and casting if > possible. > I opened this issue to keep track and implement the changes as soon as > possible, so people will have working unmapping when java 9 comes out. > Current Lucene versions will no longer work with Java 9. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 5.5.2 RC2
+1 [from the Hibernate Search integration testsuite] On 22 June 2016 at 06:35, Shalin Shekhar Mangarwrote: > +1 > > SUCCESS! [2:19:37.075305] > > On Tue, Jun 21, 2016 at 10:18 PM, Steve Rowe wrote: >> >> Please vote for release candidate 2 for Lucene/Solr 5.5.2 >> >> The artifacts can be downloaded from: >> >> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.5.2-RC2-rev8e5d40b22a3968df065dfc078ef81cbb031f0e4a/ >> >> You can run the smoke tester directly with this command: >> >> python3 -u dev-tools/scripts/smokeTestRelease.py \ >> >> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.5.2-RC2-rev8e5d40b22a3968df065dfc078ef81cbb031f0e4a/ >> >> +1 from me - Docs, changes and javadocs look good, and smoke tester says: >> SUCCESS! [0:32:02.113685] >> >> -- >> Steve >> www.lucidworks.com >> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > > > -- > Regards, > Shalin Shekhar Mangar. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Congratulations to the new Lucene/Solr PMC Chair, Tommaso Teofili
That's great, congratulations!
[jira] [Commented] (LUCENE-7058) Add getters for the properties of several Query implementations
[ https://issues.apache.org/jira/browse/LUCENE-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177928#comment-15177928 ] Sanne Grinovero commented on LUCENE-7058: - Great, that's very handy! Thanks all for the speedy reviews and merge. > Add getters for the properties of several Query implementations > --- > > Key: LUCENE-7058 > URL: https://issues.apache.org/jira/browse/LUCENE-7058 > Project: Lucene - Core > Issue Type: Improvement > Components: core/query/scoring >Reporter: Guillaume Smet >Assignee: Alan Woodward > Labels: patch > Fix For: 6.0 > > Attachments: query-getters-v01.00.diff > > > Hi! > At Hibernate Search, we are currently working on an Elasticsearch backend > (aside from the existing Lucene backend). > As part of this effort, to provide a smooth migration path, we need to be > able to rewrite the Lucene queries as Elasticsearch queries. We know it will > be neither perfect or comprehensive but we want it to be the best possible > experience. > It works well in many cases but several implementations of Query don't have > the necessary getters to be able to extract the information from the Query. > The attached patch add getters to several implementations of Query. It would > be nice if it could be applied. > Any chance it could be applied to the next point release too? (probably not > but I'd better ask). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 5.3.2-RC1
Since the release had passed, could we please get further fixes in further micro versions? I'm looking forward for Lucene 5.3.2 for the other fixes it brings already. On 15 January 2016 at 17:03, Yonik Seeleywrote: > On Fri, Jan 15, 2016 at 11:34 AM, Erick Erickson > wrote: >> Anshum: >> >> I really hate to ask, but do we know whether >> >> https://issues.apache.org/jira/browse/SOLR-8496 >> (Facet search count numbers are falsified by older document versions) >> >> is a problem in 5.3.2? It's in 5.4.1 and we don't yet know when >> it was introduced. > > At this point, my guess is that was caused by LUCENE-6553 > which was committed in 5.3 > > -Yonik > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving Lucene / Solr from SVN to Git
Thanks for finally switching, I have been looking forward to this. I've been doing release management and generally helping with the switch from SVN to Git for the Hibernate project in the past 5 years, so I'm happy to share hints and tips from our experience there. Feel free to ask me for help on IRC or emails if you get stuck: we love Lucene, wouldn't want you to slow down ;) One crucial concept: it might be obvious - although sometimes it's not when people have been using SVN for a long time - but when you have a local Git clone of a project, you can experiment a lot and play with Git to see what would happen. As long as you don't push changes, you can experiment branching, merging and rebasing without making your experiments affect anyone else. Always create a new branch first, so you can play with the experimental branch and maybe nuke it if you get lost, and start over. So when reading the tutorials and references, don't be afraid to type commands and check the results. Thanks, Sanne On 11 January 2016 at 09:33, Jan Høydahl <jan@cominvent.com> wrote: > All discussion in the Github PR is captured in JIRA if the two are linked, > see https://issues.apache.org/jira/browse/SOLR-8166 as an example > If they are not linked, comments go to the dev list. > So we can keep it as today - allow people to choose freely to use patches > and/or PRs. > NOTE: We should always create JIRA for PR’s that we want to fix. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > >> 11. jan. 2016 kl. 09.13 skrev Dawid Weiss <dawid.we...@gmail.com>: >> >> Remember github is an external service, if it vanishes so would all >> comments and discussion. I'd stick with Jira, at least for the time >> being (until people get more familiar with git). Not everything at >> once. >> >> Dawid >> >> On Mon, Jan 11, 2016 at 9:11 AM, Shai Erera <ser...@gmail.com> wrote: >>> I think it will be nice if we integrate a code review tool into our >>> workflow, such as Gerrit maybe (even Github pull requests are good), instead >>> of the patch workflow with JIRA. >>> >>> But I agree we don't have to change that, not at start at least. The move to >>> git will allow those who want it, to use the code review tool on Github (via >>> pull requests). >>> >>> Shai >>> >>> On Mon, Jan 11, 2016 at 5:27 AM Mark Miller <markrmil...@gmail.com> wrote: >>>> >>>> I don't think there is a current plan to change how we do business. Just a >>>> change in where the master copy is hosted. >>>> >>>> We already have JIRA, dev, commit procedures, and integration with GitHub >>>> pull requests. All that will stay the same. No need to overthink it. >>>> >>>> - Mark >>>> >>>> On Sun, Jan 10, 2016 at 4:18 PM Jack Krupansky <jack.krupan...@gmail.com> >>>> wrote: >>>>> >>>>> Will anybody be able to create a pull request and then only committers >>>>> perform the merge operation? (I presume so, but... just for clarity, >>>>> especially for those not git-savvy yet.) >>>>> >>>>> Would patches still be added to Jira requests, or simply a link to a pull >>>>> request? (Again, I presume the latter, but the details of "submitting a >>>>> patch" should be clearly documented.) >>>>> >>>>> Then there is the matter of code review and whether to encourage comments >>>>> in Jira. Comments can be made on pull requests, but should some external >>>>> tool like reviewable.io be encouraged? >>>>> >>>>> -- Jack Krupansky >>>>> >>>>> On Sat, Jan 9, 2016 at 4:54 PM, Mark Miller <markrmil...@gmail.com> >>>>> wrote: >>>>>> >>>>>> We have done almost all of the work necessary for a move and I have >>>>>> filed an issue with INFRA. >>>>>> >>>>>> LUCENE-6937: Migrate Lucene project from SVN to Git. >>>>>> https://issues.apache.org/jira/browse/LUCENE-6937 >>>>>> >>>>>> INFRA-11056: Migrate Lucene project from SVN to Git. >>>>>> https://issues.apache.org/jira/browse/INFRA-11056 >>>>>> >>>>>> Everyone knows about rebase and linear history right ;) >>>>>> >>>>>> - Mark >>>>>> -- >>>>>> - Mark >>>>>> about.me/markrmiller >>>>> >>>>> >>>> -- >>>> - Mark >>>> about.me/markrmiller >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6909) Improve concurrency for FacetsConfig
[ https://issues.apache.org/jira/browse/LUCENE-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026929#comment-15026929 ] Sanne Grinovero commented on LUCENE-6909: - Hi [~mikemccand]! Thanks for checking. Yes, of course that first changed line is not required. I just felt it was useful to make it explicit to the reader that these are concurrent maps. Just a matter of style, feel free to revert that if it doesn't fit the Lucene style? Or should I provide an alternative patch? > Improve concurrency for FacetsConfig > > > Key: LUCENE-6909 > URL: https://issues.apache.org/jira/browse/LUCENE-6909 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Affects Versions: 5.3 >Reporter: Sanne Grinovero >Priority: Trivial > Attachments: > 0001-LUCENE-6909-Allow-efficient-concurrent-usage-of-a-Fa.patch > > > The design of {{org.apache.lucene.facet.FacetsConfig}} encourages reuse of a > single instance across multiple threads, yet the current synchronization > model is too strict as it doesn't allow for concurrent read operations. > I'll attach a trivial patch which removes the contention point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6909) Improve concurrency for FacetsConfig
[ https://issues.apache.org/jira/browse/LUCENE-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027261#comment-15027261 ] Sanne Grinovero commented on LUCENE-6909: - Thanks! > Improve concurrency for FacetsConfig > > > Key: LUCENE-6909 > URL: https://issues.apache.org/jira/browse/LUCENE-6909 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Affects Versions: 5.3 >Reporter: Sanne Grinovero >Priority: Trivial > Fix For: Trunk, 5.5 > > Attachments: > 0001-LUCENE-6909-Allow-efficient-concurrent-usage-of-a-Fa.patch > > > The design of {{org.apache.lucene.facet.FacetsConfig}} encourages reuse of a > single instance across multiple threads, yet the current synchronization > model is too strict as it doesn't allow for concurrent read operations. > I'll attach a trivial patch which removes the contention point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6909) Improve concurrency for FacetsConfig
Sanne Grinovero created LUCENE-6909: --- Summary: Improve concurrency for FacetsConfig Key: LUCENE-6909 URL: https://issues.apache.org/jira/browse/LUCENE-6909 Project: Lucene - Core Issue Type: Improvement Components: core/other Affects Versions: 5.3 Reporter: Sanne Grinovero Priority: Trivial The design of {{org.apache.lucene.facet.FacetsConfig}} encourages reuse of a single instance across multiple threads, yet the current synchronization model is too strict as it doesn't allow for concurrent read operations. I'll attach a trivial patch which removes the contention point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Optimisations proposal for FacetsConfig
Thanks Erick! It's done, it was as trivial as deleting a single word: https://issues.apache.org/jira/browse/LUCENE-6909 Sanne On 24 November 2015 at 18:10, Erick Erickson <erickerick...@gmail.com> wrote: > Sanne: > > Sure, please open a JIRA and add a patch. You'll need to create a user > ID on the JIRA system, but that's a "self-serve" option. > > Best, > Erick > > On Mon, Nov 23, 2015 at 8:21 AM, Sanne Grinovero > <sanne.grinov...@gmail.com> wrote: >> Hello all, >> I was looking into the source code for >> org.apache.lucene.facet.FacetsConfig as it's being highlighted as an >> hotspot of allocations during a performance analysis session. >> >> Our code was allocating a new instance of FacetsConfig for each >> Document being built; there are several maps being allocated by such >> an instance, both as instance fields and on the hot path of method >> "#build(Document doc)". >> >> My understanding from reading the code is that it's designed to be >> multi-threaded, probably to reuse one instance for a single index? >> >> That would resolve my issue with allocations at instance level, and >> probably also the maps being allocated within the build method as the >> JVM seems to be smart enough to skip those; at least that's my >> impression with a quick experiment. >> >> However reusing this single instance across all threads would become a >> contention point as all getters to read the field configurations are >> synchronized. >> Since the maps being read are actually safe ConcurrentMap instances, I >> see no reason for the "synchronized", so really it just boils down to >> a trivial patch to remove those on the reader methods. >> >> May I open a JIRA and propose a patch for that? >> >> As a second step, I'd also like to see if the build method could be >> short-circuited for a quick return: in case there are no faceted >> fields would be great to just return with the input document right >> away. >> >> Thanks, >> Sanne >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6909) Improve concurrency for FacetsConfig
[ https://issues.apache.org/jira/browse/LUCENE-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanne Grinovero updated LUCENE-6909: Attachment: 0001-LUCENE-6909-Allow-efficient-concurrent-usage-of-a-Fa.patch Trivial patch. The synchronization isn't needed on `getDimConfig` because it's reading from a ConcurrentMap. Synchronization is still needed on setters, but that's not a performance concern as the usage pattern is supposedly to configure the fields once and then reuse the instance mostly reading. > Improve concurrency for FacetsConfig > > > Key: LUCENE-6909 > URL: https://issues.apache.org/jira/browse/LUCENE-6909 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Affects Versions: 5.3 >Reporter: Sanne Grinovero >Priority: Trivial > Attachments: > 0001-LUCENE-6909-Allow-efficient-concurrent-usage-of-a-Fa.patch > > > The design of {{org.apache.lucene.facet.FacetsConfig}} encourages reuse of a > single instance across multiple threads, yet the current synchronization > model is too strict as it doesn't allow for concurrent read operations. > I'll attach a trivial patch which removes the contention point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Optimisations proposal for FacetsConfig
Hello all, I was looking into the source code for org.apache.lucene.facet.FacetsConfig as it's being highlighted as an hotspot of allocations during a performance analysis session. Our code was allocating a new instance of FacetsConfig for each Document being built; there are several maps being allocated by such an instance, both as instance fields and on the hot path of method "#build(Document doc)". My understanding from reading the code is that it's designed to be multi-threaded, probably to reuse one instance for a single index? That would resolve my issue with allocations at instance level, and probably also the maps being allocated within the build method as the JVM seems to be smart enough to skip those; at least that's my impression with a quick experiment. However reusing this single instance across all threads would become a contention point as all getters to read the field configurations are synchronized. Since the maps being read are actually safe ConcurrentMap instances, I see no reason for the "synchronized", so really it just boils down to a trivial patch to remove those on the reader methods. May I open a JIRA and propose a patch for that? As a second step, I'd also like to see if the build method could be short-circuited for a quick return: in case there are no faceted fields would be great to just return with the input document right away. Thanks, Sanne - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6212) Remove IndexWriter's per-document analyzer add/updateDocument APIs
[ https://issues.apache.org/jira/browse/LUCENE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608407#comment-14608407 ] Sanne Grinovero commented on LUCENE-6212: - Hi Adrien, thanks for replying! Yes I agree with you that _in general_ this could be abused and I understand the caveats, still I would like to do it. Since Lucene is a library for developers and it's not an end user product I would prefer it could give me a bit more flexibility. Remove IndexWriter's per-document analyzer add/updateDocument APIs -- Key: LUCENE-6212 URL: https://issues.apache.org/jira/browse/LUCENE-6212 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 5.1, Trunk Attachments: LUCENE-6212.patch IndexWriter already takes an analyzer up-front (via IndexWriterConfig), but it also allows you to specify a different one for each add/updateDocument. I think this is quite dangerous/trappy since it means you can easily index tokens for that document that don't match at search-time based on the search-time analyzer. I think we should remove this trap in 5.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6212) Remove IndexWriter's per-document analyzer add/updateDocument APIs
[ https://issues.apache.org/jira/browse/LUCENE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608156#comment-14608156 ] Sanne Grinovero commented on LUCENE-6212: - Hello, I understand there are good reasons to prevent this for the average user but I would beg you to restore the functionality for those who know what they are doing. There are perfectly valid use cases to use a different Analyzer at query time rather than indexing time, for example when handling synonyms at indexing time you don't need to apply the substitutions again at query time. Beyond synonyms, it's also possible to have text of different sources which has been pre-processed in different ways, so needs to be tokenized differently to get a consistent output. I love the idea of Lucene to become more strict regarding to consistent schema choices, but I would hope we could stick to field types and encoding, while Analyzer mappings can use a bit more flexibility? Would you accept a patch to overload {code}org.apache.lucene.index.IndexWriter.updateDocument(Term, Iterable? extends IndexableField){code} with the expert version: {code}org.apache.lucene.index.IndexWriter.updateDocument(Term, Iterable? extends IndexableField, Analyzer overrideAnalyzer){code} ? That would greatly help me to migrate to Lucene 5. My alternatives are to close/open the IndexWriter for each Analyzer change but that would have a significant performance impact; I'd rather cheat and pass an Analyzer instance which is mutable, even if that would prevent me from using the IndexWriter concurrently. Remove IndexWriter's per-document analyzer add/updateDocument APIs -- Key: LUCENE-6212 URL: https://issues.apache.org/jira/browse/LUCENE-6212 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 5.1, Trunk Attachments: LUCENE-6212.patch IndexWriter already takes an analyzer up-front (via IndexWriterConfig), but it also allows you to specify a different one for each add/updateDocument. I think this is quite dangerous/trappy since it means you can easily index tokens for that document that don't match at search-time based on the search-time analyzer. I think we should remove this trap in 5.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Anshum Gupta to the PMC
Congratulations Anshum! Regards, Sanne - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5569) Rename AtomicReader to LeafReader
[ https://issues.apache.org/jira/browse/LUCENE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291720#comment-14291720 ] Sanne Grinovero commented on LUCENE-5569: - As a heavy Lucene consumer I probably have no right at all to complain :) But now that the time has come to test the candidate release for 5.0, let me share some facts: - This change caused some ~600 compile errors in our codebase - My personal opinion being that {{AtomicReader}} was a very good name, please take it as a statement that such names are quite a personal choice and someone just needs to make a call (And stick to it!). Indeed it's not a major blocker, but as [~ysee...@gmail.com] wisely puts it, I'd wish the bar against API changes was higher, especially when there isn't a really good reason. Rename AtomicReader to LeafReader - Key: LUCENE-5569 URL: https://issues.apache.org/jira/browse/LUCENE-5569 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Ryan Ernst Priority: Blocker Fix For: 5.0 Attachments: LUCENE-5569.patch, LUCENE-5569.patch See LUCENE-5527 for more context: several of us seem to prefer {{Leaf}} to {{Atomic}}. Talking from my experience, I was a bit confused in the beginning that this thing is named {{AtomicReader}}, since {{Atomic}} is otherwise used in Java in the context of concurrency. So maybe renaming it to {{Leaf}} would help remove this confusion and also carry the information that these readers are used as leaves of top-level readers? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Move trunk to Java 8
Unrelated for the vote, but since it came up: Oracle isn't the only large corporation to employ high calibre skilled committers on the OpenJDK project. Oracle decides support of its own builds on its own terms, but Red Hat for example supports the JVM for much longer to its customers, and being an OSS friendly company any path which might get developed will be included in some publicly available maintenance branch. Disclaimer: I work for Red Hat but I'm just mentioning it as someone passionate for Lucene; so I happen to have an idea of the intention of my colleagues working on the OpenJDK project, but I am not representing my employer on this matter: just wanted to point out that after Oracle will end to support Java7, it's not forcing anyone to move away from it, nor forcing to pay money. In fact in my experience it's very common to find users of older Lucene versions on much older JVMs, often supported JVM builds by other vendors, and I don't expect this to change. HTH -- Sanne On 12 September 2014 20:31, Chris Hostetter hossman_luc...@fucit.org wrote: : That is bogus for an open source project. I won't have such updates, : how can i support such a java version, users that run into trouble? : And this does happen often. : I don't think i should have to pay money and become a paying customer : to Oracle to support lucene. I didn't say you should. I in fact said almost the exact opposite: that we shouldn't let commercial versions of the JDK have any bearing on our decision 1) Benson made a reasonable statement that There are many large organizations of the sort that use Lucene Solr that will not be moving to 8 for years yet 2) you said: I don't buy for years yet. ... impling that such organizations will *have* to upgrade before then because there won't be *free* releases of java. 3) I tried to point out 2 things: a) we shouldn't let the EOL cycle of *one* commercial vendor have any bearing on our policy of support -- particularly since the refrence implementation is an open source source project. b) that your argument against benson's claims seemed missleading: just because Oracle is EOLing doesn't mean people won't be using OpenJDK; even if they are using Oracle's JDK, if they are large comercial organizations they might pay oracle to keep using it for a long time. -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5541) FileExistsCachingDirectory, to work around unreliable File.exists
[ https://issues.apache.org/jira/browse/LUCENE-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133804#comment-14133804 ] Sanne Grinovero commented on LUCENE-5541: - Thanks Michael, [~gustavonalle] from the Infinispan team was able to reproduce it and figure out how this relates to {{File.exists}}; https://issues.jboss.org/browse/ISPN-2981 is now being resolved, it seems this wasn't a bug in Lucene but a very subtle race condition in the Infinispan Directory for Lucene, so affecting the Directory for Lucene 4 as well. For the record ISPN-2981 would only trigger if all following conditions are met: - A Merge is writing concurrently to a thread using an IndexWriter for doing new writes - The node in the cluster happens to not be the primary owner for a specific entry (so it would be impossible on single node tests, unlikely on small clusters) - High load (or rather: low write load would make it unlikely) FileExistsCachingDirectory, to work around unreliable File.exists - Key: LUCENE-5541 URL: https://issues.apache.org/jira/browse/LUCENE-5541 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Michael McCandless Attachments: LUCENE-5541.patch File.exists is a dangerous method in Java, because if there is a low-level IOException (permission denied, out of file handles, etc.) the method can return false when it should return true. Fortunately, as of Lucene 4.x, we rely much less on File.exists, because we track which files the codec components created, and we know those files then exist. But, unfortunately, going from 3.0.x to 3.6.x, we increased our reliance on File.exists, e.g. when creating CFS we check File.exists on each sub-file before trying to add it, and I have a customer corruption case where apparently a transient low level IOE caused File.exists to incorrectly return false for one of the sub-files. It results in corruption like this: {noformat} java.io.FileNotFoundException: No sub-file with id .fnm found (fileName=_1u7.cfs files: [.tis, .tii, .frq, .prx, .fdt, .nrm, .fdx]) org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:157) org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:146) org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71) org.apache.lucene.index.IndexWriter.getFieldInfos(IndexWriter.java:1212) org.apache.lucene.index.IndexWriter.getCurrentFieldInfos(IndexWriter.java:1228) org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1161) {noformat} I think typically local file systems don't often hit such low level errors, but if you have an index on a remote filesystem, where network hiccups can cause problems, it's more likely. As a simple workaround, I created a basic Directory delegator that holds a Set of all created but not deleted files, and short-circuits fileExists to return true if the file is in that set. I don't plan to commit this: we aren't doing bug-fix releases on 3.6.x anymore (it's very old by now), and this problem is already fixed in 4.x (by reducing our reliance on File.exists), but I wanted to post the code here in case others hit it. It looks like it was hit e.g. https://netbeans.org/bugzilla/show_bug.cgi?id=189571 and https://issues.jboss.org/browse/ISPN-2981 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5541) FileExistsCachingDirectory, to work around unreliable File.exists
[ https://issues.apache.org/jira/browse/LUCENE-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130172#comment-14130172 ] Sanne Grinovero commented on LUCENE-5541: - Hi [~mikemccand], I think I'm hitting this issue, indeed using still Lucene 3.6.2. Your comments are much appreciated, but I'm not understanding how {{File.exists}} is related with the exception, when this is being thrown by the {{CompoundFileReader}} ? In fact these tests were run having compound files disabled, so I'd love to put a breackpoint in the IndexWriter code where it decided this segment needed to be wrapped in a {{CompoundFileReader}}, however it seems I can't easily reproduce the same error. In case we're able to reproduce it again I would like to provide a patch, even if I understand there won't be more releases. FileExistsCachingDirectory, to work around unreliable File.exists - Key: LUCENE-5541 URL: https://issues.apache.org/jira/browse/LUCENE-5541 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Michael McCandless Attachments: LUCENE-5541.patch File.exists is a dangerous method in Java, because if there is a low-level IOException (permission denied, out of file handles, etc.) the method can return false when it should return true. Fortunately, as of Lucene 4.x, we rely much less on File.exists, because we track which files the codec components created, and we know those files then exist. But, unfortunately, going from 3.0.x to 3.6.x, we increased our reliance on File.exists, e.g. when creating CFS we check File.exists on each sub-file before trying to add it, and I have a customer corruption case where apparently a transient low level IOE caused File.exists to incorrectly return false for one of the sub-files. It results in corruption like this: {noformat} java.io.FileNotFoundException: No sub-file with id .fnm found (fileName=_1u7.cfs files: [.tis, .tii, .frq, .prx, .fdt, .nrm, .fdx]) org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:157) org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:146) org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71) org.apache.lucene.index.IndexWriter.getFieldInfos(IndexWriter.java:1212) org.apache.lucene.index.IndexWriter.getCurrentFieldInfos(IndexWriter.java:1228) org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1161) {noformat} I think typically local file systems don't often hit such low level errors, but if you have an index on a remote filesystem, where network hiccups can cause problems, it's more likely. As a simple workaround, I created a basic Directory delegator that holds a Set of all created but not deleted files, and short-circuits fileExists to return true if the file is in that set. I don't plan to commit this: we aren't doing bug-fix releases on 3.6.x anymore (it's very old by now), and this problem is already fixed in 4.x (by reducing our reliance on File.exists), but I wanted to post the code here in case others hit it. It looks like it was hit e.g. https://netbeans.org/bugzilla/show_bug.cgi?id=189571 and https://issues.jboss.org/browse/ISPN-2981 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 4.6.1 RC1
+1 Run integration tests with: - Hibernate Search - Infinispan (indexing/searching entries with Lucene) - Infinispan (storing indexes from Lucene) All perfect, great job! (For long we've been stuck on Lucene 3.x but that's finally resolved) Sanne On 18 January 2014 01:51, Steve Rowe sar...@gmail.com wrote: +1 Smoke tester says: SUCCESS! [1:03:14.565590] Changes, docs and javadocs look good. Steve On Jan 17, 2014, at 9:13 AM, Mark Miller markrmil...@gmail.com wrote: Please vote to release the following artifacts: http://people.apache.org/~markrmiller/lucene_solr_4_6_1r1559132/ Here is my +1. -- - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: The Old Git Discussion
+1 David and Mark I also like Lajos have - very sadly - not contributed as much as I'd want to Lucene, but having followed this thread with interest for a while, I hope my contribution is well received. I do sympathize with all the problems which have been highlighted about Git as I've had the same impression 3 years ago when all our projects (Hibernate) where moved to Git, and I was the skeptical one back then. I have suffered from it for a couple of weeks, while I was pointlessly trying to map my previous SVN workflow on Git.. until I realized that that was the main crux of my pain with it. I really do have to admit I was just stubborn and grown up in bad habits, I'm extremely happy we moved now.. and yes - no offence - but from an outsider you all look like carving code on a stone wall with stone axes. Sparing you all the details of what I did wrong and how exactly it should be used the point is really a huge flexibility and a better model for the problem it solves. On this thread I've seen several problems being pointed out about git, but while I'd be happy to chat about each single one, for the sake of brevity my impression is just confusion by people who are trying to use it as it was an alias to svn. To put it boldly you're missing the point :-) If you need details, feel free to ask here or contact me on IRC: I'm afraid my email is too long already. Would be good to see some negative points from someone who actually used it for a significant time. From my part for example I don't like the complexity of handling merges; but then again we also use fast-forward only; considering that, maybe I've never actually understood how a merge should be done - as I've never practiced it. Please take it as an example of how you don't need to learn all its details and still get huge benefits from it: in 47 releases, for 3 years long, ~100 contributors have been happily collaborating, we developed a workflow which suites us best and never ever needed to do a merge. And yes I confirm it feels very odd for an occasional contributor that you guys still work by attaching patch files to JIRA. - Sanne On 8 January 2014 00:45, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: +1, Mark. Git isn't perfect; I sympathize with the annoyances pointed out by Rob et. all. But I think we would be better off for it -- a net win considering the upsides. In the end I'd love to track changes via branches (which includes forks people make to add changes), not with attaching patch files to an issue tracker. The way we do things here sucks for collaboration and it's a higher bar for people to get involved than it can and should be. ~ David Mark Miller-3 wrote I don’t really buy the fad argument, but as I’ve said, I’m willing to wait a little longer for others to catch on. I try and follow the stats and reports and articles on this pretty closely. As I mentioned early in the thread, by all appearances, the shift from SVN to GIT looks much like the shift from CVS to SVN. This was not a fad change, nor is the next mass movement likely to be. Just like no one starts a project on CVS anymore, we are almost already to the point where new projects start exclusive on GIT - especially open source. I’m happy to sit back and watch the trend continue though. The number of GIT users in the committee and among the committers only grows every time the discussion comes up. If this was 2009, 2010, 2011 … who knows, perhaps I would buy some fad argument. But it just doesn’t jive in 2014. - Mark - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/The-Old-Git-Discussion-tp4109193p4110109.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0 Take 3
+1 On 20 November 2013 18:00, Tommaso Teofili tommaso.teof...@gmail.com wrote: +1 Tommaso 2013/11/20 Jan Høydahl jan@cominvent.com +1 Happy smoketester on Mac -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 19. nov. 2013 kl. 15:11 skrev Simon Willnauer simon.willna...@gmail.com: Please vote for the third Release Candidate for Lucene/Solr 4.6.0 (don't be irritated by that this is RC4 I build on that I didn't put up for vote) This RC includes some additional fixes related to Changes.html that were committed in the last days like SOLR-5397 as well as: SOLR-5464: Add option to ConcurrentSolrServer to stream pure delete requests. SOLR-5465: SolrCmdDistributor retry logic has a concurrency race bug. SOLR-5452: Do not attempt to proxy internal update requests. you can download it here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC4-rev1543363/ or run the smoke tester directly with this commandline (don't forget to set JAVA6_HOME etc.): python3.2 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC4-rev1543363/ 1543363 4.6.0 /tmp/smoke_test_4_6 I integrated the RC into Elasticsearch and all tests pass: https://github.com/s1monw/elasticsearch/tree/upgrade_lucene_4_6 Smoketester said: SUCCESS! [1:08:00.010026] here is my +1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Indexing file with security problem
To be honest I am not familiar with ManifoldCF, so I won't say if Hibernate Search is better or not, but it would definitely not be too hard with Hibernate Search: 1) You annotate with @Indexed the entity referring to your PostgreSQL table containing the metadata; with @TikaBridge you point it to the external resource containing the document. Returning database ids is the default behaviour. http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#d0e4244 2) Is a bit more complex but I don't think any more complex than what it would be with other technologies: you should encode some information in the index, then define a parametric filter on that. http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#query-filter 3) Not sure, sorry. But the automatic indexing triggers happen as soon as you store the metadata, so maybe that is good enough? Looks interesting! Sanne - Hibernate Search team On 27 June 2013 03:14, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I would start from ManifoldCF - it may save you some work. Otis Solr ElasticSearch Support http://sematext.com/ On Jun 26, 2013 5:01 PM, lukasw lukas...@gmail.com wrote: Hello I'll try to briefly describe my problem and task. My name is Lukas and i am Java developer , my task is to create search engine for different types of file (only text file types) pdf, word, odf, xml but not html. I have got little experience with lucene about year ago i wrote simple full text search using lucene and hibernate search. That was simple project. But now i have got very difficult task with searching. We are using java 1.7 and glassfish 3 and i have to concentrate only server side approach not client ui. Ther is my three major problem : 1) All files is stored on webdav server, but information about file name , id file typ etc are stored into database (postgresql) so when i creating index i need to use both information. As a result of query i need only return file id from database. Summary content of file is stored in server but information about file is stored in database so we must retrieve both. 2) Secondary problem it that each file has a level of secrecy. But major problem is that this level is calculated dynamically. When calculating level of security for file we considering several properties. The static properties is files location, the folder in which the file is, but also dynamic information user profiles user roles and departments . So when user Maggie is logged she can search only files test.pdf , test2.doc etc but if user Stev is logged he have got different profiles such a Maggie so he can only search some phase in file broken.pdf, mybook.odt. test2.doc etc . . I think that when for example user search phase lucene +solr we search in all indexed documents and after that filtered result. But i think that solution is is not very efficient. What if results count 100 files , so what next we filtered step by step each files ? But i do not see any other solution. Maybe you can help me and lucene or solr have got mechanism to help. 3) Last problem is that some files are encrypted. So that files must be indexed only once before encryption ! But i think that if we indexed secure files so we get security issue. Because all word from that file is tokenized. I have not got any idea haw to secure lucene documents and index datastore ? its possible ... Also i have got question that i need to use Solr for my serarch engine or using only lucene and write own search engine ? So as you can see i have not got problem with indexing , serching but with security files and files secured levels. Thanks for any hints and time you spend for me. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-file-with-security-problem-tp4073394.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Infinispan JGroups migrating to Apache License
Hello all, as some of you already know the Infinispan project includes several integration points with the Apache Lucene project, including a Directory implementation, but so far we had a separate community because of the license incompatibility. I'm very happy to announce now that both Infinispan and its dependency JGroups are going to move to the Apache License, as you can see from the following blogposts: http://infinispan.blogspot.co.uk/2013/05/infinispan-to-adopt-apache-software.html http://belaban.blogspot.ch/2013/05/jgroups-to-investigate-adopting-apache.html I hope this will benefit both projects and allow more people to use both. # What's Infinispan ? It's an in-memory Key/Value store geared to fast data rather than very large data, with Dynamo inspired consistent hashing to combine reliability and resources of multiple machines. Does not support eventual consistency but supports transactions, including XA. When data gets too large to be handled in JVM heap it can swap over to different storage engines, i.e. Cassandra, HBase, MongoDB, JDBC, cloud storage, .. [there is much more but for the sake of brevity I expect this to be most useful to Lucene developers] # What's this state of this Infinispan / Lucene Directory? Basically it stores the segments in the distributed cache: so it provides a quick storage engine, real-time replication without NFS trouble, optionally integration with transactions. This is working quite well, and - depending on your needs and configuration options - it might be faster than FSDirectory or RAMDirectory. In all fairness it's not easy to defeat the efficiency of FSDirectory when it's in memory-mapping mode: it might happen in some cases that it will be faster, more or less significantly, but I think the real difference is in the scalability options and the flexibility in architectures. It is generally faster than the RAMDirectory, especially under contention. Support for Lucene 4 was just added recently, so while I think it would be great to have custom Codecs for it, that isn't done yet: for now it just stores the byte[] chunks of the segments. This is not a replacement for Solr or ElasticSearch: it provides just a storage component; it does not solve - among others - the problem of distributed writers. It is used by Hibernate Search. Regards, Sanne - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: release 3.6.2
+1 tested the Maven artifacts with the testsuites from Infinispan and Hibernate Search On 21 December 2012 13:22, Tommaso Teofili tommaso.teof...@gmail.com wrote: +1 Tommaso 2012/12/21 Simon Willnauer simon.willna...@gmail.com same here +1 On Fri, Dec 21, 2012 at 1:11 PM, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: Besides the mentioned jdoc warnings the smoke tester ran fine. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Source Control
If Lucene was moved to use GIT, I would love that. Not going into details now, but having used GIT for two years on other open source projects I'm pretty sure that it makes collaboration significantly easier. We use GitHub, but the star is GIT: GitHub makes it easier for non-power users and is great to have but after you get used to the command line git it's outrageously useful and I don't actually use the github webinterface any more (but it's nice that occasional contributors can). Being very flexible indeed often the problem might be to find an agreement on some consistent work flow, but that's never been a blocker in our case as each user is free to use what he prefers on his personal repository. Highly recommended! Sanne On 28 October 2012 16:58, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Different toys for different boys. Everyone will have his or her favorite workflow, it'll be impossible to find a consensus here. As for me, I've tasted cvs, svn, git and other version control systems and I must say git is the one I like the most, although there were a good few cursing moments along the way. As for legal -- the maven team had to go through the same process, I don't think the checkbox (or its absence) was a problem. Dawid P.S. If anybody knows the equivalent of git add -A . (that also stages removed files) in svn I'd really like to know ;) On Sun, Oct 28, 2012 at 5:49 PM, Adrien Grand jpou...@gmail.com wrote: Hi Uwe, On Sun, Oct 28, 2012 at 5:25 PM, Uwe Schindler u...@thetaphi.de wrote: I don't want to use GIT; HG was horrible, too! Why don't you like them? -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ideas for alphas/betas?
On 7 March 2012 15:42, Tommaso Teofili tommaso.teof...@gmail.com wrote: 2012/3/7 Robert Muir rcm...@gmail.com On Tue, Mar 6, 2012 at 1:42 AM, Shai Erera ser...@gmail.com wrote: I agree. Maybe we should also tag issues as 4.0-alpha, 4.0-beta in JIRA? For 4.0-alpha we'll tag all the issues that are expected to change the index format, and 4.0-beta all the issues that require API changes? I have no opinion on the actual JIRA tagging, but I think Hoss has a good point that it would be better if we looked at alphas/betas as real releases... ideally our first alpha release would be exactly the same as our real 4.0 release, but we are just being realistic and at the same time marking some caveats so that users know its a big scary change. So I'm not sure we should intentionally try to delay/bucket any issues to alpha or beta, I think we should try to make it great from the start... these 'guarantees' are just to help increase adoption and testing. +1, as also Simon was saying let's go fixing the blockers and start working on the alpha release process. It's of course very cool if you could start by make it great from the start, but that would take more time I would rather be realistic and start providing some tags in quick iterations. Even if it has known issues, that's acceptable for an Alpha release but at least you start getting more feedback, especially on the API which you obviously don't want to alter significantly just before the final. Regards, Sanne - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.4.0, RC1
+1 all tests on all Lucene-using projects I contribute to pass without any change needed (a sure sign I should add more...). Once more, great work and thank so much to everyone involved. Sanne On 11 September 2011 16:11, Robert Muir rcm...@gmail.com wrote: +1, thanks for creating this release candidate. On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless luc...@mikemccandless.com wrote: Please vote to release the RC1 artifacts at: https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142 as Lucene 3.4.0 and Solr 3.4.0. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)
+1 Sanne 2011/6/27 Michael McCandless luc...@mikemccandless.com: +1 Mike McCandless http://blog.mikemccandless.com On Mon, Jun 27, 2011 at 1:38 PM, Simon Willnauer simon.willna...@googlemail.com wrote: This issue has been discussed on various occasions and lately on LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239) The main reasons for this have been discussed on the issue but let me put them out here too: - Lack of testing on Jenkins with Java 5 - Java 5 end of lifetime is reached a long time ago so Java 5 is totally unmaintained which means for us that bugs have to either be hacked around, tests disabled, warnings placed, but some things simply cannot be fixed... we cannot actually support something that is no longer maintained: we do find JRE bugs (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important that bugs actually get fixed: cannot do everything with hacks.\ - due to Java 5 we legitimate performance hits like 20% slower grouping speed. For reference please read through the issue mentioned above. A lot of the committers seem to be on the same page here to drop Java 5 support so I am calling out an official vote. all Lucene 3.x releases will remain with Java 5 support this vote is for trunk only. Here is my +1 Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] release 3.3 (take two)
+1 All tests are fine on both Infinispan and Hibernate Search. While I understand that often APIs needed changes, I'm very happy to state that for the first time three mayor releases are fully API compatible! (As far as tested on these projects, Lucene versions 3.1.0, 3.2.0, 3.3.0 are drop-in compatible replacements) Regards, Sanne 2011/6/26 Steven A Rowe sar...@syr.edu: +1 I looked at the differences, and then just ran tests on the Solr and Lucene source tarballs. Steve -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Sunday, June 26, 2011 11:12 AM To: dev@lucene.apache.org Subject: [VOTE] release 3.3 (take two) Artifacts here: http://s.apache.org/lusolr330rc1 working release notes here: http://wiki.apache.org/lucene-java/ReleaseNote33 http://wiki.apache.org/solr/ReleaseNote33 To see the changes between the previous release candidate (rc0): svn diff -r 1139028:1139775 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3 Here is my +1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Code Freeze on realtime_search branch
Hello, this is totally awesome! Does it imply we don't need the IndexWriter lock anymore? And hence that people sharing the Lucene Directory across multiple JVMs can have both write at the same time? I had intentions to *try* removing such limitations this summer, but if this is the case I will spend my time testing this carefully instead, or if some kind of locking is still required I'd appreciate some pointers so that I'll be able to remove them. Regards, Sanne 2011/4/29 Simon Willnauer simon.willna...@googlemail.com: Hey folks, LUCENE-3023 aims to land the considerably large DocumentsWriterPerThread (DWPT) refactoring on trunk. During the last weeks we have put lots of efforts into cleaning the code up, fixing javadocs and run test locally as well as on Jenkins. We reached the point where we are able to create a final patch for review and land this exciting refactoring on trunk very soon. I committed the CHANGES.TXT entry (also appended below) a couple of minutes ago so from now on we freeze the branch for final review (Robert can you create a new final patch and upload to LUCENE-3023). Any comments should go to [1] or as a reply to this email. If there is no blocker coming up we plan to reintegrate the branch and commit it to trunk early next week. For those who want some background what DWPT does read: [2] Note: this change will not change the index file format so there is no need to reindex for trunk users. Yet, I will send a heads up next week with an overview of that has changed. Simon [1] https://issues.apache.org/jira/browse/LUCENE-3023 [2] http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/ * LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from DocumentsWriterPerThread: - IndexWriter now uses a DocumentsWriter per thread when indexing documents. Each DocumentsWriterPerThread indexes documents in its own private segment, and the in memory segments are no longer merged on flush. Instead, each segment is separately flushed to disk and subsequently merged with normal segment merging. - DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a FlushPolicy. When a DWPT is flushed, a fresh DWPT is swapped in so that indexing may continue concurrently with flushing. The selected DWPT flushes all its RAM resident documents do disk. Note: Segment flushes don't flush all RAM resident documents but only the documents private to the DWPT selected for flushing. - Flushing is now controlled by FlushPolicy that is called for every add, update or delete on IndexWriter. By default DWPTs are flushed either on maxBufferedDocs per DWPT or the global active used memory. Once the active memory exceeds ramBufferSizeMB only the largest DWPT is selected for flushing and the memory used by this DWPT is substracted from the active memory and added to a flushing memory pool, which can lead to temporarily higher memory usage due to ongoing indexing. - IndexWriter now can utilize ramBufferSize 2048 MB. Each DWPT can address up to 2048 MB memory such that the ramBufferSize is now bounded by the max number of DWPT avaliable in the used DocumentsWriterPerThreadPool. IndexWriters net memory consumption can grow far beyond the 2048 MB limit if the applicatoin can use all available DWPTs. To prevent a DWPT from exhausting its address space IndexWriter will forcefully flush a DWPT if its hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be controlled via IndexWriterConfig and defaults to 1945 MB. Since IndexWriter flushes DWPT concurrently not all memory is released immediately. Applications should still use a ramBufferSize significantly lower than the JVMs avaliable heap memory since under high load multiple flushing DWPT can consume substantial transient memory when IO performance is slow relative to indexing rate. - IndexWriter#commit now doesn't block concurrent indexing while flushing all 'currently' RAM resident documents to disk. Yet, flushes that occur while a a full flush is running are queued and will happen after all DWPT involved in the full flush are done flushing. Applications using multiple threads during indexing and trigger a full flush (eg call commmit() or open a new NRT reader) can use significantly more transient memory. - IndexWriter#addDocument and IndexWriter.updateDocument can block indexing threads if the number of active + number of flushing DWPT exceed a safety limit. By default this happens if 2 * max number available thread states (DWPTPool) is exceeded. This safety limit prevents applications from exhausting their available memory if flushing can't keep up with concurrently indexing threads. - IndexWriter only applies and flushes deletes if the maxBufferedDelTerms limit is reached during indexing. No segment flushes
Re: Code Freeze on realtime_search branch
2011/4/29 Michael McCandless luc...@mikemccandless.com: Sorry, but, no :) So feel free to keep working towards removing this limitation!! This change makes IndexWriter's flush (where it writes the added documents in RAM to disk as a new segment) fully concurrent, so that while one segment is being flushed (which could take a longish time, eg on a slowish IO system), other threads are now free to continue indexing (where they were blocked before). On computers with substantial CPU concurrency, and fast enough IO systems, this change should give a big increase in indexing throughput. That said, I do think this change is a step towards what you seek (allowing multiple IndexWriters, even in separate JVMs maybe on separate computers, to write into an index at once). Mike thank you for clarifying this; maybe I don't even need to remove the locking if I can run some of those participant threads in the remote nodes. I'll keep you updated, but unfortunately can't start working on it sooner. Sanne http://blog.mikemccandless.com On Fri, Apr 29, 2011 at 2:16 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: Hello, this is totally awesome! Does it imply we don't need the IndexWriter lock anymore? And hence that people sharing the Lucene Directory across multiple JVMs can have both write at the same time? I had intentions to *try* removing such limitations this summer, but if this is the case I will spend my time testing this carefully instead, or if some kind of locking is still required I'd appreciate some pointers so that I'll be able to remove them. Regards, Sanne 2011/4/29 Simon Willnauer simon.willna...@googlemail.com: Hey folks, LUCENE-3023 aims to land the considerably large DocumentsWriterPerThread (DWPT) refactoring on trunk. During the last weeks we have put lots of efforts into cleaning the code up, fixing javadocs and run test locally as well as on Jenkins. We reached the point where we are able to create a final patch for review and land this exciting refactoring on trunk very soon. I committed the CHANGES.TXT entry (also appended below) a couple of minutes ago so from now on we freeze the branch for final review (Robert can you create a new final patch and upload to LUCENE-3023). Any comments should go to [1] or as a reply to this email. If there is no blocker coming up we plan to reintegrate the branch and commit it to trunk early next week. For those who want some background what DWPT does read: [2] Note: this change will not change the index file format so there is no need to reindex for trunk users. Yet, I will send a heads up next week with an overview of that has changed. Simon [1] https://issues.apache.org/jira/browse/LUCENE-3023 [2] http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/ * LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from DocumentsWriterPerThread: - IndexWriter now uses a DocumentsWriter per thread when indexing documents. Each DocumentsWriterPerThread indexes documents in its own private segment, and the in memory segments are no longer merged on flush. Instead, each segment is separately flushed to disk and subsequently merged with normal segment merging. - DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a FlushPolicy. When a DWPT is flushed, a fresh DWPT is swapped in so that indexing may continue concurrently with flushing. The selected DWPT flushes all its RAM resident documents do disk. Note: Segment flushes don't flush all RAM resident documents but only the documents private to the DWPT selected for flushing. - Flushing is now controlled by FlushPolicy that is called for every add, update or delete on IndexWriter. By default DWPTs are flushed either on maxBufferedDocs per DWPT or the global active used memory. Once the active memory exceeds ramBufferSizeMB only the largest DWPT is selected for flushing and the memory used by this DWPT is substracted from the active memory and added to a flushing memory pool, which can lead to temporarily higher memory usage due to ongoing indexing. - IndexWriter now can utilize ramBufferSize 2048 MB. Each DWPT can address up to 2048 MB memory such that the ramBufferSize is now bounded by the max number of DWPT avaliable in the used DocumentsWriterPerThreadPool. IndexWriters net memory consumption can grow far beyond the 2048 MB limit if the applicatoin can use all available DWPTs. To prevent a DWPT from exhausting its address space IndexWriter will forcefully flush a DWPT if its hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be controlled via IndexWriterConfig and defaults to 1945 MB. Since IndexWriter flushes DWPT concurrently not all memory is released immediately. Applications should still use a ramBufferSize significantly lower than the JVMs avaliable
Re: IndexReader.indexExists declares throwing IOE, but never does
2011/3/21 Earwin Burrfoot ear...@gmail.com: Technically, there's a big difference between I checked, and there was no index, and I was unable to check the disk because file system went BANG!. So the proper behaviour is to return false IOE (on proper occasion)? +1 to throw the exception when proper to do so Otherwise please keep the throws declaration so that you won't break public APIs if this changes implementation. On Mon, Mar 21, 2011 at 13:53, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Mar 21, 2011 at 12:52 AM, Shai Erera ser...@gmail.com wrote: Can we remove the declaration? The method never throws IOE, but instead catches it and returns false. I think it's reasonable that such a method will not throw exceptions. +1 -- Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexReader.indexExists declares throwing IOE, but never does
2011/3/21 Shai Erera ser...@gmail.com: So the proper behaviour is to return false IOE (on proper occasion)? I don't object to it, as I think it's reasonable (as today we may be hiding some info from the app). However, given that today we never throw IOE, and that if we start doing so, we'll change runtime behavior, I lean towards keeping the method simple and remove the throws declaration. Well, it's either we change the impl to throw IOE, or remove the declaration altogether. Changing the impl to throw IOE on proper occasion might be problematic -- IndexNotFoundException is thrown when an empty index directory was given, however by its Javadocs, it can also indicate the index is corrupted. Perhaps the jdocs are wrong and it's thrown only if the index directory is empty, or no segments files are found. If that's the case, then we should change its javadocs. Otherwise, it will be difficult to know whether the INFE indicates an empty directory, for which you'll want to return false, or a corrupt index, for which you'll want to throw the exception. Besides, I consider this method almost like File.exists() which doesn't throw an exception. If indexExists() returns false, the app can decide to investigate further by trying to open IndexReader or read the SegmentInfos. But the API as-is needs to be simple IMO. good points, I withdraw my previous objection :) Otherwise please keep the throws declaration so that you won't break public APIs if this changes implementation. Removing the throws declaration doesn't break apps. In the worse case, they'll have a catch block which is redundant? yes you wouldn't make any harm now, but if you release it without, and then figure you actually need to add it back in future, people might have code which is not handling it. I'm looking into Lucene 3.0.3 and the IOException it *is* actually needed, not sure what was changed in the version this is referring to, but as it used to throw it (and needing it), I think it's quite possible this need is not so remote. Regards, Sanne Shai On Mon, Mar 21, 2011 at 4:12 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: 2011/3/21 Earwin Burrfoot ear...@gmail.com: Technically, there's a big difference between I checked, and there was no index, and I was unable to check the disk because file system went BANG!. So the proper behaviour is to return false IOE (on proper occasion)? +1 to throw the exception when proper to do so Otherwise please keep the throws declaration so that you won't break public APIs if this changes implementation. On Mon, Mar 21, 2011 at 13:53, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Mar 21, 2011 at 12:52 AM, Shai Erera ser...@gmail.com wrote: Can we remove the declaration? The method never throws IOE, but instead catches it and returns false. I think it's reasonable that such a method will not throw exceptions. +1 -- Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Index File
Hi, 2011/3/21 soheila dehghanzadeh sally...@gmail.com: Hi All, I have created Index folder, and i tried to open .CFS,.CFX,.GEN, segments file with notpad . but they are unreadable. i want to see their structure for my sample directory which i have passed to Indexfiles . ihave red this http://lucene.apache.org/java/3_0_3/fileformats.html and i know what Index should has. is there any way to see created index? yes, use the force :) http://code.google.com/p/luke/ Regards, Sanne thanks in advance . Peace. -Soheila D. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene and Solr 3.1 release candidate
2011/3/14 slaava slaav...@gmail.com: Hi, when could we expect final 3.1 version with maven-repository? We need some functionality included in 3.1 and I don't know if I have to wait or create own maven project from sources... Hi, I hope soon as well, in the meantime we've been testing with the release candidate repositories: profile idlucene-staging-rmuir/id repositories repository idlucene-staging-repository-rmuir/id nameLucene testing repo/name urlhttp://people.apache.org/~rmuir/staging_area/lucene-solr-3.1RC0-rev1078688/lucene-3.1RC0/maven//url layoutdefault/layout releases enabledtrue/enabled updatePolicynever/updatePolicy /releases snapshots enabledfalse/enabled updatePolicynever/updatePolicy /snapshots /repository repository idsolr-staging-repository-rmuir/id nameSolr testing repo/name urlhttp://people.apache.org/~rmuir/staging_area/lucene-solr-3.1RC0-rev1078688/solr-3.1RC0/maven/url layoutdefault/layout releases enabledtrue/enabled updatePolicynever/updatePolicy /releases snapshots enabledfalse/enabled updatePolicynever/updatePolicy /snapshots /repository /repositories /profile Use it with care, as they are marked with the same identifiers the final will have, so you might end up polluting your local caches with this: make sure you delete all copies when the real one is released. -- Sanne -- View this message in context: http://lucene.472066.n3.nabble.com/VOTE-Lucene-and-Solr-3-1-release-candidate-tp2645100p2675660.html Sent from the Solr - Dev mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene and Solr 3.1 release candidate
Hello, the lucene-solr-grandparent pom [1] file mentions a jetty version 6.1.26-patched-JETTY-1340 which is not available in the repositories where I would expect it. Do I need to enable some additional repository? This seems related to SOLR-2381. I think for people using Solr as their dependency via Maven, this is a blocker; of course not everyone uses it so I've no strong opinions about this, but thought to let you know. Personally I'd depend on the released version of jetty, and document that this bug is not fixed until Jetty version XY is released; in alternative, I'd add keep the pom as is but instructions and warnings in the release notes would be very welcome. (I couldn't find a Chances.html for Solr?) Regards, Sanne [1] http://people.apache.org/~rmuir/staging_area/lucene-solr-3.1RC0-rev1078688/lucene-3.1RC0/maven/org/apache/lucene/lucene-solr-grandparent/3.1.0/lucene-solr-grandparent-3.1.0.pom 2011/3/8 Shai Erera ser...@gmail.com: I found what seems to be a glitch in StopFilter's ctors -- the boolean 'enablePosInc' was removed from the ctors and users now have to use the setter instead. However, the ctors do default to 'true' if the passed in Version is onOrAfter(29). All of FilteringTokenFilter sub-classes include the enablePosIncr in their ctors, including FilteringTF itself. Therefore I assume the parameter was mistakenly dropped from StopFilter's ctors. Also, the @deprecated text doesn't mention how should I enable/disable it, and reading the source code doesn't help either, since the setter/getter are in FilteringTF. Also, LengthFilter has a deprecated ctor, but the class was added on Nov 16 and I don't see it in 3.0.3. So perhaps we can remove that ctor (and add a @since tag to the class)? I don't know if these two warrant a new RC but I think they are important to fix. Shai On Mon, Mar 7, 2011 at 5:52 PM, Smiley, David W. dsmi...@mitre.org wrote: So https://issues.apache.org/jira/browse/SOLR-2405 didn't make it in yesterday (apparently it didn't)? :-( Darn... maybe I shouldn't have waited for a committer to agree with the issue. I would have had it in Saturday. ~ David Smiley On Mar 7, 2011, at 1:32 AM, Robert Muir wrote: Hi all, I have posted a release candidate for both Lucene 3.1 and Solr 3.1, both from revision 1078688 of http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/ Thanks for all your help! Please test them and give your votes, the tentative release date for both versions is Sunday, March 13th, 2011. Only votes from Lucene PMC are binding, but everyone is welcome to check the release candidates and voice their approval or disapproval. The vote passes if at least three binding +1 votes are cast. The release candidates are produced in parallel because in 2010 we merged the development of Lucene and Solr in order to produce higher quality releases. While we voted to reserve the right to release Lucene by itself, in my opinion we should definitely try to avoid this unless absolutely necessary, as it would ultimately cause more work and complication: instead it would be far easier to just fix whatever issues are discovered and respin both releases again. Because of this, I ask that you cast a single vote to cover both releases. If the vote succeeds, both sets of artifacts can go their separate ways to the different websites. Artifacts are located here: http://s.apache.org/solrcene31rc0 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene and Solr 3.1 release candidate
2011/3/8 Steven A Rowe sar...@syr.edu: Hi Sanne, Solr (and some Lucene modules) have several non-Mavenized dependencies. To work around this, the Maven build has a profile called bootstrap. If you check out the source (or use the source distribution) you can place all non-Mavenized dependencies in your local repository as follows (from the top-level directory containing lucene, solr, etc.): ant get-maven-poms mvn -N -P bootstrap install Maybe there should also be a way to deploy these to an internal repository? Steve Hi Steve, thank you for the answer. I'm not personally worried as I'm unaffected by this issue, just thought to let the list know, so core developers can evaluate how urgent it is. I'm not sold on the several non-mavenized dependencies argument: if I adjust my pom locally to refer to a released Jetty version I have no other build nor test issues, so this should be the only artifact unless you refer to some other optional dependency. Also I used to depend on Solr in the past via maven, without issues - so it looks to me that this is going to break expectations, as it worked properly before. I'm totally fine with as long as you're all aware of it and making a conscious decision, I don't think waiting for a Jetty release is a reasonable option, but I'd add at least a warning in the release notes. Regards, Sanne -Original Message- From: Sanne Grinovero [mailto:sanne.grinov...@gmail.com] Sent: Tuesday, March 08, 2011 6:44 AM To: dev@lucene.apache.org Subject: Re: [VOTE] Lucene and Solr 3.1 release candidate Hello, the lucene-solr-grandparent pom [1] file mentions a jetty version 6.1.26-patched-JETTY-1340 which is not available in the repositories where I would expect it. Do I need to enable some additional repository? This seems related to SOLR-2381. I think for people using Solr as their dependency via Maven, this is a blocker; of course not everyone uses it so I've no strong opinions about this, but thought to let you know. Personally I'd depend on the released version of jetty, and document that this bug is not fixed until Jetty version XY is released; in alternative, I'd add keep the pom as is but instructions and warnings in the release notes would be very welcome. (I couldn't find a Chances.html for Solr?) Regards, Sanne [1] http://people.apache.org/~rmuir/staging_area/lucene-solr-3.1RC0- rev1078688/lucene-3.1RC0/maven/org/apache/lucene/lucene-solr- grandparent/3.1.0/lucene-solr-grandparent-3.1.0.pom 2011/3/8 Shai Erera ser...@gmail.com: I found what seems to be a glitch in StopFilter's ctors -- the boolean 'enablePosInc' was removed from the ctors and users now have to use the setter instead. However, the ctors do default to 'true' if the passed in Version is onOrAfter(29). All of FilteringTokenFilter sub-classes include the enablePosIncr in their ctors, including FilteringTF itself. Therefore I assume the parameter was mistakenly dropped from StopFilter's ctors. Also, the @deprecated text doesn't mention how should I enable/disable it, and reading the source code doesn't help either, since the setter/getter are in FilteringTF. Also, LengthFilter has a deprecated ctor, but the class was added on Nov 16 and I don't see it in 3.0.3. So perhaps we can remove that ctor (and add a @since tag to the class)? I don't know if these two warrant a new RC but I think they are important to fix. Shai On Mon, Mar 7, 2011 at 5:52 PM, Smiley, David W. dsmi...@mitre.org wrote: So https://issues.apache.org/jira/browse/SOLR-2405 didn't make it in yesterday (apparently it didn't)? :-( Darn... maybe I shouldn't have waited for a committer to agree with the issue. I would have had it in Saturday. ~ David Smiley On Mar 7, 2011, at 1:32 AM, Robert Muir wrote: Hi all, I have posted a release candidate for both Lucene 3.1 and Solr 3.1, both from revision 1078688 of http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/ Thanks for all your help! Please test them and give your votes, the tentative release date for both versions is Sunday, March 13th, 2011. Only votes from Lucene PMC are binding, but everyone is welcome to check the release candidates and voice their approval or disapproval. The vote passes if at least three binding +1 votes are cast. The release candidates are produced in parallel because in 2010 we merged the development of Lucene and Solr in order to produce higher quality releases. While we voted to reserve the right to release Lucene by itself, in my opinion we should definitely try to avoid this unless absolutely necessary, as it would ultimately cause more work and complication: instead it would be far easier to just fix whatever issues are discovered and respin both releases again. Because of this, I ask that you cast a single vote to cover both releases. If the vote
Re: wind down for 3.1?
Hello all, Is there any update on the 3.1 status? I'm really looking forward to it :) Regards, Sanne 2011/2/16 Chris Hostetter hossman_luc...@fucit.org: : 1. javadocs warnings/errors: this is a constant battle, its worth : considering if the build should actually fail if you get one of these, : in my opinion if we can do this we really should. its frustrating to for a brief period we did, and then we rolled it back... https://issues.apache.org/jira/browse/LUCENE-875 : 2. introducing new compiler warnings: another problem just being left : for someone else to clean up later, another constant losing battle. : 99% of the time (for non-autogenerated code) the warnings are : useful... in my opinion we should not commit patches that create new : warnings. it's hard to spot new compiler warnings when there are already so many ... if we can get down to 0 then we can add hacks to make hte build fail if someone adds 1 but until then we have an uphill battle. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: wind down for 3.1?
2011/3/3 Robert Muir rcm...@gmail.com: On Thu, Mar 3, 2011 at 7:43 AM, Sanne Grinovero sanne.grinov...@gmail.com wrote: Hello all, Is there any update on the 3.1 status? I'm really looking forward to it :) Yes, we are currently in the feature freeze, but it seems to be coming in shape. I'm planning on creating the release branch this weekend and getting our first RC out Sunday (Steven Rowe volunteered to help with the maven side, thanks!). If you want to help, for example you can checkout the lucene code from http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/ then you can run 'ant clean dist dist-src' and inspect the artifacts it puts in the dist/ folder and report any problems. If everyone waits until we build an RC before reviewing how things look and reporting problems, its going to significantly slow down the release process as generating RC's for both lucene and solr at the moment is nontrivial (which is why Steven and I have set aside this day to try to build RC1, if the vote doesn't pass it might be weeks before we have the time to build RC2). Cheers, thanks a lot. I'm definitely testing it often, and will report anything weird. I can't say about Solr though as we use Lucene mostly. Sanne - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2585) DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes
DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes - Key: LUCENE-2585 URL: https://issues.apache.org/jira/browse/LUCENE-2585 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 3.0.2, 3.0.1, 3.0, 2.9.3 Reporter: Sanne Grinovero Fix For: 2.9.4, 3.0.3, 3.1 I could reproduce the issue several times but only by running long and stressfull benchmarks, the high number of files is likely part of the scenario. All tests run on local disk, using ext3. Sample stacktrace: {noformat}java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.NIOFSDirectory@/home/sanne/infinispan-41/lucene-directory/tempIndexName: files: _2l3.frq _uz.fdt _1q4.fnm _1q0.fdx _4bc.fdt _v2.tis _4ll.fdx _2l8.tii _ux.fnm _3g7.fdx _4bb.tii _4bj.prx _uy.fdx _3g7.prx _2l7.frq _2la.fdt _3ge.nrm _2l6.prx _1py.fdx _3g6.nrm _v0.prx _4bi.tii _2l2.tis _v2.fdx _2l3.nrm _2l8.fnm _4bg.tis _2la.tis _uu.fdx _3g6.fdx _1q3.frq _2la.frq _4bb.tis _3gb.tii _1pz.tis _2lb.nrm _4lm.nrm _3g9.tii _v0.fdt _2l5.fnm _v2.prx _4ll.tii _4bd.nrm _2l7.fnm _2l4.nrm _1q2.tis _3gb.fdx _4bh.fdx _1pz.nrm _ux.fdx _ux.tii _1q6.nrm _3gf.fdx _4lk.fdt _3gd.nrm _v3.fnm _3g8.prx _1q2.nrm _4bh.prx _1q0.frq _ux.fdt _1q7.fdt _4bb.fnm _4bf.nrm _4bc.nrm _3gb.fdt _4bh.fnm _2l5.tis _1pz.fnm _1py.fnm _3gc.fnm _2l2.prx _2l4.frq _3gc.fdt _ux.tis _1q3.prx _2l7.fdx _4bj.nrm _4bj.fdx _4bi.tis _3g9.prx _1q4.prx _v3.fdt _1q3.fdx _2l9.fdt _4bh.tis _3gb.nrm _v2.nrm _3gd.tii _2l7.nrm _2lb.tii _4lm.tis _3ga.fdx _1pz.fdt _3g7.fnm _2l3.fnm _4lk.fnm _uz.fnm _2l2.frq _4bd.fdx _1q2.fdt _3g7.tis _4bi.frq _4bj.frq _2l7.prx _ux.prx _3gd.fnm _1q4.fdt _1q1.fdt _v1.fnm _1py.nrm _3gf.nrm _4be.fdt _1q3.tii _1q1.prx _2l3.fdt _4lk.frq _2l4.fdx _4bd.fnm _uw.frq _3g8.fdx _2l6.tii _1q5.frq _1q5.tis _3g8.nrm _uw.nrm _v0.tii _v2.fdt _2l7.fdt _v0.tis _uy.tii _3ge.tii _v1.tii _3gb.tis _4lm.fdx _4bc.fnm _2lb.frq _2l6.fnm _3g6.tii _3ge.prx _uu.frq _1pz.fdx _1q2.fnm _4bi.prx _3gc.frq _2l9.tis _3ge.fdt _uy.fdt _4ll.fnm _3gc.prx _1q7.tii _2l5.nrm _uy.nrm _uv.frq _1q6.frq _4ba.tis _3g9.tis _4be.nrm _4bi.fnm _ux.frq _1q1.fnm _v0.fnm _2l4.fnm _4ba.fnm _4be.tis _uz.prx _1q6.fdx _uw.tii _2l6.nrm _1pz.prx _2l7.tis _1q7.fdx _2l9.tii _4lk.tii _uz.frq _3g8.frq _4bb.prx _1q5.tii _1q5.prx _v2.frq _4bc.tii _1q7.prx _v2.tii _2lb.tis _4bi.fdt _uv.nrm _2l2.fnm _4bd.tii _1q7.tis _4bg.fnm _3ga.frq _uu.fnm _2l9.fnm _3ga.fnm _uw.fnm _1pz.frq _1q1.fdx _3ge.fdx _2l3.prx _3ga.nrm _uv.fdt _4bb.nrm _1q7.fnm _uv.tis _3gb.fnm _2l6.tis _1pz.tii _uy.fnm _3gf.fdt _3gc.nrm _4bf.tis _1q5.fnm _uu.tis _4bh.tii _2l5.fdt _1q6.tii _4bc.tis _3gc.tii _3g9.fnm _2l6.fdt _4bj.fnm _uu.tii _v3.frq _3g9.fdx _v0.nrm _2l7.tii _1q0.fdt _3ge.fnm _4bf.fdt _1q6.prx _uz.nrm _4bi.fdx _3gf.fnm _4lm.frq _v0.fdx _4ba.fdt _1py.tii _4bf.tii _uw.fdx _2l5.frq _3g9.nrm _v1.fdt _uw.fdt _4bd.frq _4bg.prx _3gd.tis _1q4.tis _2l9.nrm _2la.nrm _v3.tii _4bf.prx _1q1.nrm _4ba.tii _3gd.fdx _1q4.tii _4lm.tii _3ga.tis _4bf.fnm write.lock _2l8.prx _2l8.fdt segments.gen _2lb.fnm _2l4.fdt _1q2.prx _4be.fnm _3gf.prx _2l6.fdx _3g6.fnm _4bb.fdt _4bd.tis _4lk.nrm _2l5.fdx _2la.tii _4bd.prx _4ln.fnm _3gf.tis _4ba.nrm _v3.prx _uv.prx _1q3.fnm _3ga.tii _uz.tii _3g9.frq _v0.frq _3ge.tis _3g6.tis _4ln.prx _3g7.tii _3g8.fdt _3g7.nrm _3ga.prx _2l2.fdx _2l8.fdx _4ba.prx _1py.frq _uz.fdx _2l3.tii _3g6.prx _v3.fdx _1q6.fdt _v1.nrm _2l2.tii _1q0.tis _4ba.fdx _4be.tii _4ba.frq _4ll.fdt _4bh.nrm _4lm.fdt _1q7.frq _4lk.tis _4bc.frq _1q6.fnm _3g7.frq _uw.tis _3g8.tis _2l9.fdx _2l4.tii _1q4.fdx _4be.prx _1q3.nrm _1q0.tii _1q0.fnm _v3.nrm _1py.tis _3g9.fdt _4bh.fdt _4ll.nrm _4lk.prx _3gd.prx _1q3.tis _1q2.tii _2l2.nrm _3gd.fdt _2l3.fdx _3g6.fdt _3gd.frq _1q1.tis _4bb.fdx _1q2.frq _1q3.fdt _v1.tis _2l8.frq _3gc.fdx _1q1.frq _4bg.frq _4bb.frq _2la.fdx _2l9.frq _uy.tis _uy.prx _4bg.fdx _3gb.prx _uy.frq _1q2.fdx _4lm.prx _2la.prx _2l4.prx _4bg.fdt _4be.frq _1q7.nrm _2l5.prx _4bf.frq _v1.prx _4bd.fdt _2l9.prx _1q6.tis _3g8.fnm _4ln.tis _2l3.tis _4bc.fdx _2lb.prx _3gb.frq _3gf.frq _2la.fnm _3ga.fdt _uz.tis _4bg.nrm _uv.tii _4bg.tii _3g8.tii _4ll.frq _uv.fnm _2l8.tis _2l8.nrm _2l2.fdt _4bj.tis _4lk.fdx _uw.prx _4bc.prx _4bj.fdt _4be.fdx _1q4.frq _uu.fdt _1q1.tii _2l5.tii _2lb.fdt _4bh.frq _3ge.frq _1py.prx _1q5.nrm _v1.fdx _3g7.fdt _4ln.fdt _1q4.nrm _1py.fdt _3gc.tis _4ll.prx _v3.tis _4bf.fdx _1q5.fdx _1q0.prx _4bi.nrm _4ll.tis _2l4.tis _3gf.tii _v2.fnm _uu.nrm _1q0.nrm _4lm.fnm _uu.prx _2l6.frq _4ln.nrm _ux.nrm _3g6.frq _1q5.fdt _4bj.tii _2lb.fdx _uv.fdx _v1.frq at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:634) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:517) at org.apache.lucene.index.SegmentInfos.read
[jira] Commented: (LUCENE-2585) DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes
[ https://issues.apache.org/jira/browse/LUCENE-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895027#action_12895027 ] Sanne Grinovero commented on LUCENE-2585: - I'm going to see if I can contribute a patch myself, but I don't think I'll be able to provide a unit test. DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes - Key: LUCENE-2585 URL: https://issues.apache.org/jira/browse/LUCENE-2585 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.3, 3.0, 3.0.1, 3.0.2 Reporter: Sanne Grinovero Fix For: 2.9.4, 3.0.3, 3.1 I could reproduce the issue several times but only by running long and stressfull benchmarks, the high number of files is likely part of the scenario. All tests run on local disk, using ext3. Sample stacktrace: {noformat}java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.NIOFSDirectory@/home/sanne/infinispan-41/lucene-directory/tempIndexName: files: _2l3.frq _uz.fdt _1q4.fnm _1q0.fdx _4bc.fdt _v2.tis _4ll.fdx _2l8.tii _ux.fnm _3g7.fdx _4bb.tii _4bj.prx _uy.fdx _3g7.prx _2l7.frq _2la.fdt _3ge.nrm _2l6.prx _1py.fdx _3g6.nrm _v0.prx _4bi.tii _2l2.tis _v2.fdx _2l3.nrm _2l8.fnm _4bg.tis _2la.tis _uu.fdx _3g6.fdx _1q3.frq _2la.frq _4bb.tis _3gb.tii _1pz.tis _2lb.nrm _4lm.nrm _3g9.tii _v0.fdt _2l5.fnm _v2.prx _4ll.tii _4bd.nrm _2l7.fnm _2l4.nrm _1q2.tis _3gb.fdx _4bh.fdx _1pz.nrm _ux.fdx _ux.tii _1q6.nrm _3gf.fdx _4lk.fdt _3gd.nrm _v3.fnm _3g8.prx _1q2.nrm _4bh.prx _1q0.frq _ux.fdt _1q7.fdt _4bb.fnm _4bf.nrm _4bc.nrm _3gb.fdt _4bh.fnm _2l5.tis _1pz.fnm _1py.fnm _3gc.fnm _2l2.prx _2l4.frq _3gc.fdt _ux.tis _1q3.prx _2l7.fdx _4bj.nrm _4bj.fdx _4bi.tis _3g9.prx _1q4.prx _v3.fdt _1q3.fdx _2l9.fdt _4bh.tis _3gb.nrm _v2.nrm _3gd.tii _2l7.nrm _2lb.tii _4lm.tis _3ga.fdx _1pz.fdt _3g7.fnm _2l3.fnm _4lk.fnm _uz.fnm _2l2.frq _4bd.fdx _1q2.fdt _3g7.tis _4bi.frq _4bj.frq _2l7.prx _ux.prx _3gd.fnm _1q4.fdt _1q1.fdt _v1.fnm _1py.nrm _3gf.nrm _4be.fdt _1q3.tii _1q1.prx _2l3.fdt _4lk.frq _2l4.fdx _4bd.fnm _uw.frq _3g8.fdx _2l6.tii _1q5.frq _1q5.tis _3g8.nrm _uw.nrm _v0.tii _v2.fdt _2l7.fdt _v0.tis _uy.tii _3ge.tii _v1.tii _3gb.tis _4lm.fdx _4bc.fnm _2lb.frq _2l6.fnm _3g6.tii _3ge.prx _uu.frq _1pz.fdx _1q2.fnm _4bi.prx _3gc.frq _2l9.tis _3ge.fdt _uy.fdt _4ll.fnm _3gc.prx _1q7.tii _2l5.nrm _uy.nrm _uv.frq _1q6.frq _4ba.tis _3g9.tis _4be.nrm _4bi.fnm _ux.frq _1q1.fnm _v0.fnm _2l4.fnm _4ba.fnm _4be.tis _uz.prx _1q6.fdx _uw.tii _2l6.nrm _1pz.prx _2l7.tis _1q7.fdx _2l9.tii _4lk.tii _uz.frq _3g8.frq _4bb.prx _1q5.tii _1q5.prx _v2.frq _4bc.tii _1q7.prx _v2.tii _2lb.tis _4bi.fdt _uv.nrm _2l2.fnm _4bd.tii _1q7.tis _4bg.fnm _3ga.frq _uu.fnm _2l9.fnm _3ga.fnm _uw.fnm _1pz.frq _1q1.fdx _3ge.fdx _2l3.prx _3ga.nrm _uv.fdt _4bb.nrm _1q7.fnm _uv.tis _3gb.fnm _2l6.tis _1pz.tii _uy.fnm _3gf.fdt _3gc.nrm _4bf.tis _1q5.fnm _uu.tis _4bh.tii _2l5.fdt _1q6.tii _4bc.tis _3gc.tii _3g9.fnm _2l6.fdt _4bj.fnm _uu.tii _v3.frq _3g9.fdx _v0.nrm _2l7.tii _1q0.fdt _3ge.fnm _4bf.fdt _1q6.prx _uz.nrm _4bi.fdx _3gf.fnm _4lm.frq _v0.fdx _4ba.fdt _1py.tii _4bf.tii _uw.fdx _2l5.frq _3g9.nrm _v1.fdt _uw.fdt _4bd.frq _4bg.prx _3gd.tis _1q4.tis _2l9.nrm _2la.nrm _v3.tii _4bf.prx _1q1.nrm _4ba.tii _3gd.fdx _1q4.tii _4lm.tii _3ga.tis _4bf.fnm write.lock _2l8.prx _2l8.fdt segments.gen _2lb.fnm _2l4.fdt _1q2.prx _4be.fnm _3gf.prx _2l6.fdx _3g6.fnm _4bb.fdt _4bd.tis _4lk.nrm _2l5.fdx _2la.tii _4bd.prx _4ln.fnm _3gf.tis _4ba.nrm _v3.prx _uv.prx _1q3.fnm _3ga.tii _uz.tii _3g9.frq _v0.frq _3ge.tis _3g6.tis _4ln.prx _3g7.tii _3g8.fdt _3g7.nrm _3ga.prx _2l2.fdx _2l8.fdx _4ba.prx _1py.frq _uz.fdx _2l3.tii _3g6.prx _v3.fdx _1q6.fdt _v1.nrm _2l2.tii _1q0.tis _4ba.fdx _4be.tii _4ba.frq _4ll.fdt _4bh.nrm _4lm.fdt _1q7.frq _4lk.tis _4bc.frq _1q6.fnm _3g7.frq _uw.tis _3g8.tis _2l9.fdx _2l4.tii _1q4.fdx _4be.prx _1q3.nrm _1q0.tii _1q0.fnm _v3.nrm _1py.tis _3g9.fdt _4bh.fdt _4ll.nrm _4lk.prx _3gd.prx _1q3.tis _1q2.tii _2l2.nrm _3gd.fdt _2l3.fdx _3g6.fdt _3gd.frq _1q1.tis _4bb.fdx _1q2.frq _1q3.fdt _v1.tis _2l8.frq _3gc.fdx _1q1.frq _4bg.frq _4bb.frq _2la.fdx _2l9.frq _uy.tis _uy.prx _4bg.fdx _3gb.prx _uy.frq _1q2.fdx _4lm.prx _2la.prx _2l4.prx _4bg.fdt _4be.frq _1q7.nrm _2l5.prx _4bf.frq _v1.prx _4bd.fdt _2l9.prx _1q6.tis _3g8.fnm _4ln.tis _2l3.tis _4bc.fdx _2lb.prx _3gb.frq _3gf.frq _2la.fnm _3ga.fdt _uz.tis _4bg.nrm _uv.tii _4bg.tii _3g8.tii _4ll.frq _uv.fnm _2l8.tis _2l8.nrm _2l2.fdt _4bj.tis _4lk.fdx _uw.prx _4bc.prx _4bj.fdt _4be.fdx _1q4.frq _uu.fdt _1q1.tii _2l5.tii _2lb.fdt _4bh.frq _3ge.frq _1py.prx _1q5.nrm _v1.fdx _3g7.fdt _4ln.fdt _1q4.nrm _1py.fdt _3gc.tis _4ll.prx _v3
[jira] Commented: (LUCENE-2585) DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes
[ https://issues.apache.org/jira/browse/LUCENE-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895059#action_12895059 ] Sanne Grinovero commented on LUCENE-2585: - sure, the test is totally open source; the directory implementation based on Infinispan is hosted as submodule of Infinispan: http://anonsvn.jboss.org/repos/infinispan/branches/4.1.x/lucene-directory/ The test is org.infinispan.lucene.profiling.PerformanceCompareStressTest it is included in the default test suite but disabled in Maven's configuration, so you should run it manually mvn clean test -Dtest=PerformanceCompareStressTest (running it requires the jboss.org repositories to be enabled in maven settings) To describe it at higher level: there are 5 IndexRead-ing threads using reopen() before each search, 2 threads writing to the index, 1 additional thread as a coordinator and asserting that readers find what they expect to see in the index. Exactly the same test scenario is then applied in sequence to RAMDirectory (not having issues), NIOFSDirectory, and 4 differently configured Infinispan directories. Only the FSDirectory is affected by the issue, and it can never complete the full hour of stresstest succesfully, while all other implementations behave fine. IndexWriter is set to MaxMergeDocs(5000) and setUseCompoundFile(false); the issue is reveled both using SerialMergeScheduler and while using the default merger. During the last execution the test managed to perform 22,192,006 searches and 26,875 writes during the hour, but the benchmark is invalidated as one thread was killed by the exception. If you deem it useful I'd be happy in contributing a similar testcase to Lucene, but I assume you won't be excited in having such a long running test. Open to ideas to build a simpler one. DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes - Key: LUCENE-2585 URL: https://issues.apache.org/jira/browse/LUCENE-2585 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.3, 3.0, 3.0.1, 3.0.2 Reporter: Sanne Grinovero Fix For: 2.9.4, 3.0.3, 3.1 I could reproduce the issue several times but only by running long and stressfull benchmarks, the high number of files is likely part of the scenario. All tests run on local disk, using ext3. Sample stacktrace: {noformat}java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.NIOFSDirectory@/home/sanne/infinispan-41/lucene-directory/tempIndexName: files: _2l3.frq _uz.fdt _1q4.fnm _1q0.fdx _4bc.fdt _v2.tis _4ll.fdx _2l8.tii _ux.fnm _3g7.fdx _4bb.tii _4bj.prx _uy.fdx _3g7.prx _2l7.frq _2la.fdt _3ge.nrm _2l6.prx _1py.fdx _3g6.nrm _v0.prx _4bi.tii _2l2.tis _v2.fdx _2l3.nrm _2l8.fnm _4bg.tis _2la.tis _uu.fdx _3g6.fdx _1q3.frq _2la.frq _4bb.tis _3gb.tii _1pz.tis _2lb.nrm _4lm.nrm _3g9.tii _v0.fdt _2l5.fnm _v2.prx _4ll.tii _4bd.nrm _2l7.fnm _2l4.nrm _1q2.tis _3gb.fdx _4bh.fdx _1pz.nrm _ux.fdx _ux.tii _1q6.nrm _3gf.fdx _4lk.fdt _3gd.nrm _v3.fnm _3g8.prx _1q2.nrm _4bh.prx _1q0.frq _ux.fdt _1q7.fdt _4bb.fnm _4bf.nrm _4bc.nrm _3gb.fdt _4bh.fnm _2l5.tis _1pz.fnm _1py.fnm _3gc.fnm _2l2.prx _2l4.frq _3gc.fdt _ux.tis _1q3.prx _2l7.fdx _4bj.nrm _4bj.fdx _4bi.tis _3g9.prx _1q4.prx _v3.fdt _1q3.fdx _2l9.fdt _4bh.tis _3gb.nrm _v2.nrm _3gd.tii _2l7.nrm _2lb.tii _4lm.tis _3ga.fdx _1pz.fdt _3g7.fnm _2l3.fnm _4lk.fnm _uz.fnm _2l2.frq _4bd.fdx _1q2.fdt _3g7.tis _4bi.frq _4bj.frq _2l7.prx _ux.prx _3gd.fnm _1q4.fdt _1q1.fdt _v1.fnm _1py.nrm _3gf.nrm _4be.fdt _1q3.tii _1q1.prx _2l3.fdt _4lk.frq _2l4.fdx _4bd.fnm _uw.frq _3g8.fdx _2l6.tii _1q5.frq _1q5.tis _3g8.nrm _uw.nrm _v0.tii _v2.fdt _2l7.fdt _v0.tis _uy.tii _3ge.tii _v1.tii _3gb.tis _4lm.fdx _4bc.fnm _2lb.frq _2l6.fnm _3g6.tii _3ge.prx _uu.frq _1pz.fdx _1q2.fnm _4bi.prx _3gc.frq _2l9.tis _3ge.fdt _uy.fdt _4ll.fnm _3gc.prx _1q7.tii _2l5.nrm _uy.nrm _uv.frq _1q6.frq _4ba.tis _3g9.tis _4be.nrm _4bi.fnm _ux.frq _1q1.fnm _v0.fnm _2l4.fnm _4ba.fnm _4be.tis _uz.prx _1q6.fdx _uw.tii _2l6.nrm _1pz.prx _2l7.tis _1q7.fdx _2l9.tii _4lk.tii _uz.frq _3g8.frq _4bb.prx _1q5.tii _1q5.prx _v2.frq _4bc.tii _1q7.prx _v2.tii _2lb.tis _4bi.fdt _uv.nrm _2l2.fnm _4bd.tii _1q7.tis _4bg.fnm _3ga.frq _uu.fnm _2l9.fnm _3ga.fnm _uw.fnm _1pz.frq _1q1.fdx _3ge.fdx _2l3.prx _3ga.nrm _uv.fdt _4bb.nrm _1q7.fnm _uv.tis _3gb.fnm _2l6.tis _1pz.tii _uy.fnm _3gf.fdt _3gc.nrm _4bf.tis _1q5.fnm _uu.tis _4bh.tii _2l5.fdt _1q6.tii _4bc.tis _3gc.tii _3g9.fnm _2l6.fdt _4bj.fnm _uu.tii _v3.frq _3g9.fdx _v0.nrm _2l7.tii _1q0.fdt _3ge.fnm _4bf.fdt _1q6.prx _uz.nrm _4bi.fdx _3gf.fnm _4lm.frq _v0.fdx _4ba.fdt _1py.tii _4bf.tii _uw.fdx _2l5.frq _3g9.nrm _v1.fdt
[jira] Updated: (LUCENE-2585) DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes
[ https://issues.apache.org/jira/browse/LUCENE-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanne Grinovero updated LUCENE-2585: Description: I could reproduce the issue several times but only by running long and stressfull benchmarks, the high number of files is likely part of the scenario. All tests run on local disk, using ext3. Sample stacktrace: {noformat}java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.NIOFSDirectory@/home/sanne/infinispan-41/lucene-directory/tempIndexName: files: _2l3.frq _uz.fdt _1q4.fnm _1q0.fdx _4bc.fdt _v2.tis _4ll.fdx _2l8.tii _ux.fnm _3g7.fdx _4bb.tii _4bj.prx _uy.fdx _3g7.prx _2l7.frq _2la.fdt _3ge.nrm _2l6.prx _1py.fdx _3g6.nrm _v0.prx _4bi.tii _2l2.tis _v2.fdx _2l3.nrm _2l8.fnm _4bg.tis _2la.tis _uu.fdx _3g6.fdx _1q3.frq _2la.frq _4bb.tis _3gb.tii _1pz.tis _2lb.nrm _4lm.nrm _3g9.tii _v0.fdt _2l5.fnm _v2.prx _4ll.tii _4bd.nrm _2l7.fnm _2l4.nrm _1q2.tis _3gb.fdx _4bh.fdx _1pz.nrm _ux.fdx _ux.tii _1q6.nrm _3gf.fdx _4lk.fdt _3gd.nrm _v3.fnm _3g8.prx _1q2.nrm _4bh.prx _1q0.frq _ux.fdt _1q7.fdt _4bb.fnm _4bf.nrm _4bc.nrm _3gb.fdt _4bh.fnm _2l5.tis _1pz.fnm _1py.fnm _3gc.fnm _2l2.prx _2l4.frq _3gc.fdt _ux.tis _1q3.prx _2l7.fdx _4bj.nrm _4bj.fdx _4bi.tis _3g9.prx _1q4.prx _v3.fdt _1q3.fdx _2l9.fdt _4bh.tis _3gb.nrm _v2.nrm _3gd.tii _2l7.nrm _2lb.tii _4lm.tis _3ga.fdx _1pz.fdt _3g7.fnm _2l3.fnm _4lk.fnm _uz.fnm _2l2.frq _4bd.fdx _1q2.fdt _3g7.tis _4bi.frq _4bj.frq _2l7.prx _ux.prx _3gd.fnm _1q4.fdt _1q1.fdt _v1.fnm _1py.nrm _3gf.nrm _4be.fdt _1q3.tii _1q1.prx _2l3.fdt _4lk.frq _2l4.fdx _4bd.fnm _uw.frq _3g8.fdx _2l6.tii _1q5.frq _1q5.tis _3g8.nrm _uw.nrm _v0.tii _v2.fdt _2l7.fdt _v0.tis _uy.tii _3ge.tii _v1.tii _3gb.tis _4lm.fdx _4bc.fnm _2lb.frq _2l6.fnm _3g6.tii _3ge.prx _uu.frq _1pz.fdx _1q2.fnm _4bi.prx _3gc.frq _2l9.tis _3ge.fdt _uy.fdt _4ll.fnm _3gc.prx _1q7.tii _2l5.nrm _uy.nrm _uv.frq _1q6.frq _4ba.tis _3g9.tis _4be.nrm _4bi.fnm _ux.frq _1q1.fnm _v0.fnm _2l4.fnm _4ba.fnm _4be.tis _uz.prx _1q6.fdx _uw.tii _2l6.nrm _1pz.prx _2l7.tis _1q7.fdx _2l9.tii _4lk.tii _uz.frq _3g8.frq _4bb.prx _1q5.tii _1q5.prx _v2.frq _4bc.tii _1q7.prx _v2.tii _2lb.tis _4bi.fdt _uv.nrm _2l2.fnm _4bd.tii _1q7.tis _4bg.fnm _3ga.frq _uu.fnm _2l9.fnm _3ga.fnm _uw.fnm _1pz.frq _1q1.fdx _3ge.fdx _2l3.prx _3ga.nrm _uv.fdt _4bb.nrm _1q7.fnm _uv.tis _3gb.fnm _2l6.tis _1pz.tii _uy.fnm _3gf.fdt _3gc.nrm _4bf.tis _1q5.fnm _uu.tis _4bh.tii _2l5.fdt _1q6.tii _4bc.tis _3gc.tii _3g9.fnm _2l6.fdt _4bj.fnm _uu.tii _v3.frq _3g9.fdx _v0.nrm _2l7.tii _1q0.fdt _3ge.fnm _4bf.fdt _1q6.prx _uz.nrm _4bi.fdx _3gf.fnm _4lm.frq _v0.fdx _4ba.fdt _1py.tii _4bf.tii _uw.fdx _2l5.frq _3g9.nrm _v1.fdt _uw.fdt _4bd.frq _4bg.prx _3gd.tis _1q4.tis _2l9.nrm _2la.nrm _v3.tii _4bf.prx _1q1.nrm _4ba.tii _3gd.fdx _1q4.tii _4lm.tii _3ga.tis _4bf.fnm write.lock _2l8.prx _2l8.fdt segments.gen _2lb.fnm _2l4.fdt _1q2.prx _4be.fnm _3gf.prx _2l6.fdx _3g6.fnm _4bb.fdt _4bd.tis _4lk.nrm _2l5.fdx _2la.tii _4bd.prx _4ln.fnm _3gf.tis _4ba.nrm _v3.prx _uv.prx _1q3.fnm _3ga.tii _uz.tii _3g9.frq _v0.frq _3ge.tis _3g6.tis _4ln.prx _3g7.tii _3g8.fdt _3g7.nrm _3ga.prx _2l2.fdx _2l8.fdx _4ba.prx _1py.frq _uz.fdx _2l3.tii _3g6.prx _v3.fdx _1q6.fdt _v1.nrm _2l2.tii _1q0.tis _4ba.fdx _4be.tii _4ba.frq _4ll.fdt _4bh.nrm _4lm.fdt _1q7.frq _4lk.tis _4bc.frq _1q6.fnm _3g7.frq _uw.tis _3g8.tis _2l9.fdx _2l4.tii _1q4.fdx _4be.prx _1q3.nrm _1q0.tii _1q0.fnm _v3.nrm _1py.tis _3g9.fdt _4bh.fdt _4ll.nrm _4lk.prx _3gd.prx _1q3.tis _1q2.tii _2l2.nrm _3gd.fdt _2l3.fdx _3g6.fdt _3gd.frq _1q1.tis _4bb.fdx _1q2.frq _1q3.fdt _v1.tis _2l8.frq _3gc.fdx _1q1.frq _4bg.frq _4bb.frq _2la.fdx _2l9.frq _uy.tis _uy.prx _4bg.fdx _3gb.prx _uy.frq _1q2.fdx _4lm.prx _2la.prx _2l4.prx _4bg.fdt _4be.frq _1q7.nrm _2l5.prx _4bf.frq _v1.prx _4bd.fdt _2l9.prx _1q6.tis _3g8.fnm _4ln.tis _2l3.tis _4bc.fdx _2lb.prx _3gb.frq _3gf.frq _2la.fnm _3ga.fdt _uz.tis _4bg.nrm _uv.tii _4bg.tii _3g8.tii _4ll.frq _uv.fnm _2l8.tis _2l8.nrm _2l2.fdt _4bj.tis _4lk.fdx _uw.prx _4bc.prx _4bj.fdt _4be.fdx _1q4.frq _uu.fdt _1q1.tii _2l5.tii _2lb.fdt _4bh.frq _3ge.frq _1py.prx _1q5.nrm _v1.fdx _3g7.fdt _4ln.fdt _1q4.nrm _1py.fdt _3gc.tis _4ll.prx _v3.tis _4bf.fdx _1q5.fdx _1q0.prx _4bi.nrm _4ll.tis _2l4.tis _3gf.tii _v2.fnm _uu.nrm _1q0.nrm _4lm.fnm _uu.prx _2l6.frq _4ln.nrm _ux.nrm _3g6.frq _1q5.fdt _4bj.tii _2lb.fdx _uv.fdx _v1.frq at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:634) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:517) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:306) at org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:408) at org.apache.lucene.index.DirectoryReader.isCurrent(DirectoryReader.java:797) at org.apache.lucene.index.DirectoryReader.doReopenNoWriter(DirectoryReader.java:407
[jira] Issue Comment Edited: (LUCENE-2585) DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes
[ https://issues.apache.org/jira/browse/LUCENE-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895059#action_12895059 ] Sanne Grinovero edited comment on LUCENE-2585 at 8/3/10 6:26 PM: - sure, the test is totally open source; the directory implementation based on Infinispan is hosted as submodule of Infinispan: http://anonsvn.jboss.org/repos/infinispan/branches/4.1.x/lucene-directory/ The test is org.infinispan.lucene.profiling.PerformanceCompareStressTest it is included in the default test suite but disabled in Maven's configuration, so you should run it manually mvn clean test -Dtest=PerformanceCompareStressTest (running it requires the jboss.org repositories to be enabled in maven settings) To describe it at higher level: there are 5 IndexRead-ing threads using reopen() before each search, 2 threads writing to the index, 1 additional thread as a coordinator and asserting that readers find what they expect to see in the index. Exactly the same test scenario is then applied in sequence to RAMDirectory (not having issues), NIOFSDirectory, and 4 differently configured Infinispan directories. Only the FSDirectory is affected by the issue, and it can never complete the full hour of stresstest succesfully, while all other implementations behave fine. IndexWriter is set to MaxMergeDocs(5000) and setUseCompoundFile(false); the issue is reveled both using SerialMergeScheduler and while using the default merger. During the last execution the test managed to perform 22,192,006 searches and 26,875 writes before hitting the exceptional case. If you deem it useful I'd be happy in contributing a similar testcase to Lucene, but I assume you won't be excited in having such a long running test. Open to ideas to build a simpler one. was (Author: sanne): sure, the test is totally open source; the directory implementation based on Infinispan is hosted as submodule of Infinispan: http://anonsvn.jboss.org/repos/infinispan/branches/4.1.x/lucene-directory/ The test is org.infinispan.lucene.profiling.PerformanceCompareStressTest it is included in the default test suite but disabled in Maven's configuration, so you should run it manually mvn clean test -Dtest=PerformanceCompareStressTest (running it requires the jboss.org repositories to be enabled in maven settings) To describe it at higher level: there are 5 IndexRead-ing threads using reopen() before each search, 2 threads writing to the index, 1 additional thread as a coordinator and asserting that readers find what they expect to see in the index. Exactly the same test scenario is then applied in sequence to RAMDirectory (not having issues), NIOFSDirectory, and 4 differently configured Infinispan directories. Only the FSDirectory is affected by the issue, and it can never complete the full hour of stresstest succesfully, while all other implementations behave fine. IndexWriter is set to MaxMergeDocs(5000) and setUseCompoundFile(false); the issue is reveled both using SerialMergeScheduler and while using the default merger. During the last execution the test managed to perform 22,192,006 searches and 26,875 writes during the hour, but the benchmark is invalidated as one thread was killed by the exception. If you deem it useful I'd be happy in contributing a similar testcase to Lucene, but I assume you won't be excited in having such a long running test. Open to ideas to build a simpler one. DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes - Key: LUCENE-2585 URL: https://issues.apache.org/jira/browse/LUCENE-2585 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.3, 3.0, 3.0.1, 3.0.2 Reporter: Sanne Grinovero Fix For: 2.9.4, 3.0.3, 3.1 I could reproduce the issue several times but only by running long and stressfull benchmarks, the high number of files is likely part of the scenario. All tests run on local disk, using ext3. Sample stacktrace: {noformat}java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.NIOFSDirectory@/home/sanne/infinispan-41/lucene-directory/tempIndexName: files: _2l3.frq _uz.fdt _1q4.fnm _1q0.fdx _4bc.fdt _v2.tis _4ll.fdx _2l8.tii _ux.fnm _3g7.fdx _4bb.tii _4bj.prx _uy.fdx _3g7.prx _2l7.frq _2la.fdt _3ge.nrm _2l6.prx _1py.fdx _3g6.nrm _v0.prx _4bi.tii _2l2.tis _v2.fdx _2l3.nrm _2l8.fnm _4bg.tis _2la.tis _uu.fdx _3g6.fdx _1q3.frq _2la.frq _4bb.tis _3gb.tii _1pz.tis _2lb.nrm _4lm.nrm _3g9.tii _v0.fdt _2l5.fnm _v2.prx _4ll.tii _4bd.nrm _2l7.fnm _2l4.nrm _1q2.tis _3gb.fdx _4bh.fdx _1pz.nrm _ux.fdx _ux.tii _1q6.nrm _3gf.fdx _4lk.fdt _3gd.nrm _v3.fnm _3g8.prx _1q2.nrm
Re: Proposal about Version API relaxation
Hello, I think some compatibility breaks should really be accepted, otherwise these requirements are going to kill the technological advancement: the effort in backwards compatibility will grow and be more timeconsuming and harder every day. A mayor release won't happen every day, likely not even every year, so it seems acceptable to have milestones defining compatibility boundaries: you need to be able to reset the complexity curve occasionally. Backporting a feature would benefit from being merged in the correct testsuite, and avoid the explosion of this matrix-like backwards compatibility test suite. BTW the current testsuite is likely covering all kinds of combinations which nobody is actually using or caring about. Also if I where to discover a nice improvement in an Analyzer, and you where telling me that to contribute it I would have to face this amount of complexity.. I would think twice before trying; honestly the current requirements are scary. +1 Sanne 2010/4/15 Earwin Burrfoot ear...@gmail.com: I'd like to remind that Mike's proposal has stable branches. We can branch off preflex trunk right now and wrap it up as 3.1. Current trunk is declared as future 4.0 and all backcompat cruft is removed from it. If some new features/bugfixes appear in trunk, and they don't break stuff - we backport them to 3.x branch, eventually releasing 3.2, 3.3, etc Thus, devs are free to work without back-compat burden, bleeding edge users get their blood, conservative users get their stability + a subset of new features from stable branches. On Thu, Apr 15, 2010 at 22:02, DM Smith dmsmith...@gmail.com wrote: On 04/15/2010 01:50 PM, Earwin Burrfoot wrote: First, the index format. IMHO, it is a good thing for a major release to be able to read the prior major release's index. And the ability to convert it to the current format via optimize is also good. Whatever is decided on this thread should take this seriously. Optimize is a bad way to convert to current. 1. conversion is not guaranteed, optimizing already optimized index is a noop 2. it merges all your segments. if you use BalancedSegmentMergePolicy, that destroys your segment size distribution Dedicated upgrade tool (available both from command-line and programmatically) is a good way to convert to current. 1. conversion happens exactly when you need it, conversion happens for sure, no additional checks needed 2. it should leave all your segments as is, only changing their format It is my observation, though possibly not correct, that core only has rudimentary analysis capabilities, handling English very well. To handle other languages well contrib/analyzers is required. Until recently it did not get much love. There have been many bw compat breaking changes (though w/ version one can probably get the prior behavior). IMHO, most of contrib/analyzers should be core. My guess is that most non-trivial applications will use contrib/analyzers. I counter - most non-trivial applications will use their own analyzers. The more modules - the merrier. You can choose precisely what you need. By and large an analyzer is a simple wrapper for a tokenizer and some filters. Are you suggesting that most non-trivial apps write their own tokenizers and filters? I'd find that hard to believe. For example, I don't know enough Chinese, Farsi, Arabic, Polish, ... to come up with anything better than what Lucene has to tokenize, stem or filter these. Our user base are those with ancient, underpowered laptops in 3-rd world countries. On those machines it might take 10 minutes to create an index and during that time the machine is fairly unresponsive. There is no opportunity to do it in the background. Major Lucene releases (feature-wise, not version-wise) happen like once in a year, or year-and-a-half. Is it that hard for your users to wait ten minutes once a year? I said that was for one index. Multiply that times the number of books available (300+) and yes, it is too much to ask. Even if a small subset is indexed, say 30, that's around 5 hours of waiting. Under consideration is the frequency of breakage. Some are suggesting a greater frequency than yearly. DM - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Proposal about Version API relaxation
+1 on the Analyzers split, But would like to point out that it's not very different than having a non final static version field. Just a much better solution as you keep your code manageable. 2010/4/15 Grant Ingersoll gsing...@apache.org: On Apr 15, 2010, at 4:21 PM, Shai Erera wrote: +1 on the Analyzers as well. Earwin, I think I don't mind if we introduce migrate() elsewhere rather than on IW. What I meant to say is that if we stick w/ index format back-compat and ongoing migration, then such a method would be useful on IW for customers to call to ensure they're on the latest version. But if the majority here agree w/ a standalone tool, then I'm ok if it sits elsewhere. Grant, I'm all for 'just doing it and see what happens'. But I think we need to at least decide what we're going to do so it's clear to everyone. Because I'd like to know if I'm about to propose an index format change, whether I need to build migration tool or not. Actually, I'd like to know if people like Robert (basically those who have no problem to reindex and don't understand the fuss around it) will want to change the index format - can I count on them to be asked to provide such tool? That's to me a policy we should decide on ... whatever the consequences. As I said, we should strive for index compatibility, but even in the past, we said we did, but the implications weren't always clear. I think index compatibility is very important. I've seen plenty of times where reindexing is not possible. But even then, you still have the option of testing to find out whether you can update or not. If you can't update, then don't until you can figure out how to do it. FWIW, I think our approach is much more proactive than see what happens. I'd argue, that in the past, our approach was see what happens, only the seeing didn't happen until after the release! -Grant - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Having a default constructor in Analyzers
2010/2/8 Robert Muir rcm...@gmail.com: 8 snip 8 how would this work when the Query analyzer differs from the Index analyzer? For example, using commongrams in solr means you use a different Query analyzer from Index analyzer, and there are some other use cases even in solr (synonyms expansion and things like that) 8 snip 8 They are two different Analyzer types, but I assume they want to use the same value for Version, right? The same version which was used to build the rest of the index. Regards, Sanne - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Having a default constructor in Analyzers
Hi Uwe, yes Hibernate is definitely recommending the Solr way for normal and power users, but we're also taking care of beginners trying it out for the first time it should just work out of the box for a simple POC, in those cases an Analyzer is defined as global analyzer (used for all cases you're not overriding the default); in this case it used to be possible to specify a single Analyzer by fully qualified name, to be used globally, or one per index. Of course this is far from the flexibility needed for most real world applications, but keeps it simple for beginners taking a first look to introducing Lucene; so for these cases I don't care much about the Version used, of course it's important that they later can pin it down. To be compatible I'll have to change the loader, which is going to look for a default constructor, or a single-parameter Version constructor, should be good enough to accomodate the simple goal; I'll read the Version from a configuration parameter, probably nailing down the Version to the current latest and/or reading my own environment parameter. I agree about the factory strategy; in fact it's on HSEARCH-457 since right before my emails here; I asked here to check we could keep it simple :-) Thanks all, Sanne 2010/2/8 Uwe Schindler u...@thetaphi.de: Simon: Sanne, I would recommend you building a Factory pattern around you Analyzers / TokenStreams similar to what solr does. That way you can load you own default ctor interface via reflection and obtain you analyzers from those factories. That makes more sense anyway as you only load the factory via reflection an not the analyzers. As far as I see, Hibernate uses Solr Factories. On the other hand, you can instead of creating your own SolrAnalyzer also use a standard one from Lucene (you can do this in Solr, too): http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#analyzer In my opinion, the Factory pattern is ok for own analyzer definitions. For reusing standard analyzers like StandardAnalyzer or TurkishAnalyzer, the ideal case is to use the reflection code I proposed before. This code works for all language-based analyzers having a standard ctor or Version ctor. Solr will also handle this reflection-based instantiation with optional Version parameter in future, too (Eric Hatcher pointed that out to me, when working on SOLR-1677: Another comment on this... Solr supports using an Analyzer also, but only ones with zero-arg constructors. It would be nice if this Version support also allowed for Analyzers (say SmartChineseAnalyzer) to be used also directly. I don't think this patch accounts for this case, does it?). As Hibernate also uses the factory pattern for custom analyzers, as soon as https://issues.apache.org/jira/browse/SOLR-1677 is in, the version problem for those should be solved, too (as you can specify the parameter to each component). But Hibernate should also think about a global default Version (like Solr via CoreAware or like that), that is used as a default param to all Tokenizers/TokenFilters and when reflection-based Anaylzer subclass instantiation is used. By the way, hibernate's reuse of Solr's schema is one argument of Hoss, not to make it CoreAware. Uwe - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Having a default constructor in Analyzers
Hello, I've seen that some core Analyzers are now missing a default constructor; this is preventing many applications to configure/load Analyzers by reflection, which is a common use case to have Analyzers chosen in configuration files. Would it be possible to add, for example, a constructor like public StandardAnalyzer() { this(Version.LUCENE_CURRENT); } ? Of course more advanced use cases would need to pass parameters but please make the advanced usage optional; I have now seen more than a single project break because of this (and revert to older Lucene). Regards, Sanne - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Having a default constructor in Analyzers
Thanks for all the quick answers; finding the ctor having only a Version parameter is fine for me, I had noticed this frequent pattern but didn't understand that was a general rule. So can I assume this is an implicit contract for all Analyzers, to have either an empty ctor or a single-parameter of type Version? I know about the dangers of using LUCENE_CURRENT, but rebuilding the index is not always something you need to avoid. Having LUCENE_CURRENT is for example useful for me to test Hibernate Search towards the current Lucene on classpath, without having to rebuild the code. thanks for all help, Sanne 2010/2/7 Robert Muir rcm...@gmail.com: I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION is done. On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler u...@thetaphi.de wrote: Hi Sanne, Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is the badest thing you can do if you want to later update your Lucene version and do not want to reindex all your indexes (see javadocs). It is easy to modify your application to create analyzers even from config files using the reflection way. Just find a constructor taking Version and call newInstance() on it, not directly on the Class. It's just one line of code more. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Sanne Grinovero [mailto:sanne.grinov...@gmail.com] Sent: Sunday, February 07, 2010 6:33 PM To: java-dev@lucene.apache.org Subject: Having a default constructor in Analyzers Hello, I've seen that some core Analyzers are now missing a default constructor; this is preventing many applications to configure/load Analyzers by reflection, which is a common use case to have Analyzers chosen in configuration files. Would it be possible to add, for example, a constructor like public StandardAnalyzer() { this(Version.LUCENE_CURRENT); } ? Of course more advanced use cases would need to pass parameters but please make the advanced usage optional; I have now seen more than a single project break because of this (and revert to older Lucene). Regards, Sanne - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Robert Muir rcm...@gmail.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Having a default constructor in Analyzers
Does it make sense to use different values across the same application? Obviously in the unlikely case you want to threat different indexes in a different way, but does it make sense when working all on the same index? If not, why not introduce a value like Version.BY_ENVIRONMENT which is statically initialized to be one of the other values, reading from an environment parameter? So you get the latest at first deploy, and can then keep compatibility as long as you need, even when updating Lucene. This way I could still have the safety of pinning down a specific version and yet avoid rebuilding the app when changing it. Of course the default would be LUCENE_CURRENT, so that people trying out Lucene get all features out of the box, and warn about setting it (maybe log a warning when not set). Also, wouldn't it make sense to be able to read the recommended version from the Index? I'd like to have the hypothetical AnalyzerFactory to find out what it needs to build getting information from the relevant IndexReader; so in the case I have two indexes using different versions I won't get mistakes. (For a query on index A I'm creating a QueryParser, so let's ask the index which kind of QueryParser I should use...) just some ideas, forgive me if I misunderstood this usage (should avoid writing late in the night..) Regards, Sanne 2010/2/7 Simon Willnauer simon.willna...@googlemail.com: On Sun, Feb 7, 2010 at 8:38 PM, Robert Muir rcm...@gmail.com wrote: Simon, can you explain how removing CURRENT makes it harder for users to upgrade? If you mean for the case of people that always re-index all documents when upgrading lucene jar, then this makes sense to me. That is what I was alluding to! Not much of a deal though most IDEs let you upgrade via refactoring easily and we can document this too. Yet we won't have a drop in upgrade anymore though. I guess as a step we can at least deprecate this thing and strongly discourage its use, please see the patch at LUCENE-2080. Not to pick on Sanne, but his wording about: Of course more advanced use cases would need to pass parameters but please make the advanced usage optional, this really caused me to rethink CURRENT, because CURRENT itself should be the advanced use case!!! On Sun, Feb 7, 2010 at 2:34 PM, Simon Willnauer simon.willna...@googlemail.com wrote: Sanne, I would recommend you building a Factory pattern around you Analyzers / TokenStreams similar to what solr does. That way you can load you own default ctor interface via reflection and obtain you analyzers from those factories. That makes more sense anyway as you only load the factory via reflection an not the analyzers. @Robert: I don't know if removing LUCENE_CURRENT is the way to go. On the one hand it would make our live easier over time but would make it harder for our users to upgrade. I would totally agree that for upgrade safety it would be much better to enforce an explicit version number so upgrading can be done step by step. Yet, if we deprecate LUCENE_CURRENT people will use it for at least the next 3 to 5 years (until 4.0) anyway :) simon On Sun, Feb 7, 2010 at 8:17 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: Thanks for all the quick answers; finding the ctor having only a Version parameter is fine for me, I had noticed this frequent pattern but didn't understand that was a general rule. So can I assume this is an implicit contract for all Analyzers, to have either an empty ctor or a single-parameter of type Version? I know about the dangers of using LUCENE_CURRENT, but rebuilding the index is not always something you need to avoid. Having LUCENE_CURRENT is for example useful for me to test Hibernate Search towards the current Lucene on classpath, without having to rebuild the code. thanks for all help, Sanne 2010/2/7 Robert Muir rcm...@gmail.com: I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION is done. On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler u...@thetaphi.de wrote: Hi Sanne, Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is the badest thing you can do if you want to later update your Lucene version and do not want to reindex all your indexes (see javadocs). It is easy to modify your application to create analyzers even from config files using the reflection way. Just find a constructor taking Version and call newInstance() on it, not directly on the Class. It's just one line of code more. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Sanne Grinovero [mailto:sanne.grinov...@gmail.com] Sent: Sunday, February 07, 2010 6:33 PM To: java-dev@lucene.apache.org Subject: Having a default constructor in Analyzers Hello, I've seen that some core Analyzers are now missing a default constructor
Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
thanks for the heads-up, this is good to know. I've updated http://wiki.apache.org/lucene-java/AvailableLockFactories which I recently created as a guide to help in choosing between different LockFactories. I believe the Native LockFactory is very useful, I wouldn't consider this a bug nor consider discouraging it's use, people just need to be informed of the behavior and know that no LockFactory impl is good for all cases. Adding some lines to it's javadoc seems appropriate. Regards, Sanne 2010/1/20 Chris Hostetter hossman_luc...@fucit.org: : At a minimu, shouldn't NativeFSLock.obtain() be checking for : OverlappingFileLockException and treating that as a failure to acquire the : lock? ... : Perhaps - that should make it work in more cases - but in my simple : testing its not 100% reliable. ... : File locks are held on behalf of the entire Java virtual machine. : * They are not suitable for controlling access to a file by multiple : * threads within the same virtual machine. ...Grrr so where does that leave us? Yonik's added comment was that native isnt' recommended when running multiple webapps in the same container. in truth, native *can* work when running multiple webapps in the same container, just as long as those cotnainers don't refrence the same data dirs I'm worried that we should recommend people avoid native altogether because even if you are only running one webapp, it seems like a reload or that app could trigger some similar bad behavior. So what/how should we document all of this? -Hoss
Re: Lucene memory consumption
Hello Frederic, I'm CCing java-dev@lucene.apache.org as Michael McCandless has been very helpful on IRC in discussing the ThreadLocal implication, and it would be nice you could provide first-hand information. There's a good reading to start from at http://issues.apache.org/jira/browse/LUCENE-1383 Basically your proposal is having a problem which is that when you close the ThreadLocal it's only going to cleanup the resources stored by the current thread, not by others; setting the reference to null also won't help: Quoting the TLocal source's comment: * However, since reference queues are not * used, stale entries are guaranteed to be removed only when * the table starts running out of space. About your issues: 1. A ThreadLocal object should normally be a singleton used has key to the thread map. Here it is repeatedly created and destroy! It's only built in the constructor, and destroyed on close. So it's lifecycle is linked to the Analyzer / FieldCache using it, probably a long time, or the appropriate time to cleanup things. 2. Setting t = null; is not affecting the garbage collection of the ThreadLocal map since t is the key (hard ref) of the thread map. Well t is unfortunately being reused as a variable name: t = null; is clearing the reference to the threadlocal, which really is the key of the map used by the threadlocal and referenced by the current Thread instance, and TLocal uses weak *Keys* not values (and the key is the TLocal itself). 3. There are no call to t.remove() which will really clean the Map entry. You could add one, but it would only cleanup the garbage from the current thread, so it's ok but not enough. The current impl is making sure all stuff is collected by wrapping it all in weak values. Actually some stuff is not collected: the WeakReferences themselves, but pointing to going-to-be-collected stuff. These WeakReferences are going to be removed when the TLocal table is full, and should be harmless (?). As you pointed out, since Lucene 3 it's releasing what is possible to release eagerly, but it's a very small slight optimization: you still need the weak/hardref trick to clean the other values. 4. A ThreadLocal Map is already a WeakReference for the value. No, it's on the keys: a collected ThreadLocal will be cleaned up for. eventually :-/ 5. Leaving objects on a ThreadLocal after it is out of your control is bad practice. Another task may reuse the Thread and found dirty objects there. Agree, but having weak values it's not a big issue. Also it's not meant to be used by faint hearted, just people writing their own Analyzer could have this wrong :) 6. We found (in all our tests) the hardRef Map to be completely unnecessary in Lucene 2.4.1, but here I'm lacking more in depth knowledge of the lifecycle of the objects added to this CloseableThreadLocal. Well as it's being used as a cache functionality will be the same, performance should be worse. AFAIK all TokenFilters are able to rebuild what they need when get() returns null, you might have a problem on the unlikely case of org.apache.lucene.util.CloseableThreadLocal:68 having the assertion fail, but again not affecting functionality (assuming assertions are disabled). A vanilla ThreadLocal is obviously faster than this, but then we end up reverting LUCENE-1383 and so introducing more pressure on the GC. It would be very interesting to find out why your implementation is performing better? Maybe because in your case Analyzers are used by one thread at a time, and so you're not leaking memory? Could you tell more about this to lucene-dev directly? Regards, Sanne 2010/1/6 Frederic Simon fr...@jfrog.org: Thanks Emmanuel, Yes the main issue is that the hardRef map in this class was forcing all the objects to go to the Old generation space in the JVM GC, instead of staying at a ThreadLocal level. So, all objects put in the CloseableThreadLocal were GC only on full GC. On heavy lucene usage, it generated around 500Mb of heap for each 5 secs until full GC kicks in. Our problem is that we really a lot on SoftReference for our cache and so this Lucene behavior is really bad for us (Customer feedback: http://old.nabble.com/What's-the-memory-requirements-for-2.1.3--to27026622.html#a27026622 ). With my class all objects stay in young gen and so the performance boost for us was huge. The issues with the class: A ThreadLocal object should normally be a singleton used has key to the thread map. Here it is reapeatdly created and destroy! Setting t = null; is not affecting the garbage collection of the ThreadLocal map since t is the key (hard ref) of the thread map. There are no call to t.remove() which will really clean the Map entry. A ThreadLocal Map is already a WeakReference for the value. Leaving objects on a ThreadLocal after it is out of your control is bad practice. Another task may reuse the Thread and found dirty objects there. We found (in all our tests) the hardRef Map
Re: Lucene 2.9.0 Near Real Time Indexing and lock timeouts
A common error I see is that people assume the IndexWriter to be not threadsafe, and open several different instances. You should use just one IndexWriter, keep it open and flush periodically (not commit at each add operation), and read the Lucene wiki pages about the IndexWriter settings like ramBufferSize. That why there's only one lock, no contention from different threads. There's an explanation of the fastest design I could get here: http://in.relation.to/Bloggers/HibernateSearch32FastIndexRebuild It's describing the procedure used by Hibernate Search for rebuilding the Lucene index from an Hibernate mapped database. While I recommend reading for newcomers, I'd also appreciate feedback and comments from Lucene experts and developers :-) Regards, Sanne 2010/1/14 Michael McCandless luc...@mikemccandless.com: Calling commit after every addition will drastically slow down your indexing throughput, and concurrency (commit is internally synchronized), but should not create lock timeouts, unless you are also opening a new IndexWriter for every addition? Mike On Thu, Jan 14, 2010 at 12:15 PM, jchang jchangkihat...@gmail.com wrote: With only 10 concurrent consumers, I do get lock problems. However, I am calling commit() at the end of each addition. Could I expect better concurrency without timeouts if I did not commit as often? -- View this message in context: http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing-and-lock-timeouts-tp27136743p27164797.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: update doc by query
Then I wouldn't need it and can still improve performance by using periodic commits, nice! thanks for explaining this, Sanne On Mon, Jan 11, 2010 at 10:57 AM, Michael McCandless luc...@mikemccandless.com wrote: On Sun, Jan 10, 2010 at 6:13 PM, Sanne Grinovero s.grinov...@sourcesense.com wrote: Even if it's not strictly needed anymore, could it improve performance? I think there should be no real performance gains/losses one way or another. The current updateDocument call basically boils down to delete then add. Right now I need to use commit() right after this dual operation to make sure no reader is ever going to miss it You don't need to use commit() right after -- you can use commit any time later and both the del add will be present. but if it was atomic I could have avoided the commit and just trust that at some time later it will be auto-committed: exact moment would be out of my control, but even so the view on index wouldn't have a chance to miss some documents. Lucene no longer auto-commits -- your app completely controls when to commit, so, I think the atomic-ness is unecessary? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Sanne Grinovero http://in.relation.to/Bloggers/Sanne Sourcesense - making sense of Open Source: http://www.sourcesense.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: update doc by query
If the demand is the problem: I would really love that: in most scenarios a single term is not enough to identify a Document: I need at least two so I use usually remove-by-query first and then add again. This sometimes needs some application level lock to make the changes consistent. Sanne 2010/1/10 Mark Miller markrmil...@gmail.com: Any reason we don't offer update doc by query along with term? Its easy enough to implement in the same manner - is there some sort of gotchya with this, or is it just because there has been no demand yet? -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: update doc by query
Even if it's not strictly needed anymore, could it improve performance? Right now I need to use commit() right after this dual operation to make sure no reader is ever going to miss it, but if it was atomic I could have avoided the commit and just trust that at some time later it will be auto-committed: exact moment would be out of my control, but even so the view on index wouldn't have a chance to miss some documents. Regards, Sanne On Sun, Jan 10, 2010 at 10:04 PM, Michael McCandless luc...@mikemccandless.com wrote: I think there no particular demand... But: why not just separately delete by query, then add? Back when IW had autoCommit=true, it was compelling to have an atomic update, but now with only autoCommit=false, the app has full control over visibility to readers, so do we even need update-by-term anymore? Mike On Sun, Jan 10, 2010 at 2:13 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: If the demand is the problem: I would really love that: in most scenarios a single term is not enough to identify a Document: I need at least two so I use usually remove-by-query first and then add again. This sometimes needs some application level lock to make the changes consistent. Sanne 2010/1/10 Mark Miller markrmil...@gmail.com: Any reason we don't offer update doc by query along with term? Its easy enough to implement in the same manner - is there some sort of gotchya with this, or is it just because there has been no demand yet? -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Sanne Grinovero http://in.relation.to/Bloggers/Sanne Sourcesense - making sense of Open Source: http://www.sourcesense.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: nightly build deploy to Maven repositories
I would be happy with 3.0.1-SNAPSHOT too, that will also fix my problem. Will I have to wait for next release before I can share my patches? Best Regards, Sanne Grinovero 2009/12/3 Sanne Grinovero sanne.grinov...@gmail.com: Hello, I'm needing to depend on some recently committed bugfix from Lucene's 2.9 branch in other OSS projects, using Maven2 for dependency management. Are there snapshots uploaded somewhere regularly? Could Hudson do that? Looking into Hudson it appears that it regularly builds trunk; wouldn't it be a good idea to have him also verify the 2.9 branch until it's actively updated? Regards, Sanne - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene 2.4.1 src .zip issue
Hello Erik, I just downloaded it from: http://archive.apache.org/dist/lucene/java/lucene-2.4.1-src.zip Size: 5.9 MB (6134777 bytes) I'm having no errors, using UnZip 6.00 of 20 April 2009, by Debian. on Debian 64bit. If you're downloading from the same source, you could try again? Best Regards, Sanne Grinovero 2009/12/10 Erik Hatcher erik.hatc...@gmail.com: I was doing some research on past releases of Lucene and downloaded the archived 2.4.1 src .zip and got this: ~/Downloads: unzip lucene-2.4.1-src.zip Archive: lucene-2.4.1-src.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of lucene-2.4.1-src.zip or lucene-2.4.1-src.zip.zip, and cannot find lucene-2.4.1-src.zip.ZIP, period. Yikes! Anyone else have issues with it? Or anomalous to my download? Erik - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Upgrading Lucene jars
I'm not using Embedded Solr directly, I've seen several projects depending on Lucene as a Maven artifact and include also a dependency on some solr module as a general utility, for example to use some solr analysers. Let's say you had Lucene 2.4.1, when adding solr-analysers version 1.3.0 in the mix it appears to work well in testing, until the classloading order changes in an application server and you'll find out that maven will have added the solr-lucene-core artifact too, which looks like fine unless you know what's in there. The poor developer could have a hard time to find out that he is having two artifacts with different identifiers and different jar names containing same code at different versions, after noticing some undefined field or method. I've learnt the lesson so I don't speak to help myself, but I think it would be an improvement and make life easier for others; Maven should take care of this but it's actually giving a false feeling of confidence in this case. Regards, Sanne 2009/12/9 Shalin Shekhar Mangar shalinman...@gmail.com: On Wed, Dec 9, 2009 at 3:33 PM, Sanne Grinovero sanne.grinov...@gmail.comwrote: Why is Solr not depending directly on Lucene but repackaging the same classes? Solr does depend on Lucene jars. We strive to package officially released Lucene artifacts but sometimes the release schedule of Lucene and Solr are different enough to build and package Lucene jars ourselves. The CHANGES.txt in the Solr distribution has the version of Lucene used in that distribution. For example, Solr 1.4 released with Lucene 2.9.1 Solr 1.4 has already released and we are free to upgrade Lucene jars in trunk to any version we desire for further development. Sorry I've probably missed some important discussion. Whatever the reason for this decision, is it still a good reason? This gets new users in a hell of trouble sometimes, as some applications introduce Solr after having Lucene already on the classpath and it's not immediately obvious that differently named jars contain same named classes. Are you using Embedded Solr? Otherwise the Lucene jars are in the solr.war's WEB-INF/lib directory and there is no chance of a conflict. -- Regards, Shalin Shekhar Mangar.
Re: Upgrading Lucene jars
Why is Solr not depending directly on Lucene but repackaging the same classes? Sorry I've probably missed some important discussion. Whatever the reason for this decision, is it still a good reason? This gets new users in a hell of trouble sometimes, as some applications introduce Solr after having Lucene already on the classpath and it's not immediately obvious that differently named jars contain same named classes. Could this be a good timeframe to change this? Regards, Sanne 2009/12/8 Koji Sekiguchi k...@r.email.ne.jp: Shalin Shekhar Mangar wrote: I need to upgrade contrib-spellcheck jar for SOLR-785. Should I go ahead and upgrade all Lucene jars to the latest 2.9 branch code? +1. Koji -- http://www.rondhuit.com/en/
[jira] Commented: (LUCENE-2095) Document not guaranteed to be found after write and commit
[ https://issues.apache.org/jira/browse/LUCENE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785943#action_12785943 ] Sanne Grinovero commented on LUCENE-2095: - Thanks a lot Michael, this makes my distributed testing reliable again :-) I see you didn't apply my testcase, do you think it's not needed to have such a test? If you need I could change it as you wish. Document not guaranteed to be found after write and commit -- Key: LUCENE-2095 URL: https://issues.apache.org/jira/browse/LUCENE-2095 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.4.1, 2.9.1 Environment: Linux 64bit Reporter: Sanne Grinovero Assignee: Michael McCandless Fix For: 2.9.2, 3.0.1, 3.1 Attachments: LUCENE-2095.patch, lucene-stresstest.patch after same email on developer list: I developed a stress test to assert that a new document containing a specific term X is always found after a commit on the IndexWriter. This works most of the time, but it fails under load in rare occasions. I'm testing with 40 Threads, both with a SerialMergeScheduler and a ConcurrentMergeScheduler, all sharing a common IndexWriter. Attached testcase is using a RAMDirectory only, but I verified a FSDirectory behaves in the same way so I don't believe it's the Directory implementation or the MergeScheduler. This test is slow, so I don't consider it a functional or unit test. It might give false positives: it doesn't always fail, sorry I couldn't find out how to make it more likely to happen, besides scheduling it to run for a longer time. I tested this to affect versions 2.4.1 and 2.9.1; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
nightly build deploy to Maven repositories
Hello, I'm needing to depend on some recently committed bugfix from Lucene's 2.9 branch in other OSS projects, using Maven2 for dependency management. Are there snapshots uploaded somewhere regularly? Could Hudson do that? Looking into Hudson it appears that it regularly builds trunk; wouldn't it be a good idea to have him also verify the 2.9 branch until it's actively updated? Regards, Sanne - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Solr 1.5 - The Cloud Edition!
Hello Yonik, that's very interesting, I'm working since some time on the Infinispan based Lucene Directory, have you seen my announcement on dev-lucene? I didn't dare to cross-post. Again the link: http://www.jboss.org/community/wiki/InfinispanasaDirectoryforLucene It's an implementation to distribute the index to a dynamic cluster, and Infinispan enables autodiscovery even on the cloud. I'm only focusing on the Directory and LockManager; the Directory is working but needs yes some polishing and profiling, while I can trust the LockFactory for having survived some good stress tests and performing well already. I didn't know about ZooKeeper, and in my plans the index sharding would have been transparent to Lucene; It shouldn't be hard to have non-transparent sharding on top of it i you need that, and the low level distribution is totally configurable. It's also nice it can scale down to zero-nodes, persisting the in memory distributed state to something else (some plugins provided, like JDBC or S3 stores). Regards, Sanne 2009/12/4 Yonik Seeley yo...@lucidimagination.com: I hereby dub Solr 1.5 The Cloud Edition! (of course anyone else may also dub it anything else they so choose ;-) There's lots of prototyles and great work floating around that aim to increase the practical scalability and ease of cluster management of Solr. I did some brainstorming myself of how we could use zookeeper on the flight to ApacheCon US last month, and had a number of discussions with various people while there. I'm going over those notes and adding some stuff to a new wiki page: http://wiki.apache.org/solr/SolrCloud Of course the main issue is at https://issues.apache.org/jira/browse/SOLR-1277 And there is already another wiki page http://wiki.apache.org/solr/ZooKeeperIntegration I started a new page for myself because I'm not sure we're all in sync yet and didn't want to get into competitive editing :-) Anyway, I think this is going to be a big enough issue with potentially a ton of discussion, and we should perhaps use the mailing lists for general design discussions rather than forcing everything into a single JIRA issue (which doesn't deal well with huge threads). -Yonik http://www.lucidimagination.com
Re: Socket and file locks
Hello, I'm glad you appreciate it; I've added the Wiki page here: http://wiki.apache.org/lucene-java/AvailableLockFactories I avoided on purpose to copy-paste the full javadocs of each implementation as that would be out-of-date or too specific to some version, I limited myself to writing some words to highlight the differences as a quick overview of what is available. hope you like it, I'm open to suggestions. Regards, Sanne 2009/11/29 Michael McCandless luc...@mikemccandless.com: This looks great! Maybe it makes most sense to create a wiki page (http://wiki.apache.org/lucene-java) for interesting LockFactory implementations/tradeoffs, and add this there? Mike On Sat, Nov 28, 2009 at 9:26 AM, Sanne Grinovero sanne.grinov...@gmail.com wrote: Hello, Together with the Infinispan Directory we developed such a LockFactory; I'd me more than happy if you wanted to add some pointers to it in the Lucene documention/readme. This depends on Infinispan for multiple-machines communication (JGroups, indirectly) but it's not required to use an Infinispan Directory, you could combine it with a Directory impl of choice. This was tested with the LockVerifyServer mentioned by Michael McCandless and also with some other tests inspired from it (in-VM for lower delay coordination and verify, while the LockFactory was forced to use real network communication). While this is a technology preview and performance regarding the Directory code is still unknown, I believe the LockFactory was the most tested component. free to download and inspect (LGPL): http://anonsvn.jboss.org/repos/infinispan/trunk/lucene-directory/ Regards, Sanne 2009/11/27 Michael McCandless luc...@mikemccandless.com: I think a LockFactory for Lucene that implemented the ideas you Marvin are discussing in LUCENE-1877, and/or the approach you implemented in the H2 DB, would be a useful addition to Lucene! For many apps, the simple LockFactory impls suffice, but for apps where multiple machines can become the writer, it gets hairy. Having an always correct Lock impl for these apps would be great. Note that Lucene has some basic tools (in oal.store) for asserting that a LockFactory is correct (see LockVerifyServer), so it's a useful way to test that things are working from Lucene's standpoint. Mike On Fri, Nov 27, 2009 at 9:23 AM, Thomas Mueller thomas.tom.muel...@gmail.com wrote: Hi, I'm wondering if your are interested in automatically releasing the write lock. See also my comments on https://issues.apache.org/jira/browse/LUCENE-1877 - I thought it's a problem worth solving, because it's also in the Lucene FAQ list at http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_purpose_of_write.lock_file.2C_when_is_it_used.2C_and_by_which_classes.3F Unfortunately there seems to be no solution that 'always works', but delegating the task and responsibility to the application / to the user is problematic as well. For example, a user of the H2 database (that supports Lucene fulltext indexing) suggested to automatically remove the write.lock file whenever the file is there: http://code.google.com/p/h2database/issues/detail?id=141 - sounds a bit dangerous in my view. So, if you are interested to solve the problem, then maybe I can help. If not, then I will not bother you any longer :-) Regards, Thomas shouldn't active code like that live in the application layer? Why? You can all but guarantee that polling will work at the app layer The application layer may also run with low priority. In operating systems, it's usually the lower layer that have more 'rights' (priority), and not the higher levels (I'm not saying it should be like that in Java). I just think the application layer should not have to deal with write locks or removing write locks. by the time the original process realizes that it doesn't hold the lock anymore, the damage could already have been done. Yes, I'm not sure how to best avoid that (with any design). Asking the application layer or the user whether the lock file can be removed is probably more dangerous than trying the best in Lucene. Standby / hibernate: the question is, if the machine process is currently not running, does the process still hold the lock? I think no, because the machine might as well turned off. How to detect whether the machine is turned off versus in hibernate mode? I guess that's a problem for all mechanisms (socket / file lock / background thread). When a hibernated process wakes up again, he thinks he owns the lock. Even if the process checks before each write, it is unsafe: if (isStillLocked()) { write(); } The process could wake up after isStillLocked() but before write(). One protection is: The second process (the one that breaks the lock) would need to work on a copy of the data instead of the original file (it could delete / truncate the orginal file after creating a copy). On Windows, renaming
Re: Socket and file locks
Hello, Together with the Infinispan Directory we developed such a LockFactory; I'd me more than happy if you wanted to add some pointers to it in the Lucene documention/readme. This depends on Infinispan for multiple-machines communication (JGroups, indirectly) but it's not required to use an Infinispan Directory, you could combine it with a Directory impl of choice. This was tested with the LockVerifyServer mentioned by Michael McCandless and also with some other tests inspired from it (in-VM for lower delay coordination and verify, while the LockFactory was forced to use real network communication). While this is a technology preview and performance regarding the Directory code is still unknown, I believe the LockFactory was the most tested component. free to download and inspect (LGPL): http://anonsvn.jboss.org/repos/infinispan/trunk/lucene-directory/ Regards, Sanne 2009/11/27 Michael McCandless luc...@mikemccandless.com: I think a LockFactory for Lucene that implemented the ideas you Marvin are discussing in LUCENE-1877, and/or the approach you implemented in the H2 DB, would be a useful addition to Lucene! For many apps, the simple LockFactory impls suffice, but for apps where multiple machines can become the writer, it gets hairy. Having an always correct Lock impl for these apps would be great. Note that Lucene has some basic tools (in oal.store) for asserting that a LockFactory is correct (see LockVerifyServer), so it's a useful way to test that things are working from Lucene's standpoint. Mike On Fri, Nov 27, 2009 at 9:23 AM, Thomas Mueller thomas.tom.muel...@gmail.com wrote: Hi, I'm wondering if your are interested in automatically releasing the write lock. See also my comments on https://issues.apache.org/jira/browse/LUCENE-1877 - I thought it's a problem worth solving, because it's also in the Lucene FAQ list at http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_purpose_of_write.lock_file.2C_when_is_it_used.2C_and_by_which_classes.3F Unfortunately there seems to be no solution that 'always works', but delegating the task and responsibility to the application / to the user is problematic as well. For example, a user of the H2 database (that supports Lucene fulltext indexing) suggested to automatically remove the write.lock file whenever the file is there: http://code.google.com/p/h2database/issues/detail?id=141 - sounds a bit dangerous in my view. So, if you are interested to solve the problem, then maybe I can help. If not, then I will not bother you any longer :-) Regards, Thomas shouldn't active code like that live in the application layer? Why? You can all but guarantee that polling will work at the app layer The application layer may also run with low priority. In operating systems, it's usually the lower layer that have more 'rights' (priority), and not the higher levels (I'm not saying it should be like that in Java). I just think the application layer should not have to deal with write locks or removing write locks. by the time the original process realizes that it doesn't hold the lock anymore, the damage could already have been done. Yes, I'm not sure how to best avoid that (with any design). Asking the application layer or the user whether the lock file can be removed is probably more dangerous than trying the best in Lucene. Standby / hibernate: the question is, if the machine process is currently not running, does the process still hold the lock? I think no, because the machine might as well turned off. How to detect whether the machine is turned off versus in hibernate mode? I guess that's a problem for all mechanisms (socket / file lock / background thread). When a hibernated process wakes up again, he thinks he owns the lock. Even if the process checks before each write, it is unsafe: if (isStillLocked()) { write(); } The process could wake up after isStillLocked() but before write(). One protection is: The second process (the one that breaks the lock) would need to work on a copy of the data instead of the original file (it could delete / truncate the orginal file after creating a copy). On Windows, renaming the file might work (not sure); on Linux you probably need to copy the content to a new file. Like that, the awoken process can only destroy inactive data. The question is: do we need to solve this problem? How big is the risk? Instead of solving this problem completely, you could detect it after the fact without much overhead, and throw an exception saying: data may be corrupt now. PID: With the PID, you could check if the process still runs. Or it could be another process with the same PID (is that possible?), or the same PID but a different machine (when using a network share). It's probably more safe if you can communicate with the lock owner (using TCP/IP or over the file system by deleting/creating a file). Unique id: The easiest solution is to use a UUID
StressTest: Document not guaranteed to be found after write and commit
Hello, I developed a stress test to assert that a new document containing a specific term X is always found after a commit on the IndexWriter. This works most of the time, but it fails under load in rare occasions. I'm testing with 40 Threads, both with a SerialMergeScheduler and a ConcurrentMergeScheduler, all sharing a common IndexWriter. Attached testcase is using a RAMDirectory only, but I verified a FSDirectory behaves in the same way so I don't believe it's the Directory implementation or the MergeScheduler. This test is slow, so I don't consider it a functional or unit test. It might give false positives: it doesn't always fail, sorry I couldn't find out how to make it more likely to happen, besides scheduling it to run for a longer time. Could someone please try it, and suggest if my test is wrong or if I should open a new issue? The patch applies to 2.9.1, I've experienced same behavior on 2.4.1. Best regards, Sanne Grinovero P.S. congratulations with the release of 3.0.0 :-) lucene-stresstest.patch Description: Binary data - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2095) Document not guaranteed to be found after write and commit
Document not guaranteed to be found after write and commit -- Key: LUCENE-2095 URL: https://issues.apache.org/jira/browse/LUCENE-2095 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.4.1 Environment: Linux 64bit Reporter: Sanne Grinovero after same email on developer list: I developed a stress test to assert that a new document containing a specific term X is always found after a commit on the IndexWriter. This works most of the time, but it fails under load in rare occasions. I'm testing with 40 Threads, both with a SerialMergeScheduler and a ConcurrentMergeScheduler, all sharing a common IndexWriter. Attached testcase is using a RAMDirectory only, but I verified a FSDirectory behaves in the same way so I don't believe it's the Directory implementation or the MergeScheduler. This test is slow, so I don't consider it a functional or unit test. It might give false positives: it doesn't always fail, sorry I couldn't find out how to make it more likely to happen, besides scheduling it to run for a longer time. I tested this to affect versions 2.4.1 and 2.9.1; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2095) Document not guaranteed to be found after write and commit
[ https://issues.apache.org/jira/browse/LUCENE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanne Grinovero updated LUCENE-2095: Attachment: lucene-stresstest.patch attaching the testcase, apply to version 2.9.1. It's slow, please be patient. Document not guaranteed to be found after write and commit -- Key: LUCENE-2095 URL: https://issues.apache.org/jira/browse/LUCENE-2095 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.4.1, 2.9.1 Environment: Linux 64bit Reporter: Sanne Grinovero Attachments: lucene-stresstest.patch after same email on developer list: I developed a stress test to assert that a new document containing a specific term X is always found after a commit on the IndexWriter. This works most of the time, but it fails under load in rare occasions. I'm testing with 40 Threads, both with a SerialMergeScheduler and a ConcurrentMergeScheduler, all sharing a common IndexWriter. Attached testcase is using a RAMDirectory only, but I verified a FSDirectory behaves in the same way so I don't believe it's the Directory implementation or the MergeScheduler. This test is slow, so I don't consider it a functional or unit test. It might give false positives: it doesn't always fail, sorry I couldn't find out how to make it more likely to happen, besides scheduling it to run for a longer time. I tested this to affect versions 2.4.1 and 2.9.1; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: StressTest: Document not guaranteed to be found after write and commit
thanks a lot for looking into it. It's opened: https://issues.apache.org/jira/browse/LUCENE-2095 Besides this being expected behavior after a commit(), I'm needing this to be able to assert state consistency on the distributed Directory under load: any suggestions for a temporary workaround? I am thinking about a statistical assert, like considering it's fine if (ratio error some threshold), but that's my last resort. Regards, Sanne 2009/11/25 Michael McCandless luc...@mikemccandless.com: Indeed I see this test failing too! On first look the test seems correct. Can you open an issue attach this as a patch? Thanks. Mike On Wed, Nov 25, 2009 at 12:30 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: Hello, I developed a stress test to assert that a new document containing a specific term X is always found after a commit on the IndexWriter. This works most of the time, but it fails under load in rare occasions. I'm testing with 40 Threads, both with a SerialMergeScheduler and a ConcurrentMergeScheduler, all sharing a common IndexWriter. Attached testcase is using a RAMDirectory only, but I verified a FSDirectory behaves in the same way so I don't believe it's the Directory implementation or the MergeScheduler. This test is slow, so I don't consider it a functional or unit test. It might give false positives: it doesn't always fail, sorry I couldn't find out how to make it more likely to happen, besides scheduling it to run for a longer time. Could someone please try it, and suggest if my test is wrong or if I should open a new issue? The patch applies to 2.9.1, I've experienced same behavior on 2.4.1. Best regards, Sanne Grinovero P.S. congratulations with the release of 3.0.0 :-) - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: A new Lucene Directory available
Hi Lukas, Our reference during early design was Lucene 2.4.1, but we look forward for compatibility and new tricks. Current trunk is compatible towards Lucene's trunk, but I won't close ISPN-275 until it's confirmed against a released Lucene 3.0.0 : hopefully this will come before Infinispan 4 release. Regards, Sanne On Sun, Nov 15, 2009 at 8:50 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, this sounds very interesting. Do you know which versions of Lucene are supported? Do you know if it would work with upcoming Lucene 3.0.x? https://jira.jboss.org/jira/browse/ISPN-275 Regards, Lukas http://blog.lukas-vlcek.com/ On Sun, Nov 15, 2009 at 5:33 AM, Sanne Grinovero s.grinov...@sourcesense.com wrote: Hi John, I didn't run a long running reliable benchmark, so at the moment I can't really speak of numbers. Suggestions and help on performance testing are welcome: I guess it will shine in some situations, not necessarily all, so really choosing a correct ratio of concurrent writers/searches, number of nodes in the cluster and resources per node will never be fair enough to compare this Directory with others. On paper the premises are good: it's all in-memory, until it fits: it will distribute data across nodes and overflow to disk is supported (called passivation). A permanent store can be configured, so you could set it to periodically flush incrementally to slower storages like a database, a filesystem, a cloud storage service. This makes it possible to avoid losing state even when all nodes are shut down. A RAMDirectory is AFAIK not recommended as you could hit memory limits and because it's basically a synchronized HashMap; Infinispan implements ConcurrentHashMap and doesn't need synchronization. Even if the data is replicated across nodes each node has it's own local cache, so when caches are warm and all segments fit in memory it should be, theoretically, the fastest Directory ever. The more it will read from disk, the more it will behave similarly to a FSDirectory with some buffers. As per Lucene's design, writes can happen only at one node at a time: one IndexWriter can own the lock, but IndexReaders and Searchers are not blocked, so when using this Directory it should behave exactly as if you had multiple processes sharing a local NIOFSdirectory: basically the situation is that you can't scale on writers, but you can scale near-linearly with readers adding in more power from more machines. Besides performance, the reasons to implement this was to be able to easily add or remove processing power to a service (clouds), make it easier to share indexes across nodes, and last but not least to remove single points of failure: all data is distributed and there is no such notion of Master: services will continue running fine when killing any node. I hope this peeks your interest, sorry if I couldn't provide numbers. Regards, Sanne On Sat, Nov 14, 2009 at 11:15 PM, John Wang john.w...@gmail.com wrote: HI Sanne: Very interesting! What kinda performance should we expect with this, comparing to regular FSDIrectory on local HD. Thanks -John On Sat, Nov 14, 2009 at 11:44 AM, Sanne Grinovero s.grinov...@sourcesense.com wrote: Hello all, I'm a Lucene user and fan, I wanted to tell you that we just released a first technology preview of a distributed in memory Directory for Lucene. The release announcement: http://infinispan.blogspot.com/2009/11/second-release-candidate-for-400.html From there you'll find links to the Wiki, to the sources, to the issue tracker. A minimal demo is included with the sources. This was developed together with Google Summer of Code student Lukasz Moren and much support from the Infinispan and Hibernate Search teams, as we are storing the index segments on Infinispan and using it's atomic distributed locks to implement a Lucene LockFactory. Initial idea was to contribute it directly to Lucene, but as Infinispan is a LGPL dependency we had to distribute it with Infinispan (as the other way around would have introduced some legal issues); still we hope you appreciate the effort and are interested in giving it a try. All kind of feedback is welcome, especially on benchmarking methodologies as I yet have to do some serious performance tests. Main code, build with Maven2: svn co http://anonsvn.jboss.org/repos/infinispan/tags/4.0.0.CR2/lucene-directory/ infinispan-directory Demo, see the Readme: svn co http://anonsvn.jboss.org/repos/infinispan/tags/4.0.0.CR2/demos/lucene-directory/ lucene-demo Best Regards, Sanne -- Sanne Grinovero Sourcesense - making sense of Open Source: http://www.sourcesense.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: A new Lucene Directory available
Hi Earwin, thanks for the insight, as I mentioned I have no proper benchmarks to back my statements but I can see how it behaves, so absolutely I could be too optimistic. They are currently profiling Infinispan and speeding up some internals, so I'll wait for these tasks to finish to begin testing on our part; while waiting I collect suggestions about how you think I should test it properly? Which kind of comparisons would you like to see? I'm currently working on JIRA clustering (called Scarlet), so the typical index usage pattern of that application is going to be my favorite scenario. I know about the Terracotta efforts, I agree with you and have collected much feedback about which problems were arising directly talking with the people maintaining such systems. I even got to hear some success cases, but yes they are scarce and there are some problems; be assured that we have analyzed them carefully before deciding for this design. I'm not a Terracotta expert myself, but was helped on this by specialists. My personal opinion resulting from these talks is that Terracotta works, but is too tricky to setup and not viable in case the indexes change frequently. About the RAMDirectory comparison, as you said yourself the bytes aren't read constantly but just at index reopen so I wouldn't be too worried about the bunch of methods as they're executed once per segment loading; I'll improve that if possible, thanks for looking! I'm sure many parts can be improved, patches are welcome. Instances of ChunkCacheKey are not created for each single byte read but for each byte[] buffer, being the size of these buffers configurable. This was decided after observations that it was improving performance to chunk segments in smaller pieces rather than have huge arrays of bytes, but if you like you can configure it to degenerate to approach the one key per segment ratio. Comparing to a RAMDirectory is unfair, as with InfinispanDirectory I can scale :-) Still I take the point, I'll have some tests also in single node mode to compare them, for fun as the use cases are a bit different but I'm confident I could surprise you when I have to choice of the scenario. About JGroups I'm not technically prepared for a match, but I've heard of different stories of much bigger than 20 nodes business critical clusters working very well. Sure, it won't scale without a proper configuration at all levels: os, jgroups and infrastructure. Thank you very much for you considerations, it's very appreciated. Regards, Sanne On Sun, Nov 15, 2009 at 12:39 PM, Earwin Burrfoot ear...@gmail.com wrote: Terracotta guys easy-clustered Lucene a few years ago. I'm yet to see at least one person saying it worked for him allright. This new directory ain't gonna be faster than RAMDirectory, as syncs on a map doesn't matter, they are taken once per opened file - once per reopen, which is not happening thousands of times a sec. Taking a glance at the code (svn trunk), it actually is much slower. I mean, compare IndexInput.readByte()s. A whole slew of code and method calls plus a ChunkCacheKey created per each byte read (violent GC rape, ring the police!) VS if, incr, array access for RAMDir. I wouldn't be too optimistic in doesn't-fit-in-memory case VS FSDirectory either. OS' paging/file caching skills are hard to match, plus OS file cache resides outside of Java heap, which (as reallife experience dictates) is immensely good for your GC pauses. Now to the networking part. Infinispan is based on JGroups. Last time I saw it, it exploded under a moderate load on 20 nodes. I believe the library is still good, properly configured and for lesser loads, but not for distributing Lucene index that is frequently updated and merged on each node of the cluster. Please excuse me if I'm overboard in places, and correct me if I am wrong. On Sun, Nov 15, 2009 at 07:33, Sanne Grinovero s.grinov...@sourcesense.com wrote: Hi John, I didn't run a long running reliable benchmark, so at the moment I can't really speak of numbers. Suggestions and help on performance testing are welcome: I guess it will shine in some situations, not necessarily all, so really choosing a correct ratio of concurrent writers/searches, number of nodes in the cluster and resources per node will never be fair enough to compare this Directory with others. On paper the premises are good: it's all in-memory, until it fits: it will distribute data across nodes and overflow to disk is supported (called passivation). A permanent store can be configured, so you could set it to periodically flush incrementally to slower storages like a database, a filesystem, a cloud storage service. This makes it possible to avoid losing state even when all nodes are shut down. A RAMDirectory is AFAIK not recommended as you could hit memory limits and because it's basically a synchronized HashMap; Infinispan implements ConcurrentHashMap and doesn't need synchronization. Even
Re: A new Lucene Directory available
Hi again Earwin, thanks you very much for spotting the byte reading issue, it's definitely not as I wanted it. https://jira.jboss.org/jira/browse/ISPN-276 I never tried to defend an improved updates/s ratio, just maybe compared to scheduled rsyncs :-) Our goal is to scale on queries/sec while usage semantics stays unchanged, so you can open an IndexWriter as it was local to make updates clusterwide. Very useful to cluster the many products already using Lucene which are currently implementing exotic index management workarounds or shared filesystems, as they weren't designed for it from the beginning as SolR did. I mentioned JIRA, you noticed how slow it can get on larger deployments? because there's no way to deploy it clustered currently (besides by using Terracotta), as it relies much on Lucene and index changes need to be applied in real time. About locking and jgroups.. please switch over to infinispan-...@lists.jboss.org so you can get better answers and I don't have to spam the Lucene developers. Regards, Sanne On Sun, Nov 15, 2009 at 3:43 PM, Earwin Burrfoot ear...@gmail.com wrote: About the RAMDirectory comparison, as you said yourself the bytes aren't read constantly but just at index reopen so I wouldn't be too worried about the bunch of methods as they're executed once per segment loading; The bytes /are/ read constantly (readByte() method). I believe that is the most innermost loop you can hope to find in Lucene. A RAMDirectory is AFAIK not recommended as you could hit memory limits and because it's basically a synchronized HashMap; On the other hand, just as I mentioned - the only access to said synchronized HashMap is done when you open InputStream on a file. That, unlike readByte(), happens rarely, as InputStreams are cloned after creation as needed. As for memory limits, your unbounded local cache hits them with same ease. Instances of ChunkCacheKey are not created for each single byte read but for each byte[] buffer, being the size of these buffers configurable. No, they are! :-) InfinispanIndexIO.java, rev. 1103: 120 public byte readByte() throws IOException { . 132 buffer = getChunkFromPosition(cache, fileKey, filePosition, bufferSize); . 141 } getChunkFromPosition() is called each time readByte() is invoked. It creates 1-2 instances of ChunkCacheKey. This was decided after observations that it was improving performance to chunk segments in smaller pieces rather than have huge arrays of bytes, but if you like you can configure it to degenerate to approach the one key per segment ratio. Locally, it's better not to chunk segments (unless you hit 2Gb barrier). When shuffling them over network - I can't say. Comparing to a RAMDirectory is unfair, as with InfinispanDirectory I can scale :-) I'm just following two of your initial comparisons. And the only characteristic that can be scaled with such approach is queries/s. Index size - definetly not, updates/s - questionable. About JGroups I'm not technically prepared for a match, but I've heard of different stories of much bigger than 20 nodes business critical clusters working very well. Sure, it won't scale without a proper configuration at all levels: os, jgroups and infrastructure. The volume of messages travelling around, length of GC delays VS cluster size and messaging mode matter. They used reliable synchronous multicasts, so - once one node starts collecting, all others wait (or worse - send retries). Another one starts collecting, then another, partially delivered messages hold threads - caboom! How is locking handled here? With central broker it probably can work. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Sanne Grinovero Sourcesense - making sense of Open Source: http://www.sourcesense.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: A new Lucene Directory available
Hi John, I didn't run a long running reliable benchmark, so at the moment I can't really speak of numbers. Suggestions and help on performance testing are welcome: I guess it will shine in some situations, not necessarily all, so really choosing a correct ratio of concurrent writers/searches, number of nodes in the cluster and resources per node will never be fair enough to compare this Directory with others. On paper the premises are good: it's all in-memory, until it fits: it will distribute data across nodes and overflow to disk is supported (called passivation). A permanent store can be configured, so you could set it to periodically flush incrementally to slower storages like a database, a filesystem, a cloud storage service. This makes it possible to avoid losing state even when all nodes are shut down. A RAMDirectory is AFAIK not recommended as you could hit memory limits and because it's basically a synchronized HashMap; Infinispan implements ConcurrentHashMap and doesn't need synchronization. Even if the data is replicated across nodes each node has it's own local cache, so when caches are warm and all segments fit in memory it should be, theoretically, the fastest Directory ever. The more it will read from disk, the more it will behave similarly to a FSDirectory with some buffers. As per Lucene's design, writes can happen only at one node at a time: one IndexWriter can own the lock, but IndexReaders and Searchers are not blocked, so when using this Directory it should behave exactly as if you had multiple processes sharing a local NIOFSdirectory: basically the situation is that you can't scale on writers, but you can scale near-linearly with readers adding in more power from more machines. Besides performance, the reasons to implement this was to be able to easily add or remove processing power to a service (clouds), make it easier to share indexes across nodes, and last but not least to remove single points of failure: all data is distributed and there is no such notion of Master: services will continue running fine when killing any node. I hope this peeks your interest, sorry if I couldn't provide numbers. Regards, Sanne On Sat, Nov 14, 2009 at 11:15 PM, John Wang john.w...@gmail.com wrote: HI Sanne: Very interesting! What kinda performance should we expect with this, comparing to regular FSDIrectory on local HD. Thanks -John On Sat, Nov 14, 2009 at 11:44 AM, Sanne Grinovero s.grinov...@sourcesense.com wrote: Hello all, I'm a Lucene user and fan, I wanted to tell you that we just released a first technology preview of a distributed in memory Directory for Lucene. The release announcement: http://infinispan.blogspot.com/2009/11/second-release-candidate-for-400.html From there you'll find links to the Wiki, to the sources, to the issue tracker. A minimal demo is included with the sources. This was developed together with Google Summer of Code student Lukasz Moren and much support from the Infinispan and Hibernate Search teams, as we are storing the index segments on Infinispan and using it's atomic distributed locks to implement a Lucene LockFactory. Initial idea was to contribute it directly to Lucene, but as Infinispan is a LGPL dependency we had to distribute it with Infinispan (as the other way around would have introduced some legal issues); still we hope you appreciate the effort and are interested in giving it a try. All kind of feedback is welcome, especially on benchmarking methodologies as I yet have to do some serious performance tests. Main code, build with Maven2: svn co http://anonsvn.jboss.org/repos/infinispan/tags/4.0.0.CR2/lucene-directory/ infinispan-directory Demo, see the Readme: svn co http://anonsvn.jboss.org/repos/infinispan/tags/4.0.0.CR2/demos/lucene-directory/ lucene-demo Best Regards, Sanne -- Sanne Grinovero Sourcesense - making sense of Open Source: http://www.sourcesense.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Sanne Grinovero Sourcesense - making sense of Open Source: http://www.sourcesense.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lock API, throwing IOException
Thanks a lot! This makes error management much simpler. Sanne 2009/11/1 Michael McCandless luc...@mikemccandless.com: OK, this makes sense. I'll add it. Mike On Sat, Oct 31, 2009 at 9:43 AM, Sanne Grinovero sanne.grinov...@gmail.com wrote: Hello, I'm implementing a distributed directory based on Infinispan (www.jboss.org/infinispan) currently implementing the org.apache.lucene.store.Lock, I was wondering why is /** Returns true if the resource is currently locked. Note that one must * still call {...@link #obtain()} before using the resource. */ public abstract boolean isLocked(); not throwing an IOException as other methods do? Could you please add it? It looks like it should be trivial, as all clients of this API are already declaring to throw the same Exception. Regards, Sanne Grinovero - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Lock API, throwing IOException
Hello, I'm implementing a distributed directory based on Infinispan (www.jboss.org/infinispan) currently implementing the org.apache.lucene.store.Lock, I was wondering why is /** Returns true if the resource is currently locked. Note that one must * still call {...@link #obtain()} before using the resource. */ public abstract boolean isLocked(); not throwing an IOException as other methods do? Could you please add it? It looks like it should be trivial, as all clients of this API are already declaring to throw the same Exception. Regards, Sanne Grinovero - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1329) Remove synchronization in SegmentReader.isDeleted
[ https://issues.apache.org/jira/browse/LUCENE-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12611572#action_12611572 ] Sanne Grinovero commented on LUCENE-1329: - Adding a readonly IndexReader would be really great, I'm contributing some code to Hibernate Search (integration of Lucene and Hibernate) and that project could really benefit from that. Remove synchronization in SegmentReader.isDeleted - Key: LUCENE-1329 URL: https://issues.apache.org/jira/browse/LUCENE-1329 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.3.1 Reporter: Jason Rutherglen Priority: Trivial Attachments: lucene-1329.patch Removes SegmentReader.isDeleted synchronization by using a volatile deletedDocs variable on Java 1.5 platforms. On Java 1.4 platforms synchronization is limited to obtaining the deletedDocs reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]