Re: [VOTE] Release Lucene/Solr 5.5.5 RC2

2017-10-23 Thread Sanne Grinovero
SUCCESS! [1:15:56.228143]

+1

Thanks!
Sanne


On 20 October 2017 at 16:28, Steve Rowe <sar...@gmail.com> wrote:
> Please vote for release candidate 2 for Lucene/Solr 5.5.5
>
> The artifacts can be downloaded from:
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.5.5-RC2-revb3441673c21c83762035dc21d3827ad16aa17b68
>
> You can run the smoke tester directly with this command:
>
> python3 -u dev-tools/scripts/smokeTestRelease.py \
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.5.5-RC2-revb3441673c21c83762035dc21d3827ad16aa17b68
>
> Here's my +1
> SUCCESS! [0:53:51.570213]
>
> --
> Steve
> www.lucidworks.com
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #263: Backporting of SOLR-11477 on branch_5_5

2017-10-17 Thread Sanne
Github user Sanne closed the pull request at:

https://github.com/apache/lucene-solr/pull/263


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #263: Backporting of SOLR-11477 on branch_5_5

2017-10-17 Thread Sanne
Github user Sanne commented on the issue:

https://github.com/apache/lucene-solr/pull/263
  
Thanks @sarowe ! closing


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #263: Backporting of SOLR-11477 on branch_5_5

2017-10-17 Thread Sanne
GitHub user Sanne opened a pull request:

https://github.com/apache/lucene-solr/pull/263

Backporting of SOLR-11477 on branch_5_5

This is an adaptation of last weeks' security fix SOLR-11477 by (Michael 
Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke) (aka
@cpoerschke @uschindler ) to the 5_5 branch.

The main difference with the original patch is in the inability of using 
lambdas, and not having some of the new generation testing helpers.

In the CHANGES file I wasn't sure how to name this, I've opted to call it 
"version 5.5.6". Maybe I should simply omit the version?

HTH


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sanne/lucene-solr SOLR-11477-on-5_5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #263


commit 590dca88dedc44242d155d476b1e4dca99a25f12
Author: Christine Poerschke <cpoersc...@apache.org>
Date:   2017-10-13T11:46:58Z

SOLR-11477: Disallow resolving of external entities in Lucene 
queryparser/xml/CoreParser and SolrCoreParser (defType=xmlparser or {!xmlparser 
...}) by default.

(Michael Stepankin, Olga Barinova, Uwe Schindler, Christine Poerschke)




---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6989) Implement MMapDirectory unmapping for coming Java 9 changes

2016-12-15 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752649#comment-15752649
 ] 

Sanne Grinovero commented on LUCENE-6989:
-

We have more recent releases of Hibernate Search using Lucene 5.5.x, but we 
typically aim to support older releases as well, for some reasonable time. It 
just so happens that Lucene 5.3 isn't that old yet in our perspective. While I 
constantly work to motivate people to move to the latest, for many Lucene 5.3 
is working just great.

The OSS communities we target typically will not expect API changes in a 
maintenance release, and we happen to (proudly) expose Lucene as public API, as 
I believe that hiding it all under some wrapping layer would not be able to be 
as powerful. Since we expose Lucene as public API implies I can't really update 
my dependency to Lucene with other than a micro (bugfix) release, when doing a 
micro/bugfix release myself: people got used that a Lucene major/minor update 
will only happen in an Hibernate Search major/minor update.

Of course if that's not feasible, we might have to advise that those older 
releases won't be compatible with Java 9; that's a possible outcome, I guess 
we'll see how the final Java 9 release will make this doable. See you at 
FOSDEM, hopefully with my colleague Andrew Haley as well ;-)

> Implement MMapDirectory unmapping for coming Java 9 changes
> ---
>
> Key: LUCENE-6989
> URL: https://issues.apache.org/jira/browse/LUCENE-6989
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/store
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>  Labels: Java9
> Fix For: 6.0, 6.4
>
> Attachments: LUCENE-6989-disable5x.patch, 
> LUCENE-6989-disable5x.patch, LUCENE-6989-fixbuild148.patch, 
> LUCENE-6989-v2.patch, LUCENE-6989-v3-post-b148.patch, LUCENE-6989.patch, 
> LUCENE-6989.patch, LUCENE-6989.patch, LUCENE-6989.patch
>
>
> Originally, the sun.misc.Cleaner interface was declared as "critical API" in 
> [JEP 260|http://openjdk.java.net/jeps/260 ]
> Unfortunately the decission was changed in favor of a oficially supported 
> {{java.lang.ref.Cleaner}} API. A side effect of this change is to move all 
> existing {{sun.misc.Cleaner}} APIs into a non-exported package. This causes 
> our forceful unmapping to no longer work, because we can get the cleaner 
> instance via reflection, but trying to invoke it will throw one of the new 
> Jigsaw RuntimeException because it is completely inaccessible. This will make 
> our forceful unmapping fail. There are also no changes in Garbage collector, 
> the problem still exists.
> For more information see this [mailing list 
> thread|http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-January/thread.html#38243].
> This commit will likely be done, making our unmapping efforts no longer 
> working. Alan Bateman is aware of this issue and will open a new issue at 
> OpenJDK to allow forceful unmapping without using the now private 
> sun.misc.Cleaner. The idea is to let the internal class sun.misc.Cleaner 
> implement the Runable interface, so we can simply cast to runable and call 
> the run() method to unmap. The code would then work. This will lead to minor 
> changes in our unmapper in MMapDirectory: An instanceof check and casting if 
> possible.
> I opened this issue to keep track and implement the changes as soon as 
> possible, so people will have working unmapping when java 9 comes out. 
> Current Lucene versions will no longer work with Java 9.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6989) Implement MMapDirectory unmapping for coming Java 9 changes

2016-11-12 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659554#comment-15659554
 ] 

Sanne Grinovero commented on LUCENE-6989:
-

Hi all, is there an update on this? I see several patches were committed, and 
the Hotspot issue JDK-8150436 is marked resolved, yet this issue is not.

I'm particularly interested in the backport to 5.5 (actually ideally to 5.3); 
if someone could guide me I'll try to help with it.

Thanks!

> Implement MMapDirectory unmapping for coming Java 9 changes
> ---
>
> Key: LUCENE-6989
> URL: https://issues.apache.org/jira/browse/LUCENE-6989
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/store
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>  Labels: Java9
> Fix For: 6.0
>
> Attachments: LUCENE-6989-disable5x.patch, 
> LUCENE-6989-disable5x.patch, LUCENE-6989-v2.patch, LUCENE-6989.patch, 
> LUCENE-6989.patch, LUCENE-6989.patch, LUCENE-6989.patch
>
>
> Originally, the sun.misc.Cleaner interface was declared as "critical API" in 
> [JEP 260|http://openjdk.java.net/jeps/260 ]
> Unfortunately the decission was changed in favor of a oficially supported 
> {{java.lang.ref.Cleaner}} API. A side effect of this change is to move all 
> existing {{sun.misc.Cleaner}} APIs into a non-exported package. This causes 
> our forceful unmapping to no longer work, because we can get the cleaner 
> instance via reflection, but trying to invoke it will throw one of the new 
> Jigsaw RuntimeException because it is completely inaccessible. This will make 
> our forceful unmapping fail. There are also no changes in Garbage collector, 
> the problem still exists.
> For more information see this [mailing list 
> thread|http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-January/thread.html#38243].
> This commit will likely be done, making our unmapping efforts no longer 
> working. Alan Bateman is aware of this issue and will open a new issue at 
> OpenJDK to allow forceful unmapping without using the now private 
> sun.misc.Cleaner. The idea is to let the internal class sun.misc.Cleaner 
> implement the Runable interface, so we can simply cast to runable and call 
> the run() method to unmap. The code would then work. This will lead to minor 
> changes in our unmapper in MMapDirectory: An instanceof check and casting if 
> possible.
> I opened this issue to keep track and implement the changes as soon as 
> possible, so people will have working unmapping when java 9 comes out. 
> Current Lucene versions will no longer work with Java 9.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 5.5.2 RC2

2016-06-22 Thread Sanne Grinovero
+1

[from the Hibernate Search integration testsuite]

On 22 June 2016 at 06:35, Shalin Shekhar Mangar  wrote:
> +1
>
> SUCCESS! [2:19:37.075305]
>
> On Tue, Jun 21, 2016 at 10:18 PM, Steve Rowe  wrote:
>>
>> Please vote for release candidate 2 for Lucene/Solr 5.5.2
>>
>> The artifacts can be downloaded from:
>>
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.5.2-RC2-rev8e5d40b22a3968df065dfc078ef81cbb031f0e4a/
>>
>> You can run the smoke tester directly with this command:
>>
>> python3 -u dev-tools/scripts/smokeTestRelease.py \
>>
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.5.2-RC2-rev8e5d40b22a3968df065dfc078ef81cbb031f0e4a/
>>
>> +1 from me - Docs, changes and javadocs look good, and smoke tester says:
>> SUCCESS! [0:32:02.113685]
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Congratulations to the new Lucene/Solr PMC Chair, Tommaso Teofili

2016-06-16 Thread Sanne Grinovero
That's great, congratulations!


[jira] [Commented] (LUCENE-7058) Add getters for the properties of several Query implementations

2016-03-03 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177928#comment-15177928
 ] 

Sanne Grinovero commented on LUCENE-7058:
-

Great, that's very handy! Thanks all for the speedy reviews and merge.

> Add getters for the properties of several Query implementations
> ---
>
> Key: LUCENE-7058
> URL: https://issues.apache.org/jira/browse/LUCENE-7058
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/query/scoring
>Reporter: Guillaume Smet
>Assignee: Alan Woodward
>  Labels: patch
> Fix For: 6.0
>
> Attachments: query-getters-v01.00.diff
>
>
> Hi!
> At Hibernate Search, we are currently working on an Elasticsearch backend 
> (aside from the existing Lucene backend).
> As part of this effort, to provide a smooth migration path, we need to be 
> able to rewrite the Lucene queries as Elasticsearch queries. We know it will 
> be neither perfect or comprehensive but we want it to be the best possible 
> experience.
> It works well in many cases but several implementations of Query don't have 
> the necessary getters to be able to extract the information from the Query.
> The attached patch add getters to several implementations of Query. It would 
> be nice if it could be applied.
> Any chance it could be applied to the next point release too? (probably not 
> but I'd better ask).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 5.3.2-RC1

2016-01-16 Thread Sanne Grinovero
Since the release had passed, could we please get further fixes in
further micro versions?

I'm looking forward for Lucene 5.3.2 for the other fixes it brings already.

On 15 January 2016 at 17:03, Yonik Seeley  wrote:
> On Fri, Jan 15, 2016 at 11:34 AM, Erick Erickson
>  wrote:
>> Anshum:
>>
>> I really hate to ask, but do we know whether
>>
>> https://issues.apache.org/jira/browse/SOLR-8496
>> (Facet search count numbers are falsified by older document versions)
>>
>> is a problem in 5.3.2? It's in 5.4.1 and we don't yet know when
>> it was introduced.
>
> At this point, my guess is that was caused by LUCENE-6553
> which was committed in 5.3
>
> -Yonik
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Moving Lucene / Solr from SVN to Git

2016-01-12 Thread Sanne Grinovero
Thanks for finally switching, I have been looking forward to this.

I've been doing release management and generally helping with the
switch from SVN to Git for the Hibernate project in the past 5 years,
so I'm happy to share hints and tips from our experience there.

Feel free to ask me for help on IRC or emails if you get stuck: we
love Lucene, wouldn't want you to slow down ;)

One crucial concept: it might be obvious - although sometimes it's not
when people have been using SVN for a long time - but when you have a
local Git clone of a project, you can experiment a lot and play with
Git to see what would happen.
As long as you don't push changes, you can experiment branching,
merging and rebasing without making your experiments affect anyone
else.

Always create a new branch first, so you can play with the
experimental branch and maybe nuke it if you get lost, and start over.
So when reading the tutorials and references, don't be afraid to type
commands and check the results.

Thanks,
Sanne


On 11 January 2016 at 09:33, Jan Høydahl <jan@cominvent.com> wrote:
> All discussion in the Github PR is captured in JIRA if the two are linked,
> see https://issues.apache.org/jira/browse/SOLR-8166 as an example
> If they are not linked, comments go to the dev list.
> So we can keep it as today - allow people to choose freely to use patches 
> and/or PRs.
> NOTE: We should always create JIRA for PR’s that we want to fix.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>> 11. jan. 2016 kl. 09.13 skrev Dawid Weiss <dawid.we...@gmail.com>:
>>
>> Remember github is an external service, if it vanishes so would all
>> comments and discussion. I'd stick with Jira, at least for the time
>> being (until people get more familiar with git). Not everything at
>> once.
>>
>> Dawid
>>
>> On Mon, Jan 11, 2016 at 9:11 AM, Shai Erera <ser...@gmail.com> wrote:
>>> I think it will be nice if we integrate a code review tool into our
>>> workflow, such as Gerrit maybe (even Github pull requests are good), instead
>>> of the patch workflow with JIRA.
>>>
>>> But I agree we don't have to change that, not at start at least. The move to
>>> git will allow those who want it, to use the code review tool on Github (via
>>> pull requests).
>>>
>>> Shai
>>>
>>> On Mon, Jan 11, 2016 at 5:27 AM Mark Miller <markrmil...@gmail.com> wrote:
>>>>
>>>> I don't think there is a current plan to change how we do business. Just a
>>>> change in where the master copy is hosted.
>>>>
>>>> We already have JIRA, dev, commit procedures, and integration with GitHub
>>>> pull requests. All that will stay the same. No need to overthink it.
>>>>
>>>> - Mark
>>>>
>>>> On Sun, Jan 10, 2016 at 4:18 PM Jack Krupansky <jack.krupan...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Will anybody be able to create a pull request and then only committers
>>>>> perform the merge operation? (I presume so, but... just for clarity,
>>>>> especially for those not git-savvy yet.)
>>>>>
>>>>> Would patches still be added to Jira requests, or simply a link to a pull
>>>>> request? (Again, I presume the latter, but the details of "submitting a
>>>>> patch" should be clearly documented.)
>>>>>
>>>>> Then there is the matter of code review and whether to encourage comments
>>>>> in Jira. Comments can be made on pull requests, but should some external
>>>>> tool like reviewable.io be encouraged?
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Sat, Jan 9, 2016 at 4:54 PM, Mark Miller <markrmil...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> We have done almost all of the work necessary for a move and I have
>>>>>> filed an issue with INFRA.
>>>>>>
>>>>>> LUCENE-6937: Migrate Lucene project from SVN to Git.
>>>>>> https://issues.apache.org/jira/browse/LUCENE-6937
>>>>>>
>>>>>> INFRA-11056: Migrate Lucene project from SVN to Git.
>>>>>> https://issues.apache.org/jira/browse/INFRA-11056
>>>>>>
>>>>>> Everyone knows about rebase and linear history right ;)
>>>>>>
>>>>>> - Mark
>>>>>> --
>>>>>> - Mark
>>>>>> about.me/markrmiller
>>>>>
>>>>>
>>>> --
>>>> - Mark
>>>> about.me/markrmiller
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6909) Improve concurrency for FacetsConfig

2015-11-25 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026929#comment-15026929
 ] 

Sanne Grinovero commented on LUCENE-6909:
-

Hi [~mikemccand]! Thanks for checking.
Yes, of course that first changed line is not required. I just felt it was 
useful to make it explicit to the reader that these are concurrent maps. Just a 
matter of style, feel free to revert that if it doesn't fit the Lucene style? 
Or should I provide an alternative patch?

> Improve concurrency for FacetsConfig
> 
>
> Key: LUCENE-6909
> URL: https://issues.apache.org/jira/browse/LUCENE-6909
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: 5.3
>Reporter: Sanne Grinovero
>Priority: Trivial
> Attachments: 
> 0001-LUCENE-6909-Allow-efficient-concurrent-usage-of-a-Fa.patch
>
>
> The design of {{org.apache.lucene.facet.FacetsConfig}} encourages reuse of a 
> single instance across multiple threads, yet the current synchronization 
> model is too strict as it doesn't allow for concurrent read operations.
> I'll attach a trivial patch which removes the contention point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6909) Improve concurrency for FacetsConfig

2015-11-25 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027261#comment-15027261
 ] 

Sanne Grinovero commented on LUCENE-6909:
-

Thanks!

> Improve concurrency for FacetsConfig
> 
>
> Key: LUCENE-6909
> URL: https://issues.apache.org/jira/browse/LUCENE-6909
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: 5.3
>Reporter: Sanne Grinovero
>Priority: Trivial
> Fix For: Trunk, 5.5
>
> Attachments: 
> 0001-LUCENE-6909-Allow-efficient-concurrent-usage-of-a-Fa.patch
>
>
> The design of {{org.apache.lucene.facet.FacetsConfig}} encourages reuse of a 
> single instance across multiple threads, yet the current synchronization 
> model is too strict as it doesn't allow for concurrent read operations.
> I'll attach a trivial patch which removes the contention point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6909) Improve concurrency for FacetsConfig

2015-11-24 Thread Sanne Grinovero (JIRA)
Sanne Grinovero created LUCENE-6909:
---

 Summary: Improve concurrency for FacetsConfig
 Key: LUCENE-6909
 URL: https://issues.apache.org/jira/browse/LUCENE-6909
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Affects Versions: 5.3
Reporter: Sanne Grinovero
Priority: Trivial


The design of {{org.apache.lucene.facet.FacetsConfig}} encourages reuse of a 
single instance across multiple threads, yet the current synchronization model 
is too strict as it doesn't allow for concurrent read operations.

I'll attach a trivial patch which removes the contention point.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Optimisations proposal for FacetsConfig

2015-11-24 Thread Sanne Grinovero
Thanks Erick!

It's done, it was as trivial as deleting a single word:
 https://issues.apache.org/jira/browse/LUCENE-6909

Sanne

On 24 November 2015 at 18:10, Erick Erickson <erickerick...@gmail.com> wrote:
> Sanne:
>
> Sure, please open a JIRA and add a patch. You'll need to create a user
> ID on the JIRA system, but that's a "self-serve" option.
>
> Best,
> Erick
>
> On Mon, Nov 23, 2015 at 8:21 AM, Sanne Grinovero
> <sanne.grinov...@gmail.com> wrote:
>> Hello all,
>> I was looking into the source code for
>> org.apache.lucene.facet.FacetsConfig as it's being highlighted as an
>> hotspot of allocations during a performance analysis session.
>>
>> Our code was allocating a new instance of FacetsConfig for each
>> Document being built; there are several maps being allocated by such
>> an instance, both as instance fields and on the hot path of method
>> "#build(Document doc)".
>>
>> My understanding from reading the code is that it's designed to be
>> multi-threaded, probably to reuse one instance for a single index?
>>
>> That would resolve my issue with allocations at instance level, and
>> probably also the maps being allocated within the build method as the
>> JVM seems to be smart enough to skip those; at least that's my
>> impression with a quick experiment.
>>
>> However reusing this single instance across all threads would become a
>> contention point as all getters to read the field configurations are
>> synchronized.
>> Since the maps being read are actually safe ConcurrentMap instances, I
>> see no reason for the "synchronized", so really it just boils down to
>> a trivial patch to remove those on the reader methods.
>>
>> May I open a JIRA and propose a patch for that?
>>
>> As a second step, I'd also like to see if the build method could be
>> short-circuited for a quick return: in case there are no faceted
>> fields would be great to just return with the input document right
>> away.
>>
>> Thanks,
>> Sanne
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6909) Improve concurrency for FacetsConfig

2015-11-24 Thread Sanne Grinovero (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanne Grinovero updated LUCENE-6909:

Attachment: 0001-LUCENE-6909-Allow-efficient-concurrent-usage-of-a-Fa.patch

Trivial patch.

The synchronization isn't needed on `getDimConfig` because it's reading from a 
ConcurrentMap.

Synchronization is still needed on setters, but that's not a performance 
concern as the usage pattern is supposedly to configure the fields once and 
then reuse the instance mostly reading.

> Improve concurrency for FacetsConfig
> 
>
> Key: LUCENE-6909
> URL: https://issues.apache.org/jira/browse/LUCENE-6909
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: 5.3
>Reporter: Sanne Grinovero
>Priority: Trivial
> Attachments: 
> 0001-LUCENE-6909-Allow-efficient-concurrent-usage-of-a-Fa.patch
>
>
> The design of {{org.apache.lucene.facet.FacetsConfig}} encourages reuse of a 
> single instance across multiple threads, yet the current synchronization 
> model is too strict as it doesn't allow for concurrent read operations.
> I'll attach a trivial patch which removes the contention point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Optimisations proposal for FacetsConfig

2015-11-23 Thread Sanne Grinovero
Hello all,
I was looking into the source code for
org.apache.lucene.facet.FacetsConfig as it's being highlighted as an
hotspot of allocations during a performance analysis session.

Our code was allocating a new instance of FacetsConfig for each
Document being built; there are several maps being allocated by such
an instance, both as instance fields and on the hot path of method
"#build(Document doc)".

My understanding from reading the code is that it's designed to be
multi-threaded, probably to reuse one instance for a single index?

That would resolve my issue with allocations at instance level, and
probably also the maps being allocated within the build method as the
JVM seems to be smart enough to skip those; at least that's my
impression with a quick experiment.

However reusing this single instance across all threads would become a
contention point as all getters to read the field configurations are
synchronized.
Since the maps being read are actually safe ConcurrentMap instances, I
see no reason for the "synchronized", so really it just boils down to
a trivial patch to remove those on the reader methods.

May I open a JIRA and propose a patch for that?

As a second step, I'd also like to see if the build method could be
short-circuited for a quick return: in case there are no faceted
fields would be great to just return with the input document right
away.

Thanks,
Sanne

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6212) Remove IndexWriter's per-document analyzer add/updateDocument APIs

2015-06-30 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608407#comment-14608407
 ] 

Sanne Grinovero commented on LUCENE-6212:
-

Hi Adrien, thanks for replying!
Yes I agree with you that _in general_ this could be abused and I understand 
the caveats, still I would like to do it. Since Lucene is a library for 
developers and it's not an end user product I would prefer it could give me a 
bit more flexibility.


 Remove IndexWriter's per-document analyzer add/updateDocument APIs
 --

 Key: LUCENE-6212
 URL: https://issues.apache.org/jira/browse/LUCENE-6212
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 5.1, Trunk

 Attachments: LUCENE-6212.patch


 IndexWriter already takes an analyzer up-front (via
 IndexWriterConfig), but it also allows you to specify a different one
 for each add/updateDocument.
 I think this is quite dangerous/trappy since it means you can easily
 index tokens for that document that don't match at search-time based
 on the search-time analyzer.
 I think we should remove this trap in 5.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6212) Remove IndexWriter's per-document analyzer add/updateDocument APIs

2015-06-30 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608156#comment-14608156
 ] 

Sanne Grinovero commented on LUCENE-6212:
-

Hello,
I understand there are good reasons to prevent this for the average user but 
I would beg you to restore the functionality for those who know what they are 
doing.

There are perfectly valid use cases to use a different Analyzer at query time 
rather than indexing time, for example when handling synonyms at indexing time 
you don't need to apply the substitutions again at query time.
Beyond synonyms, it's also possible to have text of different sources which has 
been pre-processed in different ways, so needs to be tokenized differently to 
get a consistent output.

I love the idea of Lucene to become more strict regarding to consistent schema 
choices, but I would hope we could stick to field types and encoding, while 
Analyzer mappings can use a bit more flexibility?

Would you accept a patch to overload
{code}org.apache.lucene.index.IndexWriter.updateDocument(Term, Iterable? 
extends IndexableField){code}
with the expert version:
{code}org.apache.lucene.index.IndexWriter.updateDocument(Term, Iterable? 
extends IndexableField, Analyzer overrideAnalyzer){code} ?

That would greatly help me to migrate to Lucene 5. My alternatives are to 
close/open the IndexWriter for each Analyzer change but that would have a 
significant performance impact; I'd rather cheat and pass an Analyzer instance 
which is mutable, even if that would prevent me from using the IndexWriter 
concurrently.

 Remove IndexWriter's per-document analyzer add/updateDocument APIs
 --

 Key: LUCENE-6212
 URL: https://issues.apache.org/jira/browse/LUCENE-6212
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 5.1, Trunk

 Attachments: LUCENE-6212.patch


 IndexWriter already takes an analyzer up-front (via
 IndexWriterConfig), but it also allows you to specify a different one
 for each add/updateDocument.
 I think this is quite dangerous/trappy since it means you can easily
 index tokens for that document that don't match at search-time based
 on the search-time analyzer.
 I think we should remove this trap in 5.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Anshum Gupta to the PMC

2015-03-02 Thread Sanne Grinovero
Congratulations Anshum!

Regards,
Sanne

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5569) Rename AtomicReader to LeafReader

2015-01-26 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291720#comment-14291720
 ] 

Sanne Grinovero commented on LUCENE-5569:
-

As a heavy Lucene consumer I probably have no right at all to complain :)

But now that the time has come to test the candidate release for 5.0, let me 
share some facts:
 - This change caused some ~600 compile errors in our codebase
 - My personal opinion being that {{AtomicReader}} was a very good name, please 
take it as a statement that such names are quite a personal choice and someone 
just needs to make a call (And stick to it!).

Indeed it's not a major blocker, but as [~ysee...@gmail.com] wisely puts it, 
I'd wish the bar against API changes was higher, especially when there isn't a 
really good reason.


 Rename AtomicReader to LeafReader
 -

 Key: LUCENE-5569
 URL: https://issues.apache.org/jira/browse/LUCENE-5569
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Ryan Ernst
Priority: Blocker
 Fix For: 5.0

 Attachments: LUCENE-5569.patch, LUCENE-5569.patch


 See LUCENE-5527 for more context: several of us seem to prefer {{Leaf}} to 
 {{Atomic}}.
 Talking from my experience, I was a bit confused in the beginning that this 
 thing is named {{AtomicReader}}, since {{Atomic}} is otherwise used in Java 
 in the context of concurrency. So maybe renaming it to {{Leaf}} would help 
 remove this confusion and also carry the information that these readers are 
 used as leaves of top-level readers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Move trunk to Java 8

2014-09-21 Thread Sanne Grinovero
Unrelated for the vote, but since it came up: Oracle isn't the only
large corporation to employ high calibre skilled committers on the
OpenJDK project.
Oracle decides support of its own builds on its own terms, but Red Hat
for example supports the JVM for much longer to its customers, and
being an OSS friendly company any path which might get developed will
be included in some publicly available maintenance branch.

Disclaimer: I work for Red Hat but I'm just mentioning it as someone
passionate for Lucene; so I happen to have an idea of the intention of
my colleagues working on the OpenJDK project, but I am not
representing my employer on this matter: just wanted to point out that
after Oracle will end to support Java7, it's not forcing anyone to
move away from it, nor forcing to pay money.

In fact in my experience it's very common to find users of older
Lucene versions on much older JVMs, often supported JVM builds by
other vendors, and I don't expect this to change.

HTH

-- Sanne



On 12 September 2014 20:31, Chris Hostetter hossman_luc...@fucit.org wrote:

 : That is bogus for an open source project. I won't have such updates,
 : how can i support such a java version, users that run into trouble?
 : And this does happen often.
 : I don't think i should have to pay money and become a paying customer
 : to Oracle to support lucene.

 I didn't say you should.  I in fact said almost the exact opposite: that
 we shouldn't let commercial versions of the JDK have any bearing on our
 decision



 1) Benson made a reasonable statement that There are many large
 organizations of the sort that use Lucene  Solr that will not be moving
 to 8 for years yet

 2) you said: I don't buy for years yet. ... impling that such
 organizations will *have* to upgrade before then because there won't be
 *free* releases of java.

 3) I tried to point out 2 things:

 a) we shouldn't let the EOL cycle of *one* commercial vendor have any
 bearing on our policy of support -- particularly since the refrence
 implementation is an open source source project.

 b) that your argument against benson's claims seemed missleading: just
 because Oracle is EOLing doesn't mean people won't be using OpenJDK; even
 if they are using Oracle's JDK, if they are large comercial organizations
 they might pay oracle to keep using it for a long time.





 -Hoss
 http://www.lucidworks.com/

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5541) FileExistsCachingDirectory, to work around unreliable File.exists

2014-09-15 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133804#comment-14133804
 ] 

Sanne Grinovero commented on LUCENE-5541:
-

Thanks Michael,
[~gustavonalle] from the Infinispan team was able to reproduce it and figure 
out how this relates to {{File.exists}}; 
https://issues.jboss.org/browse/ISPN-2981 is now being resolved, it seems this 
wasn't a bug in Lucene but a very subtle race condition in the Infinispan 
Directory for Lucene, so affecting the Directory for Lucene 4 as well.
For the record ISPN-2981 would only trigger if all following conditions are met:
 - A Merge is writing concurrently to a thread using an IndexWriter for doing 
new writes
 - The node in the cluster happens to not be the primary owner for a specific 
entry (so it would be impossible on single node tests, unlikely on small 
clusters)
 - High load (or rather: low write load would make it unlikely)


 FileExistsCachingDirectory, to work around unreliable File.exists
 -

 Key: LUCENE-5541
 URL: https://issues.apache.org/jira/browse/LUCENE-5541
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
 Attachments: LUCENE-5541.patch


 File.exists is a dangerous method in Java, because if there is a
 low-level IOException (permission denied, out of file handles, etc.)
 the method can return false when it should return true.
 Fortunately, as of Lucene 4.x, we rely much less on File.exists,
 because we track which files the codec components created, and we know
 those files then exist.
 But, unfortunately, going from 3.0.x to 3.6.x, we increased our
 reliance on File.exists, e.g. when creating CFS we check File.exists
 on each sub-file before trying to add it, and I have a customer
 corruption case where apparently a transient low level IOE caused
 File.exists to incorrectly return false for one of the sub-files.  It
 results in corruption like this:
 {noformat}
   java.io.FileNotFoundException: No sub-file with id .fnm found 
 (fileName=_1u7.cfs files: [.tis, .tii, .frq, .prx, .fdt, .nrm, .fdx])
   
 org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:157)
   
 org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:146)
   org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71)
   org.apache.lucene.index.IndexWriter.getFieldInfos(IndexWriter.java:1212)
   
 org.apache.lucene.index.IndexWriter.getCurrentFieldInfos(IndexWriter.java:1228)
   org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1161)
 {noformat}
 I think typically local file systems don't often hit such low level
 errors, but if you have an index on a remote filesystem, where network
 hiccups can cause problems, it's more likely.
 As a simple workaround, I created a basic Directory delegator that
 holds a Set of all created but not deleted files, and short-circuits
 fileExists to return true if the file is in that set.
 I don't plan to commit this: we aren't doing bug-fix releases on
 3.6.x anymore (it's very old by now), and this problem is already
 fixed in 4.x (by reducing our reliance on File.exists), but I wanted
 to post the code here in case others hit it.  It looks like it was hit
 e.g. https://netbeans.org/bugzilla/show_bug.cgi?id=189571 and
 https://issues.jboss.org/browse/ISPN-2981 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5541) FileExistsCachingDirectory, to work around unreliable File.exists

2014-09-11 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130172#comment-14130172
 ] 

Sanne Grinovero commented on LUCENE-5541:
-

Hi [~mikemccand], I think I'm hitting this issue, indeed using still Lucene 
3.6.2.
Your comments are much appreciated, but I'm not understanding how 
{{File.exists}} is related with the exception, when this is being thrown by the 
{{CompoundFileReader}} ?
In fact these tests were run having compound files disabled, so I'd love to put 
a breackpoint in the IndexWriter code where it decided this segment needed to 
be wrapped in a {{CompoundFileReader}}, however it seems I can't easily 
reproduce the same error.

In case we're able to reproduce it again I would like to provide a patch, even 
if I understand there won't be more releases.

 FileExistsCachingDirectory, to work around unreliable File.exists
 -

 Key: LUCENE-5541
 URL: https://issues.apache.org/jira/browse/LUCENE-5541
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
 Attachments: LUCENE-5541.patch


 File.exists is a dangerous method in Java, because if there is a
 low-level IOException (permission denied, out of file handles, etc.)
 the method can return false when it should return true.
 Fortunately, as of Lucene 4.x, we rely much less on File.exists,
 because we track which files the codec components created, and we know
 those files then exist.
 But, unfortunately, going from 3.0.x to 3.6.x, we increased our
 reliance on File.exists, e.g. when creating CFS we check File.exists
 on each sub-file before trying to add it, and I have a customer
 corruption case where apparently a transient low level IOE caused
 File.exists to incorrectly return false for one of the sub-files.  It
 results in corruption like this:
 {noformat}
   java.io.FileNotFoundException: No sub-file with id .fnm found 
 (fileName=_1u7.cfs files: [.tis, .tii, .frq, .prx, .fdt, .nrm, .fdx])
   
 org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:157)
   
 org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:146)
   org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71)
   org.apache.lucene.index.IndexWriter.getFieldInfos(IndexWriter.java:1212)
   
 org.apache.lucene.index.IndexWriter.getCurrentFieldInfos(IndexWriter.java:1228)
   org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1161)
 {noformat}
 I think typically local file systems don't often hit such low level
 errors, but if you have an index on a remote filesystem, where network
 hiccups can cause problems, it's more likely.
 As a simple workaround, I created a basic Directory delegator that
 holds a Set of all created but not deleted files, and short-circuits
 fileExists to return true if the file is in that set.
 I don't plan to commit this: we aren't doing bug-fix releases on
 3.6.x anymore (it's very old by now), and this problem is already
 fixed in 4.x (by reducing our reliance on File.exists), but I wanted
 to post the code here in case others hit it.  It looks like it was hit
 e.g. https://netbeans.org/bugzilla/show_bug.cgi?id=189571 and
 https://issues.jboss.org/browse/ISPN-2981 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 4.6.1 RC1

2014-01-18 Thread Sanne Grinovero
+1

Run integration tests with:
 - Hibernate Search
 - Infinispan (indexing/searching entries with Lucene)
 - Infinispan (storing indexes from Lucene)

All perfect, great job!
(For long we've been stuck on Lucene 3.x but that's finally resolved)

Sanne


On 18 January 2014 01:51, Steve Rowe sar...@gmail.com wrote:
 +1

 Smoke tester says: SUCCESS! [1:03:14.565590]

 Changes, docs and javadocs look good.

 Steve

 On Jan 17, 2014, at 9:13 AM, Mark Miller markrmil...@gmail.com wrote:

 Please vote to release the following artifacts:

 http://people.apache.org/~markrmiller/lucene_solr_4_6_1r1559132/

 Here is my +1.

 --
 - Mark


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: The Old Git Discussion

2014-01-08 Thread Sanne Grinovero
+1 David and Mark

I also like Lajos have - very sadly - not contributed as much as I'd
want to Lucene, but having followed this thread with interest for a
while, I hope my contribution is well received.

I do sympathize with all the problems which have been highlighted
about Git as I've had the same impression 3 years ago when all our
projects (Hibernate) where moved to Git, and I was the skeptical one
back then. I have suffered from it for a couple of weeks, while I was
pointlessly trying to map my previous SVN workflow on Git.. until I
realized that that was the main crux of my pain with it. I really do
have to admit I was just stubborn and grown up in bad habits, I'm
extremely happy we moved now.. and yes - no offence - but from an
outsider you all look like carving code on a stone wall with stone
axes.

Sparing you all the details of what I did wrong and how exactly it
should be used the point is really a huge flexibility and a better
model for the problem it solves.
On this thread I've seen several problems being pointed out about
git, but while I'd be happy to chat about each single one, for the
sake of brevity my impression is just confusion by people who are
trying to use it as it was an alias to svn. To put it boldly you're
missing the point :-)
If you need details, feel free to ask here or contact me on IRC: I'm
afraid my email is too long already.

Would be good to see some negative points from someone who actually
used it for a significant time. From my part for example I don't like
the complexity of handling merges; but then again we also use
fast-forward only; considering that, maybe I've never actually
understood how a merge should be done - as I've never practiced it.
Please take it as an example of how you don't need to learn all its
details and still get huge benefits from it: in 47 releases, for 3
years long, ~100 contributors have been happily collaborating, we
developed a workflow which suites us best and never ever needed to do
a merge.

And yes I confirm it feels very odd for an occasional contributor that
you guys still work by attaching patch files to JIRA.

 - Sanne


On 8 January 2014 00:45, David Smiley (@MITRE.org) dsmi...@mitre.org wrote:
 +1, Mark.

 Git isn't perfect; I sympathize with the annoyances pointed out by Rob et.
 all.  But I think we would be better off for it -- a net win considering the
 upsides.  In the end I'd love to track changes via branches (which includes
 forks people make to add changes), not with attaching patch files to an
 issue tracker.  The way we do things here sucks for collaboration and it's a
 higher bar for people to get involved than it can and should be.

 ~ David


 Mark Miller-3 wrote
 I don’t really buy the fad argument, but as I’ve said, I’m willing to wait
 a little longer for others to catch on. I try and follow the stats and
 reports and articles on this pretty closely.

 As I mentioned early in the thread, by all appearances, the shift from SVN
 to GIT looks much like the shift from CVS to SVN. This was not a fad
 change, nor is the next mass movement likely to be.

 Just like no one starts a project on CVS anymore, we are almost already to
 the point where new projects start exclusive on GIT - especially open
 source.

 I’m happy to sit back and watch the trend continue though. The number of
 GIT users in the committee and among the committers only grows every time
 the discussion comes up.

 If this was 2009, 2010, 2011 … who knows, perhaps I would buy some fad
 argument. But it just doesn’t jive in 2014.

 - Mark





 -
  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/The-Old-Git-Discussion-tp4109193p4110109.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0 Take 3

2013-11-20 Thread Sanne Grinovero
+1

On 20 November 2013 18:00, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 +1

 Tommaso


 2013/11/20 Jan Høydahl jan@cominvent.com

 +1
 Happy smoketester on Mac

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 19. nov. 2013 kl. 15:11 skrev Simon Willnauer simon.willna...@gmail.com:

  Please vote for the third Release Candidate for Lucene/Solr 4.6.0
  (don't be irritated by that this is RC4 I build on that I didn't put
  up for vote)
 
  This RC includes some additional fixes related to Changes.html that
  were committed in the last days like SOLR-5397 as well as:
 
  SOLR-5464: Add option to ConcurrentSolrServer to stream pure delete
  requests.
  SOLR-5465: SolrCmdDistributor retry logic has a concurrency race bug.
  SOLR-5452: Do not attempt to proxy internal update requests.
 
  you can download it here:
 
  http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC4-rev1543363/
 
  or run the smoke tester directly with this commandline (don't forget
  to set JAVA6_HOME etc.):
 
  python3.2 -u dev-tools/scripts/smokeTestRelease.py
 
  http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC4-rev1543363/
  1543363 4.6.0 /tmp/smoke_test_4_6
 
  I integrated the RC into Elasticsearch and all tests pass:
 
  https://github.com/s1monw/elasticsearch/tree/upgrade_lucene_4_6
 
  Smoketester said: SUCCESS! [1:08:00.010026]
 
  here is my +1
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Indexing file with security problem

2013-07-04 Thread Sanne Grinovero
To be honest I am not familiar with ManifoldCF, so I won't say if
Hibernate Search is better or not, but it would definitely not be too
hard with Hibernate Search:

1) You annotate with @Indexed the entity referring to your PostgreSQL
table containing the metadata; with @TikaBridge you point it to the
external resource containing the document.

Returning database ids is the default behaviour.

http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#d0e4244

2) Is a bit more complex but I don't think any more complex than what
it would be with other technologies: you should encode some
information in the index, then define a parametric filter on that.

http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#query-filter

3) Not sure, sorry. But the automatic indexing triggers happen as soon
as you store the metadata, so maybe that is good enough?

Looks interesting!

Sanne - Hibernate Search team


On 27 June 2013 03:14, Otis Gospodnetic otis.gospodne...@gmail.com wrote:
 Hi,

 I would start from ManifoldCF - it may save you some work.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/

 On Jun 26, 2013 5:01 PM, lukasw lukas...@gmail.com wrote:

 Hello

 I'll try to briefly describe my problem and task.
 My name is Lukas and i am Java developer , my task is to create search
 engine for different types of file (only text file types) pdf, word, odf,
 xml but not html.
 I have got little experience with lucene about year ago i wrote simple
 full
 text search using lucene and hibernate search. That was simple project.
 But
 now i have got very difficult task with searching.
 We are using java 1.7 and glassfish 3 and i have to concentrate only
 server
 side approach not client ui. Ther is my three major problem :

 1) All files is stored on webdav server, but information about file name ,
 id file typ etc are stored into database (postgresql) so when i creating
 index i need to use both information. As a result of query i need only
 return file id from database. Summary content of file is stored in server
 but information about file is stored in database so we must retrieve both.

 2) Secondary problem it that  each file has a level of secrecy. But major
 problem is that this level is calculated dynamically. When calculating
 level
 of security for file we considering several properties. The static
 properties is files location, the folder in which the file is, but also
 dynamic  information  user profiles user roles and departments . So when
 user Maggie is logged she can search only files test.pdf , test2.doc
 etc but if user Stev is logged he have got different profiles such a
 Maggie so he can only search some phase in file broken.pdf,
 mybook.odt.
 test2.doc etc . . I think that when for example user search phase
 lucene +solr we search in all indexed documents and after that filtered
 result. But i think that solution is  is not very efficient. What if
 results
 count 100 files , so what next we filtered step by step each files  ? But
 i
 do not see any other solution. Maybe you can help me and lucene or solr
 have
 got mechanism to help.

 3) Last problem is that some files are encrypted. So that files must be
 indexed only once before encryption ! But i think that if we indexed
 secure
 files so we get security issue. Because all word from that file is
 tokenized.
 I have not got any idea haw to secure lucene documents and index datastore
 ?
 its possible ...


 Also i have got question that i need to use Solr for my serarch engine or
 using only lucene and write own search engine ? So as you can see i have
 not
 got problem with indexing , serching but with security files and files
 secured levels.

 Thanks for any hints and time you spend for me.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Indexing-file-with-security-problem-tp4073394.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Infinispan JGroups migrating to Apache License

2013-05-28 Thread Sanne Grinovero
Hello all,
as some of you already know the Infinispan project includes several
integration points with the Apache Lucene project, including a
Directory implementation, but so far we had a separate community
because of the license incompatibility.

I'm very happy to announce now that both Infinispan and its dependency
JGroups are going to move to the Apache License, as you can see from
the following blogposts:

   
http://infinispan.blogspot.co.uk/2013/05/infinispan-to-adopt-apache-software.html

   
http://belaban.blogspot.ch/2013/05/jgroups-to-investigate-adopting-apache.html

I hope this will benefit both projects and allow more people to use both.

# What's Infinispan ?

It's an in-memory Key/Value store geared to fast data rather than very
large data, with Dynamo inspired consistent hashing to combine
reliability and resources of multiple machines.
Does not support eventual consistency but supports transactions, including XA.
When data gets too large to be handled in JVM heap it can swap over
to different storage engines, i.e. Cassandra, HBase, MongoDB, JDBC,
cloud storage,  ..

[there is much more but for the sake of brevity I expect this to be
most useful to Lucene developers]

# What's this state of this Infinispan / Lucene Directory?

Basically it stores the segments in the distributed cache: so it
provides a quick storage engine, real-time replication without NFS
trouble, optionally integration with transactions.

This is working quite well, and - depending on your needs and
configuration options - it might be faster than FSDirectory or
RAMDirectory. In all fairness it's not easy to defeat the efficiency
of FSDirectory when it's in memory-mapping mode: it might happen in
some cases that it will be faster, more or less significantly, but I
think the real difference is in the scalability options and the
flexibility in architectures.
It is generally faster than the RAMDirectory, especially under contention.

Support for Lucene 4 was just added recently, so while I think it
would be great to have custom Codecs for it, that isn't done yet: for
now it just stores the byte[] chunks of the segments.

This is not a replacement for Solr or ElasticSearch: it provides just
a storage component; it does not solve - among others - the problem of
distributed writers. It is used by Hibernate Search.

Regards,
Sanne

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE: release 3.6.2

2012-12-21 Thread Sanne Grinovero
+1
tested the Maven artifacts with the testsuites from Infinispan and
Hibernate Search


On 21 December 2012 13:22, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 +1
 Tommaso


 2012/12/21 Simon Willnauer simon.willna...@gmail.com

 same here +1

 On Fri, Dec 21, 2012 at 1:11 PM, Martijn v Groningen
 martijn.v.gronin...@gmail.com wrote:
  Besides the mentioned jdoc warnings the smoke tester ran fine.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Source Control

2012-10-28 Thread Sanne Grinovero
If Lucene was moved to use GIT, I would love that.

Not going into details now, but having used GIT for two years on other
open source projects I'm pretty sure that it makes collaboration
significantly easier. We use GitHub, but the star is GIT: GitHub makes
it easier for non-power users and is great to have but after you get
used to the command line git it's outrageously useful and I don't
actually use the github webinterface any more (but it's nice that
occasional contributors can).

Being very flexible indeed often the problem might be to find an
agreement on some consistent work flow, but that's never been a
blocker in our case as each user is free to use what he prefers on his
personal repository.

Highly recommended!

Sanne



On 28 October 2012 16:58, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote:
 Different toys for different boys. Everyone will have his or her
 favorite workflow, it'll be impossible to find a consensus here. As
 for me, I've tasted cvs, svn, git and other version control systems
 and I must say git is the one I like the most, although there were a
 good few cursing moments along the way.

 As for legal -- the maven team had to go through the same process, I
 don't think the checkbox (or its absence) was a problem.

 Dawid

 P.S. If anybody knows the equivalent of git add -A . (that also
 stages removed files) in svn I'd really like to know ;)

 On Sun, Oct 28, 2012 at 5:49 PM, Adrien Grand jpou...@gmail.com wrote:
 Hi Uwe,

 On Sun, Oct 28, 2012 at 5:25 PM, Uwe Schindler u...@thetaphi.de wrote:
 I don't want to use GIT; HG was horrible, too!

 Why don't you like them?

 --
 Adrien

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: ideas for alphas/betas?

2012-03-07 Thread Sanne Grinovero
On 7 March 2012 15:42, Tommaso Teofili tommaso.teof...@gmail.com wrote:


 2012/3/7 Robert Muir rcm...@gmail.com

 On Tue, Mar 6, 2012 at 1:42 AM, Shai Erera ser...@gmail.com wrote:
  I agree.
 
  Maybe we should also tag issues as 4.0-alpha, 4.0-beta in JIRA? For
  4.0-alpha we'll tag all the issues that are expected to change the index
  format, and 4.0-beta all the issues that require API changes?
 

 I have no opinion on the actual JIRA tagging, but I think Hoss has a
 good point that it would be better if we looked at alphas/betas as
 real releases... ideally our first alpha release would be exactly
 the same as our real 4.0 release, but we are just being realistic and
 at the same time marking some caveats so that users know its a big
 scary change.

 So I'm not sure we should intentionally try to delay/bucket any issues
 to alpha or beta, I think we should try to make it great from the
 start... these 'guarantees' are just to help increase adoption and
 testing.


 +1, as also Simon was saying let's go fixing the blockers and start working
 on the alpha release process.


It's of course very cool if you could start by make it great from the start,
but that would take more time I would rather be realistic and start
providing some tags
in quick iterations.

Even if it has known issues, that's acceptable for an Alpha release
but at least you start
getting more feedback, especially on the API which you obviously don't
want to alter
significantly just before the final.

Regards,
Sanne

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Sanne Grinovero
+1
all tests on all Lucene-using projects I contribute to pass without
any change needed (a sure sign I should add more...).

Once more, great work and thank so much to everyone involved.

Sanne

On 11 September 2011 16:11, Robert Muir rcm...@gmail.com wrote:
 +1, thanks for creating this release candidate.

 On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Please vote to release the RC1 artifacts at:

  https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142

 as Lucene 3.4.0 and Solr 3.4.0.

 Mike McCandless

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Drop Java 5 support for trunk (Lucene 4.0)

2011-06-27 Thread Sanne Grinovero
+1

Sanne

2011/6/27 Michael McCandless luc...@mikemccandless.com:
 +1

 Mike McCandless

 http://blog.mikemccandless.com

 On Mon, Jun 27, 2011 at 1:38 PM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 This issue has been discussed on various occasions and lately on
 LUCENE-3239 (https://issues.apache.org/jira/browse/LUCENE-3239)

 The main reasons for this have been discussed on the issue but let me
 put them out here too:

 - Lack of testing on Jenkins with Java 5
 - Java 5 end of lifetime is reached a long time ago so Java 5 is
 totally unmaintained which means for us that bugs have to either be
 hacked around, tests disabled, warnings placed, but some things simply
 cannot be fixed... we cannot actually support something that is no
 longer maintained: we do find JRE bugs
 (http://wiki.apache.org/lucene-java/SunJavaBugs) and its important
 that bugs actually get fixed: cannot do everything with hacks.\
 - due to Java 5 we legitimate performance hits like 20% slower grouping 
 speed.

 For reference please read through the issue mentioned above.

 A lot of the committers seem to be on the same page here to drop Java
 5 support so I am calling out an official vote.

 all Lucene 3.x releases will remain with Java 5 support this vote is
 for trunk only.


 Here is my +1

 Simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] release 3.3 (take two)

2011-06-26 Thread Sanne Grinovero
+1

All tests are fine on both Infinispan and Hibernate Search.

While I understand that often APIs needed changes, I'm very happy to
state that for the first time three mayor releases are fully API
compatible!
(As far as tested on these projects, Lucene versions 3.1.0, 3.2.0,
3.3.0 are drop-in compatible replacements)

Regards,
Sanne

2011/6/26 Steven A Rowe sar...@syr.edu:
 +1

 I looked at the differences, and then just ran tests on the Solr and Lucene 
 source tarballs.

 Steve

 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Sunday, June 26, 2011 11:12 AM
 To: dev@lucene.apache.org
 Subject: [VOTE] release 3.3 (take two)

 Artifacts here:

 http://s.apache.org/lusolr330rc1

 working release notes here:

 http://wiki.apache.org/lucene-java/ReleaseNote33
 http://wiki.apache.org/solr/ReleaseNote33

 To see the changes between the previous release candidate (rc0):
 svn diff -r 1139028:1139775
 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3

 Here is my +1

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Code Freeze on realtime_search branch

2011-04-29 Thread Sanne Grinovero
Hello,
this is totally awesome!

Does it imply we don't need the IndexWriter lock anymore? And hence
that people sharing the Lucene Directory across multiple JVMs can have
both write at the same time?

I had intentions to *try* removing such limitations this summer, but
if this is the case I will spend my time testing this carefully
instead, or if some kind of locking is still required I'd appreciate
some pointers so that I'll be able to remove them.

Regards,
Sanne

2011/4/29 Simon Willnauer simon.willna...@googlemail.com:
 Hey folks,

 LUCENE-3023 aims to land the considerably large
 DocumentsWriterPerThread (DWPT) refactoring on trunk.
 During the last weeks we have put lots of efforts into cleaning the
 code up, fixing javadocs and run test locally
 as well as on Jenkins. We reached the point where we are able to
 create a final patch for review and land this
 exciting refactoring on trunk very soon. I committed the CHANGES.TXT
 entry (also appended below) a couple of minutes ago so from now on
 we freeze the branch for final review (Robert can you create a new
 final patch and upload to LUCENE-3023).
 Any comments should go to [1] or as a reply to this email. If there is
 no blocker coming up we plan to reintegrate the
 branch and commit it to trunk early next week. For those who want some
 background what DWPT does read: [2]

 Note: this change will not change the index file format so there is no
 need to reindex for trunk users. Yet, I will send a heads up next week
 with an
 overview of that has changed.

 Simon

 [1] https://issues.apache.org/jira/browse/LUCENE-3023
 [2] 
 http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/


 * LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from
  DocumentsWriterPerThread:

  - IndexWriter now uses a DocumentsWriter per thread when indexing documents.
    Each DocumentsWriterPerThread indexes documents in its own private segment,
    and the in memory segments are no longer merged on flush.  Instead, each
    segment is separately flushed to disk and subsequently merged with normal
    segment merging.

  - DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a
    FlushPolicy.  When a DWPT is flushed, a fresh DWPT is swapped in so that
    indexing may continue concurrently with flushing.  The selected
    DWPT flushes all its RAM resident documents do disk.  Note: Segment flushes
    don't flush all RAM resident documents but only the documents private to
    the DWPT selected for flushing.

  - Flushing is now controlled by FlushPolicy that is called for every add,
    update or delete on IndexWriter. By default DWPTs are flushed either on
    maxBufferedDocs per DWPT or the global active used memory. Once the active
    memory exceeds ramBufferSizeMB only the largest DWPT is selected for
    flushing and the memory used by this DWPT is substracted from the active
    memory and added to a flushing memory pool, which can lead to temporarily
    higher memory usage due to ongoing indexing.

  - IndexWriter now can utilize ramBufferSize  2048 MB. Each DWPT can address
    up to 2048 MB memory such that the ramBufferSize is now bounded by the max
    number of DWPT avaliable in the used DocumentsWriterPerThreadPool.
    IndexWriters net memory consumption can grow far beyond the 2048 MB limit 
 if
    the applicatoin can use all available DWPTs. To prevent a DWPT from
    exhausting its address space IndexWriter will forcefully flush a DWPT if 
 its
    hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be 
 controlled
    via IndexWriterConfig and defaults to 1945 MB.
    Since IndexWriter flushes DWPT concurrently not all memory is released
    immediately. Applications should still use a ramBufferSize significantly
    lower than the JVMs avaliable heap memory since under high load multiple
    flushing DWPT can consume substantial transient memory when IO performance
    is slow relative to indexing rate.

  - IndexWriter#commit now doesn't block concurrent indexing while flushing all
    'currently' RAM resident documents to disk. Yet, flushes that occur while a
    a full flush is running are queued and will happen after all DWPT involved
    in the full flush are done flushing. Applications using multiple threads
    during indexing and trigger a full flush (eg call commmit() or open a new
    NRT reader) can use significantly more transient memory.

  - IndexWriter#addDocument and IndexWriter.updateDocument can block indexing
    threads if the number of active + number of flushing DWPT exceed a
    safety limit. By default this happens if 2 * max number available thread
    states (DWPTPool) is exceeded. This safety limit prevents applications from
    exhausting their available memory if flushing can't keep up with
    concurrently indexing threads.

  - IndexWriter only applies and flushes deletes if the maxBufferedDelTerms
    limit is reached during indexing. No segment flushes

Re: Code Freeze on realtime_search branch

2011-04-29 Thread Sanne Grinovero
2011/4/29 Michael McCandless luc...@mikemccandless.com:
 Sorry, but, no :)

 So feel free to keep working towards removing this limitation!!

 This change makes IndexWriter's flush (where it writes the added
 documents in RAM to disk as a new segment) fully concurrent, so that
 while one segment is being flushed (which could take a longish time,
 eg on a slowish IO system), other threads are now free to continue
 indexing (where they were blocked before).  On computers with
 substantial CPU concurrency, and fast enough IO systems, this change
 should give a big increase in indexing throughput.

 That said, I do think this change is a step towards what you seek
 (allowing multiple IndexWriters, even in separate JVMs maybe on
 separate computers, to write into an index at once).

 Mike

thank you for clarifying this; maybe I don't even need to remove the
locking if I can run some of those participant threads in the remote
nodes.
I'll keep you updated, but unfortunately can't start working on it sooner.

Sanne



 http://blog.mikemccandless.com

 On Fri, Apr 29, 2011 at 2:16 PM, Sanne Grinovero
 sanne.grinov...@gmail.com wrote:
 Hello,
 this is totally awesome!

 Does it imply we don't need the IndexWriter lock anymore? And hence
 that people sharing the Lucene Directory across multiple JVMs can have
 both write at the same time?

 I had intentions to *try* removing such limitations this summer, but
 if this is the case I will spend my time testing this carefully
 instead, or if some kind of locking is still required I'd appreciate
 some pointers so that I'll be able to remove them.

 Regards,
 Sanne

 2011/4/29 Simon Willnauer simon.willna...@googlemail.com:
 Hey folks,

 LUCENE-3023 aims to land the considerably large
 DocumentsWriterPerThread (DWPT) refactoring on trunk.
 During the last weeks we have put lots of efforts into cleaning the
 code up, fixing javadocs and run test locally
 as well as on Jenkins. We reached the point where we are able to
 create a final patch for review and land this
 exciting refactoring on trunk very soon. I committed the CHANGES.TXT
 entry (also appended below) a couple of minutes ago so from now on
 we freeze the branch for final review (Robert can you create a new
 final patch and upload to LUCENE-3023).
 Any comments should go to [1] or as a reply to this email. If there is
 no blocker coming up we plan to reintegrate the
 branch and commit it to trunk early next week. For those who want some
 background what DWPT does read: [2]

 Note: this change will not change the index file format so there is no
 need to reindex for trunk users. Yet, I will send a heads up next week
 with an
 overview of that has changed.

 Simon

 [1] https://issues.apache.org/jira/browse/LUCENE-3023
 [2] 
 http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/


 * LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from
  DocumentsWriterPerThread:

  - IndexWriter now uses a DocumentsWriter per thread when indexing 
 documents.
    Each DocumentsWriterPerThread indexes documents in its own private 
 segment,
    and the in memory segments are no longer merged on flush.  Instead, each
    segment is separately flushed to disk and subsequently merged with normal
    segment merging.

  - DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a
    FlushPolicy.  When a DWPT is flushed, a fresh DWPT is swapped in so that
    indexing may continue concurrently with flushing.  The selected
    DWPT flushes all its RAM resident documents do disk.  Note: Segment 
 flushes
    don't flush all RAM resident documents but only the documents private to
    the DWPT selected for flushing.

  - Flushing is now controlled by FlushPolicy that is called for every add,
    update or delete on IndexWriter. By default DWPTs are flushed either on
    maxBufferedDocs per DWPT or the global active used memory. Once the 
 active
    memory exceeds ramBufferSizeMB only the largest DWPT is selected for
    flushing and the memory used by this DWPT is substracted from the active
    memory and added to a flushing memory pool, which can lead to temporarily
    higher memory usage due to ongoing indexing.

  - IndexWriter now can utilize ramBufferSize  2048 MB. Each DWPT can 
 address
    up to 2048 MB memory such that the ramBufferSize is now bounded by the 
 max
    number of DWPT avaliable in the used DocumentsWriterPerThreadPool.
    IndexWriters net memory consumption can grow far beyond the 2048 MB 
 limit if
    the applicatoin can use all available DWPTs. To prevent a DWPT from
    exhausting its address space IndexWriter will forcefully flush a DWPT if 
 its
    hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be 
 controlled
    via IndexWriterConfig and defaults to 1945 MB.
    Since IndexWriter flushes DWPT concurrently not all memory is released
    immediately. Applications should still use a ramBufferSize significantly
    lower than the JVMs avaliable

Re: IndexReader.indexExists declares throwing IOE, but never does

2011-03-21 Thread Sanne Grinovero
2011/3/21 Earwin Burrfoot ear...@gmail.com:
 Technically, there's a big difference between I checked, and there
 was no index, and I was unable to check the disk because file system
 went BANG!.
 So the proper behaviour is to return false  IOE (on proper occasion)?

+1 to throw the exception when proper to do so

Otherwise please keep the throws declaration so that you won't break
public APIs if this changes implementation.


 On Mon, Mar 21, 2011 at 13:53, Michael McCandless
 luc...@mikemccandless.com wrote:
 On Mon, Mar 21, 2011 at 12:52 AM, Shai Erera ser...@gmail.com wrote:
 Can we remove the declaration? The method never throws IOE, but instead
 catches it and returns false. I think it's reasonable that such a method
 will not throw exceptions.

 +1

 --
 Mike

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





 --
 Kirill Zakharenko/Кирилл Захаренко
 E-Mail/Jabber: ear...@gmail.com
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: IndexReader.indexExists declares throwing IOE, but never does

2011-03-21 Thread Sanne Grinovero
2011/3/21 Shai Erera ser...@gmail.com:
 So the proper behaviour is to return false  IOE (on proper occasion)?

 I don't object to it, as I think it's reasonable (as today we may be hiding
 some info from the app). However, given that today we never throw IOE, and
 that if we start doing so, we'll change runtime behavior, I lean towards
 keeping the method simple and remove the throws declaration. Well, it's
 either we change the impl to throw IOE, or remove the declaration
 altogether.

 Changing the impl to throw IOE on proper occasion might be problematic --
 IndexNotFoundException is thrown when an empty index directory was given,
 however by its Javadocs, it can also indicate the index is corrupted.
 Perhaps the jdocs are wrong and it's thrown only if the index directory is
 empty, or no segments files are found. If that's the case, then we should
 change its javadocs. Otherwise, it will be difficult to know whether the
 INFE indicates an empty directory, for which you'll want to return false, or
 a corrupt index, for which you'll want to throw the exception.

 Besides, I consider this method almost like File.exists() which doesn't
 throw an exception. If indexExists() returns false, the app can decide to
 investigate further by trying to open IndexReader or read the SegmentInfos.
 But the API as-is needs to be simple IMO.

good points, I withdraw my previous objection :)


 Otherwise please keep the throws declaration so that you won't break
 public APIs if this changes implementation.

 Removing the throws declaration doesn't break apps. In the worse case,
 they'll have a catch block which is redundant?

yes you wouldn't make any harm now, but if you release it without, and
then figure
you actually need to add it back in future, people might have code
which is not handling it.

I'm looking into Lucene 3.0.3 and the IOException it *is* actually
needed, not sure what was changed in the version this is referring to,
but as it used to throw it (and needing it), I think it's quite
possible this need is not so remote.

Regards,
Sanne


 Shai

 On Mon, Mar 21, 2011 at 4:12 PM, Sanne Grinovero sanne.grinov...@gmail.com
 wrote:

 2011/3/21 Earwin Burrfoot ear...@gmail.com:
  Technically, there's a big difference between I checked, and there
  was no index, and I was unable to check the disk because file system
  went BANG!.
  So the proper behaviour is to return false  IOE (on proper occasion)?

 +1 to throw the exception when proper to do so

 Otherwise please keep the throws declaration so that you won't break
 public APIs if this changes implementation.

 
  On Mon, Mar 21, 2011 at 13:53, Michael McCandless
  luc...@mikemccandless.com wrote:
  On Mon, Mar 21, 2011 at 12:52 AM, Shai Erera ser...@gmail.com wrote:
  Can we remove the declaration? The method never throws IOE, but
  instead
  catches it and returns false. I think it's reasonable that such a
  method
  will not throw exceptions.
 
  +1
 
  --
  Mike
 
  http://blog.mikemccandless.com
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 
 
  --
  Kirill Zakharenko/Кирилл Захаренко
  E-Mail/Jabber: ear...@gmail.com
  Phone: +7 (495) 683-567-4
  ICQ: 104465785
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Index File

2011-03-21 Thread Sanne Grinovero
Hi,

2011/3/21 soheila dehghanzadeh sally...@gmail.com:
 Hi All,
 I have created Index folder, and i tried to open .CFS,.CFX,.GEN, segments
 file with notpad . but they are unreadable. i want to see their structure
 for my sample directory which i have passed to Indexfiles . ihave red this
 http://lucene.apache.org/java/3_0_3/fileformats.html and i know what Index
 should has. is there any way to see created index?

yes, use the force :)
http://code.google.com/p/luke/

Regards,
Sanne

 thanks in advance .
 Peace.
 -Soheila D.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene and Solr 3.1 release candidate

2011-03-14 Thread Sanne Grinovero
2011/3/14 slaava slaav...@gmail.com:
 Hi,
 when could we expect final 3.1 version with maven-repository? We need some
 functionality included in 3.1 and I don't know if I have to wait or create
 own maven project from sources...

Hi,
I hope soon as well, in the meantime we've been testing with the
release candidate repositories:

profile
idlucene-staging-rmuir/id
repositories
repository
idlucene-staging-repository-rmuir/id
nameLucene testing repo/name

urlhttp://people.apache.org/~rmuir/staging_area/lucene-solr-3.1RC0-rev1078688/lucene-3.1RC0/maven//url
layoutdefault/layout
releases
enabledtrue/enabled
updatePolicynever/updatePolicy
/releases
snapshots
enabledfalse/enabled
updatePolicynever/updatePolicy
/snapshots
/repository
repository
idsolr-staging-repository-rmuir/id
nameSolr testing repo/name

urlhttp://people.apache.org/~rmuir/staging_area/lucene-solr-3.1RC0-rev1078688/solr-3.1RC0/maven/url
layoutdefault/layout
releases
enabledtrue/enabled
updatePolicynever/updatePolicy
/releases
snapshots
enabledfalse/enabled
updatePolicynever/updatePolicy
/snapshots
/repository
/repositories
/profile

Use it with care, as they are marked with the same identifiers the
final will have, so you might end up polluting your local caches with
this: make sure you delete all copies when the real one is released.

-- Sanne


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/VOTE-Lucene-and-Solr-3-1-release-candidate-tp2645100p2675660.html
 Sent from the Solr - Dev mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene and Solr 3.1 release candidate

2011-03-08 Thread Sanne Grinovero
Hello,
the lucene-solr-grandparent pom [1] file mentions a jetty version
6.1.26-patched-JETTY-1340 which is not available in the repositories
where I would expect it.
Do I need to enable some additional repository?

This seems related to SOLR-2381.

I think for people using Solr as their dependency via Maven, this is a
blocker; of course not everyone uses it so I've no strong opinions
about this, but thought to let you know.
Personally I'd depend on the released version of jetty, and document
that this bug is not fixed until Jetty version XY is released; in
alternative, I'd add keep the pom as is but instructions and warnings
in the release notes would be very welcome. (I couldn't find a
Chances.html for Solr?)

Regards,
Sanne

[1] 
http://people.apache.org/~rmuir/staging_area/lucene-solr-3.1RC0-rev1078688/lucene-3.1RC0/maven/org/apache/lucene/lucene-solr-grandparent/3.1.0/lucene-solr-grandparent-3.1.0.pom

2011/3/8 Shai Erera ser...@gmail.com:
 I found what seems to be a glitch in StopFilter's ctors -- the boolean
 'enablePosInc' was removed from the ctors and users now have to use the
 setter instead. However, the ctors do default to 'true' if the passed in
 Version is onOrAfter(29).

 All of FilteringTokenFilter sub-classes include the enablePosIncr in their
 ctors, including FilteringTF itself. Therefore I assume the parameter was
 mistakenly dropped from StopFilter's ctors. Also, the @deprecated text
 doesn't mention how should I enable/disable it, and reading the source code
 doesn't help either, since the setter/getter are in FilteringTF.

 Also, LengthFilter has a deprecated ctor, but the class was added on Nov 16
 and I don't see it in 3.0.3. So perhaps we can remove that ctor (and add a
 @since tag to the class)?

 I don't know if these two warrant a new RC but I think they are important to
 fix.

 Shai

 On Mon, Mar 7, 2011 at 5:52 PM, Smiley, David W. dsmi...@mitre.org wrote:

 So https://issues.apache.org/jira/browse/SOLR-2405 didn't make it in
 yesterday (apparently it didn't)? :-(  Darn... maybe I shouldn't have waited
 for a committer to agree with the issue. I would have had it in Saturday.

 ~ David Smiley

 On Mar 7, 2011, at 1:32 AM, Robert Muir wrote:

  Hi all,
 
  I have posted a release candidate for both Lucene 3.1 and Solr 3.1,
  both from revision 1078688 of
  http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/
  Thanks for all your help! Please test them and give your votes, the
  tentative release date for both versions is Sunday, March 13th, 2011.
  Only votes from Lucene PMC are binding, but everyone is welcome to
  check the release candidates and voice their approval or disapproval.
  The vote passes if at least three binding +1 votes are cast.
 
  The release candidates are produced in parallel because in 2010 we
  merged the development of Lucene and Solr in order to produce higher
  quality releases. While we voted to reserve the right to release
  Lucene by itself, in my opinion we should definitely try to avoid this
  unless absolutely necessary, as it would ultimately cause more work
  and complication: instead it would be far easier to just fix whatever
  issues are discovered and respin both releases again.
 
  Because of this, I ask that you cast a single vote to cover both
  releases. If the vote succeeds, both sets of artifacts can go their
  separate ways to the different websites.
 
  Artifacts are located here: http://s.apache.org/solrcene31rc0
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene and Solr 3.1 release candidate

2011-03-08 Thread Sanne Grinovero
2011/3/8 Steven A Rowe sar...@syr.edu:
 Hi Sanne,

 Solr (and some Lucene modules) have several non-Mavenized dependencies.

 To work around this, the Maven build has a profile called bootstrap.  If 
 you check out the source (or use the source distribution) you can place all 
 non-Mavenized dependencies in your local repository as follows (from the 
 top-level directory containing lucene, solr, etc.):

        ant get-maven-poms
        mvn -N -P bootstrap install

 Maybe there should also be a way to deploy these to an internal repository?

 Steve

Hi Steve,
thank you for the answer. I'm not personally worried as I'm unaffected
by this issue, just thought to let the list know, so core developers
can evaluate how urgent it is.

I'm not sold on the several non-mavenized dependencies argument: if
I adjust my pom locally to refer to a released Jetty version I have no
other build nor test issues, so this should be the only artifact
unless you refer to some other optional dependency.

Also I used to depend on Solr in the past via maven, without issues -
so it looks to me that this is going to break expectations, as it
worked properly before.

I'm totally fine with as long as you're all aware of it and making a
conscious decision, I don't think waiting for a Jetty release is a
reasonable option, but I'd add at least a warning in the release
notes.

Regards,
Sanne


 -Original Message-
 From: Sanne Grinovero [mailto:sanne.grinov...@gmail.com]
 Sent: Tuesday, March 08, 2011 6:44 AM
 To: dev@lucene.apache.org
 Subject: Re: [VOTE] Lucene and Solr 3.1 release candidate

 Hello,
 the lucene-solr-grandparent pom [1] file mentions a jetty version
 6.1.26-patched-JETTY-1340 which is not available in the repositories
 where I would expect it.
 Do I need to enable some additional repository?

 This seems related to SOLR-2381.

 I think for people using Solr as their dependency via Maven, this is a
 blocker; of course not everyone uses it so I've no strong opinions
 about this, but thought to let you know.
 Personally I'd depend on the released version of jetty, and document
 that this bug is not fixed until Jetty version XY is released; in
 alternative, I'd add keep the pom as is but instructions and warnings
 in the release notes would be very welcome. (I couldn't find a
 Chances.html for Solr?)

 Regards,
 Sanne

 [1] http://people.apache.org/~rmuir/staging_area/lucene-solr-3.1RC0-
 rev1078688/lucene-3.1RC0/maven/org/apache/lucene/lucene-solr-
 grandparent/3.1.0/lucene-solr-grandparent-3.1.0.pom

 2011/3/8 Shai Erera ser...@gmail.com:
  I found what seems to be a glitch in StopFilter's ctors -- the boolean
  'enablePosInc' was removed from the ctors and users now have to use the
  setter instead. However, the ctors do default to 'true' if the passed in
  Version is onOrAfter(29).
 
  All of FilteringTokenFilter sub-classes include the enablePosIncr in
 their
  ctors, including FilteringTF itself. Therefore I assume the parameter
 was
  mistakenly dropped from StopFilter's ctors. Also, the @deprecated text
  doesn't mention how should I enable/disable it, and reading the source
 code
  doesn't help either, since the setter/getter are in FilteringTF.
 
  Also, LengthFilter has a deprecated ctor, but the class was added on Nov
 16
  and I don't see it in 3.0.3. So perhaps we can remove that ctor (and add
 a
  @since tag to the class)?
 
  I don't know if these two warrant a new RC but I think they are
 important to
  fix.
 
  Shai
 
  On Mon, Mar 7, 2011 at 5:52 PM, Smiley, David W. dsmi...@mitre.org
 wrote:
 
  So https://issues.apache.org/jira/browse/SOLR-2405 didn't make it in
  yesterday (apparently it didn't)? :-(  Darn... maybe I shouldn't have
 waited
  for a committer to agree with the issue. I would have had it in
 Saturday.
 
  ~ David Smiley
 
  On Mar 7, 2011, at 1:32 AM, Robert Muir wrote:
 
   Hi all,
  
   I have posted a release candidate for both Lucene 3.1 and Solr 3.1,
   both from revision 1078688 of
   http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/
   Thanks for all your help! Please test them and give your votes, the
   tentative release date for both versions is Sunday, March 13th, 2011.
   Only votes from Lucene PMC are binding, but everyone is welcome to
   check the release candidates and voice their approval or disapproval.
   The vote passes if at least three binding +1 votes are cast.
  
   The release candidates are produced in parallel because in 2010 we
   merged the development of Lucene and Solr in order to produce higher
   quality releases. While we voted to reserve the right to release
   Lucene by itself, in my opinion we should definitely try to avoid
 this
   unless absolutely necessary, as it would ultimately cause more work
   and complication: instead it would be far easier to just fix whatever
   issues are discovered and respin both releases again.
  
   Because of this, I ask that you cast a single vote to cover both
   releases. If the vote

Re: wind down for 3.1?

2011-03-03 Thread Sanne Grinovero
Hello all,
Is there any update on the 3.1 status?
I'm really looking forward to it :)

Regards,
Sanne


2011/2/16 Chris Hostetter hossman_luc...@fucit.org:

 : 1. javadocs warnings/errors: this is a constant battle, its worth
 : considering if the build should actually fail if you get one of these,
 : in my opinion if we can do this we really should. its frustrating to

 for a brief period we did, and then we rolled it back...

        https://issues.apache.org/jira/browse/LUCENE-875

 : 2. introducing new compiler warnings: another problem just being left
 : for someone else to clean up later, another constant losing battle.
 : 99% of the time (for non-autogenerated code) the warnings are
 : useful... in my opinion we should not commit patches that create new
 : warnings.

 it's hard to spot new compiler warnings when there are already so many
 ... if we can get down to 0 then we can add hacks to make hte build fail
 if someone adds 1 but until then we have an uphill battle.


 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: wind down for 3.1?

2011-03-03 Thread Sanne Grinovero
2011/3/3 Robert Muir rcm...@gmail.com:
 On Thu, Mar 3, 2011 at 7:43 AM, Sanne Grinovero
 sanne.grinov...@gmail.com wrote:
 Hello all,
 Is there any update on the 3.1 status?
 I'm really looking forward to it :)


 Yes, we are currently in the feature freeze, but it seems to be coming in 
 shape.

 I'm planning on creating the release branch this weekend and getting
 our first RC out Sunday (Steven Rowe volunteered to help with the
 maven side, thanks!).

 If you want to help, for example you can checkout the lucene code from
 http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/
 then you can run 'ant clean dist dist-src' and inspect the artifacts
 it puts in the dist/ folder and report any problems.

 If everyone waits until we build an RC before reviewing how things
 look and reporting problems, its going to significantly slow down the
 release process as generating RC's for both lucene and solr at the
 moment is nontrivial (which is why Steven and I have set aside
 this day to try to build RC1, if the vote doesn't pass it might be
 weeks before we have the time to build RC2).

Cheers, thanks a lot. I'm definitely testing it often, and will report
anything weird.
I can't say about Solr though as we use Lucene mostly.

Sanne

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2585) DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes

2010-08-03 Thread Sanne Grinovero (JIRA)
DirectoryReader.isCurrent might fail to see the segments file during concurrent 
index changes
-

 Key: LUCENE-2585
 URL: https://issues.apache.org/jira/browse/LUCENE-2585
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.2, 3.0.1, 3.0, 2.9.3
Reporter: Sanne Grinovero
 Fix For: 2.9.4, 3.0.3, 3.1


I could reproduce the issue several times but only by running long and 
stressfull benchmarks, the high number of files is likely part of the scenario.
All tests run on local disk, using ext3.

Sample stacktrace:
{noformat}java.io.FileNotFoundException: no segments* file found in 
org.apache.lucene.store.NIOFSDirectory@/home/sanne/infinispan-41/lucene-directory/tempIndexName:
 files: _2l3.frq _uz.fdt _1q4.fnm _1q0.fdx _4bc.fdt _v2.tis _4ll.fdx _2l8.tii 
_ux.fnm _3g7.fdx _4bb.tii _4bj.prx _uy.fdx _3g7.prx _2l7.frq _2la.fdt _3ge.nrm 
_2l6.prx _1py.fdx _3g6.nrm _v0.prx _4bi.tii _2l2.tis _v2.fdx _2l3.nrm _2l8.fnm 
_4bg.tis _2la.tis _uu.fdx _3g6.fdx _1q3.frq _2la.frq _4bb.tis _3gb.tii _1pz.tis 
_2lb.nrm _4lm.nrm _3g9.tii _v0.fdt _2l5.fnm _v2.prx _4ll.tii _4bd.nrm _2l7.fnm 
_2l4.nrm _1q2.tis _3gb.fdx _4bh.fdx _1pz.nrm _ux.fdx _ux.tii _1q6.nrm _3gf.fdx 
_4lk.fdt _3gd.nrm _v3.fnm _3g8.prx _1q2.nrm _4bh.prx _1q0.frq _ux.fdt _1q7.fdt 
_4bb.fnm _4bf.nrm _4bc.nrm _3gb.fdt _4bh.fnm _2l5.tis _1pz.fnm _1py.fnm 
_3gc.fnm _2l2.prx _2l4.frq _3gc.fdt _ux.tis _1q3.prx _2l7.fdx _4bj.nrm _4bj.fdx 
_4bi.tis _3g9.prx _1q4.prx _v3.fdt _1q3.fdx _2l9.fdt _4bh.tis _3gb.nrm _v2.nrm 
_3gd.tii _2l7.nrm _2lb.tii _4lm.tis _3ga.fdx _1pz.fdt _3g7.fnm _2l3.fnm 
_4lk.fnm _uz.fnm _2l2.frq _4bd.fdx _1q2.fdt _3g7.tis _4bi.frq _4bj.frq _2l7.prx 
_ux.prx _3gd.fnm _1q4.fdt _1q1.fdt _v1.fnm _1py.nrm _3gf.nrm _4be.fdt _1q3.tii 
_1q1.prx _2l3.fdt _4lk.frq _2l4.fdx _4bd.fnm _uw.frq _3g8.fdx _2l6.tii _1q5.frq 
_1q5.tis _3g8.nrm _uw.nrm _v0.tii _v2.fdt _2l7.fdt _v0.tis _uy.tii _3ge.tii 
_v1.tii _3gb.tis _4lm.fdx _4bc.fnm _2lb.frq _2l6.fnm _3g6.tii _3ge.prx _uu.frq 
_1pz.fdx _1q2.fnm _4bi.prx _3gc.frq _2l9.tis _3ge.fdt _uy.fdt _4ll.fnm _3gc.prx 
_1q7.tii _2l5.nrm _uy.nrm _uv.frq _1q6.frq _4ba.tis _3g9.tis _4be.nrm _4bi.fnm 
_ux.frq _1q1.fnm _v0.fnm _2l4.fnm _4ba.fnm _4be.tis _uz.prx _1q6.fdx _uw.tii 
_2l6.nrm _1pz.prx _2l7.tis _1q7.fdx _2l9.tii _4lk.tii _uz.frq _3g8.frq _4bb.prx 
_1q5.tii _1q5.prx _v2.frq _4bc.tii _1q7.prx _v2.tii _2lb.tis _4bi.fdt _uv.nrm 
_2l2.fnm _4bd.tii _1q7.tis _4bg.fnm _3ga.frq _uu.fnm _2l9.fnm _3ga.fnm _uw.fnm 
_1pz.frq _1q1.fdx _3ge.fdx _2l3.prx _3ga.nrm _uv.fdt _4bb.nrm _1q7.fnm _uv.tis 
_3gb.fnm _2l6.tis _1pz.tii _uy.fnm _3gf.fdt _3gc.nrm _4bf.tis _1q5.fnm _uu.tis 
_4bh.tii _2l5.fdt _1q6.tii _4bc.tis _3gc.tii _3g9.fnm _2l6.fdt _4bj.fnm _uu.tii 
_v3.frq _3g9.fdx _v0.nrm _2l7.tii _1q0.fdt _3ge.fnm _4bf.fdt _1q6.prx _uz.nrm 
_4bi.fdx _3gf.fnm _4lm.frq _v0.fdx _4ba.fdt _1py.tii _4bf.tii _uw.fdx _2l5.frq 
_3g9.nrm _v1.fdt _uw.fdt _4bd.frq _4bg.prx _3gd.tis _1q4.tis _2l9.nrm _2la.nrm 
_v3.tii _4bf.prx _1q1.nrm _4ba.tii _3gd.fdx _1q4.tii _4lm.tii _3ga.tis _4bf.fnm 
write.lock _2l8.prx _2l8.fdt segments.gen _2lb.fnm _2l4.fdt _1q2.prx _4be.fnm 
_3gf.prx _2l6.fdx _3g6.fnm _4bb.fdt _4bd.tis _4lk.nrm _2l5.fdx _2la.tii 
_4bd.prx _4ln.fnm _3gf.tis _4ba.nrm _v3.prx _uv.prx _1q3.fnm _3ga.tii _uz.tii 
_3g9.frq _v0.frq _3ge.tis _3g6.tis _4ln.prx _3g7.tii _3g8.fdt _3g7.nrm _3ga.prx 
_2l2.fdx _2l8.fdx _4ba.prx _1py.frq _uz.fdx _2l3.tii _3g6.prx _v3.fdx _1q6.fdt 
_v1.nrm _2l2.tii _1q0.tis _4ba.fdx _4be.tii _4ba.frq _4ll.fdt _4bh.nrm _4lm.fdt 
_1q7.frq _4lk.tis _4bc.frq _1q6.fnm _3g7.frq _uw.tis _3g8.tis _2l9.fdx _2l4.tii 
_1q4.fdx _4be.prx _1q3.nrm _1q0.tii _1q0.fnm _v3.nrm _1py.tis _3g9.fdt _4bh.fdt 
_4ll.nrm _4lk.prx _3gd.prx _1q3.tis _1q2.tii _2l2.nrm _3gd.fdt _2l3.fdx 
_3g6.fdt _3gd.frq _1q1.tis _4bb.fdx _1q2.frq _1q3.fdt _v1.tis _2l8.frq _3gc.fdx 
_1q1.frq _4bg.frq _4bb.frq _2la.fdx _2l9.frq _uy.tis _uy.prx _4bg.fdx _3gb.prx 
_uy.frq _1q2.fdx _4lm.prx _2la.prx _2l4.prx _4bg.fdt _4be.frq _1q7.nrm _2l5.prx 
_4bf.frq _v1.prx _4bd.fdt _2l9.prx _1q6.tis _3g8.fnm _4ln.tis _2l3.tis _4bc.fdx 
_2lb.prx _3gb.frq _3gf.frq _2la.fnm _3ga.fdt _uz.tis _4bg.nrm _uv.tii _4bg.tii 
_3g8.tii _4ll.frq _uv.fnm _2l8.tis _2l8.nrm _2l2.fdt _4bj.tis _4lk.fdx _uw.prx 
_4bc.prx _4bj.fdt _4be.fdx _1q4.frq _uu.fdt _1q1.tii _2l5.tii _2lb.fdt _4bh.frq 
_3ge.frq _1py.prx _1q5.nrm _v1.fdx _3g7.fdt _4ln.fdt _1q4.nrm _1py.fdt _3gc.tis 
_4ll.prx _v3.tis _4bf.fdx _1q5.fdx _1q0.prx _4bi.nrm _4ll.tis _2l4.tis _3gf.tii 
_v2.fnm _uu.nrm _1q0.nrm _4lm.fnm _uu.prx _2l6.frq _4ln.nrm _ux.nrm _3g6.frq 
_1q5.fdt _4bj.tii _2lb.fdx _uv.fdx _v1.frq
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:634)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:517)
at org.apache.lucene.index.SegmentInfos.read

[jira] Commented: (LUCENE-2585) DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes

2010-08-03 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895027#action_12895027
 ] 

Sanne Grinovero commented on LUCENE-2585:
-

I'm going to see if I can contribute a patch myself, but I don't think I'll be 
able to provide a unit test.

 DirectoryReader.isCurrent might fail to see the segments file during 
 concurrent index changes
 -

 Key: LUCENE-2585
 URL: https://issues.apache.org/jira/browse/LUCENE-2585
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.3, 3.0, 3.0.1, 3.0.2
Reporter: Sanne Grinovero
 Fix For: 2.9.4, 3.0.3, 3.1


 I could reproduce the issue several times but only by running long and 
 stressfull benchmarks, the high number of files is likely part of the 
 scenario.
 All tests run on local disk, using ext3.
 Sample stacktrace:
 {noformat}java.io.FileNotFoundException: no segments* file found in 
 org.apache.lucene.store.NIOFSDirectory@/home/sanne/infinispan-41/lucene-directory/tempIndexName:
  files: _2l3.frq _uz.fdt _1q4.fnm _1q0.fdx _4bc.fdt _v2.tis _4ll.fdx _2l8.tii 
 _ux.fnm _3g7.fdx _4bb.tii _4bj.prx _uy.fdx _3g7.prx _2l7.frq _2la.fdt 
 _3ge.nrm _2l6.prx _1py.fdx _3g6.nrm _v0.prx _4bi.tii _2l2.tis _v2.fdx 
 _2l3.nrm _2l8.fnm _4bg.tis _2la.tis _uu.fdx _3g6.fdx _1q3.frq _2la.frq 
 _4bb.tis _3gb.tii _1pz.tis _2lb.nrm _4lm.nrm _3g9.tii _v0.fdt _2l5.fnm 
 _v2.prx _4ll.tii _4bd.nrm _2l7.fnm _2l4.nrm _1q2.tis _3gb.fdx _4bh.fdx 
 _1pz.nrm _ux.fdx _ux.tii _1q6.nrm _3gf.fdx _4lk.fdt _3gd.nrm _v3.fnm _3g8.prx 
 _1q2.nrm _4bh.prx _1q0.frq _ux.fdt _1q7.fdt _4bb.fnm _4bf.nrm _4bc.nrm 
 _3gb.fdt _4bh.fnm _2l5.tis _1pz.fnm _1py.fnm _3gc.fnm _2l2.prx _2l4.frq 
 _3gc.fdt _ux.tis _1q3.prx _2l7.fdx _4bj.nrm _4bj.fdx _4bi.tis _3g9.prx 
 _1q4.prx _v3.fdt _1q3.fdx _2l9.fdt _4bh.tis _3gb.nrm _v2.nrm _3gd.tii 
 _2l7.nrm _2lb.tii _4lm.tis _3ga.fdx _1pz.fdt _3g7.fnm _2l3.fnm _4lk.fnm 
 _uz.fnm _2l2.frq _4bd.fdx _1q2.fdt _3g7.tis _4bi.frq _4bj.frq _2l7.prx 
 _ux.prx _3gd.fnm _1q4.fdt _1q1.fdt _v1.fnm _1py.nrm _3gf.nrm _4be.fdt 
 _1q3.tii _1q1.prx _2l3.fdt _4lk.frq _2l4.fdx _4bd.fnm _uw.frq _3g8.fdx 
 _2l6.tii _1q5.frq _1q5.tis _3g8.nrm _uw.nrm _v0.tii _v2.fdt _2l7.fdt _v0.tis 
 _uy.tii _3ge.tii _v1.tii _3gb.tis _4lm.fdx _4bc.fnm _2lb.frq _2l6.fnm 
 _3g6.tii _3ge.prx _uu.frq _1pz.fdx _1q2.fnm _4bi.prx _3gc.frq _2l9.tis 
 _3ge.fdt _uy.fdt _4ll.fnm _3gc.prx _1q7.tii _2l5.nrm _uy.nrm _uv.frq _1q6.frq 
 _4ba.tis _3g9.tis _4be.nrm _4bi.fnm _ux.frq _1q1.fnm _v0.fnm _2l4.fnm 
 _4ba.fnm _4be.tis _uz.prx _1q6.fdx _uw.tii _2l6.nrm _1pz.prx _2l7.tis 
 _1q7.fdx _2l9.tii _4lk.tii _uz.frq _3g8.frq _4bb.prx _1q5.tii _1q5.prx 
 _v2.frq _4bc.tii _1q7.prx _v2.tii _2lb.tis _4bi.fdt _uv.nrm _2l2.fnm _4bd.tii 
 _1q7.tis _4bg.fnm _3ga.frq _uu.fnm _2l9.fnm _3ga.fnm _uw.fnm _1pz.frq 
 _1q1.fdx _3ge.fdx _2l3.prx _3ga.nrm _uv.fdt _4bb.nrm _1q7.fnm _uv.tis 
 _3gb.fnm _2l6.tis _1pz.tii _uy.fnm _3gf.fdt _3gc.nrm _4bf.tis _1q5.fnm 
 _uu.tis _4bh.tii _2l5.fdt _1q6.tii _4bc.tis _3gc.tii _3g9.fnm _2l6.fdt 
 _4bj.fnm _uu.tii _v3.frq _3g9.fdx _v0.nrm _2l7.tii _1q0.fdt _3ge.fnm _4bf.fdt 
 _1q6.prx _uz.nrm _4bi.fdx _3gf.fnm _4lm.frq _v0.fdx _4ba.fdt _1py.tii 
 _4bf.tii _uw.fdx _2l5.frq _3g9.nrm _v1.fdt _uw.fdt _4bd.frq _4bg.prx _3gd.tis 
 _1q4.tis _2l9.nrm _2la.nrm _v3.tii _4bf.prx _1q1.nrm _4ba.tii _3gd.fdx 
 _1q4.tii _4lm.tii _3ga.tis _4bf.fnm write.lock _2l8.prx _2l8.fdt segments.gen 
 _2lb.fnm _2l4.fdt _1q2.prx _4be.fnm _3gf.prx _2l6.fdx _3g6.fnm _4bb.fdt 
 _4bd.tis _4lk.nrm _2l5.fdx _2la.tii _4bd.prx _4ln.fnm _3gf.tis _4ba.nrm 
 _v3.prx _uv.prx _1q3.fnm _3ga.tii _uz.tii _3g9.frq _v0.frq _3ge.tis _3g6.tis 
 _4ln.prx _3g7.tii _3g8.fdt _3g7.nrm _3ga.prx _2l2.fdx _2l8.fdx _4ba.prx 
 _1py.frq _uz.fdx _2l3.tii _3g6.prx _v3.fdx _1q6.fdt _v1.nrm _2l2.tii _1q0.tis 
 _4ba.fdx _4be.tii _4ba.frq _4ll.fdt _4bh.nrm _4lm.fdt _1q7.frq _4lk.tis 
 _4bc.frq _1q6.fnm _3g7.frq _uw.tis _3g8.tis _2l9.fdx _2l4.tii _1q4.fdx 
 _4be.prx _1q3.nrm _1q0.tii _1q0.fnm _v3.nrm _1py.tis _3g9.fdt _4bh.fdt 
 _4ll.nrm _4lk.prx _3gd.prx _1q3.tis _1q2.tii _2l2.nrm _3gd.fdt _2l3.fdx 
 _3g6.fdt _3gd.frq _1q1.tis _4bb.fdx _1q2.frq _1q3.fdt _v1.tis _2l8.frq 
 _3gc.fdx _1q1.frq _4bg.frq _4bb.frq _2la.fdx _2l9.frq _uy.tis _uy.prx 
 _4bg.fdx _3gb.prx _uy.frq _1q2.fdx _4lm.prx _2la.prx _2l4.prx _4bg.fdt 
 _4be.frq _1q7.nrm _2l5.prx _4bf.frq _v1.prx _4bd.fdt _2l9.prx _1q6.tis 
 _3g8.fnm _4ln.tis _2l3.tis _4bc.fdx _2lb.prx _3gb.frq _3gf.frq _2la.fnm 
 _3ga.fdt _uz.tis _4bg.nrm _uv.tii _4bg.tii _3g8.tii _4ll.frq _uv.fnm _2l8.tis 
 _2l8.nrm _2l2.fdt _4bj.tis _4lk.fdx _uw.prx _4bc.prx _4bj.fdt _4be.fdx 
 _1q4.frq _uu.fdt _1q1.tii _2l5.tii _2lb.fdt _4bh.frq _3ge.frq _1py.prx 
 _1q5.nrm _v1.fdx _3g7.fdt _4ln.fdt _1q4.nrm _1py.fdt _3gc.tis _4ll.prx 
 _v3

[jira] Commented: (LUCENE-2585) DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes

2010-08-03 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895059#action_12895059
 ] 

Sanne Grinovero commented on LUCENE-2585:
-

sure, the test is totally open source; the directory implementation based on 
Infinispan is hosted as submodule of Infinispan:
http://anonsvn.jboss.org/repos/infinispan/branches/4.1.x/lucene-directory/

The test is
org.infinispan.lucene.profiling.PerformanceCompareStressTest

it is included in the default test suite but disabled in Maven's configuration, 
so you should run it manually
mvn clean test -Dtest=PerformanceCompareStressTest
(running it requires the jboss.org repositories to be enabled in maven settings)

To describe it at higher level: there are 5 IndexRead-ing threads using 
reopen() before each search, 2 threads writing to the index, 1 additional 
thread as a coordinator and asserting that readers find what they expect to see 
in the index.
Exactly the same test scenario is then applied in sequence to RAMDirectory (not 
having issues), NIOFSDirectory, and 4 differently configured Infinispan 
directories.
Only the FSDirectory is affected by the issue, and it can never complete the 
full hour of stresstest succesfully, while all other implementations behave 
fine.

IndexWriter is set to MaxMergeDocs(5000) and setUseCompoundFile(false); the 
issue is reveled both using SerialMergeScheduler and while using the default 
merger.

During the last execution the test managed to perform 22,192,006 searches and 
26,875 writes during the hour, but the benchmark is invalidated as one thread 
was killed by the exception.

If you deem it useful I'd be happy in contributing a similar testcase to 
Lucene, but I assume you won't be excited in having such a long running test. 
Open to ideas to build a simpler one.


 DirectoryReader.isCurrent might fail to see the segments file during 
 concurrent index changes
 -

 Key: LUCENE-2585
 URL: https://issues.apache.org/jira/browse/LUCENE-2585
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.3, 3.0, 3.0.1, 3.0.2
Reporter: Sanne Grinovero
 Fix For: 2.9.4, 3.0.3, 3.1


 I could reproduce the issue several times but only by running long and 
 stressfull benchmarks, the high number of files is likely part of the 
 scenario.
 All tests run on local disk, using ext3.
 Sample stacktrace:
 {noformat}java.io.FileNotFoundException: no segments* file found in 
 org.apache.lucene.store.NIOFSDirectory@/home/sanne/infinispan-41/lucene-directory/tempIndexName:
  files: _2l3.frq _uz.fdt _1q4.fnm _1q0.fdx _4bc.fdt _v2.tis _4ll.fdx _2l8.tii 
 _ux.fnm _3g7.fdx _4bb.tii _4bj.prx _uy.fdx _3g7.prx _2l7.frq _2la.fdt 
 _3ge.nrm _2l6.prx _1py.fdx _3g6.nrm _v0.prx _4bi.tii _2l2.tis _v2.fdx 
 _2l3.nrm _2l8.fnm _4bg.tis _2la.tis _uu.fdx _3g6.fdx _1q3.frq _2la.frq 
 _4bb.tis _3gb.tii _1pz.tis _2lb.nrm _4lm.nrm _3g9.tii _v0.fdt _2l5.fnm 
 _v2.prx _4ll.tii _4bd.nrm _2l7.fnm _2l4.nrm _1q2.tis _3gb.fdx _4bh.fdx 
 _1pz.nrm _ux.fdx _ux.tii _1q6.nrm _3gf.fdx _4lk.fdt _3gd.nrm _v3.fnm _3g8.prx 
 _1q2.nrm _4bh.prx _1q0.frq _ux.fdt _1q7.fdt _4bb.fnm _4bf.nrm _4bc.nrm 
 _3gb.fdt _4bh.fnm _2l5.tis _1pz.fnm _1py.fnm _3gc.fnm _2l2.prx _2l4.frq 
 _3gc.fdt _ux.tis _1q3.prx _2l7.fdx _4bj.nrm _4bj.fdx _4bi.tis _3g9.prx 
 _1q4.prx _v3.fdt _1q3.fdx _2l9.fdt _4bh.tis _3gb.nrm _v2.nrm _3gd.tii 
 _2l7.nrm _2lb.tii _4lm.tis _3ga.fdx _1pz.fdt _3g7.fnm _2l3.fnm _4lk.fnm 
 _uz.fnm _2l2.frq _4bd.fdx _1q2.fdt _3g7.tis _4bi.frq _4bj.frq _2l7.prx 
 _ux.prx _3gd.fnm _1q4.fdt _1q1.fdt _v1.fnm _1py.nrm _3gf.nrm _4be.fdt 
 _1q3.tii _1q1.prx _2l3.fdt _4lk.frq _2l4.fdx _4bd.fnm _uw.frq _3g8.fdx 
 _2l6.tii _1q5.frq _1q5.tis _3g8.nrm _uw.nrm _v0.tii _v2.fdt _2l7.fdt _v0.tis 
 _uy.tii _3ge.tii _v1.tii _3gb.tis _4lm.fdx _4bc.fnm _2lb.frq _2l6.fnm 
 _3g6.tii _3ge.prx _uu.frq _1pz.fdx _1q2.fnm _4bi.prx _3gc.frq _2l9.tis 
 _3ge.fdt _uy.fdt _4ll.fnm _3gc.prx _1q7.tii _2l5.nrm _uy.nrm _uv.frq _1q6.frq 
 _4ba.tis _3g9.tis _4be.nrm _4bi.fnm _ux.frq _1q1.fnm _v0.fnm _2l4.fnm 
 _4ba.fnm _4be.tis _uz.prx _1q6.fdx _uw.tii _2l6.nrm _1pz.prx _2l7.tis 
 _1q7.fdx _2l9.tii _4lk.tii _uz.frq _3g8.frq _4bb.prx _1q5.tii _1q5.prx 
 _v2.frq _4bc.tii _1q7.prx _v2.tii _2lb.tis _4bi.fdt _uv.nrm _2l2.fnm _4bd.tii 
 _1q7.tis _4bg.fnm _3ga.frq _uu.fnm _2l9.fnm _3ga.fnm _uw.fnm _1pz.frq 
 _1q1.fdx _3ge.fdx _2l3.prx _3ga.nrm _uv.fdt _4bb.nrm _1q7.fnm _uv.tis 
 _3gb.fnm _2l6.tis _1pz.tii _uy.fnm _3gf.fdt _3gc.nrm _4bf.tis _1q5.fnm 
 _uu.tis _4bh.tii _2l5.fdt _1q6.tii _4bc.tis _3gc.tii _3g9.fnm _2l6.fdt 
 _4bj.fnm _uu.tii _v3.frq _3g9.fdx _v0.nrm _2l7.tii _1q0.fdt _3ge.fnm _4bf.fdt 
 _1q6.prx _uz.nrm _4bi.fdx _3gf.fnm _4lm.frq _v0.fdx _4ba.fdt _1py.tii 
 _4bf.tii _uw.fdx _2l5.frq _3g9.nrm _v1.fdt

[jira] Updated: (LUCENE-2585) DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes

2010-08-03 Thread Sanne Grinovero (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanne Grinovero updated LUCENE-2585:


Description: 
I could reproduce the issue several times but only by running long and 
stressfull benchmarks, the high number of files is likely part of the scenario.
All tests run on local disk, using ext3.

Sample stacktrace:
{noformat}java.io.FileNotFoundException: no segments* file found in 
org.apache.lucene.store.NIOFSDirectory@/home/sanne/infinispan-41/lucene-directory/tempIndexName:
 files:
_2l3.frq _uz.fdt _1q4.fnm _1q0.fdx _4bc.fdt _v2.tis _4ll.fdx _2l8.tii _ux.fnm 
_3g7.fdx _4bb.tii _4bj.prx _uy.fdx _3g7.prx _2l7.frq _2la.fdt _3ge.nrm _2l6.prx 
_1py.fdx _3g6.nrm _v0.prx _4bi.tii _2l2.tis _v2.fdx _2l3.nrm _2l8.fnm _4bg.tis 
_2la.tis _uu.fdx _3g6.fdx _1q3.frq _2la.frq _4bb.tis _3gb.tii _1pz.tis 
_2lb.nrm _4lm.nrm _3g9.tii _v0.fdt _2l5.fnm _v2.prx _4ll.tii _4bd.nrm _2l7.fnm 
_2l4.nrm _1q2.tis _3gb.fdx _4bh.fdx _1pz.nrm _ux.fdx _ux.tii _1q6.nrm 
_3gf.fdx _4lk.fdt _3gd.nrm _v3.fnm _3g8.prx _1q2.nrm _4bh.prx _1q0.frq _ux.fdt 
_1q7.fdt _4bb.fnm _4bf.nrm _4bc.nrm _3gb.fdt _4bh.fnm _2l5.tis 
_1pz.fnm _1py.fnm _3gc.fnm _2l2.prx _2l4.frq _3gc.fdt _ux.tis _1q3.prx _2l7.fdx 
_4bj.nrm _4bj.fdx _4bi.tis _3g9.prx _1q4.prx _v3.fdt _1q3.fdx _2l9.fdt 
_4bh.tis _3gb.nrm _v2.nrm _3gd.tii _2l7.nrm _2lb.tii _4lm.tis _3ga.fdx _1pz.fdt 
_3g7.fnm _2l3.fnm _4lk.fnm _uz.fnm _2l2.frq _4bd.fdx _1q2.fdt _3g7.tis 
_4bi.frq _4bj.frq _2l7.prx _ux.prx _3gd.fnm _1q4.fdt _1q1.fdt _v1.fnm _1py.nrm 
_3gf.nrm _4be.fdt _1q3.tii _1q1.prx _2l3.fdt _4lk.frq _2l4.fdx _4bd.fnm 
_uw.frq _3g8.fdx _2l6.tii _1q5.frq _1q5.tis _3g8.nrm _uw.nrm _v0.tii _v2.fdt 
_2l7.fdt _v0.tis _uy.tii _3ge.tii _v1.tii _3gb.tis _4lm.fdx _4bc.fnm _2lb.frq 
_2l6.fnm _3g6.tii _3ge.prx _uu.frq _1pz.fdx _1q2.fnm _4bi.prx _3gc.frq _2l9.tis 
_3ge.fdt _uy.fdt _4ll.fnm _3gc.prx _1q7.tii _2l5.nrm _uy.nrm _uv.frq 
_1q6.frq _4ba.tis _3g9.tis _4be.nrm _4bi.fnm _ux.frq _1q1.fnm _v0.fnm _2l4.fnm 
_4ba.fnm _4be.tis _uz.prx _1q6.fdx _uw.tii _2l6.nrm _1pz.prx _2l7.tis 
_1q7.fdx _2l9.tii _4lk.tii _uz.frq _3g8.frq _4bb.prx _1q5.tii _1q5.prx _v2.frq 
_4bc.tii _1q7.prx _v2.tii _2lb.tis _4bi.fdt _uv.nrm _2l2.fnm _4bd.tii _1q7.tis 
_4bg.fnm _3ga.frq _uu.fnm _2l9.fnm _3ga.fnm _uw.fnm _1pz.frq _1q1.fdx _3ge.fdx 
_2l3.prx _3ga.nrm _uv.fdt _4bb.nrm _1q7.fnm _uv.tis _3gb.fnm 
_2l6.tis _1pz.tii _uy.fnm _3gf.fdt _3gc.nrm _4bf.tis _1q5.fnm _uu.tis _4bh.tii 
_2l5.fdt _1q6.tii _4bc.tis _3gc.tii _3g9.fnm _2l6.fdt _4bj.fnm _uu.tii _v3.frq 
_3g9.fdx _v0.nrm _2l7.tii _1q0.fdt _3ge.fnm _4bf.fdt _1q6.prx _uz.nrm _4bi.fdx 
_3gf.fnm _4lm.frq _v0.fdx _4ba.fdt _1py.tii _4bf.tii _uw.fdx _2l5.frq 
_3g9.nrm _v1.fdt _uw.fdt _4bd.frq _4bg.prx _3gd.tis _1q4.tis _2l9.nrm _2la.nrm 
_v3.tii _4bf.prx _1q1.nrm _4ba.tii _3gd.fdx _1q4.tii _4lm.tii _3ga.tis 
_4bf.fnm write.lock _2l8.prx _2l8.fdt segments.gen _2lb.fnm _2l4.fdt _1q2.prx 
_4be.fnm _3gf.prx _2l6.fdx _3g6.fnm _4bb.fdt _4bd.tis _4lk.nrm _2l5.fdx 
_2la.tii _4bd.prx _4ln.fnm _3gf.tis _4ba.nrm _v3.prx _uv.prx _1q3.fnm _3ga.tii 
_uz.tii _3g9.frq _v0.frq _3ge.tis _3g6.tis _4ln.prx _3g7.tii _3g8.fdt 
_3g7.nrm _3ga.prx _2l2.fdx _2l8.fdx _4ba.prx _1py.frq _uz.fdx _2l3.tii _3g6.prx 
_v3.fdx _1q6.fdt _v1.nrm _2l2.tii _1q0.tis _4ba.fdx _4be.tii _4ba.frq 
_4ll.fdt _4bh.nrm _4lm.fdt _1q7.frq _4lk.tis _4bc.frq _1q6.fnm _3g7.frq _uw.tis 
_3g8.tis _2l9.fdx _2l4.tii _1q4.fdx _4be.prx _1q3.nrm _1q0.tii _1q0.fnm 
_v3.nrm _1py.tis _3g9.fdt _4bh.fdt _4ll.nrm _4lk.prx _3gd.prx _1q3.tis _1q2.tii 
_2l2.nrm _3gd.fdt _2l3.fdx _3g6.fdt _3gd.frq _1q1.tis _4bb.fdx _1q2.frq 
_1q3.fdt _v1.tis _2l8.frq _3gc.fdx _1q1.frq _4bg.frq _4bb.frq _2la.fdx _2l9.frq 
_uy.tis _uy.prx _4bg.fdx _3gb.prx _uy.frq _1q2.fdx _4lm.prx _2la.prx 
_2l4.prx _4bg.fdt _4be.frq _1q7.nrm _2l5.prx _4bf.frq _v1.prx _4bd.fdt _2l9.prx 
_1q6.tis _3g8.fnm _4ln.tis _2l3.tis _4bc.fdx _2lb.prx _3gb.frq _3gf.frq 
_2la.fnm _3ga.fdt _uz.tis _4bg.nrm _uv.tii _4bg.tii _3g8.tii _4ll.frq _uv.fnm 
_2l8.tis _2l8.nrm _2l2.fdt _4bj.tis _4lk.fdx _uw.prx _4bc.prx _4bj.fdt _4be.fdx 
_1q4.frq _uu.fdt _1q1.tii _2l5.tii _2lb.fdt _4bh.frq _3ge.frq _1py.prx _1q5.nrm 
_v1.fdx _3g7.fdt _4ln.fdt _1q4.nrm _1py.fdt _3gc.tis _4ll.prx _v3.tis _4bf.fdx 
_1q5.fdx _1q0.prx _4bi.nrm _4ll.tis _2l4.tis _3gf.tii _v2.fnm _uu.nrm _1q0.nrm 
_4lm.fnm _uu.prx _2l6.frq _4ln.nrm _ux.nrm _3g6.frq _1q5.fdt _4bj.tii 
_2lb.fdx _uv.fdx _v1.frq
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:634)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:517)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:306)
at 
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:408)
at 
org.apache.lucene.index.DirectoryReader.isCurrent(DirectoryReader.java:797)
at 
org.apache.lucene.index.DirectoryReader.doReopenNoWriter(DirectoryReader.java:407

[jira] Issue Comment Edited: (LUCENE-2585) DirectoryReader.isCurrent might fail to see the segments file during concurrent index changes

2010-08-03 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895059#action_12895059
 ] 

Sanne Grinovero edited comment on LUCENE-2585 at 8/3/10 6:26 PM:
-

sure, the test is totally open source; the directory implementation based on 
Infinispan is hosted as submodule of Infinispan:
http://anonsvn.jboss.org/repos/infinispan/branches/4.1.x/lucene-directory/

The test is
org.infinispan.lucene.profiling.PerformanceCompareStressTest

it is included in the default test suite but disabled in Maven's configuration, 
so you should run it manually
mvn clean test -Dtest=PerformanceCompareStressTest
(running it requires the jboss.org repositories to be enabled in maven settings)

To describe it at higher level: there are 5 IndexRead-ing threads using 
reopen() before each search, 2 threads writing to the index, 1 additional 
thread as a coordinator and asserting that readers find what they expect to see 
in the index.
Exactly the same test scenario is then applied in sequence to RAMDirectory (not 
having issues), NIOFSDirectory, and 4 differently configured Infinispan 
directories.
Only the FSDirectory is affected by the issue, and it can never complete the 
full hour of stresstest succesfully, while all other implementations behave 
fine.

IndexWriter is set to MaxMergeDocs(5000) and setUseCompoundFile(false); the 
issue is reveled both using SerialMergeScheduler and while using the default 
merger.

During the last execution the test managed to perform 22,192,006 searches and 
26,875 writes before hitting the exceptional case.

If you deem it useful I'd be happy in contributing a similar testcase to 
Lucene, but I assume you won't be excited in having such a long running test. 
Open to ideas to build a simpler one.


  was (Author: sanne):
sure, the test is totally open source; the directory implementation based 
on Infinispan is hosted as submodule of Infinispan:
http://anonsvn.jboss.org/repos/infinispan/branches/4.1.x/lucene-directory/

The test is
org.infinispan.lucene.profiling.PerformanceCompareStressTest

it is included in the default test suite but disabled in Maven's configuration, 
so you should run it manually
mvn clean test -Dtest=PerformanceCompareStressTest
(running it requires the jboss.org repositories to be enabled in maven settings)

To describe it at higher level: there are 5 IndexRead-ing threads using 
reopen() before each search, 2 threads writing to the index, 1 additional 
thread as a coordinator and asserting that readers find what they expect to see 
in the index.
Exactly the same test scenario is then applied in sequence to RAMDirectory (not 
having issues), NIOFSDirectory, and 4 differently configured Infinispan 
directories.
Only the FSDirectory is affected by the issue, and it can never complete the 
full hour of stresstest succesfully, while all other implementations behave 
fine.

IndexWriter is set to MaxMergeDocs(5000) and setUseCompoundFile(false); the 
issue is reveled both using SerialMergeScheduler and while using the default 
merger.

During the last execution the test managed to perform 22,192,006 searches and 
26,875 writes during the hour, but the benchmark is invalidated as one thread 
was killed by the exception.

If you deem it useful I'd be happy in contributing a similar testcase to 
Lucene, but I assume you won't be excited in having such a long running test. 
Open to ideas to build a simpler one.

  
 DirectoryReader.isCurrent might fail to see the segments file during 
 concurrent index changes
 -

 Key: LUCENE-2585
 URL: https://issues.apache.org/jira/browse/LUCENE-2585
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.3, 3.0, 3.0.1, 3.0.2
Reporter: Sanne Grinovero
 Fix For: 2.9.4, 3.0.3, 3.1


 I could reproduce the issue several times but only by running long and 
 stressfull benchmarks, the high number of files is likely part of the 
 scenario.
 All tests run on local disk, using ext3.
 Sample stacktrace:
 {noformat}java.io.FileNotFoundException: no segments* file found in 
 org.apache.lucene.store.NIOFSDirectory@/home/sanne/infinispan-41/lucene-directory/tempIndexName:
  files:
 _2l3.frq _uz.fdt _1q4.fnm _1q0.fdx _4bc.fdt _v2.tis _4ll.fdx _2l8.tii _ux.fnm 
 _3g7.fdx _4bb.tii _4bj.prx _uy.fdx _3g7.prx _2l7.frq _2la.fdt _3ge.nrm 
 _2l6.prx 
 _1py.fdx _3g6.nrm _v0.prx _4bi.tii _2l2.tis _v2.fdx _2l3.nrm _2l8.fnm 
 _4bg.tis _2la.tis _uu.fdx _3g6.fdx _1q3.frq _2la.frq _4bb.tis _3gb.tii 
 _1pz.tis 
 _2lb.nrm _4lm.nrm _3g9.tii _v0.fdt _2l5.fnm _v2.prx _4ll.tii _4bd.nrm 
 _2l7.fnm _2l4.nrm _1q2.tis _3gb.fdx _4bh.fdx _1pz.nrm _ux.fdx _ux.tii 
 _1q6.nrm 
 _3gf.fdx _4lk.fdt _3gd.nrm _v3.fnm _3g8.prx _1q2.nrm

Re: Proposal about Version API relaxation

2010-04-15 Thread Sanne Grinovero
Hello,
I think some compatibility breaks should really be accepted, otherwise
these requirements are going to kill the technological advancement:
the effort in backwards compatibility will grow and be more
timeconsuming and harder every day.

A mayor release won't happen every day, likely not even every year, so
it seems acceptable to have milestones defining compatibility
boundaries: you need to be able to reset the complexity curve
occasionally.

Backporting a feature would benefit from being merged in the correct
testsuite, and avoid the explosion of this matrix-like backwards
compatibility test suite. BTW the current testsuite is likely covering
all kinds of combinations which nobody is actually using or caring
about.

Also if I where to discover a nice improvement in an Analyzer, and you
where telling me that to contribute it I would have to face this
amount of complexity.. I would think twice before trying; honestly the
current requirements are scary.

+1

Sanne

2010/4/15 Earwin Burrfoot ear...@gmail.com:
 I'd like to remind that Mike's proposal has stable branches.

 We can branch off preflex trunk right now and wrap it up as 3.1.
 Current trunk is declared as future 4.0 and all backcompat cruft is
 removed from it.
 If some new features/bugfixes appear in trunk, and they don't break
 stuff - we backport them to 3.x branch, eventually releasing 3.2, 3.3,
 etc

 Thus, devs are free to work without back-compat burden, bleeding edge
 users get their blood, conservative users get their stability + a
 subset of new features from stable branches.


 On Thu, Apr 15, 2010 at 22:02, DM Smith dmsmith...@gmail.com wrote:
 On 04/15/2010 01:50 PM, Earwin Burrfoot wrote:

 First, the index format. IMHO, it is a good thing for a major release to
 be
 able to read the prior major release's index. And the ability to convert
 it
 to the current format via optimize is also good. Whatever is decided on
 this
 thread should take this seriously.


 Optimize is a bad way to convert to current.
 1. conversion is not guaranteed, optimizing already optimized index is a
 noop
 2. it merges all your segments. if you use BalancedSegmentMergePolicy,
 that destroys your segment size distribution

 Dedicated upgrade tool (available both from command-line and
 programmatically) is a good way to convert to current.
 1. conversion happens exactly when you need it, conversion happens for
 sure, no additional checks needed
 2. it should leave all your segments as is, only changing their format



 It is my observation, though possibly not correct, that core only has
 rudimentary analysis capabilities, handling English very well. To handle
 other languages well contrib/analyzers is required. Until recently it
 did
 not get much love. There have been many bw compat breaking changes
 (though
 w/ version one can probably get the prior behavior). IMHO, most of
 contrib/analyzers should be core. My guess is that most non-trivial
 applications will use contrib/analyzers.


 I counter - most non-trivial applications will use their own analyzers.
 The more modules - the merrier. You can choose precisely what you need.


 By and large an analyzer is a simple wrapper for a tokenizer and some
 filters. Are you suggesting that most non-trivial apps write their own
 tokenizers and filters?

 I'd find that hard to believe. For example, I don't know enough Chinese,
 Farsi, Arabic, Polish, ... to come up with anything better than what Lucene
 has to tokenize, stem or filter these.



 Our user base are those with ancient,
 underpowered laptops in 3-rd world countries. On those machines it might
 take 10 minutes to create an index and during that time the machine is
 fairly unresponsive. There is no opportunity to do it in the
 background.


 Major Lucene releases (feature-wise, not version-wise) happen like
 once in a year, or year-and-a-half.
 Is it that hard for your users to wait ten minutes once a year?


  I said that was for one index. Multiply that times the number of books
 available (300+) and yes, it is too much to ask. Even if a small subset is
 indexed, say 30, that's around 5 hours of waiting.

 Under consideration is the frequency of breakage. Some are suggesting a
 greater frequency than yearly.

 DM

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
 ICQ: 104465785

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Proposal about Version API relaxation

2010-04-15 Thread Sanne Grinovero
+1 on the Analyzers split,
But would like to point out that it's not very different than having a
non final static version field.
Just a much better solution as you keep your code manageable.

2010/4/15 Grant Ingersoll gsing...@apache.org:

 On Apr 15, 2010, at 4:21 PM, Shai Erera wrote:

 +1 on the Analyzers as well.

 Earwin, I think I don't mind if we introduce migrate() elsewhere rather than 
 on IW. What I meant to say is that if we stick w/ index format back-compat 
 and ongoing migration, then such a method would be useful on IW for 
 customers to call to ensure they're on the latest version.
 But if the majority here agree w/ a standalone tool, then I'm ok if it sits 
 elsewhere.

 Grant, I'm all for 'just doing it and see what happens'. But I think we need 
 to at least decide what we're going to do so it's clear to everyone. Because 
 I'd like to know if I'm about to propose an index format change, whether I 
 need to build migration tool or not. Actually, I'd like to know if people 
 like Robert (basically those who have no problem to reindex and don't 
 understand the fuss around it) will want to change the index format - can I 
 count on them to be asked to provide such tool? That's to me a policy we 
 should decide on ... whatever the consequences.

 As I said, we should strive for index compatibility, but even in the past, we 
 said we did, but the implications weren't always clear.   I think index 
 compatibility is very important.  I've seen plenty of times where reindexing 
 is not possible.  But even then, you still have the option of testing to find 
 out whether you can update or not.  If you can't update, then don't until you 
 can figure out how to do it.  FWIW, I think our approach is much more 
 proactive than see what happens.  I'd argue, that in the past, our approach 
 was see what happens, only the seeing didn't happen until after the 
 release!

 -Grant
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Having a default constructor in Analyzers

2010-02-08 Thread Sanne Grinovero
2010/2/8 Robert Muir rcm...@gmail.com:
8 snip 8

 how would this work when the Query analyzer differs from the Index analyzer?
 For example, using commongrams in solr means you use a different Query
 analyzer from Index analyzer, and there are some other use cases even in
 solr (synonyms expansion and things like that)
8 snip 8

They are two different Analyzer types, but I assume they want to use
the same value for Version, right? The same version which was used to
build the rest of the index.

Regards,
Sanne

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Having a default constructor in Analyzers

2010-02-08 Thread Sanne Grinovero
Hi Uwe,
yes Hibernate is definitely recommending the Solr way for normal and
power users, but we're also taking care of beginners trying it out for
the first time it should just work out of the box for a simple POC, in
those cases an Analyzer is defined as global analyzer (used for all
cases you're not overriding the default); in this case it used to be
possible to specify a single Analyzer by fully qualified name, to be
used globally, or one per index. Of course this is far from the
flexibility needed for most real world applications, but keeps it
simple for beginners taking a first look to introducing Lucene; so for
these cases I don't care much about the Version used, of course it's
important that they later can pin it down.
To be compatible I'll have to change the loader, which is going to
look for a default constructor, or a single-parameter Version
constructor, should be good enough to accomodate the simple goal; I'll
read the Version from a configuration parameter, probably nailing down
the Version to the current latest and/or reading my own environment
parameter.

I agree about the factory strategy; in fact it's on HSEARCH-457 since
right before my emails here; I asked here to check we could keep it
simple :-)

Thanks all,
Sanne

2010/2/8 Uwe Schindler u...@thetaphi.de:
 Simon:
 Sanne, I would recommend you building a Factory pattern around you
 Analyzers / TokenStreams similar to what solr does. That way you can
 load you own default ctor interface via reflection and obtain you
 analyzers from those factories. That makes more sense anyway as you
 only load the factory via reflection an not the analyzers.

 As far as I see, Hibernate uses Solr Factories.  On the other hand, you can 
 instead of creating your own SolrAnalyzer also use a standard one from 
 Lucene (you can do this in Solr, too):

 http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#analyzer

 In my opinion, the Factory pattern is ok for own analyzer definitions. For 
 reusing standard analyzers like StandardAnalyzer or TurkishAnalyzer, 
 the ideal case is to use the reflection code I proposed before. This code 
 works for all language-based analyzers having a standard ctor or Version 
 ctor. Solr will also handle this reflection-based instantiation with optional 
 Version parameter in future, too (Eric Hatcher pointed that out to me, when 
 working on SOLR-1677: Another comment on this... Solr supports using an 
 Analyzer also, but only ones with zero-arg constructors. It would be nice if 
 this Version support also allowed for Analyzers (say SmartChineseAnalyzer) to 
 be used also directly. I don't think this patch accounts for this case, does 
 it?).

 As Hibernate also uses the factory pattern for custom analyzers, as soon as 
 https://issues.apache.org/jira/browse/SOLR-1677 is in, the version problem 
 for those should be solved, too (as you can specify the parameter to each 
 component). But Hibernate should also think about a global default Version 
 (like Solr via CoreAware or like that), that is used as a default param to 
 all Tokenizers/TokenFilters and when reflection-based Anaylzer subclass 
 instantiation is used.

 By the way, hibernate's reuse of Solr's schema is one argument of Hoss, not 
 to make it CoreAware.

 Uwe


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Having a default constructor in Analyzers

2010-02-07 Thread Sanne Grinovero
Hello,
I've seen that some core Analyzers are now missing a default
constructor; this is preventing many applications to configure/load
Analyzers by reflection, which is a common use case to have Analyzers
chosen in configuration files.

Would it be possible to add, for example, a constructor like

public StandardAnalyzer() {
   this(Version.LUCENE_CURRENT);
}

?

Of course more advanced use cases would need to pass parameters but
please make the advanced usage optional; I have now seen more than a
single project break because of this (and revert to older Lucene).

Regards,
Sanne

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




Re: Having a default constructor in Analyzers

2010-02-07 Thread Sanne Grinovero
Thanks for all the quick answers;

finding the ctor having only a Version parameter is fine for me, I had
noticed this frequent pattern but didn't understand that was a
general rule.
So can I assume this is an implicit contract for all Analyzers, to
have either an empty ctor or a single-parameter of type Version?

I know about the dangers of using LUCENE_CURRENT, but rebuilding the
index is not always something you need to avoid.
Having LUCENE_CURRENT is for example useful for me to test Hibernate
Search towards the current Lucene on classpath, without having to
rebuild the code.

thanks for all help,
Sanne


2010/2/7 Robert Muir rcm...@gmail.com:
 I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION is
 done.

 On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler u...@thetaphi.de wrote:

 Hi Sanne,

 Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is the
 badest thing you can do if you want to later update your Lucene version and
 do not want to reindex all your indexes (see javadocs).

 It is easy to modify your application to create analyzers even from config
 files using the reflection way. Just find a constructor taking Version and
 call newInstance() on it, not directly on the Class. It's just one line of
 code more.

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de

  -Original Message-
  From: Sanne Grinovero [mailto:sanne.grinov...@gmail.com]
  Sent: Sunday, February 07, 2010 6:33 PM
  To: java-dev@lucene.apache.org
  Subject: Having a default constructor in Analyzers
 
  Hello,
  I've seen that some core Analyzers are now missing a default
  constructor; this is preventing many applications to configure/load
  Analyzers by reflection, which is a common use case to have Analyzers
  chosen in configuration files.
 
  Would it be possible to add, for example, a constructor like
 
  public StandardAnalyzer() {
     this(Version.LUCENE_CURRENT);
  }
 
  ?
 
  Of course more advanced use cases would need to pass parameters but
  please make the advanced usage optional; I have now seen more than a
  single project break because of this (and revert to older Lucene).
 
  Regards,
  Sanne
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




 --
 Robert Muir
 rcm...@gmail.com


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Having a default constructor in Analyzers

2010-02-07 Thread Sanne Grinovero
Does it make sense to use different values across the same
application? Obviously in the unlikely case you want to threat
different indexes in a different way, but does it make sense when
working all on the same index?
If not, why not introduce a value like Version.BY_ENVIRONMENT which
is statically initialized to be one of the other values, reading from
an environment parameter?
So you get the latest at first deploy, and can then keep compatibility
as long as you need, even when updating Lucene.
This way I could still have the safety of pinning down a specific
version and yet avoid rebuilding the app when changing it.
Of course the default would be LUCENE_CURRENT, so that people trying
out Lucene get all features out of the box, and warn about setting it
(maybe log a warning when not set).

Also, wouldn't it make sense to be able to read the recommended
version from the Index?
I'd like to have the hypothetical AnalyzerFactory to find out what it
needs to build getting information from the relevant IndexReader; so
in the case I have two indexes using different versions I won't get
mistakes. (For a query on index A I'm creating a QueryParser, so let's
ask the index which kind of QueryParser I should use...)

just some ideas, forgive me if I misunderstood this usage (should
avoid writing late in the night..)
Regards,
Sanne



2010/2/7 Simon Willnauer simon.willna...@googlemail.com:
 On Sun, Feb 7, 2010 at 8:38 PM, Robert Muir rcm...@gmail.com wrote:
 Simon, can you explain how removing CURRENT makes it harder for users to
 upgrade? If you mean for the case of people that always re-index all
 documents when upgrading lucene jar, then this makes sense to me.
 That is what I was alluding to!
 Not much of a deal though most IDEs let you upgrade via refactoring
 easily and we can document this too. Yet we won't have a drop in
 upgrade anymore though.


 I guess as a step we can at least deprecate this thing and strongly
 discourage its use, please see the patch at LUCENE-2080.

 Not to pick on Sanne, but his wording about: Of course more advanced use
 cases would need to pass parameters but please make the advanced usage
 optional, this really caused me to rethink CURRENT, because CURRENT itself
 should be the advanced use case!!!

 On Sun, Feb 7, 2010 at 2:34 PM, Simon Willnauer
 simon.willna...@googlemail.com wrote:

 Sanne, I would recommend you building a Factory pattern around you
 Analyzers / TokenStreams similar to what solr does. That way you can
 load you own default ctor interface via reflection and obtain you
 analyzers from those factories. That makes more sense anyway as you
 only load the factory via reflection an not the analyzers.

 @Robert: I don't know if removing LUCENE_CURRENT is the way to go. On
 the one hand it would make our live easier over time but would make it
 harder for our users to upgrade. I would totally agree that for
 upgrade safety it would be much better to enforce an explicit version
 number so upgrading can be done step by step. Yet, if we deprecate
 LUCENE_CURRENT people will use it for at least the next 3 to 5 years
 (until 4.0) anyway :)

 simon

 On Sun, Feb 7, 2010 at 8:17 PM, Sanne Grinovero
 sanne.grinov...@gmail.com wrote:
  Thanks for all the quick answers;
 
  finding the ctor having only a Version parameter is fine for me, I had
  noticed this frequent pattern but didn't understand that was a
  general rule.
  So can I assume this is an implicit contract for all Analyzers, to
  have either an empty ctor or a single-parameter of type Version?
 
  I know about the dangers of using LUCENE_CURRENT, but rebuilding the
  index is not always something you need to avoid.
  Having LUCENE_CURRENT is for example useful for me to test Hibernate
  Search towards the current Lucene on classpath, without having to
  rebuild the code.
 
  thanks for all help,
  Sanne
 
 
  2010/2/7 Robert Muir rcm...@gmail.com:
  I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION
  is
  done.
 
  On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler u...@thetaphi.de wrote:
 
  Hi Sanne,
 
  Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT is
  the
  badest thing you can do if you want to later update your Lucene
  version and
  do not want to reindex all your indexes (see javadocs).
 
  It is easy to modify your application to create analyzers even from
  config
  files using the reflection way. Just find a constructor taking Version
  and
  call newInstance() on it, not directly on the Class. It's just one
  line of
  code more.
 
  Uwe
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
   -Original Message-
   From: Sanne Grinovero [mailto:sanne.grinov...@gmail.com]
   Sent: Sunday, February 07, 2010 6:33 PM
   To: java-dev@lucene.apache.org
   Subject: Having a default constructor in Analyzers
  
   Hello,
   I've seen that some core Analyzers are now missing a default
   constructor

Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

2010-01-20 Thread Sanne Grinovero
thanks for the heads-up, this is good to know.
I've updated http://wiki.apache.org/lucene-java/AvailableLockFactories
which I recently created as a guide to help in choosing between
different LockFactories.

I believe the Native LockFactory is very useful, I wouldn't consider
this a bug nor consider discouraging it's use, people just need to be
informed of the behavior and know that no LockFactory impl is good for
all cases.

Adding some lines to it's javadoc seems appropriate.

Regards,
Sanne

2010/1/20 Chris Hostetter hossman_luc...@fucit.org:

 :  At a minimu, shouldn't NativeFSLock.obtain() be checking for
 :  OverlappingFileLockException and treating that as a failure to acquire the
 :  lock?
        ...
 : Perhaps - that should make it work in more cases - but in my simple
 : testing its not 100% reliable.
        ...
 : File locks are held on behalf of the entire Java virtual machine.
 :      * They are not suitable for controlling access to a file by multiple
 :      * threads within the same virtual machine.

 ...Grrr  so where does that leave us?

 Yonik's added comment was that native isnt' recommended when running
 multiple webapps in the same container.  in truth, native *can*
 work when running multiple webapps in the same container, just as long as
 those cotnainers don't refrence the same data dirs

 I'm worried that we should recommend people avoid native altogether
 because even if you are only running one webapp, it seems like a reload
 or that app could trigger some similar bad behavior.

 So what/how should we document all of this?

 -Hoss




Re: Lucene memory consumption

2010-01-19 Thread Sanne Grinovero
Hello Frederic,
I'm CCing java-dev@lucene.apache.org as Michael McCandless has been
very helpful on IRC in discussing the ThreadLocal implication, and it
would be nice you could provide first-hand information.

There's a good reading to start from at
http://issues.apache.org/jira/browse/LUCENE-1383
Basically your proposal is having a problem which is that when you
close the ThreadLocal it's only going to cleanup the resources stored
by the current thread, not by others; setting the reference to null
also won't help:
Quoting the TLocal source's comment:
* However, since reference queues are not
 * used, stale entries are guaranteed to be removed only when
 * the table starts running out of space.

About your issues:
 1. A ThreadLocal object should normally be a singleton used has key to the 
 thread map. Here it is repeatedly created and destroy!
It's only built in the constructor, and destroyed on close. So it's
lifecycle is linked to the Analyzer / FieldCache using it, probably a
long time, or the appropriate time to cleanup things.

 2. Setting t = null; is not affecting the garbage collection of the 
 ThreadLocal map since t is the key (hard ref) of the thread map.
Well t is unfortunately being reused as a variable name: t = null;
is clearing the reference to the threadlocal, which really is the key
of the map used by the threadlocal and referenced by the current
Thread instance, and TLocal uses weak *Keys* not values (and the key
is the TLocal itself).

 3. There are no call to t.remove() which will really clean the Map entry.
You could add one, but it would only cleanup the garbage from the
current thread, so it's ok but not enough. The current impl is making
sure all stuff is collected by wrapping it all in weak values.
Actually some stuff is not collected: the WeakReferences themselves,
but pointing to going-to-be-collected stuff. These WeakReferences are
going to be removed when the TLocal table is full, and should be
harmless (?).
As you pointed out, since Lucene 3 it's releasing what is possible to
release eagerly, but it's a very small slight optimization: you still
need the weak/hardref trick to clean the other values.

 4. A ThreadLocal Map is already a WeakReference for the value.
No, it's on the keys: a collected ThreadLocal will be cleaned up for.
eventually :-/

 5. Leaving objects on a ThreadLocal after it is out of your control is bad 
 practice. Another task may reuse the Thread and found dirty objects there.
Agree, but having weak values it's not a big issue. Also it's not
meant to be used by faint hearted, just people writing their own
Analyzer could have this wrong :)

 6. We found (in all our tests) the hardRef Map to be completely unnecessary 
 in Lucene 2.4.1, but here I'm lacking more in depth knowledge of the 
 lifecycle of the objects added to this CloseableThreadLocal.
Well as it's being used as a cache functionality will be the same,
performance should be worse. AFAIK all TokenFilters are able to
rebuild what they need when get() returns null, you might have a
problem on the unlikely case of
org.apache.lucene.util.CloseableThreadLocal:68 having the assertion
fail, but again not affecting functionality (assuming assertions are
disabled).

A vanilla ThreadLocal is obviously faster than this, but then we end
up reverting LUCENE-1383 and so introducing more pressure on the GC.

It would be very interesting to find out why your implementation is
performing better? Maybe because in your case Analyzers are used by
one thread at a time, and so you're not leaking memory?
Could you tell more about this to lucene-dev directly?

Regards,
Sanne

2010/1/6 Frederic Simon fr...@jfrog.org:
 Thanks Emmanuel,
 Yes the main issue is that the hardRef map in this class was forcing all the
 objects to go to the Old generation space in the JVM GC, instead of staying
 at a ThreadLocal level. So, all objects put in the CloseableThreadLocal were
 GC only on full GC. On heavy lucene usage, it generated around 500Mb of heap
 for each 5 secs until full GC kicks in. Our problem is that we really a lot
 on SoftReference for our cache and so this Lucene behavior is really bad for
 us (Customer feedback:
 http://old.nabble.com/What's-the-memory-requirements-for-2.1.3--to27026622.html#a27026622
 ).
 With my class all objects stay in young gen and so the performance boost for
 us was huge.

 The issues with the class:

 A ThreadLocal object should normally be a singleton used has key to the
 thread map. Here it is reapeatdly created and destroy!
 Setting t = null; is not affecting the garbage collection of the
 ThreadLocal map since t is the key (hard ref) of the thread map.
 There are no call to t.remove() which will really clean the Map entry.
 A ThreadLocal Map is already a WeakReference for the value.
 Leaving objects on a ThreadLocal after it is out of your control is bad
 practice. Another task may reuse the Thread and found dirty objects there.
 We found (in all our tests) the hardRef Map

Re: Lucene 2.9.0 Near Real Time Indexing and lock timeouts

2010-01-15 Thread Sanne Grinovero
A common error I see is that people assume the IndexWriter to be not
threadsafe, and open several different instances.
You should use just one IndexWriter, keep it open and flush
periodically (not commit at each add operation), and read the Lucene
wiki pages about the IndexWriter settings like ramBufferSize. That why
there's only one lock, no contention from different threads.

There's an explanation of the fastest design I could get here:
http://in.relation.to/Bloggers/HibernateSearch32FastIndexRebuild
It's describing the procedure used by Hibernate Search for rebuilding
the Lucene index from an Hibernate mapped database.

While I recommend reading for newcomers, I'd also appreciate feedback
and comments from Lucene experts and developers :-)

Regards,
Sanne

2010/1/14 Michael McCandless luc...@mikemccandless.com:
 Calling commit after every addition will drastically slow down your
 indexing throughput, and concurrency (commit is internally
 synchronized), but should not create lock timeouts, unless you are
 also opening a new IndexWriter for every addition?

 Mike

 On Thu, Jan 14, 2010 at 12:15 PM, jchang jchangkihat...@gmail.com wrote:

 With only 10 concurrent consumers, I do get lock problems.  However, I am
 calling commit() at the end of each addition.  Could I expect better
 concurrency without timeouts if I did not commit as often?

 --
 View this message in context: 
 http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing-and-lock-timeouts-tp27136743p27164797.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: update doc by query

2010-01-11 Thread Sanne Grinovero
Then I wouldn't need it and can still improve performance by using
periodic commits, nice!
thanks for explaining this,

Sanne

On Mon, Jan 11, 2010 at 10:57 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 On Sun, Jan 10, 2010 at 6:13 PM, Sanne Grinovero
 s.grinov...@sourcesense.com wrote:
 Even if it's not strictly needed anymore, could it improve performance?

 I think there should be no real performance gains/losses one way or another.

 The current updateDocument call basically boils down to delete then add.

 Right now I need to use commit() right after this dual operation to
 make sure no reader is ever going to miss it

 You don't need to use commit() right after -- you can use commit any
 time later and both the del  add will be present.

 but if it was atomic I
 could have avoided the commit and just trust that at some time later
 it will be auto-committed: exact moment would be out of my control,
 but even so the view on index wouldn't have a chance to miss some
 documents.

 Lucene no longer auto-commits -- your app completely controls when to
 commit, so, I think the atomic-ness is unecessary?

 Mike

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





-- 
Sanne Grinovero
http://in.relation.to/Bloggers/Sanne
Sourcesense - making sense of Open  Source: http://www.sourcesense.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: update doc by query

2010-01-10 Thread Sanne Grinovero
If the demand is the problem:
I would really love that: in most scenarios a single term is not
enough to identify a Document: I need at least two so I use usually
remove-by-query first and then add again.
This sometimes needs some application level lock to make the changes consistent.

Sanne

2010/1/10 Mark Miller markrmil...@gmail.com:
 Any reason we don't offer update doc by query along with term?

 Its easy enough to implement in the same manner - is there some sort of
 gotchya with this, or is it just because there has been no demand yet?

 --
 - Mark

 http://www.lucidimagination.com




 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: update doc by query

2010-01-10 Thread Sanne Grinovero
Even if it's not strictly needed anymore, could it improve performance?
Right now I need to use commit() right after this dual operation to
make sure no reader is ever going to miss it, but if it was atomic I
could have avoided the commit and just trust that at some time later
it will be auto-committed: exact moment would be out of my control,
but even so the view on index wouldn't have a chance to miss some
documents.

Regards,
Sanne

On Sun, Jan 10, 2010 at 10:04 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 I think there no particular demand...

 But: why not just separately delete by query, then add?

 Back when IW had autoCommit=true, it was compelling to have an atomic
 update, but now with only autoCommit=false, the app has full control
 over visibility to readers, so do we even need update-by-term anymore?

 Mike

 On Sun, Jan 10, 2010 at 2:13 PM, Sanne Grinovero
 sanne.grinov...@gmail.com wrote:
 If the demand is the problem:
 I would really love that: in most scenarios a single term is not
 enough to identify a Document: I need at least two so I use usually
 remove-by-query first and then add again.
 This sometimes needs some application level lock to make the changes 
 consistent.

 Sanne

 2010/1/10 Mark Miller markrmil...@gmail.com:
 Any reason we don't offer update doc by query along with term?

 Its easy enough to implement in the same manner - is there some sort of
 gotchya with this, or is it just because there has been no demand yet?

 --
 - Mark

 http://www.lucidimagination.com




 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





-- 
Sanne Grinovero
http://in.relation.to/Bloggers/Sanne
Sourcesense - making sense of Open  Source: http://www.sourcesense.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: nightly build deploy to Maven repositories

2009-12-12 Thread Sanne Grinovero
I would be happy with 3.0.1-SNAPSHOT too, that will also fix my problem.
Will I have to wait for next release before I can share my patches?

Best Regards,
Sanne Grinovero

2009/12/3 Sanne Grinovero sanne.grinov...@gmail.com:
 Hello,
 I'm needing to depend on some recently committed bugfix from Lucene's
 2.9 branch in other OSS projects, using Maven2 for dependency
 management.

 Are there snapshots uploaded somewhere regularly? Could Hudson do that?
 Looking into Hudson it appears that it regularly builds trunk;
 wouldn't it be a good idea to have him also verify the 2.9 branch
 until it's actively updated?

 Regards,
 Sanne


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene 2.4.1 src .zip issue

2009-12-10 Thread Sanne Grinovero
Hello Erik,
I just downloaded it from:
http://archive.apache.org/dist/lucene/java/lucene-2.4.1-src.zip
Size: 5.9 MB (6134777 bytes)

I'm having no errors, using UnZip 6.00 of 20 April 2009, by Debian.
on Debian 64bit.
If you're downloading from the same source, you could try again?

Best Regards,
Sanne Grinovero

2009/12/10 Erik Hatcher erik.hatc...@gmail.com:
 I was doing some research on past releases of Lucene and downloaded the
 archived 2.4.1 src .zip and got this:

 ~/Downloads: unzip lucene-2.4.1-src.zip
 Archive:  lucene-2.4.1-src.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
 unzip:  cannot find zipfile directory in one of lucene-2.4.1-src.zip or
        lucene-2.4.1-src.zip.zip, and cannot find lucene-2.4.1-src.zip.ZIP,
 period.

 Yikes!

 Anyone else have issues with it?   Or anomalous to my download?

        Erik


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Upgrading Lucene jars

2009-12-10 Thread Sanne Grinovero
I'm not using Embedded Solr directly, I've seen several projects
depending on Lucene as a Maven artifact
and include also a dependency on some solr module as a general
utility, for example to use some solr analysers.
Let's say you had Lucene 2.4.1, when adding solr-analysers version
1.3.0 in the mix it appears to work well in testing, until the
classloading order changes in an application server and you'll find
out that maven will have added the solr-lucene-core artifact too,
which looks like fine unless you know what's in there.
The poor developer could have a hard time to find out that he is
having two artifacts with different identifiers and different jar
names containing same code at different versions, after noticing some
undefined field or method.

I've learnt the lesson so I don't speak to help myself, but I think it
would be an improvement and make life easier for others; Maven should
take care of this but it's actually giving a false feeling of
confidence in this case.

Regards,
Sanne

2009/12/9 Shalin Shekhar Mangar shalinman...@gmail.com:
 On Wed, Dec 9, 2009 at 3:33 PM, Sanne Grinovero
 sanne.grinov...@gmail.comwrote:

 Why is Solr not depending directly on Lucene but repackaging the same
 classes?


 Solr does depend on Lucene jars. We strive to package officially released
 Lucene artifacts but sometimes the release schedule of Lucene and Solr are
 different enough to build and package Lucene jars ourselves. The CHANGES.txt
 in the Solr distribution has the version of Lucene used in that
 distribution. For example, Solr 1.4 released with Lucene 2.9.1

 Solr 1.4 has already released and we are free to upgrade Lucene jars in
 trunk to any version we desire for further development.


 Sorry I've probably missed some important discussion. Whatever the
 reason for this decision, is it still a good reason?

 This gets new users in a hell of trouble sometimes, as some
 applications introduce Solr after having Lucene already on the
 classpath and it's not immediately obvious that differently named jars
 contain same named classes.


 Are you using Embedded Solr? Otherwise the Lucene jars are in the solr.war's
 WEB-INF/lib directory and there is no chance of a conflict.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: Upgrading Lucene jars

2009-12-09 Thread Sanne Grinovero
Why is Solr not depending directly on Lucene but repackaging the same classes?

Sorry I've probably missed some important discussion. Whatever the
reason for this decision, is it still a good reason?

This gets new users in a hell of trouble sometimes, as some
applications introduce Solr after having Lucene already on the
classpath and it's not immediately obvious that differently named jars
contain same named classes.
Could this be a good timeframe to change this?

Regards,
Sanne

2009/12/8 Koji Sekiguchi k...@r.email.ne.jp:
 Shalin Shekhar Mangar wrote:

 I need to upgrade contrib-spellcheck jar for SOLR-785. Should I go ahead
 and
 upgrade all Lucene jars to the latest 2.9 branch code?



 +1.

 Koji

 --
 http://www.rondhuit.com/en/




[jira] Commented: (LUCENE-2095) Document not guaranteed to be found after write and commit

2009-12-04 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785943#action_12785943
 ] 

Sanne Grinovero commented on LUCENE-2095:
-

Thanks a lot Michael, this makes my distributed testing reliable again :-)

I see you didn't apply my testcase, do you think it's not needed to have such a 
test?
If you need I could change it as you wish.

 Document not guaranteed to be found after write and commit
 --

 Key: LUCENE-2095
 URL: https://issues.apache.org/jira/browse/LUCENE-2095
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.4.1, 2.9.1
 Environment: Linux 64bit
Reporter: Sanne Grinovero
Assignee: Michael McCandless
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2095.patch, lucene-stresstest.patch


 after same email on developer list:
 I developed a stress test to assert that a new document containing a
 specific term X is always found after a commit on the IndexWriter.
 This works most of the time, but it fails under load in rare occasions.
 I'm testing with 40 Threads, both with a SerialMergeScheduler and a
 ConcurrentMergeScheduler, all sharing a common IndexWriter.
 Attached testcase is using a RAMDirectory only, but I verified a
 FSDirectory behaves in the same way so I don't believe it's the
 Directory implementation or the MergeScheduler.
 This test is slow, so I don't consider it a functional or unit test.
 It might give false positives: it doesn't always fail, sorry I
 couldn't find out how to make it more likely to happen, besides
 scheduling it to run for a longer time.
 I tested this to affect versions 2.4.1 and 2.9.1;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



nightly build deploy to Maven repositories

2009-12-03 Thread Sanne Grinovero
Hello,
I'm needing to depend on some recently committed bugfix from Lucene's
2.9 branch in other OSS projects, using Maven2 for dependency
management.

Are there snapshots uploaded somewhere regularly? Could Hudson do that?
Looking into Hudson it appears that it regularly builds trunk;
wouldn't it be a good idea to have him also verify the 2.9 branch
until it's actively updated?

Regards,
Sanne

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Solr 1.5 - The Cloud Edition!

2009-12-03 Thread Sanne Grinovero
Hello Yonik,
that's very interesting, I'm working since some time on the Infinispan
based Lucene Directory,
have you seen my announcement on dev-lucene? I didn't dare to cross-post.
Again the link:
http://www.jboss.org/community/wiki/InfinispanasaDirectoryforLucene

It's an implementation to distribute the index to a dynamic cluster,
and Infinispan enables autodiscovery even on the cloud.
I'm only focusing on the Directory and LockManager; the Directory is
working but needs yes some polishing and profiling, while
I can trust the LockFactory for having survived some good stress tests
and performing well already.
I didn't know about ZooKeeper, and in my plans the index sharding
would have been transparent to Lucene;
It shouldn't be hard to have non-transparent sharding on top of it i
you need that, and the low level distribution is totally configurable.
It's also nice it can scale down to zero-nodes, persisting the in
memory distributed state to something else (some plugins provided,
like JDBC or S3 stores).

Regards,
Sanne

2009/12/4 Yonik Seeley yo...@lucidimagination.com:
 I hereby dub Solr 1.5 The Cloud Edition!
 (of course anyone else may also dub it anything else they so choose ;-)

 There's lots of prototyles and great work floating around that aim to
 increase the practical scalability and ease of cluster management of
 Solr.  I did some brainstorming myself of how we could use zookeeper
 on the flight to ApacheCon US last month, and had a number of
 discussions with various people while there.  I'm going over those
 notes and adding some stuff to a new wiki page:

 http://wiki.apache.org/solr/SolrCloud

 Of course the main issue is at https://issues.apache.org/jira/browse/SOLR-1277
 And there is already another wiki page
 http://wiki.apache.org/solr/ZooKeeperIntegration

 I started a new page for myself because I'm not sure we're all in sync
 yet and didn't want to get into competitive editing :-)
 Anyway, I think this is going to be a big enough issue with
 potentially a ton of discussion, and we should perhaps use the mailing
 lists for general design discussions rather than forcing everything
 into a single JIRA issue (which doesn't deal well with huge threads).

 -Yonik
 http://www.lucidimagination.com



Re: Socket and file locks

2009-11-29 Thread Sanne Grinovero
Hello,

I'm glad you appreciate it; I've added the Wiki page here:
http://wiki.apache.org/lucene-java/AvailableLockFactories

I avoided on purpose to copy-paste the full javadocs of each
implementation as that would be out-of-date or too specific to some
version, I limited myself to writing some words to highlight the
differences as a quick overview of what is available.
hope you like it, I'm open to suggestions.

Regards,
Sanne


2009/11/29 Michael McCandless luc...@mikemccandless.com:
 This looks great!

 Maybe it makes most sense to create a wiki page
 (http://wiki.apache.org/lucene-java) for interesting LockFactory
 implementations/tradeoffs, and add this there?

 Mike

 On Sat, Nov 28, 2009 at 9:26 AM, Sanne Grinovero
 sanne.grinov...@gmail.com wrote:
 Hello,
 Together with the Infinispan Directory we developed such a
 LockFactory; I'd me more than happy if you wanted to add some pointers
 to it in the Lucene documention/readme.
 This depends on Infinispan for multiple-machines communication
 (JGroups, indirectly) but
 it's not required to use an Infinispan Directory, you could combine it
 with a Directory impl of choice.
 This was tested with the LockVerifyServer mentioned by Michael
 McCandless and also
 with some other tests inspired from it (in-VM for lower delay
 coordination and verify, while the LockFactory was forced to
 use real network communication).

 While this is a technology preview and performance regarding the
 Directory code is still unknown, I believe the LockFactory was the
 most tested component.

 free to download and inspect (LGPL):
 http://anonsvn.jboss.org/repos/infinispan/trunk/lucene-directory/

 Regards,
 Sanne

 2009/11/27 Michael McCandless luc...@mikemccandless.com:
 I think a LockFactory for Lucene that implemented the ideas you 
 Marvin are discussing in LUCENE-1877,  and/or the approach you
 implemented in the H2 DB, would be a useful addition to Lucene!

 For many apps, the simple LockFactory impls suffice, but for apps
 where multiple machines can become the writer, it gets hairy.  Having
 an always correct Lock impl for these apps would be great.

 Note that Lucene has some basic tools (in oal.store) for asserting
 that a LockFactory is correct (see LockVerifyServer), so it's a useful
 way to test that things are working from Lucene's standpoint.

 Mike

 On Fri, Nov 27, 2009 at 9:23 AM, Thomas Mueller
 thomas.tom.muel...@gmail.com wrote:
 Hi,

 I'm wondering if your are interested in automatically releasing the
 write lock. See also my comments on
 https://issues.apache.org/jira/browse/LUCENE-1877 - I thought it's a
 problem worth solving, because it's also in the Lucene FAQ list at
 http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_purpose_of_write.lock_file.2C_when_is_it_used.2C_and_by_which_classes.3F

 Unfortunately there seems to be no solution that 'always works', but
 delegating the task and responsibility to the application / to the
 user is problematic as well. For example, a user of the H2 database
 (that supports Lucene fulltext indexing) suggested to automatically
 remove the write.lock file whenever the file is there:
 http://code.google.com/p/h2database/issues/detail?id=141 - sounds a
 bit dangerous in my view.

 So, if you are interested to solve the problem, then maybe I can help.
 If not, then I will not bother you any longer :-)

 Regards,
 Thomas



   shouldn't active code like that live in the application layer?
  Why?
 You can all but guarantee that polling will work at the app layer

 The application layer may also run with low priority. In operating
 systems, it's usually the lower layer that have more 'rights'
 (priority), and not the higher levels (I'm not saying it should be
 like that in Java). I just think the application layer should not have
 to deal with write locks or removing write locks.

 by the time the original process realizes that it doesn't hold the lock 
 anymore, the damage could already have been done.

 Yes, I'm not sure how to best avoid that (with any design). Asking the
 application layer or the user whether the lock file can be removed is
 probably more dangerous than trying the best in Lucene.

 Standby / hibernate: the question is, if the machine process is
 currently not running, does the process still hold the lock? I think
 no, because the machine might as well turned off. How to detect
 whether the machine is turned off versus in hibernate mode? I guess
 that's a problem for all mechanisms (socket / file lock / background
 thread).

 When a hibernated process wakes up again, he thinks he owns the lock.
 Even if the process checks before each write, it is unsafe:

 if (isStillLocked()) {
  write();
 }

 The process could wake up after isStillLocked() but before write().
 One protection is: The second process (the one that breaks the lock)
 would need to work on a copy of the data instead of the original file
 (it could delete / truncate the orginal file after creating a copy).
 On Windows, renaming

Re: Socket and file locks

2009-11-28 Thread Sanne Grinovero
Hello,
Together with the Infinispan Directory we developed such a
LockFactory; I'd me more than happy if you wanted to add some pointers
to it in the Lucene documention/readme.
This depends on Infinispan for multiple-machines communication
(JGroups, indirectly) but
it's not required to use an Infinispan Directory, you could combine it
with a Directory impl of choice.
This was tested with the LockVerifyServer mentioned by Michael
McCandless and also
with some other tests inspired from it (in-VM for lower delay
coordination and verify, while the LockFactory was forced to
use real network communication).

While this is a technology preview and performance regarding the
Directory code is still unknown, I believe the LockFactory was the
most tested component.

free to download and inspect (LGPL):
http://anonsvn.jboss.org/repos/infinispan/trunk/lucene-directory/

Regards,
Sanne

2009/11/27 Michael McCandless luc...@mikemccandless.com:
 I think a LockFactory for Lucene that implemented the ideas you 
 Marvin are discussing in LUCENE-1877,  and/or the approach you
 implemented in the H2 DB, would be a useful addition to Lucene!

 For many apps, the simple LockFactory impls suffice, but for apps
 where multiple machines can become the writer, it gets hairy.  Having
 an always correct Lock impl for these apps would be great.

 Note that Lucene has some basic tools (in oal.store) for asserting
 that a LockFactory is correct (see LockVerifyServer), so it's a useful
 way to test that things are working from Lucene's standpoint.

 Mike

 On Fri, Nov 27, 2009 at 9:23 AM, Thomas Mueller
 thomas.tom.muel...@gmail.com wrote:
 Hi,

 I'm wondering if your are interested in automatically releasing the
 write lock. See also my comments on
 https://issues.apache.org/jira/browse/LUCENE-1877 - I thought it's a
 problem worth solving, because it's also in the Lucene FAQ list at
 http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_purpose_of_write.lock_file.2C_when_is_it_used.2C_and_by_which_classes.3F

 Unfortunately there seems to be no solution that 'always works', but
 delegating the task and responsibility to the application / to the
 user is problematic as well. For example, a user of the H2 database
 (that supports Lucene fulltext indexing) suggested to automatically
 remove the write.lock file whenever the file is there:
 http://code.google.com/p/h2database/issues/detail?id=141 - sounds a
 bit dangerous in my view.

 So, if you are interested to solve the problem, then maybe I can help.
 If not, then I will not bother you any longer :-)

 Regards,
 Thomas



   shouldn't active code like that live in the application layer?
  Why?
 You can all but guarantee that polling will work at the app layer

 The application layer may also run with low priority. In operating
 systems, it's usually the lower layer that have more 'rights'
 (priority), and not the higher levels (I'm not saying it should be
 like that in Java). I just think the application layer should not have
 to deal with write locks or removing write locks.

 by the time the original process realizes that it doesn't hold the lock 
 anymore, the damage could already have been done.

 Yes, I'm not sure how to best avoid that (with any design). Asking the
 application layer or the user whether the lock file can be removed is
 probably more dangerous than trying the best in Lucene.

 Standby / hibernate: the question is, if the machine process is
 currently not running, does the process still hold the lock? I think
 no, because the machine might as well turned off. How to detect
 whether the machine is turned off versus in hibernate mode? I guess
 that's a problem for all mechanisms (socket / file lock / background
 thread).

 When a hibernated process wakes up again, he thinks he owns the lock.
 Even if the process checks before each write, it is unsafe:

 if (isStillLocked()) {
  write();
 }

 The process could wake up after isStillLocked() but before write().
 One protection is: The second process (the one that breaks the lock)
 would need to work on a copy of the data instead of the original file
 (it could delete / truncate the orginal file after creating a copy).
 On Windows, renaming the file might work (not sure); on Linux you
 probably need to copy the content to a new file. Like that, the awoken
 process can only destroy inactive data.

 The question is: do we need to solve this problem? How big is the
 risk? Instead of solving this problem completely, you could detect it
 after the fact without much overhead, and throw an exception saying:
 data may be corrupt now.

 PID: With the PID, you could check if the process still runs. Or it
 could be another process with the same PID (is that possible?), or the
 same PID but a different machine (when using a network share). It's
 probably more safe if you can communicate with the lock owner (using
 TCP/IP or over the file system by deleting/creating a file).

 Unique id: The easiest solution is to use a UUID

StressTest: Document not guaranteed to be found after write and commit

2009-11-25 Thread Sanne Grinovero
Hello,
I developed a stress test to assert that a new document containing a
specific term X is always found after a commit on the IndexWriter.
This works most of the time, but it fails under load in rare occasions.

I'm testing with 40 Threads, both with a SerialMergeScheduler and a
ConcurrentMergeScheduler, all sharing a common IndexWriter.
Attached testcase is using a RAMDirectory only, but I verified a
FSDirectory behaves in the same way so I don't believe it's the
Directory implementation or the MergeScheduler.
This test is slow, so I don't consider it a functional or unit test.
It might give false positives: it doesn't always fail, sorry I
couldn't find out how to make it more likely to happen, besides
scheduling it to run for a longer time.

Could someone please try it, and suggest if my test is wrong or if I
should open a new issue?
The patch applies to 2.9.1, I've experienced same behavior on 2.4.1.

Best regards,
Sanne Grinovero

P.S. congratulations with the release of 3.0.0 :-)


lucene-stresstest.patch
Description: Binary data

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2095) Document not guaranteed to be found after write and commit

2009-11-25 Thread Sanne Grinovero (JIRA)
Document not guaranteed to be found after write and commit
--

 Key: LUCENE-2095
 URL: https://issues.apache.org/jira/browse/LUCENE-2095
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.4.1
 Environment: Linux 64bit
Reporter: Sanne Grinovero


after same email on developer list:
I developed a stress test to assert that a new document containing a
specific term X is always found after a commit on the IndexWriter.
This works most of the time, but it fails under load in rare occasions.

I'm testing with 40 Threads, both with a SerialMergeScheduler and a
ConcurrentMergeScheduler, all sharing a common IndexWriter.
Attached testcase is using a RAMDirectory only, but I verified a
FSDirectory behaves in the same way so I don't believe it's the
Directory implementation or the MergeScheduler.
This test is slow, so I don't consider it a functional or unit test.
It might give false positives: it doesn't always fail, sorry I
couldn't find out how to make it more likely to happen, besides
scheduling it to run for a longer time.

I tested this to affect versions 2.4.1 and 2.9.1;


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2095) Document not guaranteed to be found after write and commit

2009-11-25 Thread Sanne Grinovero (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanne Grinovero updated LUCENE-2095:


Attachment: lucene-stresstest.patch

attaching the testcase, apply to version 2.9.1.
It's slow, please be patient.

 Document not guaranteed to be found after write and commit
 --

 Key: LUCENE-2095
 URL: https://issues.apache.org/jira/browse/LUCENE-2095
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.4.1, 2.9.1
 Environment: Linux 64bit
Reporter: Sanne Grinovero
 Attachments: lucene-stresstest.patch


 after same email on developer list:
 I developed a stress test to assert that a new document containing a
 specific term X is always found after a commit on the IndexWriter.
 This works most of the time, but it fails under load in rare occasions.
 I'm testing with 40 Threads, both with a SerialMergeScheduler and a
 ConcurrentMergeScheduler, all sharing a common IndexWriter.
 Attached testcase is using a RAMDirectory only, but I verified a
 FSDirectory behaves in the same way so I don't believe it's the
 Directory implementation or the MergeScheduler.
 This test is slow, so I don't consider it a functional or unit test.
 It might give false positives: it doesn't always fail, sorry I
 couldn't find out how to make it more likely to happen, besides
 scheduling it to run for a longer time.
 I tested this to affect versions 2.4.1 and 2.9.1;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: StressTest: Document not guaranteed to be found after write and commit

2009-11-25 Thread Sanne Grinovero
thanks a lot for looking into it.
It's opened: https://issues.apache.org/jira/browse/LUCENE-2095

Besides this being expected behavior after a commit(), I'm needing
this to be able to assert state consistency on the distributed
Directory under load: any suggestions for a temporary workaround?

I am thinking about a statistical assert, like considering it's fine
if (ratio error  some threshold), but that's my last resort.

Regards,
Sanne


2009/11/25 Michael McCandless luc...@mikemccandless.com:
 Indeed I see this test failing too!  On first look the test seems correct.

 Can you open an issue  attach this as a patch?  Thanks.

 Mike

 On Wed, Nov 25, 2009 at 12:30 PM, Sanne Grinovero
 sanne.grinov...@gmail.com wrote:
 Hello,
 I developed a stress test to assert that a new document containing a
 specific term X is always found after a commit on the IndexWriter.
 This works most of the time, but it fails under load in rare occasions.

 I'm testing with 40 Threads, both with a SerialMergeScheduler and a
 ConcurrentMergeScheduler, all sharing a common IndexWriter.
 Attached testcase is using a RAMDirectory only, but I verified a
 FSDirectory behaves in the same way so I don't believe it's the
 Directory implementation or the MergeScheduler.
 This test is slow, so I don't consider it a functional or unit test.
 It might give false positives: it doesn't always fail, sorry I
 couldn't find out how to make it more likely to happen, besides
 scheduling it to run for a longer time.

 Could someone please try it, and suggest if my test is wrong or if I
 should open a new issue?
 The patch applies to 2.9.1, I've experienced same behavior on 2.4.1.

 Best regards,
 Sanne Grinovero

 P.S. congratulations with the release of 3.0.0 :-)


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: A new Lucene Directory available

2009-11-15 Thread Sanne Grinovero
Hi Lukas,
Our reference during early design was Lucene 2.4.1, but we look
forward for compatibility and new tricks.
Current trunk is compatible towards Lucene's trunk, but I won't close
ISPN-275 until it's confirmed against a released Lucene 3.0.0 :
hopefully this will come before Infinispan 4 release.

Regards,
Sanne

On Sun, Nov 15, 2009 at 8:50 AM, Lukáš Vlček lukas.vl...@gmail.com wrote:
 Hi,

 this sounds very interesting. Do you know which versions of Lucene are
 supported?
 Do you know if it would work with upcoming Lucene 3.0.x?
 https://jira.jboss.org/jira/browse/ISPN-275

 Regards,
 Lukas

 http://blog.lukas-vlcek.com/


 On Sun, Nov 15, 2009 at 5:33 AM, Sanne Grinovero
 s.grinov...@sourcesense.com wrote:

 Hi John,
 I didn't run a long running reliable benchmark, so at the moment I
 can't really speak of numbers.
 Suggestions and help on performance testing are welcome: I guess it
 will shine in some situations, not necessarily all, so really choosing
 a correct ratio of concurrent writers/searches, number of nodes in the
 cluster and resources per node will never be fair enough to compare
 this Directory with others.

 On paper the premises are good: it's all in-memory, until it fits: it
 will distribute data across nodes and overflow to disk is supported
 (called passivation). A permanent store can be configured, so you
 could set it to periodically flush incrementally to slower storages
 like a database, a filesystem, a cloud storage service. This makes it
 possible to avoid losing state even when all nodes are shut down.
 A RAMDirectory is AFAIK not recommended as you could hit memory limits
 and because it's basically a synchronized HashMap; Infinispan
 implements ConcurrentHashMap and doesn't need synchronization.
 Even if the data is replicated across nodes each node has it's own
 local cache, so when caches are warm and all segments fit in memory it
 should be, theoretically, the fastest Directory ever. The more it will
 read from disk, the more it will behave similarly to a FSDirectory
 with some buffers.

 As per Lucene's design, writes can happen only at one node at a time:
 one IndexWriter can own the lock, but IndexReaders and Searchers are
 not blocked, so when using this Directory it should behave exactly as
 if you had multiple processes sharing a local NIOFSdirectory:
 basically the situation is that you can't scale on writers, but you
 can scale near-linearly with readers adding in more power from more
 machines.

 Besides performance, the reasons to implement this was to be able to
 easily add or remove processing power to a service (clouds), make it
 easier to share indexes across nodes, and last but not least to remove
 single points of failure: all data is distributed and there is no such
 notion of Master: services will continue running fine when killing any
 node.

 I hope this peeks your interest, sorry if I couldn't provide numbers.

 Regards,
 Sanne

 On Sat, Nov 14, 2009 at 11:15 PM, John Wang john.w...@gmail.com wrote:
  HI Sanne:
 
      Very interesting!
 
      What kinda performance should we expect with this, comparing to
  regular
  FSDIrectory on local HD.
  Thanks
  -John
 
  On Sat, Nov 14, 2009 at 11:44 AM, Sanne Grinovero
  s.grinov...@sourcesense.com wrote:
 
  Hello all,
  I'm a Lucene user and fan, I wanted to tell you that we just released
  a first technology preview of a distributed in memory Directory for
  Lucene.
 
  The release announcement:
 
 
  http://infinispan.blogspot.com/2009/11/second-release-candidate-for-400.html
 
  From there you'll find links to the Wiki, to the sources, to the issue
  tracker. A minimal demo is included with the sources.
 
  This was developed together with Google Summer of Code student Lukasz
  Moren and much support from the Infinispan and Hibernate Search teams,
  as we are storing the index segments on Infinispan and using it's
  atomic distributed locks to implement a Lucene LockFactory.
 
  Initial idea was to contribute it directly to Lucene, but as
  Infinispan is a LGPL dependency we had to distribute it with
  Infinispan (as the other way around would have introduced some legal
  issues); still we hope you appreciate the effort and are interested in
  giving it a try.
  All kind of feedback is welcome, especially on benchmarking
  methodologies as I yet have to do some serious performance tests.
 
  Main code, build with Maven2:
  svn co
 
  http://anonsvn.jboss.org/repos/infinispan/tags/4.0.0.CR2/lucene-directory/
  infinispan-directory
 
  Demo, see the Readme:
  svn co
 
  http://anonsvn.jboss.org/repos/infinispan/tags/4.0.0.CR2/demos/lucene-directory/
  lucene-demo
 
  Best Regards,
  Sanne
 
  --
  Sanne Grinovero
  Sourcesense - making sense of Open  Source: http://www.sourcesense.com
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: A new Lucene Directory available

2009-11-15 Thread Sanne Grinovero
Hi Earwin,
thanks for the insight, as I mentioned I have no proper benchmarks to
back my statements but I can see how it behaves, so absolutely I could
be too optimistic.
They are currently profiling Infinispan and speeding up some
internals, so I'll wait for these tasks to finish to begin testing on
our part; while waiting I collect suggestions about how you think I
should test it properly? Which kind of comparisons would you like to
see?

I'm currently working on JIRA clustering (called Scarlet), so the
typical index usage pattern of that application is going to be my
favorite scenario.

I know about the Terracotta efforts, I agree with you and have
collected much feedback about which problems were arising directly
talking with the people maintaining such systems. I even got to hear
some success cases, but yes they are scarce and there are some
problems; be assured that we have analyzed them carefully before
deciding for this design. I'm not a Terracotta expert myself, but was
helped on this by specialists. My personal opinion resulting from
these talks is that Terracotta works, but is too tricky to setup and
not viable in case the indexes change frequently.

About the RAMDirectory comparison, as you said yourself the bytes
aren't read constantly but just at index reopen so I wouldn't be too
worried about the bunch of methods as they're executed once per
segment loading; I'll improve that if possible, thanks for looking!
I'm sure many parts can be improved, patches are welcome.

Instances of ChunkCacheKey are not created for each single byte read
but for each byte[] buffer, being the size of these buffers
configurable. This was decided after observations that it was
improving performance to chunk segments in smaller pieces rather
than have huge arrays of bytes, but if you like you can configure it
to degenerate to approach the one key per segment ratio.
Comparing to a RAMDirectory is unfair, as with InfinispanDirectory I
can scale :-) Still I take the point, I'll have some tests also in
single node mode to compare them, for fun as the use cases are a bit
different but I'm confident I could surprise you when I have to choice
of the scenario.

About JGroups I'm not technically prepared for a match, but I've heard
of different stories of much bigger than 20 nodes business critical
clusters working very well. Sure, it won't scale without a proper
configuration at all levels: os, jgroups and infrastructure.

Thank you very much for you considerations, it's very appreciated.
Regards,
Sanne

On Sun, Nov 15, 2009 at 12:39 PM, Earwin Burrfoot ear...@gmail.com wrote:
 Terracotta guys easy-clustered Lucene a few years ago. I'm yet to
 see at least one person saying it worked for him allright.

 This new directory ain't gonna be faster than RAMDirectory, as syncs
 on a map doesn't matter, they are taken once per opened file - once
 per reopen, which is not happening thousands of times a sec.
 Taking a glance at the code (svn trunk), it actually is much slower. I
 mean, compare IndexInput.readByte()s. A whole slew of code and method
 calls plus a ChunkCacheKey created per each byte read (violent GC
 rape, ring the police!) VS if, incr, array access for RAMDir.

 I wouldn't be too optimistic in doesn't-fit-in-memory case VS
 FSDirectory either. OS' paging/file caching skills are hard to match,
 plus OS file cache resides outside of Java heap, which (as reallife
 experience dictates) is immensely good for your GC pauses.

 Now to the networking part. Infinispan is based on JGroups. Last time
 I saw it, it exploded under a moderate load on 20 nodes. I believe the
 library is still good, properly configured and for lesser loads, but
 not for distributing Lucene index that is frequently updated and
 merged on each node of the cluster.

 Please excuse me if I'm overboard in places, and correct me if I am wrong.

 On Sun, Nov 15, 2009 at 07:33, Sanne Grinovero
 s.grinov...@sourcesense.com wrote:
 Hi John,
 I didn't run a long running reliable benchmark, so at the moment I
 can't really speak of numbers.
 Suggestions and help on performance testing are welcome: I guess it
 will shine in some situations, not necessarily all, so really choosing
 a correct ratio of concurrent writers/searches, number of nodes in the
 cluster and resources per node will never be fair enough to compare
 this Directory with others.

 On paper the premises are good: it's all in-memory, until it fits: it
 will distribute data across nodes and overflow to disk is supported
 (called passivation). A permanent store can be configured, so you
 could set it to periodically flush incrementally to slower storages
 like a database, a filesystem, a cloud storage service. This makes it
 possible to avoid losing state even when all nodes are shut down.
 A RAMDirectory is AFAIK not recommended as you could hit memory limits
 and because it's basically a synchronized HashMap; Infinispan
 implements ConcurrentHashMap and doesn't need synchronization.
 Even

Re: A new Lucene Directory available

2009-11-15 Thread Sanne Grinovero
Hi again Earwin,
thanks you very much for spotting the byte reading issue, it's
definitely not as I wanted it.
https://jira.jboss.org/jira/browse/ISPN-276

I never tried to defend an improved updates/s ratio, just maybe
compared to scheduled rsyncs :-)
Our goal is to scale on queries/sec while usage semantics stays
unchanged, so you can open an IndexWriter as it was local to make
updates clusterwide. Very useful to cluster the many products already
using Lucene which are currently implementing exotic index management
workarounds or shared filesystems, as they weren't designed for it
from the beginning as SolR did.
I mentioned JIRA, you noticed how slow it can get on larger
deployments? because there's no way to deploy it clustered currently
(besides by using Terracotta), as it relies much on Lucene and index
changes need to be applied in real time.

About locking and jgroups.. please switch over to
infinispan-...@lists.jboss.org so you can get better answers and I
don't have to spam the Lucene developers.

Regards,
Sanne



On Sun, Nov 15, 2009 at 3:43 PM, Earwin Burrfoot ear...@gmail.com wrote:
 About the RAMDirectory comparison, as you said yourself the bytes
 aren't read constantly but just at index reopen so I wouldn't be too
 worried about the bunch of methods as they're executed once per
 segment loading;
 The bytes /are/ read constantly (readByte() method). I believe that is
 the most innermost loop you can hope to find in Lucene.

 A RAMDirectory is AFAIK not recommended as you could hit memory limits and 
 because it's basically a synchronized HashMap;
 On the other hand, just as I mentioned - the only access to said
 synchronized HashMap is done when you
 open InputStream on a file. That, unlike readByte(), happens rarely,
 as InputStreams are cloned after creation as needed.
 As for memory limits, your unbounded local cache hits them with same ease.

 Instances of ChunkCacheKey are not created for each single byte read
 but for each byte[] buffer, being the size of these buffers configurable.
 No, they are! :-)
 InfinispanIndexIO.java, rev. 1103:
 120           public byte readByte() throws IOException {
 .
 132              buffer = getChunkFromPosition(cache, fileKey,
 filePosition, bufferSize);
 .
 141           }
 getChunkFromPosition() is called each time readByte() is invoked. It
 creates 1-2 instances of ChunkCacheKey.

 This was decided after observations that it was
 improving performance to chunk segments in smaller pieces rather
 than have huge arrays of bytes, but if you like you can configure it
 to degenerate to approach the one key per segment ratio.
 Locally, it's better not to chunk segments (unless you hit 2Gb
 barrier). When shuffling them over network - I can't say.

 Comparing to a RAMDirectory is unfair, as with InfinispanDirectory I can 
 scale :-)
 I'm just following two of your initial comparisons. And the only
 characteristic that can be scaled with such
 approach is queries/s. Index size - definetly not, updates/s - questionable.

 About JGroups I'm not technically prepared for a match, but I've heard
 of different stories of much bigger than 20 nodes business critical
 clusters working very well. Sure, it won't scale without a proper
 configuration at all levels: os, jgroups and infrastructure.
 The volume of messages travelling around, length of GC delays VS
 cluster size and messaging mode matter.
 They used reliable synchronous multicasts, so - once one node starts
 collecting, all others wait (or worse - send retries).
 Another one starts collecting, then another, partially delivered
 messages hold threads - caboom!
 How is locking handled here? With central broker it probably can work.

 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
 ICQ: 104465785

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





-- 
Sanne Grinovero
Sourcesense - making sense of Open  Source: http://www.sourcesense.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: A new Lucene Directory available

2009-11-14 Thread Sanne Grinovero
Hi John,
I didn't run a long running reliable benchmark, so at the moment I
can't really speak of numbers.
Suggestions and help on performance testing are welcome: I guess it
will shine in some situations, not necessarily all, so really choosing
a correct ratio of concurrent writers/searches, number of nodes in the
cluster and resources per node will never be fair enough to compare
this Directory with others.

On paper the premises are good: it's all in-memory, until it fits: it
will distribute data across nodes and overflow to disk is supported
(called passivation). A permanent store can be configured, so you
could set it to periodically flush incrementally to slower storages
like a database, a filesystem, a cloud storage service. This makes it
possible to avoid losing state even when all nodes are shut down.
A RAMDirectory is AFAIK not recommended as you could hit memory limits
and because it's basically a synchronized HashMap; Infinispan
implements ConcurrentHashMap and doesn't need synchronization.
Even if the data is replicated across nodes each node has it's own
local cache, so when caches are warm and all segments fit in memory it
should be, theoretically, the fastest Directory ever. The more it will
read from disk, the more it will behave similarly to a FSDirectory
with some buffers.

As per Lucene's design, writes can happen only at one node at a time:
one IndexWriter can own the lock, but IndexReaders and Searchers are
not blocked, so when using this Directory it should behave exactly as
if you had multiple processes sharing a local NIOFSdirectory:
basically the situation is that you can't scale on writers, but you
can scale near-linearly with readers adding in more power from more
machines.

Besides performance, the reasons to implement this was to be able to
easily add or remove processing power to a service (clouds), make it
easier to share indexes across nodes, and last but not least to remove
single points of failure: all data is distributed and there is no such
notion of Master: services will continue running fine when killing any
node.

I hope this peeks your interest, sorry if I couldn't provide numbers.

Regards,
Sanne

On Sat, Nov 14, 2009 at 11:15 PM, John Wang john.w...@gmail.com wrote:
 HI Sanne:

     Very interesting!

     What kinda performance should we expect with this, comparing to regular
 FSDIrectory on local HD.
 Thanks
 -John

 On Sat, Nov 14, 2009 at 11:44 AM, Sanne Grinovero
 s.grinov...@sourcesense.com wrote:

 Hello all,
 I'm a Lucene user and fan, I wanted to tell you that we just released
 a first technology preview of a distributed in memory Directory for
 Lucene.

 The release announcement:

 http://infinispan.blogspot.com/2009/11/second-release-candidate-for-400.html

 From there you'll find links to the Wiki, to the sources, to the issue
 tracker. A minimal demo is included with the sources.

 This was developed together with Google Summer of Code student Lukasz
 Moren and much support from the Infinispan and Hibernate Search teams,
 as we are storing the index segments on Infinispan and using it's
 atomic distributed locks to implement a Lucene LockFactory.

 Initial idea was to contribute it directly to Lucene, but as
 Infinispan is a LGPL dependency we had to distribute it with
 Infinispan (as the other way around would have introduced some legal
 issues); still we hope you appreciate the effort and are interested in
 giving it a try.
 All kind of feedback is welcome, especially on benchmarking
 methodologies as I yet have to do some serious performance tests.

 Main code, build with Maven2:
 svn co
 http://anonsvn.jboss.org/repos/infinispan/tags/4.0.0.CR2/lucene-directory/
 infinispan-directory

 Demo, see the Readme:
 svn co
 http://anonsvn.jboss.org/repos/infinispan/tags/4.0.0.CR2/demos/lucene-directory/
 lucene-demo

 Best Regards,
 Sanne

 --
 Sanne Grinovero
 Sourcesense - making sense of Open  Source: http://www.sourcesense.com

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org






-- 
Sanne Grinovero
Sourcesense - making sense of Open  Source: http://www.sourcesense.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lock API, throwing IOException

2009-11-01 Thread Sanne Grinovero
Thanks a lot! This makes error management much simpler.

Sanne

2009/11/1 Michael McCandless luc...@mikemccandless.com:
 OK, this makes sense.  I'll add it.

 Mike

 On Sat, Oct 31, 2009 at 9:43 AM, Sanne Grinovero
 sanne.grinov...@gmail.com wrote:
 Hello,
 I'm implementing a distributed directory based on Infinispan
 (www.jboss.org/infinispan)

 currently implementing the org.apache.lucene.store.Lock,
 I was wondering why is

 /** Returns true if the resource is currently locked.  Note that one must
   * still call {...@link #obtain()} before using the resource. */
 public abstract boolean isLocked();

 not throwing an IOException as other methods do?

 Could you please add it? It looks like it should be trivial, as all
 clients of this API are already declaring to throw the same Exception.

 Regards,
 Sanne Grinovero

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Lock API, throwing IOException

2009-10-31 Thread Sanne Grinovero
Hello,
I'm implementing a distributed directory based on Infinispan
(www.jboss.org/infinispan)

currently implementing the org.apache.lucene.store.Lock,
I was wondering why is

/** Returns true if the resource is currently locked.  Note that one must
   * still call {...@link #obtain()} before using the resource. */
public abstract boolean isLocked();

not throwing an IOException as other methods do?

Could you please add it? It looks like it should be trivial, as all
clients of this API are already declaring to throw the same Exception.

Regards,
Sanne Grinovero

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1329) Remove synchronization in SegmentReader.isDeleted

2008-07-08 Thread Sanne Grinovero (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12611572#action_12611572
 ] 

Sanne Grinovero commented on LUCENE-1329:
-

Adding a readonly IndexReader would be really great, I'm contributing some code 
to Hibernate Search (integration of Lucene and Hibernate) and that
project could really benefit from that.

 Remove synchronization in SegmentReader.isDeleted
 -

 Key: LUCENE-1329
 URL: https://issues.apache.org/jira/browse/LUCENE-1329
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3.1
Reporter: Jason Rutherglen
Priority: Trivial
 Attachments: lucene-1329.patch


 Removes SegmentReader.isDeleted synchronization by using a volatile 
 deletedDocs variable on Java 1.5 platforms.  On Java 1.4 platforms 
 synchronization is limited to obtaining the deletedDocs reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]