from:"\"Simon Willnauer\""

Re: RFC: N-2 compatibility for file formats

2021-01-14 Thread Simon Willnauer

thanks for all the feedback, I opened
https://issues.apache.org/jira/browse/LUCENE-9669 to address this
further.

On Wed, Jan 13, 2021 at 2:58 PM Adrien Grand  wrote:
>
> +1 this strikes to me as a good balance between increasing backward 
> compatibility guarantees and still keeping room for innovation.
>
> David, actually I would like to advocate in favor of still disallowing 
> opening N-2 indices by default, as they might not match Lucene's current 
> expectations (e.g. using a different encoding for norms due to LUCENE-7730), 
> and using Lucene's current analyzers/similarities/queries might trigger 
> surprising behavior. My preference would be to expose the ability to open N-2 
> indices behind an expert API/flag that documents limitations with N-2 indices.
>
> Mike, I wondered about this question too. As you pointed out, I think that we 
> will generally be ok given that the N-2 compatibility layer will very likely 
> be the same as the N-1 compatibility layer that we need to develop anyway. I 
> tried to think of examples when that wouldn't work but couldn't find any 
> (which doesn't mean that there is none, but hopefully it would be rare).
>
>
>
> On Mon, Jan 11, 2021 at 4:57 PM Michael McCandless 
>  wrote:
>>
>> +1, I like the idea in general.
>>
>> We will have to work out the details in practice as we come across "index 
>> breaking" changes, and where/how to draw the line of "best effort".  But I 
>> think this is an improvement for our users over the hard check we now have 
>> for "only N-1", and likely not so much development effort?
>>
>> I think where it might get interesting is if we want to make a Codec API 
>> change, maybe to optimize a interesting use-cases, and then we must do some 
>> development to fix N-2 BWC codec (as well as N-1 BWC codec that we already 
>> must fix for such an example, today).
>>
>> Some users seem to keep their indices alive for a very long time!
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Sat, Jan 9, 2021 at 6:13 AM Simon Willnauer  
>> wrote:
>>>
>>> I can provide some examples of BWC issues and what we would do if it
>>> happened in the future:
>>>
>>> - negative offsets: in this case it would be best effort to add a
>>> wrapper around the older formats to check if the offsets go backwards
>>> on the read side and throw an exception to prevent consumers making
>>> the assumption that offsets go forward only from failing or going OOM
>>> etc.
>>> - norms encoding: in this case it would be best effort in the older
>>> norms formats to convert to the newer encodings.
>>> - the removal of numeric fields queries would not fall under the
>>> promises we make with compatibility of N-2 and it would be the
>>> responsibility of the user to keep the code around that understands
>>> the value of a field.
>>>
>>> I hope this clarifies some of the aspects?
>>>
>>> we would only do all this for the reading end, for writing we would
>>> reject indices that are older than N-1
>>>
>>> simon
>>>
>>>
>>> On Thu, Jan 7, 2021 at 8:04 PM jim ferenczi  wrote:
>>> >
>>> > The proposal is only about keeping the ability to read file-format up to 
>>> > N-2. Everything that is done on top of the file format is not guaranteed 
>>> > and should be supported on a best-effort basis.
>>> > That's an important aspect if we don't want to block innovation. So in 
>>> > practice that means that queries that require some specific file format 
>>> > or analyzers that change behaviors in major versions would not be part of 
>>> > the extended guarantee.
>>> >
>>> >
>>> > Le mer. 6 janv. 2021 à 21:53, Yonik Seeley  a écrit :
>>> >>
>>> >> On Wed, Jan 6, 2021 at 4:40 AM Simon Willnauer 
>>> >>  wrote:
>>> >>>
>>> >>>  You can open a reader on an index created by
>>> >>> version N-2, but you cannot open an IndexWriter on it
>>> >>
>>> >>
>>> >> +1
>>> >> There should definitely be more consideration given to back compat in 
>>> >> general... it's caused a ton of pain to users over time.
>>> >>
>>> >> -Yonik
>>> >>
>>> >>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>
>
> --
> Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Add maxFields Option to IndexWriter

2021-01-14 Thread Simon Willnauer

I personally have pretty positive experience with what I call softlimits. At 
elastic we use them all over the place to catch issues when a user likely 
misconfigures something or if there is likely a issue on the users end. 
I think having an option on the IW that allows to limit the fieldnumbers. We 
can even extract a general limits object with total num docs etc. if we want. 
We can still set stuff to unlimited by default.

WDYT

Sent from a mobile device

> On 14. Jan 2021, at 06:36, David Smiley  wrote:
> 
> 
> I don't like the idea of IndexWriter limiting field names, but I do like the 
> idea of un-deprecating that method, which appeared to have a trivial 
> implementation.  Try commenting on the issue of it's deprecations, which has 
> various watchers to get their attention.
> 
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
> 
> 
>> On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia 
>>  wrote:
>> Hi All,
>> 
>> I work on Lucene at MongoDB.
>> 
>> I would like to limit the amount of fields in an index to prevent tenants 
>> from causing a mapping explosion. 
>> 
>> Since IndexWriter.getFieldNames has been deprecated, there is no way to do 
>> this without using a reader (which comes with a set of problems regarding 
>> flush/commit rates).
>> 
>> Would love to add to Lucene the ability to have IndexWriters limiting the 
>> number of fields. Curious to hear your thoughts.
>> 
>> Thanks,
>> Oren
>>

Re: RFC: N-2 compatibility for file formats

2021-01-09 Thread Simon Willnauer

I can provide some examples of BWC issues and what we would do if it
happened in the future:

- negative offsets: in this case it would be best effort to add a
wrapper around the older formats to check if the offsets go backwards
on the read side and throw an exception to prevent consumers making
the assumption that offsets go forward only from failing or going OOM
etc.
- norms encoding: in this case it would be best effort in the older
norms formats to convert to the newer encodings.
- the removal of numeric fields queries would not fall under the
promises we make with compatibility of N-2 and it would be the
responsibility of the user to keep the code around that understands
the value of a field.

I hope this clarifies some of the aspects?

we would only do all this for the reading end, for writing we would
reject indices that are older than N-1

simon


On Thu, Jan 7, 2021 at 8:04 PM jim ferenczi  wrote:
>
> The proposal is only about keeping the ability to read file-format up to N-2. 
> Everything that is done on top of the file format is not guaranteed and 
> should be supported on a best-effort basis.
> That's an important aspect if we don't want to block innovation. So in 
> practice that means that queries that require some specific file format or 
> analyzers that change behaviors in major versions would not be part of the 
> extended guarantee.
>
>
> Le mer. 6 janv. 2021 à 21:53, Yonik Seeley  a écrit :
>>
>> On Wed, Jan 6, 2021 at 4:40 AM Simon Willnauer  
>> wrote:
>>>
>>>  You can open a reader on an index created by
>>> version N-2, but you cannot open an IndexWriter on it
>>
>>
>> +1
>> There should definitely be more consideration given to back compat in 
>> general... it's caused a ton of pain to users over time.
>>
>> -Yonik
>>
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: additional term meta data

2021-01-08 Thread Simon Willnauer

John, can you explain what the usecase for such a new API is? I don't
see a user of the API in your code. Is there a query you can optimize
with this or what is the reasoning behind this change? I personally
think it's quite invasive to add this information and there must be a
good reason to add this to the TermsEnum? I also don't think we should
have an option on the field for this if we add it but if we don't do
that it's quite a heavy change so I am on the fence if we should even
consider this?
I wonder if you can use the TermsEnum#getAttributeSource() API instead
and add this as a dedicated attribute which is present if the info is
stored. That way you can build your own PostingsFormat that does store
this information?

simon

On Wed, Jan 6, 2021 at 8:06 PM John Wang  wrote:
>
> Thank you, Martin!
>
> You can apply the patch to the 8.7 build by just ignoring the changes to 
> Lucene90xxx. Appreciate the help and guidance!
>
> -John
>
>
> On Wed, Jan 6, 2021 at 10:36 AM Martin Gainty  wrote:
>>
>> appears you are targeting 9.0 for your code
>> lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90FieldInfosFormat.java
>> (Lucene90FIeldInfosFormat.java is not contained in either 8.4 or 8.7 distros)
>>
>> 
>> someone had the bright idea to nuke ant 8.x build.xml without consulting 
>> anyone
>> not a fan of ant but the execution model of gradle is woefully inflexible in 
>> comparison to maven
>> 
>>
>> i will try with 90 distro to get the 
>> codecs/lucene90/Lucene90FieldInfosFormat and recompile and hopefully your 
>> TestLucene84PostingsFormat will run w/o fail or error
>>
>> Thx
>> martin-
>>
>> 
>> From: John Wang 
>> Sent: Wednesday, January 6, 2021 10:15 AM
>> To: dev@lucene.apache.org 
>> Subject: Re: additional term meta data
>>
>> Hey Martin:
>>
>> There is a test case in the PR we created on our own fork: 
>> https://github.com/dashbase/lucene-solr/pull/1, which also contains some 
>> example code on how to access in the PR description.
>>
>> Here is the link to the beginning of the tests: 
>> https://github.com/dashbase/lucene-solr/blob/posting-last-docid/lucene/core/src/test/org/apache/lucene/codecs/lucene84/TestLucene84PostingsFormat.java#L142
>>
>> I am not sure which version this should be applied to, currently, it was 
>> based on master as of a few days ago. We intend to patch 8.7 for our own 
>> environment.
>>
>> Any advice or feedback is much appreciated.
>>
>> Thank you!
>>
>> -John
>>
>> On Wed, Jan 6, 2021 at 3:28 AM Martin Gainty  wrote:
>>
>> how to access first and last?
>> which version will you be merging
>>
>> 
>> From: John Wang 
>> Sent: Tuesday, January 5, 2021 8:19 PM
>> To: dev@lucene.apache.org 
>> Subject: additional term meta data
>>
>> Hi folks:
>>
>> We like to propose a feature to add additional per-term metadata to the term 
>> diction.
>>
>> Currently, the TermsEnum API returns docFreq as its only meta-data. We 
>> needed a way to quickly get the first and last doc id in the postings 
>> without having to scan through the entire postings list.
>>
>> We have created a PR on our own fork and we would like to contribute this 
>> back to the community. Please let us know if this is something that's useful 
>> and/or fits Lucene's roadmap, we would be happy to submit a patch.
>>
>> https://github.com/dashbase/lucene-solr/pull/1
>>
>> Thank you
>>
>> -John

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RFC: N-2 compatibility for file formats

2021-01-06 Thread Simon Willnauer

Hello all,

Currently Lucene supports reading and writing indices that have been
created with the current or previous (N-1) version of Lucene. Lucene
refuses to open an index created by N-2 or earlier versions.
I would like to propose that Lucene adds support for opening indices
created by version N-2 in read-only mode. Here's what I have in mind:

- Read-only support. You can open a reader on an index created by
version N-2, but you cannot open an IndexWriter on it, meaning that
you can neither delete, update, add documents or force-merge N-2
indices.

- File-format compatibility only. File-format compatibility enables
reading the content of old indices, but not more. Everything that is
done on top of file formats like analysis or the encoding of length
normalization factors is not guaranteed and only supported on a
best-effort basis.

The reason I came up with these limitations is because I wanted to
make the scope minimal in order to retain Lucene's ability to move
forward. If there is consensus to move forward with this, I would like
to target Lucene 9.0 with this change.

Simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Deterministic index construction

2021-01-05 Thread Simon Willnauer

you can do something similar to this today by exploiting the
add/updateDocuments(Iterable doc) API. All docs in
this iterable will be sent to the same segment in order. If you have
multiple threads you can feed a defined number of docs per iterable
(stream them to be memory efficient) and then let them go at the same
time. this way you have thread affinity (we had this in the early days
of DWPT, I'd be reluctant to make it configurable again). then with a
custom merge policy you should be able to get the exact same number of
segments without remerging etc. some sync overhead on top but it's
doable I think.

simon

On Wed, Dec 23, 2020 at 10:30 PM David Smiley  wrote:
>
> I like Mike McCandless's suggestion of controlling which DWPT (and thus 
> segment) an incoming document goes to.  I've thought of this before for a 
> different use case grouping documents into segments by the underlying "type" 
> of the document.  This could make sense for a use-case that queries by 
> document type, and you don't want to create an index per document type (maybe 
> because the index is too small to warrant it).  It could even be used in a 
> kind of soft / hint kind of way -- not an absolute strict separation.  For 
> example, say if some subset of DWPTs are known to hold docs of a given type, 
> then add incoming docs of that type to any of those and not the others.  But 
> if none exist then just add to any DWPT.  I also thought of this sort of 
> thing at the MergePolicy level, but at that point, any mixing of doc types 
> has already occurred and MP can't separate them, it can only combine, though 
> it can try to reduce introducing too much mixing.  It would be nice if it 
> were possible to atomically merge some documents in a segment but not the 
> whole segment, thus still leaving the segment in place but with the extracted 
> documents marked deleted.  This is similar to "shard splitting" (index 
> splitting) but to do so atomically/transactionally.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, Dec 20, 2020 at 10:24 AM Michael McCandless 
>  wrote:
>>
>> I think the addIndexes approach could work as Haoyu describes!  One 
>> IndexWriter per segment in the original source index, using 
>> FilterIndexReader to ... mark all documents NOT in the target segment as 
>> deleted?
>>
>> For the final step, you could use addIndexes(Directory[]) which more of less 
>> does a simple file copy of the incoming segment's files.
>>
>> But this is a whole extra and costly sounding step, that might undo the wall 
>> clock speedup from the concurrent indexing in the first pass.  Maybe it is 
>> still faster net/net than what luceneutil benchmarks, which is 
>> single-threaded-everything (single indexing thread, SerialMergeScheduler, 
>> LogDocMergePolicy)?
>>
>> The first option Haoyu listed sounds interesting too!  Could we somehow 
>> build a new index, concurrently, but force certain docs to go to certain 
>> in-memory segments (DWPT)?  Today the routing of incoming indexing thread to 
>> DWPT is sort of random, but there is indeed a dedicated internal class that 
>> decides that: DocumentsWriterPerThreadPool.  And, here is a fun PR that 
>> Adrien is working on to improve how threads are scheduled onto in-memory 
>> segments, to try to create larger initially flushed segments and less merge 
>> pressure as a result: https://github.com/apache/lucene-solr/pull/1912
>>
>> If we could carefully guide threads to the right DWPT during indexing the 
>> 2nd time, and then use a custom MergePolicy that is also careful to only 
>> merge segments that "belong" together, and the index is sorted, I think you 
>> would get the same segment geometry in the end, and exact same documents in 
>> each segments?  This'd likely be nearly as fast as freely building an index 
>> concurrently!  It'd be a nice addition to luceneutil benchmarks too, since 
>> now it takes crazy long to build the deterministic index.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Sat, Dec 19, 2020 at 2:50 PM Haoyu Zhai  wrote:
>>>
>>> Hi Adrien
>>> I think Mike's comment is correct, we already have index sorted but we want 
>>> to reconstruct a index with exact same number of segments and each segment 
>>> contains exact same documents.
>>>
>>> Mike
>>> AddIndexes could take CodecReader as input [1], which allows us to pass in 
>>> a customized FilteredIndexReader I think? Then it knows which docs to take. 
>>> And then suppose original index has N segments, we could open N IndexWriter 
>>> concurrently and rebuilt those N segments, and at last somehow merge them 
>>> back to a whole index. (I am not quite sure about whether we could achieve 
>>> the last step easily, but that sounds not so hard?)
>>>
>>> [1] 
>>> https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/index/IndexWriter.html#addIndexes-org.apache.lucene.index.CodecReader...-
>>>
>>> Michael Sokolov  于2020年12月1

Re: Old programmers do fade away

2021-01-05 Thread Simon Willnauer

Eric, thanks so much for your open and true words! You will always be
part of this community if you subscribed to the lists or not. (you
can't escape :D) Thanks for your contributions, this is a team effort
and you are a part of it.

enjoy the welding!!

simon

On Wed, Dec 30, 2020 at 3:09 PM Erick Erickson  wrote:
>
> 40 years is enough. OK, it's only been 39 1/2 years. Dear Lord, has it really 
> been that long? Programming's been fun, I've gotten to solve puzzles every 
> day. The art and science of programming has changed over that time. Let me 
> tell you about the joys of debugging with a Z80 stack emulator that required 
> that you to look on the stack for variables and trace function calls by 
> knowing how to follow frame pointers. Oh the tedium! Oh the (lack of) speed! 
> Not to mention that 64K of memory was all you had to work with. I had a 
> co-worker who could predict the number of bytes by which the program would 
> shrink based on extracting common code to functions. The "good old 
> days"...weren't...
>
> I'd been thinking that I'd treat Lucene/Solr as a hobby, doing occasional 
> work on it when I was bored over long winter nights. I've discovered, though, 
> that I've been increasingly reluctant to crack open the code. I guess that 
> after this much time, I'm ready to hang up my spurs. One major factor is the 
> realization that there's so much going on with Lucene/Solr that simply being 
> aware of the changes, much less trying to really understand them, isn't 
> something I can do casually.
>
> I bought a welder and find myself more interested in playing with that than 
> programming. Wait until you see the squirrel-proof garden enclosure I'm 
> building with it. If my initial plan doesn't work, next up is an electric 
> fence along the top. The laser-sighted automatic machine gun emplacement will 
> take more planning...Ahhh, probably won't be able to get a permit from the 
> township for that though. Do you think the police would notice? Perhaps I 
> should add that the local police station is two blocks away and in the line 
> of fire. But an infrared laser powerful enough to "pre-cook" them wouldn't be 
> as obvious would it?
>
> Why am I so fixated on squirrels? One of the joys of gardening is fresh 
> tomatoes rather than those red things they sell in the store. The squirrels 
> ATE EVERY ONE OF MY TOMATOES WHILE THEY WERE STILL GREEN LAST YEAR! And the 
> melons. In the words of B. Bunny: "Of course you realize this means war" 
> (https://www.youtube.com/watch?v=4XNr-BQgpd0)...
>
> Then there's working in the garden and landscaping, the desk I want to build 
> for my wife, travel as soon as I can, maybe seeing if some sailboats need 
> crew...you get the idea.
>
> It's been a privilege to work with this group, you're some of the best and 
> brightest. Many thanks to all who've generously given me their time and 
> guidance. It's been a constant source of amazement to me how willing people 
> are to take time out of their own life and work to help me when I've had 
> questions. I owe a lot of people beers ;)
>
> I'll be stopping my list subscriptions, Slack channels (dm me if you need 
> something), un-assigning any JIRAs and that kind of thing over the next 
> while. If anyone's interested in taking over the BadApple report, let me know 
> and I can put the code up somewhere. It takes about 10 minutes to do each 
> week. I won't disappear entirely, things like the code-reformatting effort 
> are nicely self-contained for instance and something I can to casually.
>
> My e-mail address if you need to get in touch with me is: 
> "erick.erick...@gmail.com". There's a correlation between gmail addresses 
> that are just a name with no numbers and a person's age... A co-worker came 
> over to my desk in pre-historical times and said "there's this new mail 
> service you might want to sign up for"... Like I said, 40 years is enough.
>
> Best to all,
> Erick
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene logo contest, third time's a charm

2020-09-08 Thread Simon Willnauer

Thank you ryan for pushing on this, being persistent and getting the vote
out.



On Tue, Sep 8, 2020 at 5:55 PM Ryan Ernst  wrote:

> This vote is now closed. The results are as follows:
>
> Binding Results
>   A1: 12 (55%)
>   D: 6 (27%)
>   A2: 4 (18%)
>
> All Results
>   A1: 16 (55%)
>   D: 7 (24%)
>   A2: 5 (17%)
>   B5d: 1 (3%)
>
> A1 is our winner! I will make the necessary changes.
>
> Thank you to Dustin Haver, Stamatis Zampetakis, Baris Kazar and all who
> voted!
>
> On Tue, Sep 1, 2020 at 1:21 PM Ryan Ernst  wrote:
>
> > Dear Lucene and Solr developers!
> >
> > Sorry for the multiple threads. This should be the last one.
> >
> > In February a contest was started to design a new logo for Lucene
> > [jira-issue]. The initial attempt [first-vote] to call a vote resulted in
> > some confusion on the rules, as well the request for one additional
> > submission. The second attempt [second-vote] yesterday had incorrect
> links
> > for one of the submissions. I would like to call a new vote, now with
> more
> > explicit instructions on how to vote, and corrected links.
> >
> > *Please read the following rules carefully* before submitting your vote.
> >
> > *Who can vote?*
> >
> > Anyone is welcome to cast a vote in support of their favorite
> > submission(s). Note that only PMC member's votes are binding. If you are
> a
> > PMC member, please indicate with your vote that the vote is binding, to
> > ease collection of votes. In tallying the votes, I will attempt to verify
> > only those marked as binding.
> >
> >
> > *How do I vote?*
> > Votes can be cast simply by replying to this email. It is a ranked-choice
> > vote [rank-choice-voting]. Multiple selections may be made, where the
> order
> > of preference must be specified. If an entry gets more than half the
> votes,
> > it is the winner. Otherwise, the entry with the lowest number of votes is
> > removed, and the votes are retallied, taking into account the next
> > preferred entry for those whose first entry was removed. This process
> > repeats until there is a winner.
> >
> > The entries are broken up by variants, since some entries have multiple
> > color or style variations. The entry identifiers are first a capital
> > letter, followed by a variation id (described with each entry below), if
> > applicable. As an example, if you prefer variant 1 of entry A, followed
> by
> > variant 2 of entry A, variant 3 of entry C, entry D, and lastly variant
> 4e
> > of entry B, the following should be in your reply:
> >
> > (binding)
> > vote: A1, A2, C3, D, B4e
> >
> > *Entries*
> >
> > The entries are as follows:
> >
> > A*.* Submitted by Dustin Haver. This entry has two variants, A1 and A2.
> >
> > [A1]
> >
> https://issues.apache.org/jira/secure/attachment/12999548/Screen%20Shot%202020-04-10%20at%208.29.32%20AM.png
> > [A2]
> > https://issues.apache.org/jira/secure/attachment/12997172/LuceneLogo.png
> >
> > B. Submitted by Stamatis Zampetakis. This has several variants. Within
> the
> > linked entry there are 7 patterns and 7 color palettes. Any vote for B
> > should contain the pattern number followed by the lowercase letter of the
> > color palette. For example, B3e or B1a.
> >
> > [B]
> >
> https://issues.apache.org/jira/secure/attachment/12997768/zabetak-1-7.pdf
> >
> > C. Submitted by Baris Kazar. This entry has 8 variants.
> >
> > [C1]
> >
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo1_full.pdf
> > [C2]
> >
> https://issues.apache.org/jira/secure/attachment/13006393/lucene_logo2_full.pdf
> > [C3]
> >
> https://issues.apache.org/jira/secure/attachment/13006394/lucene_logo3_full.pdf
> > [C4]
> >
> https://issues.apache.org/jira/secure/attachment/13006395/lucene_logo4_full.pdf
> > [C5]
> >
> https://issues.apache.org/jira/secure/attachment/13006396/lucene_logo5_full.pdf
> > [C6]
> >
> https://issues.apache.org/jira/secure/attachment/13006397/lucene_logo6_full.pdf
> > [C7]
> >
> https://issues.apache.org/jira/secure/attachment/13006398/lucene_logo7_full.pdf
> > [C8]
> >
> https://issues.apache.org/jira/secure/attachment/13006399/lucene_logo8_full.pdf
> >
> > D. The current Lucene logo.
> >
> > [D]
> > https://lucene.apache.org/theme/images/lucene/lucene_logo_green_300.png
> >
> > Please vote for one of the above choices. This vote will close about one
> > week from today, Mon, Sept 7, 2020 at 11:59PM.
> >
> > Thanks!
> >
> > [jira-issue] https://issues.apache.org/jira/browse/LUCENE-9221
> > [first-vote]
> >
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202006.mbox/%3cCA+DiXd74Mz4H6o9SmUNLUuHQc6Q1-9mzUR7xfxR03ntGwo=d...@mail.gmail.com%3e
> > [second-vote]
> >
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202009.mbox/%3cCA+DiXd7eBrQu5+aJQ3jKaUtUTJUqaG2U6o+kUZfNe-m=smn...@mail.gmail.com%3e
> > [rank-choice-voting] https://en.wikipedia.org/wiki/Instant-runoff_voting
> >
>

Re: [VOTE] Lucene logo contest, third time's a charm

2020-09-03 Thread Simon Willnauer

A1, A2, D (binding)

On Thu, Sep 3, 2020 at 7:09 AM Noble Paul  wrote:

> A1, A2, D binding
>
> On Thu, Sep 3, 2020 at 7:22 AM Jason Gerlowski 
> wrote:
> >
> > A1, A2, D (binding)
> >
> > On Wed, Sep 2, 2020 at 10:47 AM Michael McCandless
> >  wrote:
> > >
> > > A2, A1, C5, D (binding)
> > >
> > > Thank you to everyone for working so hard to make such cool looking
> possible future Lucene logos!  And to Ryan for the challenging job of
> calling this VOTE :)
> > >
> > > Mike McCandless
> > >
> > > http://blog.mikemccandless.com
> > >
> > >
> > > On Tue, Sep 1, 2020 at 4:21 PM Ryan Ernst  wrote:
> > >>
> > >> Dear Lucene and Solr developers!
> > >>
> > >> Sorry for the multiple threads. This should be the last one.
> > >>
> > >> In February a contest was started to design a new logo for Lucene
> [jira-issue]. The initial attempt [first-vote] to call a vote resulted in
> some confusion on the rules, as well the request for one additional
> submission. The second attempt [second-vote] yesterday had incorrect links
> for one of the submissions. I would like to call a new vote, now with more
> explicit instructions on how to vote, and corrected links.
> > >>
> > >> Please read the following rules carefully before submitting your vote.
> > >>
> > >> Who can vote?
> > >>
> > >> Anyone is welcome to cast a vote in support of their favorite
> submission(s). Note that only PMC member's votes are binding. If you are a
> PMC member, please indicate with your vote that the vote is binding, to
> ease collection of votes. In tallying the votes, I will attempt to verify
> only those marked as binding.
> > >>
> > >> How do I vote?
> > >>
> > >> Votes can be cast simply by replying to this email. It is a
> ranked-choice vote [rank-choice-voting]. Multiple selections may be made,
> where the order of preference must be specified. If an entry gets more than
> half the votes, it is the winner. Otherwise, the entry with the lowest
> number of votes is removed, and the votes are retallied, taking into
> account the next preferred entry for those whose first entry was removed.
> This process repeats until there is a winner.
> > >>
> > >> The entries are broken up by variants, since some entries have
> multiple color or style variations. The entry identifiers are first a
> capital letter, followed by a variation id (described with each entry
> below), if applicable. As an example, if you prefer variant 1 of entry A,
> followed by variant 2 of entry A, variant 3 of entry C, entry D, and lastly
> variant 4e of entry B, the following should be in your reply:
> > >>
> > >> (binding)
> > >> vote: A1, A2, C3, D, B4e
> > >>
> > >> Entries
> > >>
> > >> The entries are as follows:
> > >>
> > >> A. Submitted by Dustin Haver. This entry has two variants, A1 and A2.
> > >>
> > >> [A1]
> https://issues.apache.org/jira/secure/attachment/12999548/Screen%20Shot%202020-04-10%20at%208.29.32%20AM.png
> > >> [A2]
> https://issues.apache.org/jira/secure/attachment/12997172/LuceneLogo.png
> > >>
> > >> B. Submitted by Stamatis Zampetakis. This has several variants.
> Within the linked entry there are 7 patterns and 7 color palettes. Any vote
> for B should contain the pattern number followed by the lowercase letter of
> the color palette. For example, B3e or B1a.
> > >>
> > >> [B]
> https://issues.apache.org/jira/secure/attachment/12997768/zabetak-1-7.pdf
> > >>
> > >> C. Submitted by Baris Kazar. This entry has 8 variants.
> > >>
> > >> [C1]
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo1_full.pdf
> > >> [C2]
> https://issues.apache.org/jira/secure/attachment/13006393/lucene_logo2_full.pdf
> > >> [C3]
> https://issues.apache.org/jira/secure/attachment/13006394/lucene_logo3_full.pdf
> > >> [C4]
> https://issues.apache.org/jira/secure/attachment/13006395/lucene_logo4_full.pdf
> > >> [C5]
> https://issues.apache.org/jira/secure/attachment/13006396/lucene_logo5_full.pdf
> > >> [C6]
> https://issues.apache.org/jira/secure/attachment/13006397/lucene_logo6_full.pdf
> > >> [C7]
> https://issues.apache.org/jira/secure/attachment/13006398/lucene_logo7_full.pdf
> > >> [C8]
> https://issues.apache.org/jira/secure/attachment/13006399/lucene_logo8_full.pdf
> > >>
> > >> D. The current Lucene logo.
> > >>
> > >> [D]
> https://lucene.apache.org/theme/images/lucene/lucene_logo_green_300.png
> > >>
> > >> Please vote for one of the above choices. This vote will close about
> one week from today, Mon, Sept 7, 2020 at 11:59PM.
> > >>
> > >> Thanks!
> > >>
> > >> [jira-issue] https://issues.apache.org/jira/browse/LUCENE-9221
> > >> [first-vote]
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202006.mbox/%3cCA+DiXd74Mz4H6o9SmUNLUuHQc6Q1-9mzUR7xfxR03ntGwo=d...@mail.gmail.com%3e
> > >> [second-vote]
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202009.mbox/%3cCA+DiXd7eBrQu5+aJQ3jKaUtUTJUqaG2U6o+kUZfNe-m=smn...@mail.gmail.com%3e
> > >> [rank-choice-voting]
> https://en.wikipedia.org/wiki/Instant-runoff_voting
> >
> > ---

Re: [VOTE] Release Lucene/Solr 8.6.2 RC1

2020-08-27 Thread Simon Willnauer

+1 binding release looks good to me

On Thu, Aug 27, 2020 at 3:58 PM Atri Sharma  wrote:
>
> +1 (binding)
>
> SUCCESS! [1:14:17.24939]
>
> On Thu, 27 Aug 2020 at 18:41, Michael Sokolov  wrote:
>>
>> SUCCESS! [0:56:28.589654]
>>
>>
>>
>> +1
>>
>>
>>
>> On Wed, Aug 26, 2020 at 12:41 PM Nhat Nguyen
>>
>>  wrote:
>>
>> >
>>
>> > +1
>>
>> >
>>
>> > SUCCESS! [0:52:44.607871]
>>
>> >
>>
>> > On Wed, Aug 26, 2020 at 12:12 PM Tomoko Uchida 
>> >  wrote:
>>
>> >>
>>
>> >> +1 (non-binding)
>>
>> >> SUCCESS! [0:51:55.207272]
>>
>> >>
>>
>> >>
>>
>> >> 2020年8月26日(水) 22:42 Ignacio Vera :
>>
>> >>>
>>
>> >>> Please vote for release candidate 1 for Lucene/Solr 8.6.2
>>
>> >>>
>>
>> >>>
>>
>> >>> The artifacts can be downloaded from:
>>
>> >>>
>>
>> >>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.2-RC1-rev016993b65e393b58246d54e8ddda9f56a453eb0e
>>
>> >>>
>>
>> >>>
>>
>> >>> You can run the smoke tester directly with this command:
>>
>> >>>
>>
>> >>>
>>
>> >>> python3 -u dev-tools/scripts/smokeTestRelease.py \
>>
>> >>>
>>
>> >>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.2-RC1-rev016993b65e393b58246d54e8ddda9f56a453eb0e
>>
>> >>>
>>
>> >>>
>>
>> >>> The vote will be open for at least 72 hours i.e. until 2020-08-29 15:00 
>> >>> UTC.
>>
>> >>>
>>
>> >>>
>>
>> >>> [ ] +1  approve
>>
>> >>>
>>
>> >>> [ ] +0  no opinion
>>
>> >>>
>>
>> >>> [ ] -1  disapprove (and reason why)
>>
>> >>>
>>
>> >>>
>>
>> >>> Here is my +1
>>
>> >>>
>>
>> >>>
>>
>> >>> SUCCESS! [1:14:00.656250]
>>
>>
>>
>> -
>>
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>>
>
>
> --
> Regards,
>
> Atri
> Apache Concerted

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene/Solr 8.6.2 bugfix release

2020-08-25 Thread Simon Willnauer

I'd actually like to build the RC earlier than the end of the week.
Unless somebody objects I'd like to build one tonight or tomorrow.

simon

On Tue, Aug 25, 2020 at 7:52 AM Ishan Chattopadhyaya
 wrote:
>
> Thanks Simon and Ignacio!
>
> On Tue, 25 Aug, 2020, 11:21 am Simon Willnauer,  
> wrote:
>>
>> +1 thank you! I was about to write the same email. Lets sync on the RM
>> I can certainly help... I need to go and find my code signing key
>> first :)
>>
>> simon
>>
>> On Tue, Aug 25, 2020 at 7:49 AM Ignacio Vera  wrote:
>> >
>> > Hi,
>> >
>> > I propose a 8.6.2 bugfix release and I volunteer as RM. The motivation for 
>> > this release is LUCENE-9478 where Simon addressed a serious memory leak in 
>> > DWPTDeleteQueue.
>> >
>> > If there are no objections I am planning to build the first RC by the end 
>> > of this week.
>> >
>> > Ignacio
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene/Solr 8.6.2 bugfix release

2020-08-24 Thread Simon Willnauer

+1 thank you! I was about to write the same email. Lets sync on the RM
I can certainly help... I need to go and find my code signing key
first :)

simon

On Tue, Aug 25, 2020 at 7:49 AM Ignacio Vera  wrote:
>
> Hi,
>
> I propose a 8.6.2 bugfix release and I volunteer as RM. The motivation for 
> this release is LUCENE-9478 where Simon addressed a serious memory leak in 
> DWPTDeleteQueue.
>
> If there are no objections I am planning to build the first RC by the end of 
> this week.
>
> Ignacio
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lots of failures lately for lucene.index.TestBackwardsCompatability.testAllVersionsTested

2020-05-16 Thread Simon Willnauer

I thinks it’s fixed now. The 7.7.3 version was missing.

Simon


> On 16. May 2020, at 22:45, Erick Erickson  wrote:
> 
> Unfortunately the seed doesn’t reproduce, and I tried beasting it without 
> getting any fails in 700 iterations (and counting).
> 
> Here’s one example, I see three others in the last couple of hours.
> 
> I’ve done zero investigation into where these are coming from, but I did 
> notice there started being a lot of them starting 2-3 (?) days ago.
> 
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-Windows/1134/
> Java: 64bit/jdk-11.0.6 -XX:+UseCompressedOops -XX:+UseParallelGC
> 
> 6 tests failed.
> FAILED:  
> org.apache.lucene.index.TestBackwardsCompatibility.testAllVersionsTested
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Solr to become a top-level Apache project (TLP)

2020-05-12 Thread Simon Willnauer

I agree this is not a code change category vote. It’s a majority vote. -1s are 
not vetos.

Simon

> On 12. May 2020, at 21:17, Atri Sharma  wrote:
> 
> 
> I would argue against that — this is more of a project level decision with no 
> changes to the core code base per se — more of restructuring of it. Sort of 
> how a sub project becomes a TLP.
> 
>> On Wed, 13 May 2020 at 00:38, Ishan Chattopadhyaya 
>>  wrote:
>> This is in the code modification category, since code will be modified as 
>> result of this proposal.
>> 
>>> On Wed, 13 May, 2020, 12:27 am Shawn Heisey,  wrote:
>>> On 5/12/2020 1:36 AM, Dawid Weiss wrote:
>>> > According to an earlier [DISCUSS] thread on the dev list [2], I am
>>> > calling for a vote on the proposal to make Solr a top-level Apache
>>> > project (TLP) and separate Lucene and Solr development into two
>>> > independent entities.
>>> 
>>> +1 (pmc)
>>> 
>>> We should clarify exactly what kind of vote this is.  If it is in the 
>>> "code modification" category, then a single -1 vote would be enough to 
>>> defeat the proposal.  There are already some -1 votes.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> 
> -- 
> Regards,
> 
> Atri
> Apache Concerted

Re: [VOTE] Solr to become a top-level Apache project (TLP)

2020-05-12 Thread Simon Willnauer

+1 binding 

Sent from a mobile device

> On 12. May 2020, at 13:33, Jason Gerlowski  wrote:
> 
> -1 (binding)
> 
>> On Tue, May 12, 2020 at 7:31 AM Alan Woodward  wrote:
>> 
>> +1 (binding)
>> 
>> Alan Woodward
>> 
 On 12 May 2020, at 12:06, Jan Høydahl  wrote:
>>> 
>>> +1 (binding)
>>> 
>>> Jan Høydahl
>>> 
 12. mai 2020 kl. 09:36 skrev Dawid Weiss :

 Dear Lucene and Solr developers!

 According to an earlier [DISCUSS] thread on the dev list [2], I am
 calling for a vote on the proposal to make Solr a top-level Apache
 project (TLP) and separate Lucene and Solr development into two
 independent entities.

 To quickly recap the reasons and consequences of such a move: it seems
 like the reasons for the initial merge of Lucene and Solr, around 10
 years ago, have been achieved. Both projects are in good shape and
 exhibit signs of independence already (mailing lists, committers,
 patch flow). There are many technical considerations that would make
 development much easier if we move Solr out into its own TLP.

 We discussed this issue [2] and both PMC members and committers had a
 chance to review all the pros and cons and express their views. The
 discussion showed that there are clearly different opinions on the
 matter - some people are in favor, some are neutral, others are
 against or not seeing the point of additional labor. Realistically, I
 don't think reaching 100% level consensus is going to be possible --
 we are a diverse bunch with different opinions and personalities. I
 firmly believe this is the right direction hence the decision to put
 it under the voting process. Should something take a wrong turn in the
 future (as some folks worry it may), all blame is on me.

 Therefore, the proposal is to separate Solr from under Lucene TLP, and
 make it a TLP on its own. The initial structure of the new PMC,
 committer base, git repositories and other managerial aspects can be
 worked out during the process if the decision passes.

 Please indicate one of the following (see [1] for guidelines):

 [ ] +1 - yes, I vote for the proposal
 [ ] -1 - no, I vote against the proposal

 Please note that anyone in the Lucene+Solr community is invited to
 express their opinion, though only Lucene+Solr committers cast binding
 votes (indicate non-binding votes in your reply, please).

 The vote will be active for a week to give everyone a chance to read
 and cast a vote.

 Dawid

 [1] https://www.apache.org/foundation/voting.html
 [2] 
 https://lists.apache.org/thread.html/rfae2440264f6f874e91545b2030c98e7b7e3854ddf090f7747d338df%40%3Cdev.lucene.apache.org%3E

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-11 Thread Simon Willnauer

On Sun, May 10, 2020 at 3:41 PM Bram Van Dam  wrote:
>
> On 10/05/2020 08:20, David Smiley wrote:
> > An idea just occurred to me that may help make a split nicer for Solr
> > than it is today.  Solr could use a branch of the Lucene project that's
> > used for the Solr project.
>
> Maybe I'm alone in this, but (better) Lucene compatibility is one of the
> reasons why our company chose Solr over ElasticSearch.

I though about this for a while and I do wonder if you could elaborate
on what makes Solr  have a better compatibility with Lucene. That's
certainly something elasticsearch would want to catch up on since it
sounds like a clear benefit for users. Maybe I just misunderstood what
you meant hence couldn't make much sense out of it.

simon

>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [DISCUSS] Lucene-Solr split (Solr promoted to TLP)

2020-05-06 Thread Simon Willnauer

I can speak from experience that working with a snapshot is much
cleaner than working with submodules. We do this in elasticsearch for
a very long time now and our process here works just fine. It has a
bunch of advantages over a direct / source dependency like solr has
right now. I recall that someone else already mentioned some of them
like working on somewhat more stable codebase etc. do refactorings and
integration when there are people dedicated to it and have enough time
to do it properly.

Regarding the effort of a split, I think that not doing something
because it's a lot of work will just cause a ton of issues down the
road. Doing the right thing is a lot of work that's for sure but we
can start working on this in baby steps an we can all help. Like we
can gradually do this, start with website, lists then build system
etc. or start with build first and do website last. It's ok to apply
progress over perfection here. We all want this to be done properly
and we are all here to help, at least I am.

simon

On Wed, May 6, 2020 at 10:51 AM Ishan Chattopadhyaya
 wrote:
>
> Except the logistics of enacting the split, I see no valid reason of keeping 
> the projects together. Git submodule is the magic that we have to ease any 
> potential discomfort. However, the effort needed to split feels absolutely 
> massive, so I'm not sure if it is worth the hassle.
>
> On Wed, 6 May, 2020, 1:31 pm Dawid Weiss,  wrote:
>>
>> > If you go to lucene.apache.org, you'll see three things: Lucene Core 
>> > (Lucene with all it's modules), Solr and PyLucene. That's what I mean.
>>
>> Hmm... Maybe I'm dim but that's essentially what I want to do. Look:
>>
>> 1. Lucene Core (Lucene with all it's modules)
>> 2. Solr
>> 3. PyLucene
>>
>> The thing is: (1) is already a TLP - that's just Lucene. My call is to
>> make (2) a TLP. (3) I can't tell much about because I don't know
>> PyLucene as well as I do Solr and Lucene... But it seems to me that
>> PyLucene fits much better under "Lucene" umbrella, even the name
>> suggests that.
>>
>>
>>
>> Dawid
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete

2019-08-13 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906502#comment-16906502
 ] 

Simon Willnauer commented on LUCENE-8369:
-

+1 for option 1 above as well. Thanks [~nknize]

> Remove the spatial module as it is obsolete
> ---
>
> Key: LUCENE-8369
> URL: https://issues.apache.org/jira/browse/LUCENE-8369
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-8369.patch
>
>
> The "spatial" module is at this juncture nearly empty with only a couple 
> utilities that aren't used by anything in the entire codebase -- 
> GeoRelationUtils, and MortonEncoder.  Perhaps it should have been removed 
> earlier in LUCENE-7664 which was the removal of GeoPointField which was 
> essentially why the module existed.  Better late than never.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete

2019-08-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902133#comment-16902133
 ] 

Simon Willnauer commented on LUCENE-8369:
-

I don't think we should scarify the existence of LatLong point searching out of 
core for the sake of code visibility.  I think we should keep it in core and 
open up visibility to enable code-reuse in the modules and use 
_@lucene.internal_ in order to mark classes as internal and prevent users from 
complaining when the API changes. It's not ideal but progress. Can we separate 
the disucssion of getting rid of the spacial module from graduating the various 
shapes from sandbox to wherever? I think keeping a module for 2 classes doesn't 
make sense. We can move those two classes to core too or even get rid of them 
altogether I don't think it should influence the discussion if something else 
should be graduated. 

One other option would be we move all non-core spacials from sandbox to spatial 
as long as they don't add any additional dependency. that would be an 
intermediate step. we can still graduate from there then.

> Remove the spatial module as it is obsolete
> ---
>
> Key: LUCENE-8369
> URL: https://issues.apache.org/jira/browse/LUCENE-8369
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-8369.patch
>
>
> The "spatial" module is at this juncture nearly empty with only a couple 
> utilities that aren't used by anything in the entire codebase -- 
> GeoRelationUtils, and MortonEncoder.  Perhaps it should have been removed 
> earlier in LUCENE-7664 which was the removal of GeoPointField which was 
> essentially why the module existed.  Better late than never.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8887) CLONE - Add setting for moving FST offheap/onheap

2019-06-27 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8887.
-
Resolution: Duplicate

this seems to be opened accidentially

> CLONE - Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8887
> URL: https://issues.apache.org/jira/browse/LUCENE-8887
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: LuYunCheng
>    Assignee: Simon Willnauer
>Priority: Minor
> Fix For: master (9.0), 8.1
>
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8865) Use incoming thread for execution if IndexSearcher has an executor

2019-06-25 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872690#comment-16872690
 ] 

Simon Willnauer commented on LUCENE-8865:
-

[~hypothesisx86] I didn't run any benchmarks. maybe [~mikemccand] can provide 
infos if there are improvements. 

>  Use incoming thread for execution if IndexSearcher has an executor
> ---
>
> Key: LUCENE-8865
> URL: https://issues.apache.org/jira/browse/LUCENE-8865
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Today we don't utilize the incoming thread for a search when IndexSearcher
> has an executor. This thread is only idleing but can be used to execute a 
> search
> once all other collectors are dispatched.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-06-20 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868555#comment-16868555
 ] 

Simon Willnauer commented on LUCENE-8857:
-

A couple of comments:

 * can you open a PR and associate it with this issue. Patches are so hard to 
review without context and the ability to comment
 * for the second case in IndexsSearcher should we also tie-break by doc? 
 * Can we replace the verbose comparators with _Comparator.comparingInt(d -> 
d.shardIndex);_ and _Comparator.comparingInt(d -> d.doc);_ respectively?
 * Any chance we can select the tie-breaker based on if one of the TopDocs has 
a shardIndex != -1 and assert that all of them have it or not? Another option 
would be to have only one comparator and first tie-break on shardIndex and then 
on doc since we don't set the shard index it should be fine since they are all 
-1? WDYT?

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, 
> LUCENE-8857.patch, LUCENE-8857.patch
>
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8865) Use incoming thread for execution if IndexSearcher has an executor

2019-06-18 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8865.
-
   Resolution: Fixed
Fix Version/s: 8.2
   master (9.0)

>  Use incoming thread for execution if IndexSearcher has an executor
> ---
>
> Key: LUCENE-8865
> URL: https://issues.apache.org/jira/browse/LUCENE-8865
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Today we don't utilize the incoming thread for a search when IndexSearcher
> has an executor. This thread is only idleing but can be used to execute a 
> search
> once all other collectors are dispatched.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-06-18 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866575#comment-16866575
 ] 

Simon Willnauer commented on LUCENE-8857:
-

Why don't we just use the comparator and have a default and a doc one? like 
this:

{code}
 Comparator defaultComparator = Comparator.comparingInt(d -> 
d.shardIndex);
 Comparator docComparator = Comparator.comparingInt(d -> d.doc);
{code}

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch
>
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8853) FileSwitchDirectory is broken if temp outputs are used

2019-06-17 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8853.
-
   Resolution: Fixed
Fix Version/s: 8.2
   master (9.0)

> FileSwitchDirectory is broken if temp outputs are used
> --
>
> Key: LUCENE-8853
> URL: https://issues.apache.org/jira/browse/LUCENE-8853
> Project: Lucene - Core
>  Issue Type: Bug
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> FileSwitchDirectory basically doesn't work if tmp output are used for files 
> that are explicitly mapped with extensions. here is a failing test:
> {code}
> 16:49:40[junit4] Suite: 
> org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest
> 16:49:40[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=BlendedInfixSuggesterTest 
> -Dtests.method=testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch
>  -Dtests.seed=16D8C93DC8FE5192 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=pt-LU -Dtests.timezone=US/Michigan -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1
> 16:49:40[junit4] ERROR   0.05s J1 | 
> BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch
>  <<<
> 16:49:40[junit4]> Throwable #1: 
> java.nio.file.AtomicMoveNotSupportedException: _0.fdx__0.tmp -> _0.fdx: 
> source and dest are in different directories
> 16:49:40[junit4]> at 
> __randomizedtesting.SeedInfo.seed([16D8C93DC8FE5192:20E180A9490374CE]:0)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.FileSwitchDirectory.rename(FileSwitchDirectory.java:201)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.rename(LockValidatingDirectoryWrapper.java:56)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.TrackingDirectoryWrapper.rename(TrackingDirectoryWrapper.java:64)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.SortingStoredFieldsConsumer.flush(SortingStoredFieldsConsumer.java:56)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:152)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:468)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:555)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:722)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3199)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3409)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.commit(AnalyzingInfixSuggester.java:345)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:315)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.getBlendedInfixSuggester(BlendedInfixSuggesterTest.java:125)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch(BlendedInfixSuggesterTest.java:79)
> 16:49:40[junit4]> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 16:49:40[junit4]> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 16:49:40[junit4]> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 16:49:40[junit4]> at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> 16:49:40[junit4]> at 
> java.base/java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers

2019-06-17 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866253#comment-16866253
 ] 

Simon Willnauer commented on LUCENE-8857:
-

>From my perspective we should simplify this even more and remove 
>_TieBreakingParameters_. TopDocs can use _Comparator_  and default 
>to the shard index if it's not supplied. That should be sufficient?

> Refactor TopDocs#Merge To Take In Custom Tie Breakers
> -
>
> Key: LUCENE-8857
> URL: https://issues.apache.org/jira/browse/LUCENE-8857
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8857.patch, LUCENE-8857.patch
>
>
> In LUCENE-8829, the idea of having lambdas passed in to the API to allow 
> finer control over the process was discussed.
> This JIRA tracks adding a parameter to the API which allows passing in 
> lambdas to define custom tie breakers, thus allowing users to do custom 
> algorithms when required.
> CC: [~jpountz]  [~simonw] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8865) Use incoming thread for execution if IndexSearcher has an executor

2019-06-17 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8865:
---

 Summary:  Use incoming thread for execution if IndexSearcher has 
an executor
 Key: LUCENE-8865
 URL: https://issues.apache.org/jira/browse/LUCENE-8865
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer


Today we don't utilize the incoming thread for a search when IndexSearcher
has an executor. This thread is only idleing but can be used to execute a 
search
once all other collectors are dispatched.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved

2019-06-13 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863067#comment-16863067
 ] 

Simon Willnauer commented on LUCENE-8829:
-

{quote}
Simon Willnauer That is a fun idea, although it would still need a function to 
instruct TopDocs#merge whether to set the shard indices or not.
{quote}

I am not sure we have to. Can't a user initialize it ahead of time if 
necessary. I think if it's necessary to have this we can just iterate over it 
and set it from the outside? That should also be possible no?

> TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
> -
>
> Key: LUCENE-8829
> URL: https://issues.apache.org/jira/browse/LUCENE-8829
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, 
> LUCENE-8829.patch
>
>
> While investigating LUCENE-8819, I understood that TopDocs#merge's order of 
> results are indirectly dependent on the number of collectors involved in the 
> merge. This is troubling because 1) The number of collectors involved in a 
> merge are cost based and directly dependent on the number of slices created 
> for the parallel searcher case. 2) TopN hits code path will invoke merge with 
> a single Collector, so essentially, doing the same TopN query with single 
> threaded and parallel threaded searcher will invoke different order of 
> results, which is a bad invariant that breaks.
>  
> The reason why this happens is because of the subtle way TopDocs#merge sets 
> shardIndex in the ScoreDoc population during populating the priority queue 
> used for merging. ShardIndex is essentially set to the ordinal of the 
> collector which generates the hit. This means that the shardIndex is 
> dependent on the number of collectors, even for the same set of hits.
>  
> In case of no sort order specified, shardIndex is used for tie breaking when 
> scores are equal. This translates to different orders for same hits with 
> different shardIndices.
>  
> I propose that we remove shardIndex from the default tie breaking mechanism 
> and replace it with docID. DocID order is the de facto that is expected 
> during collection, so it might make sense to use the same factor during tie 
> breaking when scores are the same.
>  
> CC: [~ivera]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved

2019-06-12 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861848#comment-16861848
 ] 

Simon Willnauer edited comment on LUCENE-8829 at 6/12/19 8:56 AM:
--

I'd remove the _setShardIndex_ parameter alltogether and don't set it


was (Author: simonw):
I'd remove the _ setShardIndex_ parameter alltogether and don't set it

> TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
> -
>
> Key: LUCENE-8829
> URL: https://issues.apache.org/jira/browse/LUCENE-8829
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, 
> LUCENE-8829.patch
>
>
> While investigating LUCENE-8819, I understood that TopDocs#merge's order of 
> results are indirectly dependent on the number of collectors involved in the 
> merge. This is troubling because 1) The number of collectors involved in a 
> merge are cost based and directly dependent on the number of slices created 
> for the parallel searcher case. 2) TopN hits code path will invoke merge with 
> a single Collector, so essentially, doing the same TopN query with single 
> threaded and parallel threaded searcher will invoke different order of 
> results, which is a bad invariant that breaks.
>  
> The reason why this happens is because of the subtle way TopDocs#merge sets 
> shardIndex in the ScoreDoc population during populating the priority queue 
> used for merging. ShardIndex is essentially set to the ordinal of the 
> collector which generates the hit. This means that the shardIndex is 
> dependent on the number of collectors, even for the same set of hits.
>  
> In case of no sort order specified, shardIndex is used for tie breaking when 
> scores are equal. This translates to different orders for same hits with 
> different shardIndices.
>  
> I propose that we remove shardIndex from the default tie breaking mechanism 
> and replace it with docID. DocID order is the de facto that is expected 
> during collection, so it might make sense to use the same factor during tie 
> breaking when scores are the same.
>  
> CC: [~ivera]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved

2019-06-12 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861848#comment-16861848
 ] 

Simon Willnauer commented on LUCENE-8829:
-

I'd remove the _ setShardIndex_ parameter alltogether and don't set it

> TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
> -
>
> Key: LUCENE-8829
> URL: https://issues.apache.org/jira/browse/LUCENE-8829
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, 
> LUCENE-8829.patch
>
>
> While investigating LUCENE-8819, I understood that TopDocs#merge's order of 
> results are indirectly dependent on the number of collectors involved in the 
> merge. This is troubling because 1) The number of collectors involved in a 
> merge are cost based and directly dependent on the number of slices created 
> for the parallel searcher case. 2) TopN hits code path will invoke merge with 
> a single Collector, so essentially, doing the same TopN query with single 
> threaded and parallel threaded searcher will invoke different order of 
> results, which is a bad invariant that breaks.
>  
> The reason why this happens is because of the subtle way TopDocs#merge sets 
> shardIndex in the ScoreDoc population during populating the priority queue 
> used for merging. ShardIndex is essentially set to the ordinal of the 
> collector which generates the hit. This means that the shardIndex is 
> dependent on the number of collectors, even for the same set of hits.
>  
> In case of no sort order specified, shardIndex is used for tie breaking when 
> scores are equal. This translates to different orders for same hits with 
> different shardIndices.
>  
> I propose that we remove shardIndex from the default tie breaking mechanism 
> and replace it with docID. DocID order is the de facto that is expected 
> during collection, so it might make sense to use the same factor during tie 
> breaking when scores are the same.
>  
> CC: [~ivera]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved

2019-06-12 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861821#comment-16861821
 ] 

Simon Willnauer commented on LUCENE-8829:
-

I do wonder if we can simplify this API now that we have FunctionalInterfaces. 
If we change _TopDocs#merge_ to take a  _ToIntFunction_ we should be 
able to have a default of _ScoreDoc::doc_ and users that want to use the 
shardindex can use _ScoreDoc::shardIndex_ that should also simplify our code I 
guess. Yet, I haven't check if it works across the board just an idea.

> TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
> -
>
> Key: LUCENE-8829
> URL: https://issues.apache.org/jira/browse/LUCENE-8829
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, 
> LUCENE-8829.patch
>
>
> While investigating LUCENE-8819, I understood that TopDocs#merge's order of 
> results are indirectly dependent on the number of collectors involved in the 
> merge. This is troubling because 1) The number of collectors involved in a 
> merge are cost based and directly dependent on the number of slices created 
> for the parallel searcher case. 2) TopN hits code path will invoke merge with 
> a single Collector, so essentially, doing the same TopN query with single 
> threaded and parallel threaded searcher will invoke different order of 
> results, which is a bad invariant that breaks.
>  
> The reason why this happens is because of the subtle way TopDocs#merge sets 
> shardIndex in the ScoreDoc population during populating the priority queue 
> used for merging. ShardIndex is essentially set to the ordinal of the 
> collector which generates the hit. This means that the shardIndex is 
> dependent on the number of collectors, even for the same set of hits.
>  
> In case of no sort order specified, shardIndex is used for tie breaking when 
> scores are equal. This translates to different orders for same hits with 
> different shardIndices.
>  
> I propose that we remove shardIndex from the default tie breaking mechanism 
> and replace it with docID. DocID order is the de facto that is expected 
> during collection, so it might make sense to use the same factor during tie 
> breaking when scores are the same.
>  
> CC: [~ivera]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8853) FileSwitchDirectory is broken if temp outputs are used

2019-06-11 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861785#comment-16861785
 ] 

Simon Willnauer commented on LUCENE-8853:
-

I attached a PR but I am not really happy with it, yet it's my best bet. I am 
wondering sure if we should start a discussion about removal of 
FileSwitchDirectory. It's hard to get right and there are many situtations 
where it can break. I do wonder what's it's usecase other than opening a file 
with NIO vs. MMAP as elasticsearch uses. If that's the main purpose we can 
build a better version of it. /cc [~rcmuir]

> FileSwitchDirectory is broken if temp outputs are used
> --
>
> Key: LUCENE-8853
> URL: https://issues.apache.org/jira/browse/LUCENE-8853
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> FileSwitchDirectory basically doesn't work if tmp output are used for files 
> that are explicitly mapped with extensions. here is a failing test:
> {code}
> 16:49:40[junit4] Suite: 
> org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest
> 16:49:40[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=BlendedInfixSuggesterTest 
> -Dtests.method=testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch
>  -Dtests.seed=16D8C93DC8FE5192 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=pt-LU -Dtests.timezone=US/Michigan -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1
> 16:49:40[junit4] ERROR   0.05s J1 | 
> BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch
>  <<<
> 16:49:40[junit4]> Throwable #1: 
> java.nio.file.AtomicMoveNotSupportedException: _0.fdx__0.tmp -> _0.fdx: 
> source and dest are in different directories
> 16:49:40[junit4]> at 
> __randomizedtesting.SeedInfo.seed([16D8C93DC8FE5192:20E180A9490374CE]:0)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.FileSwitchDirectory.rename(FileSwitchDirectory.java:201)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.rename(LockValidatingDirectoryWrapper.java:56)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.TrackingDirectoryWrapper.rename(TrackingDirectoryWrapper.java:64)
> 16:49:40[junit4]> at 
> org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.SortingStoredFieldsConsumer.flush(SortingStoredFieldsConsumer.java:56)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:152)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:468)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:555)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:722)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3199)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444)
> 16:49:40[junit4]> at 
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3409)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.commit(AnalyzingInfixSuggester.java:345)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:315)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.getBlendedInfixSuggester(BlendedInfixSuggesterTest.java:125)
> 16:49:40[junit4]> at 
> org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch(BlendedInfixSuggesterTest.java:79)
> 16:49:40[junit4]> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 16:49:40[junit4]> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 16:49:40[junit4]> at 
&

[jira] [Created] (LUCENE-8853) FileSwitchDirectory is broken if temp outputs are used

2019-06-11 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8853:
---

 Summary: FileSwitchDirectory is broken if temp outputs are used
 Key: LUCENE-8853
 URL: https://issues.apache.org/jira/browse/LUCENE-8853
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Simon Willnauer


FileSwitchDirectory basically doesn't work if tmp output are used for files 
that are explicitly mapped with extensions. here is a failing test:

{code}
16:49:40[junit4] Suite: 
org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest
16:49:40[junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=BlendedInfixSuggesterTest 
-Dtests.method=testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch
 -Dtests.seed=16D8C93DC8FE5192 -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=pt-LU -Dtests.timezone=US/Michigan -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
16:49:40[junit4] ERROR   0.05s J1 | 
BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch
 <<<
16:49:40[junit4]> Throwable #1: 
java.nio.file.AtomicMoveNotSupportedException: _0.fdx__0.tmp -> _0.fdx: source 
and dest are in different directories
16:49:40[junit4]>   at 
__randomizedtesting.SeedInfo.seed([16D8C93DC8FE5192:20E180A9490374CE]:0)
16:49:40[junit4]>   at 
org.apache.lucene.store.FileSwitchDirectory.rename(FileSwitchDirectory.java:201)
16:49:40[junit4]>   at 
org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231)
16:49:40[junit4]>   at 
org.apache.lucene.store.LockValidatingDirectoryWrapper.rename(LockValidatingDirectoryWrapper.java:56)
16:49:40[junit4]>   at 
org.apache.lucene.store.TrackingDirectoryWrapper.rename(TrackingDirectoryWrapper.java:64)
16:49:40[junit4]>   at 
org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89)
16:49:40[junit4]>   at 
org.apache.lucene.index.SortingStoredFieldsConsumer.flush(SortingStoredFieldsConsumer.java:56)
16:49:40[junit4]>   at 
org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:152)
16:49:40[junit4]>   at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:468)
16:49:40[junit4]>   at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:555)
16:49:40[junit4]>   at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:722)
16:49:40[junit4]>   at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3199)
16:49:40[junit4]>   at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444)
16:49:40[junit4]>   at 
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3409)
16:49:40[junit4]>   at 
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.commit(AnalyzingInfixSuggester.java:345)
16:49:40[junit4]>   at 
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:315)
16:49:40[junit4]>   at 
org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.getBlendedInfixSuggester(BlendedInfixSuggesterTest.java:125)
16:49:40[junit4]>   at 
org.apache.lucene.search.suggest.analyzing.BlendedInfixSuggesterTest.testBlendedSort_fieldWeightZero_shouldRankSuggestionsByPositionMatch(BlendedInfixSuggesterTest.java:79)
16:49:40[junit4]>   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
16:49:40[junit4]>   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
16:49:40[junit4]>   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16:49:40[junit4]>   at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
16:49:40[junit4]>   at 
java.base/java.lang.Thread.run(Thread.java:834)
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8835) Respect file extension when listing files form FileSwitchDirectory

2019-06-11 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8835.
-
   Resolution: Fixed
 Assignee: Simon Willnauer
Fix Version/s: 8.2
   master (9.0)

> Respect file extension when listing files form FileSwitchDirectory
> --
>
> Key: LUCENE-8835
> URL: https://issues.apache.org/jira/browse/LUCENE-8835
> Project: Lucene - Core
>  Issue Type: Bug
>    Reporter: Simon Willnauer
>    Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> FileSwitchDirectory splits file actions between 2 directories based on file 
> extensions. The extensions are respected on write operations like delete or 
> create but ignored when we list the content of the directories. Until now we 
> only deduplicated the contents on Directory#listAll which can cause 
> inconsistencies and hard to debug errors due to double deletions in 
> IndexWriter is a file is pending delete in one of the directories but still 
> shows up in the directory listing form the other directory. This case can 
> happen if both directories point to the same underlying FS directory which is 
> a common usecase to split between mmap and noifs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8833) Allow subclasses of MMapDirecory to preload individual IndexInputs

2019-06-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858441#comment-16858441
 ] 

Simon Willnauer commented on LUCENE-8833:
-

I do like the idea of #warm but the footprint is much bigger since it's a 
public API. I mean for my specific usecase I'd subclass mmap anyway and it 
would make it easier that way. FileSwitchDirectory is quite heavy and isn't 
really build for what I wanna do. I basically would need a IndexInput factory 
that I can plug into a directory that can alternate between NIOFS and mmap etc. 
and conditionally preload the mmap. Either way I can work with both I just 
think this change is the minimum viable change. lemme know if you are ok moving 
forward.

> Allow subclasses of MMapDirecory to preload individual IndexInputs
> --
>
> Key: LUCENE-8833
> URL: https://issues.apache.org/jira/browse/LUCENE-8833
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it's useful for subclasses to select the preload flag on a per index 
> input basis rather than all or nothing. Here is a patch that has an 
> overloaded protected openInput method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8833) Allow subclasses of MMapDirecory to preload individual IndexInputs

2019-06-06 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857525#comment-16857525
 ] 

Simon Willnauer commented on LUCENE-8833:
-

> what would the iocontext provide to base the preload decision on? just 
> curious.

sure, the one I had in mind as an example is merge. I am not sure if it makes a 
big difference I was just thinking if there are other signals than the file 
extension. 
I opened LUCENE-8835 to fix the file listing issue FileSwitchDirectory has.

> Allow subclasses of MMapDirecory to preload individual IndexInputs
> --
>
> Key: LUCENE-8833
> URL: https://issues.apache.org/jira/browse/LUCENE-8833
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it's useful for subclasses to select the preload flag on a per index 
> input basis rather than all or nothing. Here is a patch that has an 
> overloaded protected openInput method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8835) Respect file extension when listing files form FileSwitchDirectory

2019-06-06 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8835:
---

 Summary: Respect file extension when listing files form 
FileSwitchDirectory
 Key: LUCENE-8835
 URL: https://issues.apache.org/jira/browse/LUCENE-8835
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Simon Willnauer


FileSwitchDirectory splits file actions between 2 directories based on file 
extensions. The extensions are respected on write operations like delete or 
create but ignored when we list the content of the directories. Until now we 
only deduplicated the contents on Directory#listAll which can cause 
inconsistencies and hard to debug errors due to double deletions in IndexWriter 
is a file is pending delete in one of the directories but still shows up in the 
directory listing form the other directory. This case can happen if both 
directories point to the same underlying FS directory which is a common usecase 
to split between mmap and noifs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8833) Allow subclasses of MMapDirecory to preload individual IndexInputs

2019-06-05 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856781#comment-16856781
 ] 

Simon Willnauer commented on LUCENE-8833:
-

you are correct that's what elasticsearch does. Yet, FileSwitchDirectory had 
many issues in the past and still has (I am working on one issue related to 
[this|https://github.com/elastic/elasticsearch/pull/37140] and will open 
another issue soon. Especially with the push of pending deletes down to 
FSDirectory things became more tricky for FileSwitchDirectory especially. That 
said I think these issue should be fixed and I will work on it it was more of a 
trigger to look closer. I also wanted to make decisions if you preload or not 
based on the IOContext down the road which FileSwitch would not be capable of 
doing in this context. I hope this makes sense?

> Allow subclasses of MMapDirecory to preload individual IndexInputs
> --
>
> Key: LUCENE-8833
> URL: https://issues.apache.org/jira/browse/LUCENE-8833
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it's useful for subclasses to select the preload flag on a per index 
> input basis rather than all or nothing. Here is a patch that has an 
> overloaded protected openInput method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8833) Allow subclasses of MMapDirecory to preload individual IndexInputs

2019-06-05 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8833:
---

 Summary: Allow subclasses of MMapDirecory to preload individual 
IndexInputs
 Key: LUCENE-8833
 URL: https://issues.apache.org/jira/browse/LUCENE-8833
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer


I think it's useful for subclasses to select the preload flag on a per index 
input basis rather than all or nothing. Here is a patch that has an overloaded 
protected openInput method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8809) Refresh and rollback concurrently can leave segment states unclosed

2019-06-04 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856364#comment-16856364
 ] 

Simon Willnauer commented on LUCENE-8809:
-

[~dnhatn] can we close this issue?

> Refresh and rollback concurrently can leave segment states unclosed
> ---
>
> Key: LUCENE-8809
> URL: https://issues.apache.org/jira/browse/LUCENE-8809
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.7, 8.1, 8.2
>Reporter: Nhat Nguyen
>Assignee: Nhat Nguyen
>Priority: Major
> Fix For: 7.7.2, master (9.0), 8.2, 8.1.2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A [failed test|https://github.com/elastic/elasticsearch/issues/30290] from 
> Elasticsearch shows that refresh and rollback concurrently can leave segment 
> states unclosed leads to leaking refCount of some SegmentReaders.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8813) testIndexTooManyDocs fails

2019-05-31 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8813.
-
   Resolution: Fixed
Fix Version/s: 8.2
   master (9.0)

> testIndexTooManyDocs fails
> --
>
> Key: LUCENE-8813
> URL: https://issues.apache.org/jira/browse/LUCENE-8813
> Project: Lucene - Core
>  Issue Type: Test
>  Components: core/index
>Reporter: Nhat Nguyen
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> testIndexTooManyDocs fails on [Elastic 
> CI|https://elasticsearch-ci.elastic.co/job/apache+lucene-solr+branch_8x/6402/console].
>  This failure does not reproduce locally for me.
> {noformat}
> [junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs
>[junit4]   2> KTN 23, 2019 4:09:37 PM 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
>[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-612,5,TGRP-TestIndexTooManyDocs]
>[junit4]   2> java.lang.AssertionError: only modifications from the 
> current flushing queue are permitted while doing a full flush
>[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([1F16B1DA7056AA52]:0)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.assertTicketQueueModification(DocumentsWriter.java:683)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:187)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:411)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:514)
>[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
>[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
>[junit4]   2> at 
> org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
>[junit4]   2> at java.base/java.lang.Thread.run(Thread.java:834)
>[junit4]   2> 
>[junit4]   2> KTN 23, 2019 6:09:36 PM 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
>[junit4]   2> WARNING: Suite execution timed out: 
> org.apache.lucene.index.TestIndexTooManyDocs
>[junit4]   2>1) Thread[id=669, 
> name=SUITE-TestIndexTooManyDocs-seed#[1F16B1DA7056AA52], state=RUNNABLE, 
> group=TGRP-TestIndexTooManyDocs]
>[junit4]   2> at 
> java.base/java.lang.Thread.getStackTrace(Thread.java:1606)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:696)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:693)
>[junit4]   2> at 
> java.base/java.security.AccessController.doPrivileged(Native Method)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.getStackTrace(ThreadLeakControl.java:693)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.getThreadsWithTraces(ThreadLeakControl.java:709)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.formatThreadStacksFull(ThreadLeakControl.java:689)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.access$1000(ThreadLeakControl.java:65)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2.evaluate(ThreadLeakControl.java:415)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:708)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$200(RandomizedRunner.java:138)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:629)
>[junit4]   2>2) Thread[id=671, name=Thread-606, state=BLOCKED, 
> group=TGRP-TestIndexTooManyDocs]
>[junit4]   2> at 
> app//org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:4945)
>[junit4]   2> at 
> app//org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:293)
>[junit4]   2> at 
> app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:

[jira] [Commented] (LUCENE-8813) testIndexTooManyDocs fails

2019-05-28 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849506#comment-16849506
 ] 

Simon Willnauer commented on LUCENE-8813:
-

I looked at this and I think the issue here is that we are executing 2 flushes 
very very quickly after another while at the same time a single thread has 
already released it's DWPT before the first flush but has not tried to applying 
deletes before the second flush is done. In this case the assertion doesn't 
hold anymore. The window is super small and that is likely why we never tripped 
this. I don't think we have a correctness issue here but I will still try to 
improve the way we assert/apply deletes. 

> testIndexTooManyDocs fails
> --
>
> Key: LUCENE-8813
> URL: https://issues.apache.org/jira/browse/LUCENE-8813
> Project: Lucene - Core
>  Issue Type: Test
>  Components: core/index
>Reporter: Nhat Nguyen
>Priority: Major
>
> testIndexTooManyDocs fails on [Elastic 
> CI|https://elasticsearch-ci.elastic.co/job/apache+lucene-solr+branch_8x/6402/console].
>  This failure does not reproduce locally for me.
> {noformat}
> [junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs
>[junit4]   2> KTN 23, 2019 4:09:37 PM 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
>[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-612,5,TGRP-TestIndexTooManyDocs]
>[junit4]   2> java.lang.AssertionError: only modifications from the 
> current flushing queue are permitted while doing a full flush
>[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([1F16B1DA7056AA52]:0)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.assertTicketQueueModification(DocumentsWriter.java:683)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:187)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:411)
>[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:514)
>[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
>[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
>[junit4]   2> at 
> org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
>[junit4]   2> at java.base/java.lang.Thread.run(Thread.java:834)
>[junit4]   2> 
>[junit4]   2> KTN 23, 2019 6:09:36 PM 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
>[junit4]   2> WARNING: Suite execution timed out: 
> org.apache.lucene.index.TestIndexTooManyDocs
>[junit4]   2>1) Thread[id=669, 
> name=SUITE-TestIndexTooManyDocs-seed#[1F16B1DA7056AA52], state=RUNNABLE, 
> group=TGRP-TestIndexTooManyDocs]
>[junit4]   2> at 
> java.base/java.lang.Thread.getStackTrace(Thread.java:1606)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:696)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:693)
>[junit4]   2> at 
> java.base/java.security.AccessController.doPrivileged(Native Method)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.getStackTrace(ThreadLeakControl.java:693)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.getThreadsWithTraces(ThreadLeakControl.java:709)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.formatThreadStacksFull(ThreadLeakControl.java:689)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.access$1000(ThreadLeakControl.java:65)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2.evaluate(ThreadLeakControl.java:415)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:708)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$200(RandomizedRunner.java:138)
>[junit4]   2> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:629)
>[junit4]   2>2) Thread[id=671, name=Thread-606, state=BLO

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843726#comment-16843726
 ] 

Simon Willnauer commented on LUCENE-8757:
-

[~atris] can we instead of asserting the order just sort the slice in LeafSlice 
ctor? This should prevent any issues down the road and it's cheap enough IMO

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>    Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-13 Thread Simon Willnauer

I think this should be done inside IndexSearcher. It’s a general problem, no?

> On 13. May 2019, at 10:25, Adrien Grand (JIRA)  wrote:
> 
> 
>[ 
> https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838363#comment-16838363
>  ] 
> 
> Adrien Grand commented on LUCENE-8757:
> --
> 
> Yes. Top-docs collectors are expected to tie-break by doc ID in case 
> documents compare equal. Things like TopDocs#merge compare doc IDs explicitly 
> for that purpose, but Collector#collect implementations just rely on the fact 
> that documents are collected in order to ignore documents that compare equal 
> to the current k-th best hit. So we need to sort segments within a slice by 
> docBase in order to get the same top hits regardless of how slices have been 
> constructed.
> 
>> Better Segment To Thread Mapping Algorithm
>> --
>> 
>>Key: LUCENE-8757
>>URL: https://issues.apache.org/jira/browse/LUCENE-8757
>>Project: Lucene - Core
>>     Issue Type: Improvement
>>   Reporter: Atri Sharma
>>   Assignee: Simon Willnauer
>>   Priority: Major
>>Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
>> LUCENE-8757.patch
>> 
>> 
>> The current segments to threads allocation algorithm always allocates one 
>> thread per segment. This is detrimental to performance in case of skew in 
>> segment sizes since small segments also get their dedicated thread. This can 
>> lead to performance degradation due to context switching overheads.
>>  
>> A better algorithm which is cognizant of size skew would have better 
>> performance for realistic scenarios
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-10 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-8757:
---

Assignee: Simon Willnauer

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>    Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-10 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837615#comment-16837615
 ] 

Simon Willnauer commented on LUCENE-8757:
-

LGTM I will try to commit this in the coming days

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-10 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837003#comment-16837003
 ] 

Simon Willnauer commented on LUCENE-8757:
-

{quote}
I think there is an important justification for the 2nd criteria (number of 
segments in each work unit / slice), which is if you have an index with some 
large segments, and then with a long tail of small segments (easily happens if 
your machine has substantially CPU concurrency and you use multiple threads), 
since there is a fixed cost for visiting each segment, if you put too many 
small segments into one work unit, those fixed costs multiply and that one work 
unit can become too slow even though it's not actually going to visit too many 
documents.

I think we should keep it?
{quote}

fair enough. lets add it back


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-08 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835894#comment-16835894
 ] 

Simon Willnauer commented on LUCENE-8785:
-

{quote} Please feel free to commit this to the release branch. In case of a 
re-spin, I'll pick this change up. {quote}

[~ichattopadhyaya] done. Thanks.

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 7.7.2, master (9.0), 8.2, 8.1.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-08 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835481#comment-16835481
 ] 

Simon Willnauer commented on LUCENE-8757:
-

Thanks for the additional iteration, now that we simplified this can we remove 
the sorting? I don't necessearily see how the sort makes things simpler. If we 
see a segment > threshold we can just add it as a group? I though you did that 
already and hence my comment about the assertion. WDYT?

I also want to suggest to beef up testing a bit with a randomized version of 
this like this:
{code}
diff --git 
a/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java 
b/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java
index 7c63a817adb..76ccca64ee7 100644
--- a/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java
+++ b/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java
@@ -1933,6 +1933,14 @@ public abstract class LuceneTestCase extends Assert {
 ret = random.nextBoolean()
 ? new AssertingIndexSearcher(random, r, ex)
 : new AssertingIndexSearcher(random, r.getContext(), ex);
+  } else if (random.nextBoolean()) {
+int maxDocPerSlice = 1 + random.nextInt(10);
+ret = new IndexSearcher(r, ex) {
+  @Override
+  protected LeafSlice[] slices(List leaves) {
+return slices(leaves, maxDocPerSlice);
+  }
+};
   } else {
 ret = random.nextBoolean()
 ? new IndexSearcher(r, ex)
{code}



> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7840) BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least 1 MUST/FILTER clause and 0==minShouldMatch

2019-05-08 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835473#comment-16835473
 ] 

Simon Willnauer commented on LUCENE-7840:
-

LGTM

> BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least 
> 1 MUST/FILTER clause and 0==minShouldMatch
> ---
>
> Key: LUCENE-7840
> URL: https://issues.apache.org/jira/browse/LUCENE-7840
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-7840.patch, LUCENE-7840.patch, LUCENE-7840.patch
>
>
> I haven't thought this through completely, let alone write up a patch / test 
> case, but IIUC...
> We should be able to optimize  {{ BooleanQuery rewriteNoScoring() }} so that 
> (after converting MUST clauses to FILTER clauses) we can check for the common 
> case of {{0==getMinimumNumberShouldMatch()}} and throw away any SHOULD 
> clauses as long as there is is at least one FILTER clause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-08 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8785.
-
Resolution: Fixed

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>    Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 7.7.2, master (9.0), 8.2, 8.1.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-08 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8785:

Fix Version/s: (was: 8.0.1)
   (was: 8.1)
   (was: 7.7.1)
   8.2
   7.7.2
   8.1.1

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>    Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 7.7.2, master (9.0), 8.2, 8.1.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7840) BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least 1 MUST/FILTER clause and 0==minShouldMatch

2019-05-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834778#comment-16834778
 ] 

Simon Willnauer commented on LUCENE-7840:
-

I think there are some style issues in this patch like here were _else_ shoud 
be on the prev line:

{code:java}
+  }
+}
+else {
+  newQuery.add(clause);
+}
{code}

the other question is if we should use a switch instead of if / else? Otherwise 
it's looking fine




> BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least 
> 1 MUST/FILTER clause and 0==minShouldMatch
> ---
>
> Key: LUCENE-7840
> URL: https://issues.apache.org/jira/browse/LUCENE-7840
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-7840.patch, LUCENE-7840.patch
>
>
> I haven't thought this through completely, let alone write up a patch / test 
> case, but IIUC...
> We should be able to optimize  {{ BooleanQuery rewriteNoScoring() }} so that 
> (after converting MUST clauses to FILTER clauses) we can check for the common 
> case of {{0==getMinimumNumberShouldMatch()}} and throw away any SHOULD 
> clauses as long as there is is at least one FILTER clause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834767#comment-16834767
 ] 

Simon Willnauer commented on LUCENE-8757:
-

[~atris] I think the assertion in this part doesn't hold:

{code}
+for (LeafReaderContext ctx : sortedLeaves) {
+  if (ctx.reader().maxDoc() > maxDocsPerSlice) {
+assert group == null;
+List singleSegmentSlice = new ArrayList();
{code}

if the previous segment was smallish then _group_ is non-null? I think you 
should test these cases, maybe add a random test and randomize the order or the 
segments?

This:
{code}
+List singleSegmentSlice = new ArrayList();
+
+singleSegmentSlice.add(ctx);
+groupedLeaves.add(singleSegmentSlice);
{code}
can and should be replaced by:

{code}
groupedLeaves.add(Collections.singletonList(ctx));
{code}


otherwise it looks good.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834525#comment-16834525
 ] 

Simon Willnauer commented on LUCENE-8757:
-

[~atris] actually I thought about these defaults again and I am starting to 
think it's an ok default. The reason for this is that we try to prevent having 
dedicated threads for smallish segments so we group them together. I still do 
wonder if we need to have 2 parameters? Wouldn't it be enough to just say that 
we group things together until we have at least 250k docs per thread to be 
searched? is it really necessary to have another parameter that limits the 
number of segmetns per slice? I think a single parameter would be great and 
simpler. WDYT?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-07 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-8785:

Fix Version/s: 7.7.1
   master (9.0)
   8.1
   8.0.1

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>    Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 7.7.1, 8.0.1, 8.1, master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-07 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-8785:
---

Assignee: Simon Willnauer

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>    Assignee: Simon Willnauer
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834467#comment-16834467
 ] 

Simon Willnauer commented on LUCENE-8785:
-

{quote}
If there is another thread coming in after we locked the existent threadstates 
we just issue a new one.

Yuck 
{quote}

I looked at the code again and we actually lock the threadstates for this 
purpose. I implemented this in LUCENE-8639. The issue here is in-fact a race 
condition since we request the number of active threadstates before we lock new 
ones. It's a classic one-line fix. I referenced a PR for this. [~mikemccand] 
would you take a look

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-07 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1683#comment-1683
 ] 

Simon Willnauer commented on LUCENE-8757:
-

> Would it make sense to push this patch, and then let users consume it and 
> provide feedback while we iterate on the more sophisticated version? We could 
> even have both of the methods available as options to users, potentially

I don't think we should push this if we already know we wanna do something 
different. That said, I am not convinced the numbers are good defaults. At the 
same time I don't have any numbers here do you have anything to back these 
defaults up?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-03 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832343#comment-16832343
 ] 

Simon Willnauer commented on LUCENE-8757:
-

Thanks [~atris], can you bring back the javadocs for 
{code:java}
protected LeafSlice[] slices(List leaves){code}

please don't reassign an argument like here:


{code:java}
leaves = new ArrayList<>(leaves);
{code}

The rest of the patch looks OK to me yet I am not so sure about the defaults. I 
do wonder if we should look at this from a different perspective. Rather than 
using hard numbers can we try to evenly balance the total number of documents 
across N threads and make N the variable? [~mikemccand] WDYT?


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-03 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832336#comment-16832336
 ] 

Simon Willnauer commented on LUCENE-8785:
-

{quote} I realize neither ES nor Solr expose deleteAll but I don't think that's 
a valid argument to remove it from Lucene.  
{quote}
 huh, I don't think that's a valid argument either, I just re-read my comments 
- sorry if you felt I was alluding to es or solr here. My argument is that if 
you want to do that you should construct a new IndexWriter instead of calling 
deleteAll(). Given this comment on the javadocs:
{noformat}
 Essentially a call to {@link #deleteAll()} is equivalent to creating a new 
{@link IndexWriter} with {@link OpenMode#CREATE} 
{noformat}
I want to understand why, in such a rather edgy case a user can't do exactly 
this. There is no race, no confusion it's very simple from a semantics 
perspective. Currently there are 2 ways and one if confusing. I think we should 
move towards removing the second way.

 
{quote}And for some reason the index is reset once per week, but the devs want 
to allow searching of the old index while the new index is (slowly) built up. 
But if something goes badly wrong, they need to be able to rollback (the 
deleteAll and all subsequently added docs) to the last commit and try again 
later. If instead it succeeds, then a refresh/commit will switch to the new 
index atomically. 
{quote}
 Well, there are tons of ways to do that no? I mean you can have 2 directories? 
Yes it causes some engineering effort but the semantics would be cleaner even 
for the app that does what you explain.

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Priority: Minor
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(

[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-02 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831635#comment-16831635
 ] 

Simon Willnauer commented on LUCENE-8785:
-

> But at the point we call clear() haven't we already blocked all indexing 
> threads?

no, it might look like we do that but we don't. We block and lock all threads 
up that that point in time. If there is another thread coming in after we 
locked the existent threadstates we just issue a new one.

> I also dislike deleteAll() and you're right a user could deleteByQuery using 
> MatchAllDocsQuery; can we make that close-ish as efficient as deleteAll() is 
> today?

I think we can just do what deleteAll() does today except of not dropping the 
schema on the floor?

> Though indeed that would preserve the schema, while deleteAll() let's you 
> delete docs, delete schema, all under transaction (the change is not visible 
> until commit). 

I want to understand the usecase for this. I can see how somebody wants to drop 
all docs but basically droping all IW state on the floor is difficult in my 
eyes.



> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Priority: Minor
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8785) TestIndexWriterDelete.testDeleteAllNoDeadlock failure

2019-05-02 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831612#comment-16831612
 ] 

Simon Willnauer commented on LUCENE-8785:
-

[~mikemccand] I think this is caused by the fact that we simply call _clear()_ 
during _IW#deleteAll()_. If this happens concurrently to the a document being 
indexed this assertion can trip. I personally always disliked the complexity of 
_IW#deleteAll_ and from my perspective we should remove this method entirely 
and ask users to open a new IW if they want to drop all the information 
including the _schema_. We can still fast-path a _MatchAllQuery_ through 
something like this as we do today (which is a problem IMO since it drops all 
fields map info which it shouldn't?). IMO if you want a fresh index start from 
scratch but to delete all docs go and run DeleteByQueyr and keep the schema.

> TestIndexWriterDelete.testDeleteAllNoDeadlock failure
> -
>
> Key: LUCENE-8785
> URL: https://issues.apache.org/jira/browse/LUCENE-8785
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.6
> Environment: OpenJDK 1.8.0_202
>Reporter: Michael McCandless
>Priority: Minor
>
> I was running Lucene's core tests on an {{i3.16xlarge}} EC2 instance (64 
> cores), and hit this random yet spooky failure:
> {noformat}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexWriterDelete -Dtests.method=testDeleteAllNoDeadLock 
> -Dtests.seed=952BE262BA547C1 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=ar-YE -Dtests.timezone=Europe/Lisbon -Dtests.as\
> serts=true -Dtests.file.encoding=US-ASCII
>    [junit4] ERROR   0.16s J3 | TestIndexWriterDelete.testDeleteAllNoDeadLock 
> <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=36, name=Thread-2, state=RUNNABLE, 
> group=TGRP-TestIndexWriterDelete]
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1:3A4B5138AB66FD97]:0)
>    [junit4]    > Caused by: java.lang.RuntimeException: 
> java.lang.IllegalArgumentException: field number 0 is already mapped to field 
> name "null", not "content"
>    [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([952BE262BA547C1]:0)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:332)
>    [junit4]    > Caused by: java.lang.IllegalArgumentException: field number 
> 0 is already mapped to field name "null", not "content"
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:310)
>    [junit4]    >        at 
> org.apache.lucene.index.FieldInfos$Builder.getOrAdd(FieldInfos.java:415)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:650)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)
>    [junit4]    >        at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:297)
>    [junit4]    >        at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:450)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1291)
>    [junit4]    >        at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1264)
>    [junit4]    >        at 
> org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:159)
>    [junit4]    >        at 
> org.apache.lucene.index.TestIndexWriterDelete$1.run(TestIndexWriterDelete.java:326){noformat}
> It does *not* reproduce unfortunately ... but maybe there is some subtle 
> thread safety issue in this code ... this is a hairy part of Lucene ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8776) Start offset going backwards has a legitimate purpose

2019-05-02 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831604#comment-16831604
 ] 

Simon Willnauer commented on LUCENE-8776:
-

[~venkat11] I do understand your frustration. Believe me, we don't take changes 
like this easily. One persons bug is another persons feature and as we grow and 
mature strong guarantess are essential for a vast majority of users, for future 
developments for faster iterations and more performant code. There might not be 
a tradeoff from your perspective, from the maintainers perspective there is. 
Now we can debate if a major version bump is _enough_ time to migrate or not, 
our policy is that we can make BWC and behavioral changes like this in a major 
release. In-fact we don't do it in minors to provide you the time you need and 
to easy upgrades to minors. We will and have build features on top of this 
guarantee and in order to manage expectations I am pretty sure we won't go back 
an allow negative offsets. I think your best option, if you like it or not, is 
to work towards a fix for your issue with either the tools you have now or 
improve lucene for instance with the suggestion from [~mgibney] regarding 
indexing more information. 

Please don't get mad at me, I am just trying to manage expectations. 

> Start offset going backwards has a legitimate purpose
> -
>
> Key: LUCENE-8776
> URL: https://issues.apache.org/jira/browse/LUCENE-8776
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.6
>Reporter: Ram Venkat
>Priority: Major
>
> Here is the use case where startOffset can go backwards:
> Say there is a line "Organic light-emitting-diode glows", and I want to run 
> span queries and highlight them properly. 
> During index time, light-emitting-diode is split into three words, which 
> allows me to search for 'light', 'emitting' and 'diode' individually. The 
> three words occupy adjacent positions in the index, as 'light' adjacent to 
> 'emitting' and 'light' at a distance of two words from 'diode' need to match 
> this word. So, the order of words after splitting are: Organic, light, 
> emitting, diode, glows. 
> But, I also want to search for 'organic' being adjacent to 
> 'light-emitting-diode' or 'light-emitting-diode' being adjacent to 'glows'. 
> The way I solved this was to also generate 'light-emitting-diode' at two 
> positions: (a) In the same position as 'light' and (b) in the same position 
> as 'glows', like below:
> ||organic||light||emitting||diode||glows||
> | |light-emitting-diode| |light-emitting-diode| |
> |0|1|2|3|4|
> The positions of the two 'light-emitting-diode' are 1 and 3, but the offsets 
> are obviously the same. This works beautifully in Lucene 5.x in both 
> searching and highlighting with span queries. 
> But when I try this in Lucene 7.6, it hits the condition "Offsets must not go 
> backwards" at DefaultIndexingChain:818. This IllegalArgumentException is 
> being thrown without any comments on why this check is needed. As I explained 
> above, startOffset going backwards is perfectly valid, to deal with word 
> splitting and span operations on these specialized use cases. On the other 
> hand, it is not clear what value is added by this check and which highlighter 
> code is affected by offsets going backwards. This same check is done at 
> BaseTokenStreamTestCase:245. 
> I see others talk about how this check found bugs in WordDelimiter etc. but 
> it also prevents legitimate use cases. Can this check be removed?  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-02 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831591#comment-16831591
 ] 

Simon Willnauer commented on LUCENE-8757:
-

Hey Atri,

thanks for putting up this patch, here is some additional feedback:
 - can we stick with an protected non-static method on IndexSearcher subclasses 
should be able to override your impl. I think it's ok to have a static method 
like this:
{code:java}
 public static LeafSlice[] slices (List leaves, int 
maxDocsPerSlice, int maxSegPerSlice){code}
that you can call from the protected method with your defaults?
 - you might want to change your sort to something like this: 
{code:java}
Collections.sort(leaves, Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc(;{code}

 - I think the _Leaves_ class is unnecessary we can just use 
_List_ instead?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-04-15 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8671.
-
   Resolution: Fixed
 Assignee: Simon Willnauer
Fix Version/s: master (9.0)
   8.1

> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>    Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 8.1, master (9.0)
>
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Time Spent: 5h
>  Remaining Estimate: 19h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8754) SegmentInfo#toString can cause ConcurrentModificationException

2019-04-10 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8754.
-
   Resolution: Fixed
Fix Version/s: master (9.0)
   8.1

> SegmentInfo#toString can cause ConcurrentModificationException
> --
>
> Key: LUCENE-8754
> URL: https://issues.apache.org/jira/browse/LUCENE-8754
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 8.1, master (9.0)
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> A recent change increased the likelihood for this issue to show up but it can 
> already happen before since we are using the attributes map in the 
> StoredFieldsFormat for quite some time. I found this issue due to a test 
> failure on our CI:
> {noformat}
> 13:11:56[junit4] Suite: org.apache.lucene.index.TestIndexSorting
> 13:11:56[junit4]   2> apr 05, 2019 8:11:53 AM 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
> 13:11:56[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-507,5,TGRP-TestIndexSorting]
> 13:11:56[junit4]   2> java.util.ConcurrentModificationException
> 13:11:56[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([7C25B308F180203B]:0)
> 13:11:56[junit4]   2> at 
> java.util.HashMap$HashIterator.nextNode(HashMap.java:1442)
> 13:11:56[junit4]   2> at 
> java.util.HashMap$EntryIterator.next(HashMap.java:1476)
> 13:11:56[junit4]   2> at 
> java.util.HashMap$EntryIterator.next(HashMap.java:1474)
> 13:11:56[junit4]   2> at 
> java.util.AbstractMap.toString(AbstractMap.java:554)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.SegmentInfo.toString(SegmentInfo.java:222)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.SegmentCommitInfo.toString(SegmentCommitInfo.java:345)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.SegmentCommitInfo.toString(SegmentCommitInfo.java:364)
> 13:11:56[junit4]   2> at java.lang.String.valueOf(String.java:2994)
> 13:11:56[junit4]   2> at 
> java.lang.StringBuilder.append(StringBuilder.java:131)
> 13:11:56[junit4]   2> at 
> java.util.AbstractMap.toString(AbstractMap.java:557)
> 13:11:56[junit4]   2> at 
> java.util.Collections$UnmodifiableMap.toString(Collections.java:1493)
> 13:11:56[junit4]   2> at java.lang.String.valueOf(String.java:2994)
> 13:11:56[junit4]   2> at 
> java.lang.StringBuilder.append(StringBuilder.java:131)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.TieredMergePolicy.findForcedMerges(TieredMergePolicy.java:628)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2181)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2154)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1988)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1939)
> 13:11:56[junit4]   2> at 
> org.apache.lucene.index.TestIndexSorting$UpdateRunnable.run(TestIndexSorting.java:1851)
> 13:11:56[junit4]   2> at java.lang.Thread.run(Thread.java:748)
> 13:11:56[junit4]   2> 
> 13:11:56[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestIndexSorting -Dtests.method=testConcurrentUpdates 
> -Dtests.seed=7C25B308F180203B -Dtests.slow=true -Dtest
> {noformat}
> The issue is that we update the attributes map (also we similarly do the same 
> for diagnostics but it's not necessarily causing the issue since the 
> diagnostics map is never modified) during the merge process but access it in 
> the merge policy when looking at running merges and there we call toString on 
> SegmentCommitInfo which happens without any synchronization. This is 
> technically unsafe publication but IW is a mess along those lines and real 
> fixes would require significant changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Tomoko Uchida as Lucene/Solr committer

2019-04-08 Thread Simon Willnauer

Awesome! Welcome to this group as a committer! It’s always special to grow a 
committer base!

Simon

> On 9. Apr 2019, at 06:00, Tomás Fernández Löbbe  wrote:
> 
> Welcome!
> 
>> On Mon, Apr 8, 2019 at 5:27 PM Christian Moen  wrote:
>> Congratulations, Tomoko-san!
>> 
>>> On Tue, Apr 9, 2019 at 12:20 AM Uwe Schindler  wrote:
>>> Hi all,
>>> 
>>> Please join me in welcoming Tomoko Uchida as the latest Lucene/Solr 
>>> committer! 
>>> 
>>> She has been working on https://issues.apache.org/jira/browse/LUCENE-2562 
>>> for several years with awesome progress and finally we got the fantastic 
>>> Luke as a branch on ASF JIRA: 
>>> https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=shortlog;h=refs/heads/jira/lucene-2562-luke-swing-3
>>> Looking forward to the first release of Apache Lucene 8.1 with Luke bundled 
>>> in the distribution. I will take care of merging it to master and 8.x 
>>> branches together with her once she got the ASF account.
>>> 
>>> Tomoko also helped with the Japanese and Korean Analyzers.  
>>> 
>>> Congratulations and Welcome, Tomoko! Tomoko, it's traditional for you to 
>>> introduce yourself with a brief bio.
>>> 
>>> Uwe & Robert (who nominated Tomoko)
>>> 
>>> -
>>> Uwe Schindler
>>> Achterdiek 19, D-28357 Bremen
>>> https://www.thetaphi.de
>>> eMail: u...@thetaphi.de
>>> 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>

[jira] [Created] (LUCENE-8754) SegmentInfo#toString can cause ConcurrentModificationException

2019-04-07 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8754:
---

 Summary: SegmentInfo#toString can cause 
ConcurrentModificationException
 Key: LUCENE-8754
 URL: https://issues.apache.org/jira/browse/LUCENE-8754
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer


A recent change increased the likelihood for this issue to show up but it can 
already happen before since we are using the attributes map in the 
StoredFieldsFormat for quite some time. I found this issue due to a test 
failure on our CI:


{noformat}
13:11:56[junit4] Suite: org.apache.lucene.index.TestIndexSorting
13:11:56[junit4]   2> apr 05, 2019 8:11:53 AM 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
13:11:56[junit4]   2> WARNING: Uncaught exception in thread: 
Thread[Thread-507,5,TGRP-TestIndexSorting]
13:11:56[junit4]   2> java.util.ConcurrentModificationException
13:11:56[junit4]   2>   at 
__randomizedtesting.SeedInfo.seed([7C25B308F180203B]:0)
13:11:56[junit4]   2>   at 
java.util.HashMap$HashIterator.nextNode(HashMap.java:1442)
13:11:56[junit4]   2>   at 
java.util.HashMap$EntryIterator.next(HashMap.java:1476)
13:11:56[junit4]   2>   at 
java.util.HashMap$EntryIterator.next(HashMap.java:1474)
13:11:56[junit4]   2>   at 
java.util.AbstractMap.toString(AbstractMap.java:554)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.SegmentInfo.toString(SegmentInfo.java:222)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.SegmentCommitInfo.toString(SegmentCommitInfo.java:345)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.SegmentCommitInfo.toString(SegmentCommitInfo.java:364)
13:11:56[junit4]   2>   at java.lang.String.valueOf(String.java:2994)
13:11:56[junit4]   2>   at 
java.lang.StringBuilder.append(StringBuilder.java:131)
13:11:56[junit4]   2>   at 
java.util.AbstractMap.toString(AbstractMap.java:557)
13:11:56[junit4]   2>   at 
java.util.Collections$UnmodifiableMap.toString(Collections.java:1493)
13:11:56[junit4]   2>   at java.lang.String.valueOf(String.java:2994)
13:11:56[junit4]   2>   at 
java.lang.StringBuilder.append(StringBuilder.java:131)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.TieredMergePolicy.findForcedMerges(TieredMergePolicy.java:628)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2181)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2154)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1988)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1939)
13:11:56[junit4]   2>   at 
org.apache.lucene.index.TestIndexSorting$UpdateRunnable.run(TestIndexSorting.java:1851)
13:11:56[junit4]   2>   at java.lang.Thread.run(Thread.java:748)
13:11:56[junit4]   2> 
13:11:56[junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestIndexSorting -Dtests.method=testConcurrentUpdates 
-Dtests.seed=7C25B308F180203B -Dtests.slow=true -Dtest
{noformat}

The issue is that we update the attributes map (also we similarly do the same 
for diagnostics but it's not necessarily causing the issue since the 
diagnostics map is never modified) during the merge process but access it in 
the merge policy when looking at running merges and there we call toString on 
SegmentCommitInfo which happens without any synchronization. This is 
technically unsafe publication but IW is a mess along those lines and real 
fixes would require significant changes.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8735) FileAlreadyExistsException after opening old commit

2019-03-26 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801820#comment-16801820
 ] 

Simon Willnauer commented on LUCENE-8735:
-

thanks henning

> FileAlreadyExistsException after opening old commit
> ---
>
> Key: LUCENE-8735
> URL: https://issues.apache.org/jira/browse/LUCENE-8735
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/store
>Affects Versions: 8.0
>Reporter: Henning Andersen
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: 7.7.1, 7.7.2, 8.0.1, 8.1, master (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> FilterDirectory.getPendingDeletes() does not delegate calls. This in turn 
> means that IndexFileDeleter does not consider those as relevant files.
> When opening an IndexWriter for an older commit, excess files are attempted 
> deleted. If an IndexReader exists using one of the newer commits, the excess 
> files may fail to delete (at least on windows or when using the mocking 
> WindowsFS).
> If then closing and opening the IndexWriter, the information on the pending 
> deletes are gone if a FilterDirectory derivate is used. At the same time, the 
> pending deletes are filtered out of listAll. This leads to a risk of hitting 
> an existing file name, causing a FileAlreadyExistsException.
> This issue likely only exists on windows.
> Will create pull request with fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8735) FileAlreadyExistsException after opening old commit

2019-03-26 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8735.
-
   Resolution: Fixed
 Assignee: Simon Willnauer
Fix Version/s: 7.7.1
   8.1
   8.0.1
   7.7.2

> FileAlreadyExistsException after opening old commit
> ---
>
> Key: LUCENE-8735
> URL: https://issues.apache.org/jira/browse/LUCENE-8735
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/store
>Affects Versions: 8.0
>Reporter: Henning Andersen
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: 7.7.2, 8.0.1, 8.1, master (9.0), 7.7.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> FilterDirectory.getPendingDeletes() does not delegate calls. This in turn 
> means that IndexFileDeleter does not consider those as relevant files.
> When opening an IndexWriter for an older commit, excess files are attempted 
> deleted. If an IndexReader exists using one of the newer commits, the excess 
> files may fail to delete (at least on windows or when using the mocking 
> WindowsFS).
> If then closing and opening the IndexWriter, the information on the pending 
> deletes are gone if a FilterDirectory derivate is used. At the same time, the 
> pending deletes are filtered out of listAll. This leads to a risk of hitting 
> an existing file name, causing a FileAlreadyExistsException.
> This issue likely only exists on windows.
> Will create pull request with fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-master-Windows (32bit/jdk1.8.0_172) - Build # 7812 - Still Unstable!

2019-03-21 Thread Simon Willnauer

I pushed a fix for this, sorry for the noise. test-bug

On Thu, Mar 21, 2019 at 9:38 AM Dawid Weiss  wrote:

> Ping. Jenkins builds fail on an assertion related to the recent
> changes in fst off-heap?
>
> D.
>
> On Thu, Mar 21, 2019 at 6:46 AM Policeman Jenkins Server
>  wrote:
> >
> > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/7812/
> > Java: 32bit/jdk1.8.0_172 -client -XX:+UseG1GC
> >
> > 5 tests failed.
> > FAILED:
> org.apache.lucene.codecs.lucene50.TestBlockPostingsFormat.testFstOffHeap
> >
> > Error Message:
> >
> >
> > Stack Trace:
> > java.lang.AssertionError
> > at
> __randomizedtesting.SeedInfo.seed([4086033C7FFFE0F2:5FC7DE43004D80CC]:0)
> > at org.junit.Assert.fail(Assert.java:86)
> > at org.junit.Assert.assertTrue(Assert.java:41)
> > at org.junit.Assert.assertTrue(Assert.java:52)
> > at
> org.apache.lucene.codecs.lucene50.TestBlockPostingsFormat.testFstOffHeap(TestBlockPostingsFormat.java:90)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:498)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
> > at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> > at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> > at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> > at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> > at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> > at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> > at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> > at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
> > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
> > at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> > at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
> > at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> > at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> > at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> > at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> > at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> > at
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
> > at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> > at java.lang.Thread.run(Thread.java:748)
> >
> >
> > FAILED:
> org.apache.lucene.codecs.lucene50.TestBlockPostingsFormat.testFstOffHeap
> >
> > Error Message:
> >
> >
>

Re: [JENKINS] Lucene-Solr-8.x-Linux (32bit/jdk1.8.0_172) - Build # 288 - Still Unstable!

2019-03-21 Thread Simon Willnauer

I pushed a fix for this, sorry for the noise

On Thu, Mar 21, 2019 at 10:27 AM Policeman Jenkins Server <
jenk...@thetaphi.de> wrote:

> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-Linux/288/
> Java: 32bit/jdk1.8.0_172 -client -XX:+UseSerialGC
>
> 6 tests failed.
> FAILED:
> org.apache.lucene.codecs.lucene50.TestBlockPostingsFormat.testFstOffHeap
>
> Error Message:
>
>
> Stack Trace:
> java.lang.AssertionError
> at
> __randomizedtesting.SeedInfo.seed([418BE33A6217D2DD:5ECA3E451DA5B2E3]:0)
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertTrue(Assert.java:52)
> at
> org.apache.lucene.codecs.lucene50.TestBlockPostingsFormat.testFstOffHeap(TestBlockPostingsFormat.java:90)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
> at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
> at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
> at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
> at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at java.lang.Thread.run(Thread.java:748)
>
>
> FAILED:
> org.apache.lucene.codecs.lucene50.TestBlockPostingsFormat.testFstOffHeap
>
> Error Message:
>
>
> Stack Trace:
> java.lang.AssertionError
> at
> __randomizedtesting.SeedInfo.seed([418BE33A6217D2DD:5ECA3E451DA5B2E3]:0)
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.ass

Re: [VOTE] Master/9.0 to require Java 11

2019-03-20 Thread Simon Willnauer

+1 - Java 8 EOLed last year - moving on in 2020 is reasonable and it's our
responsibility to move with the platform we are running on.

simon

On Wed, Mar 20, 2019 at 9:27 AM Jan Høydahl  wrote:

> +1
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 19. mar. 2019 kl. 19:22 skrev Adrien Grand :
>
> Hello,
>
> Now that Lucene/Solr 8.0 has shipped I'd like us to consider requiring
> Java 11 for 9.0, currently the master branch. We had 18 months between
> 7.0 and 8.0, so if we assume a similar interval between 8.0 and 9.0
> that would mean releasing 9.0 about 2 years after Java 11, which
> sounds like a conservative requirement to me.
>
> What do you think?
>
> Here is my +1.
>
> --
> Adrien
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> 
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
>
>
>

[jira] [Resolved] (LUCENE-8700) Enable concurrent flushing when no indexing is in progress

2019-03-13 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8700.
-
Resolution: Invalid

We settled on the PR that IndexWriter#flushNextBuffer is sufficient for this 
usecase. I opened a new PR for the test-improvements. here 
https://github.com/apache/lucene-solr/pull/607

> Enable concurrent flushing when no indexing is in progress
> --
>
> Key: LUCENE-8700
> URL: https://issues.apache.org/jira/browse/LUCENE-8700
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> As discussed on mailing list, this is for adding a IndexWriter.yield() method 
> that callers can use to enable concurrent flushing. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8692) IndexWriter.getTragicException() may not reflect all corrupting exceptions (notably: NoSuchFileException)

2019-03-12 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790895#comment-16790895
 ] 

Simon Willnauer commented on LUCENE-8692:
-

> rollback gives you a way to close IndexWriter without doing a commit, which 
> seems useful.  If you removed that, what would users do instead?

can't we expend close to close without commit? I mean we can keep rollback but 
bet more strict about exceptions during the commit and friends?

> IndexWriter.getTragicException() may not reflect all corrupting exceptions 
> (notably: NoSuchFileException)
> -
>
> Key: LUCENE-8692
> URL: https://issues.apache.org/jira/browse/LUCENE-8692
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-8692.patch, LUCENE-8692.patch, LUCENE-8692.patch, 
> LUCENE-8692_test.patch
>
>
> Backstory...
> Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's 
> {{corruptFiles}} to introduce corruption into the "leader" node's index and 
> then assert that this solr node gives up it's leadership of the shard and 
> another replica takes over.
> This can currently fail sporadically (but usually reproducibly - see 
> SOLR-13237) due to the leader not giving up it's leadership even after the 
> corruption causes an update/commit to fail. Solr's leadership code makes this 
> decision after encountering an exception from the IndexWriter based on wether 
> {{IndexWriter.getTragicException()}} is (non-)null.
> 
> While investigating this, I created an isolated Lucene-Core equivilent test 
> that demonstrates the same basic situation:
>  * Gradually cause corruption on an index untill (otherwise) valid execution 
> of IW.add() + IW.commit() calls throw an exception to the IW client.
>  * assert that if an exception is thrown to the IW client, 
> {{getTragicException()}} is now non-null.
> It's fairly easy to make my new test fail reproducibly – in every situation 
> I've seen the underlying exception is a {{NoSuchFileException}} (ie: the 
> randomly introduced corruption was to delete some file).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8692) IndexWriter.getTragicException() may not reflect all corrupting exceptions (notably: NoSuchFileException)

2019-03-11 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789320#comment-16789320
 ] 

Simon Willnauer commented on LUCENE-8692:
-

{quote}
It definitely seems like there should be something we can/should do to better 
recognize situations like this as "unrecoverable" and be more strict in dealing 
with low level exceptions during things like commit – but I'm out definitely 
out of my depth in understanding/suggesting what that might look like.
{quote}

I agree with you here, I personally question the purpose of rollback since all 
the cases I have seen a missing rollback would simply mean dataloss. if 
somebody continues after a failed commit / prepareCommit / reopen they will end 
up with inconsistency and / or dataloss. I can't think of a reason why you 
would want to do it. I am curious what [~mikemccand] [~jpountz] [~rcmuir ] 
think about that. 
If we deprecated and remove rollback() we can be more agressive when it gets to 
tragic events and prevent users from continuing after such an exception by 
closing the writer automatically.



> IndexWriter.getTragicException() may not reflect all corrupting exceptions 
> (notably: NoSuchFileException)
> -
>
> Key: LUCENE-8692
> URL: https://issues.apache.org/jira/browse/LUCENE-8692
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-8692.patch, LUCENE-8692.patch, LUCENE-8692.patch, 
> LUCENE-8692_test.patch
>
>
> Backstory...
> Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's 
> {{corruptFiles}} to introduce corruption into the "leader" node's index and 
> then assert that this solr node gives up it's leadership of the shard and 
> another replica takes over.
> This can currently fail sporadically (but usually reproducibly - see 
> SOLR-13237) due to the leader not giving up it's leadership even after the 
> corruption causes an update/commit to fail. Solr's leadership code makes this 
> decision after encountering an exception from the IndexWriter based on wether 
> {{IndexWriter.getTragicException()}} is (non-)null.
> 
> While investigating this, I created an isolated Lucene-Core equivilent test 
> that demonstrates the same basic situation:
>  * Gradually cause corruption on an index untill (otherwise) valid execution 
> of IW.add() + IW.commit() calls throw an exception to the IW client.
>  * assert that if an exception is thrown to the IW client, 
> {{getTragicException()}} is now non-null.
> It's fairly easy to make my new test fail reproducibly – in every situation 
> I've seen the underlying exception is a {{NoSuchFileException}} (ie: the 
> randomly introduced corruption was to delete some file).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-03-06 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785703#comment-16785703
 ] 

Simon Willnauer commented on LUCENE-8671:
-

I don't think we should add a setter to FieldInfo. This is a code-private thing 
and should be treated this way. This looks like we need to have a way to pass 
more info down when we open new SegmentReaders. I wonder if we can accept a 
simple Map on 

{noformat}
public static DirectoryReader open(final IndexWriter writer, boolean 
applyAllDeletes, boolean writeAllDeletes) throws IOException
{noformat}

We can then pass it down to the relevant parts and make it part of 
`SegmentReaderState`? This map can also be passed via IndexWriterConfig for the 
NRT case. That way we can pass stuff per DirectoryReader open which is what we 
want I guess. 


> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8692) IndexWriter.getTragicException() nay not reflect all corrupting exceptions (notably: NoSuchFileException)

2019-03-06 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785625#comment-16785625
 ] 

Simon Willnauer commented on LUCENE-8692:
-

{quote}
For now I've updated the patch to take the simplest possible approach to 
checking for MergeAbortedException
{quote}


+1

{quote}
Well, to flip your question around: is there an example of a Throwable you can 
think of bubbling up out of IndexWriter.startCommit() that should NOT be 
considered fatal?
{quote}
I think we need to be careful here. From my perspective there are 3 types of 
exceptions here:
 * unrecoverable exceptions aka. VirtualMachineErrors
 * exceptions that happen during indexing and are not recoverable (these are 
handled in DocumentsWriter)
 * exceptions that cause dataloss or inconsistencies (we didn't handle those as 
fatal yet at least not consistently) but we only catch VirtualMachineError.

Those are in particular:

 * getReader()
 * deleteAll()
 * addIndexes()
 * flushNextBuffer()
 * prepareCommitInternal() 
 * doFlush()
 * startCommit()

Those methods might cause documents go missing etc. but we treated them not as 
fatal or tragic events since a user could always call rollback() to go back the 
the last known safe-point / previous commit. Now we can debate if we want to 
change this and we can, in-fact I am all for making it even more strict 
especially since it's inconsistent with what we do if addDocument fails with an 
aborting exception. 
If we do that we need to see if rollback still has a purpose and maybe remove 
it?

now speaking of maybeMerge I don't see why we need to close the index writer 
with a tragic event, there is no dataloss nor an inconsistency? From that logic 
I don't think we need to handle these exceptions in such a drastic way?

{quote}
I don't use github for lucene development – I track all contributions as 
patches in the official issue tracker for the project as recommended by our 
official guidelines : )  ... but i'll go ahead and create a jira/LUCENE-8692 
branch if that will help you review.
{quote}

Bummer, I am not sure branches help. Working like it's still 1999 is a pain we 
should fix our guidelines.



> IndexWriter.getTragicException() nay not reflect all corrupting exceptions 
> (notably: NoSuchFileException)
> -
>
> Key: LUCENE-8692
> URL: https://issues.apache.org/jira/browse/LUCENE-8692
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-8692.patch, LUCENE-8692.patch, LUCENE-8692.patch, 
> LUCENE-8692_test.patch
>
>
> Backstory...
> Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's 
> {{corruptFiles}} to introduce corruption into the "leader" node's index and 
> then assert that this solr node gives up it's leadership of the shard and 
> another replica takes over.
> This can currently fail sporadically (but usually reproducibly - 
> seeSOLR-13237) due to the leader not giving up it's leadership even after the 
> corruption causes an update/commit to fail.  Solr's leadership code makes 
> this decision after encountering an exception from the IndexWriter based on 
> wether {{IndexWriter.getTragicException()}} is (non-)null.
> 
> While investigating this, I created an isolated Lucene-Core equivilent test 
> that demonstrates the same basic situation:
> * Gradually cause corruption on an index untill (otherwise) valid execution 
> of IW.add() + IW.commit() calls throw an exception to the IW client.
> * assert that if an exception is thrown to the IW client, 
> {{getTragicException()}} is now non-null.
> It's fairly easy to make my new test fail reproducibly -- in every situation 
> I've seen the underlying exception is a {{NoSuchFileException}} (ie: the 
> randomly introduced corruption was to delete some file).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8692) IndexWriter.getTragicException() nay not reflect all corrupting exceptions (notably: NoSuchFileException)

2019-03-05 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784438#comment-16784438
 ] 

Simon Willnauer commented on LUCENE-8692:
-



{noformat}
I think there is an issue with the patch with MergeAbortedExeption indeed given 
that registerMerge might throw such an exception. Maybe we should move this try 
block to registerMerge instead where we know which OneMerge is being registered 
(and is also where the exception is thrown when estimating the size of the 
merge).
{noformat}

+1

{code:java}
-} catch (VirtualMachineError tragedy) {
+} catch (Throwable tragedy) {
   tragicEvent(tragedy, "startCommit");
{code}

I am not sure why we need to treat every exception as fatal in this case?

I also wonder if we could move this to a PR on github, iterations would be 
simpler and comments too. I can't tell which patch is relevant which one isn't.

> IndexWriter.getTragicException() nay not reflect all corrupting exceptions 
> (notably: NoSuchFileException)
> -
>
> Key: LUCENE-8692
> URL: https://issues.apache.org/jira/browse/LUCENE-8692
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-8692.patch, LUCENE-8692.patch, 
> LUCENE-8692_test.patch
>
>
> Backstory...
> Solr has a "LeaderTragicEventTest" which uses MockDirectoryWrapper's 
> {{corruptFiles}} to introduce corruption into the "leader" node's index and 
> then assert that this solr node gives up it's leadership of the shard and 
> another replica takes over.
> This can currently fail sporadically (but usually reproducibly - 
> seeSOLR-13237) due to the leader not giving up it's leadership even after the 
> corruption causes an update/commit to fail.  Solr's leadership code makes 
> this decision after encountering an exception from the IndexWriter based on 
> wether {{IndexWriter.getTragicException()}} is (non-)null.
> 
> While investigating this, I created an isolated Lucene-Core equivilent test 
> that demonstrates the same basic situation:
> * Gradually cause corruption on an index untill (otherwise) valid execution 
> of IW.add() + IW.commit() calls throw an exception to the IW client.
> * assert that if an exception is thrown to the IW client, 
> {{getTragicException()}} is now non-null.
> It's fairly easy to make my new test fail reproducibly -- in every situation 
> I've seen the underlying exception is a {{NoSuchFileException}} (ie: the 
> randomly introduced corruption was to delete some file).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking

2019-02-20 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773011#comment-16773011
 ] 

Simon Willnauer commented on LUCENE-3041:
-

[~romseygeek] any chance you can open a PR for this. Patches are so hard to 
review and comment on  

> Support Query Visting / Walking
> ---
>
> Key: LUCENE-3041
> URL: https://issues.apache.org/jira/browse/LUCENE-3041
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Chris Male
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 4.9, 6.0
>
> Attachments: LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch, 
> LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch
>
>
> Out of the discussion in LUCENE-2868, it could be useful to add a generic 
> Query Visitor / Walker that could be used for more advanced rewriting, 
> optimizations or anything that requires state to be stored as each Query is 
> visited.
> We could keep the interface very simple:
> {code}
> public interface QueryVisitor {
>   Query visit(Query query);
> }
> {code}
> and then use a reflection based visitor like Earwin suggested, which would 
> allow implementators to provide visit methods for just Querys that they are 
> interested in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods

2019-02-18 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771013#comment-16771013
 ] 

Simon Willnauer commented on LUCENE-8292:
-

[~dsmiley] I coordinated this with [~romseygeek] given that we had to respin 
for https://issues.apache.org/jira/browse/SOLR-13126 anyhow. 

> Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
> --
>
> Key: LUCENE-8292
> URL: https://issues.apache.org/jira/browse/LUCENE-8292
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2.1
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: trunk, 8.0, 8.x, master (9.0)
>
> Attachments: 
> 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, 
> LUCENE-8292.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many 
> methods.
> It misses some seekExact() methods, thus it is not possible to the delegate 
> to override these methods to have specific behavior (unlike the TermsEnum API 
> which allows that).
> The fix is straightforward: simply override these seekExact() methods and 
> delegate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods

2019-02-15 Thread Simon Willnauer

I spoke to Alan about this before pushing and we have an unresolved solr 
blocker too

> On 15. Feb 2019, at 22:56, David Smiley (JIRA)  wrote:
> 
> 
>[ 
> https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769780#comment-16769780
>  ] 
> 
> David Smiley commented on LUCENE-8292:
> --
> 
> Thanks Simon.  I didn't think this could get in to 8.x at the last second or 
> I would have volunteered.  FYI [~romseygeek] so you're aware.
> 
>> Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
>> --
>> 
>>Key: LUCENE-8292
>>URL: https://issues.apache.org/jira/browse/LUCENE-8292
>>Project: Lucene - Core
>> Issue Type: Bug
>> Components: core/index
>>   Affects Versions: 7.2.1
>>   Reporter: Bruno Roustant
>>   Priority: Major
>>Fix For: trunk, 8.0, 8.x, master (9.0)
>> 
>>Attachments: 
>> 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, 
>> LUCENE-8292.patch
>> 
>> Time Spent: 0.5h
>> Remaining Estimate: 0h
>> 
>> FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many 
>> methods.
>> It misses some seekExact() methods, thus it is not possible to the delegate 
>> to override these methods to have specific behavior (unlike the TermsEnum 
>> API which allows that).
>> The fix is straightforward: simply override these seekExact() methods and 
>> delegate.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods

2019-02-15 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8292.
-
   Resolution: Fixed
Fix Version/s: master (9.0)
   8.x
   8.0

> Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
> --
>
> Key: LUCENE-8292
> URL: https://issues.apache.org/jira/browse/LUCENE-8292
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2.1
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: trunk, 8.0, 8.x, master (9.0)
>
> Attachments: 
> 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, 
> LUCENE-8292.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many 
> methods.
> It misses some seekExact() methods, thus it is not possible to the delegate 
> to override these methods to have specific behavior (unlike the TermsEnum API 
> which allows that).
> The fix is straightforward: simply override these seekExact() methods and 
> delegate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods

2019-02-15 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769324#comment-16769324
 ] 

Simon Willnauer commented on LUCENE-8292:
-

I opened a PR here https://github.com/apache/lucene-solr/pull/574

> Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
> --
>
> Key: LUCENE-8292
> URL: https://issues.apache.org/jira/browse/LUCENE-8292
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2.1
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: trunk
>
> Attachments: 
> 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, 
> LUCENE-8292.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many 
> methods.
> It misses some seekExact() methods, thus it is not possible to the delegate 
> to override these methods to have specific behavior (unlike the TermsEnum API 
> which allows that).
> The fix is straightforward: simply override these seekExact() methods and 
> delegate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8292) Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods

2019-02-13 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767061#comment-16767061
 ] 

Simon Willnauer commented on LUCENE-8292:
-

I do see both points here. [~dsmiley] I hate how trappy this is and [~jpountz] 
I completely agree with you. My suggestions here would be to add an additional 
class TermsEnum that has all methods abstract and BaseTermsEnum that can add 
default impls. FilterTermsEnum then subclasses TermsEnum and does the right 
thing. Other classes that don't need to override stuff like seekExact and 
seek(BytesRef, TermState) / TermState termState() can simply subclass 
BaseTermsEnum and we don't have to duplicate code all over the place. I don't 
think we need to do this in other places were we have the same pattern but in 
this case the traps are significant and we can fix it with a simple class 
in-between?



> Fix FilterLeafReader.FilterTermsEnum to delegate all seekExact methods
> --
>
> Key: LUCENE-8292
> URL: https://issues.apache.org/jira/browse/LUCENE-8292
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.2.1
>Reporter: Bruno Roustant
>Priority: Major
> Fix For: trunk
>
> Attachments: 
> 0001-Fix-FilterLeafReader.FilterTermsEnum-to-delegate-see.patch, 
> LUCENE-8292.patch
>
>
> FilterLeafReader#FilterTermsEnum wraps another TermsEnum and delegates many 
> methods.
> It misses some seekExact() methods, thus it is not possible to the delegate 
> to override these methods to have specific behavior (unlike the TermsEnum API 
> which allows that).
> The fix is straightforward: simply override these seekExact() methods and 
> delegate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8662) Change TermsEnum.seekExact(BytesRef) to abstract + delegate seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-02-08 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763762#comment-16763762
 ] 

Simon Willnauer commented on LUCENE-8662:
-

[~tomasflobbe] yes I think this should go into 8.0 - feel free to pull it in, I 
will do it next week once I am back at the keyboard.

> Change TermsEnum.seekExact(BytesRef) to abstract + delegate 
> seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> ---
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Solr uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
> I added System.out.println("ord: " + ord); in 
> codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt. 
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8664) Add equals/hashcode to TotalHits

2019-01-30 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8664.
-
   Resolution: Fixed
Fix Version/s: master (9.0)
   8.0

> Add equals/hashcode to TotalHits
> 
>
> Key: LUCENE-8664
> URL: https://issues.apache.org/jira/browse/LUCENE-8664
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
> Fix For: 8.0, master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it would be convenient to add equals/hashcode methods to the 
> TotalHits class. I opened a PR here: 
> [https://github.com/apache/lucene-solr/pull/552] .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8664) Add equals/hashcode to TotalHits

2019-01-30 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756032#comment-16756032
 ] 

Simon Willnauer commented on LUCENE-8664:
-

pushed - thanks [~lucacavanna]

> Add equals/hashcode to TotalHits
> 
>
> Key: LUCENE-8664
> URL: https://issues.apache.org/jira/browse/LUCENE-8664
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
> Fix For: 8.0, master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it would be convenient to add equals/hashcode methods to the 
> TotalHits class. I opened a PR here: 
> [https://github.com/apache/lucene-solr/pull/552] .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [DISCUSS] Opening old indices for reading

2019-01-29 Thread Simon Willnauer

thanks folks,

these are all good points. I created a first cut of what I had in mind
[1] . It's relatively simple and from a java visibility perspective
the only change that a user can take advantage of is this [2] and this
[3] respectively. This would allow opening indices back to Lucene 7.0
given that the codecs and postings formats are available. From a
documentation perspective I added [4]. Thisi s a pure read-only change
and doesn't allow opening these indices for writing. You can't merge
them neither would you be able to open an index writer on top of it. I
still need to add support to Check-Index but that's what it is
basically.

lemme know what you think,

simon
[1] 
https://github.com/apache/lucene-solr/commit/0c4c885214ef30627a01e320f9c861dc2521b752
[2] 
https://github.com/apache/lucene-solr/commit/0c4c885214ef30627a01e320f9c861dc2521b752#diff-e0352098b027d6f41a17c068ad8d7ef0R689
[3] 
https://github.com/apache/lucene-solr/commit/0c4c885214ef30627a01e320f9c861dc2521b752#diff-e3ccf9ee90355b10f2dd22ce2da6c73cR306
[4] 
https://github.com/apache/lucene-solr/commit/0c4c885214ef30627a01e320f9c861dc2521b752#diff-1bedf4d0d52ff88ef8a16a6788ad7684R86

On Fri, Jan 25, 2019 at 3:14 PM Michael McCandless
 wrote:
>
> Another example is long ago Lucene allowed pos=-1 to be indexed and it caused 
> all sorts of problems.  We also stopped allowing positions close to 
> Integer.MAX_VALUE (https://issues.apache.org/jira/browse/LUCENE-6382).  Yet 
> another is allowing negative vInts which are possible but horribly 
> inefficient (https://issues.apache.org/jira/browse/LUCENE-3738).
>
> We do need to be free to fix these problems and then know after N+2 releases 
> that no index can have the issue.
>
> I like the idea of providing "expert" / best effort / limited way of carrying 
> forward such ancient indices, but I think the huge challenge for someone 
> using that tool on an important index will be enumerating the list of issues 
> that might "matter" (the 3 Adrien listed + the 3 I listed above is a start 
> for this list) and taking appropriate steps to "correct" the index if so.  
> E.g. on a norms encoding change, somehow these expert tools must decode norms 
> the old way, encode them the new way, and then rewrite the norms files.  Or 
> if the index has pos=-1, changing that to pos=0.  Or if it has negative 
> vInts, ... etc.
>
> Or maybe the "special" DirectoryReader only reads stored fields?  And so you 
> would enumerate your _source and reindex into the latest format ...
>
> > Something like https://issues.apache.org/jira/browse/LUCENE-8277 would
> > help make it harder to introduce corrupt data in an index.
>
> +1
>
> Every time we catch something like "don't allow pos = -1 into the index" we 
> need somehow remember to go and add the check also in addIndices.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Jan 25, 2019 at 3:52 AM Adrien Grand  wrote:
>>
>> Agreed with Michael that setting expectations is going to be
>> important. The thing that I would like to make sure is that we would
>> never refrain from moving Lucene forward because of this feature. In
>> particular, lucene-core should be free to make assumptions that are
>> valid for N and N-1 indices without worrying about the fact that we
>> have this super-expert feature that allows opening older indices. Here
>> are some assumptions that I have in mind which have not always been
>> true:
>>  - norms might be encoded in a different way (this changed in 7)
>>  - all index files have a checksum (only true since Lucene 5)
>>  - offsets are always going forward (only enforced since Lucene 7)
>>
>> This means that carrying indices over by just merging them with the
>> new version to move them to a new codec won't work all the time. For
>> instance if your index has backward offsets and new codecs assume that
>> offsets are going forward, then merging might fail or corrupt offsets
>> - I'd like to make sure that we would not consider this a bug.
>>
>> Erick, I don't think this feature would be suitable for "robust index
>> upgrades". To me it is really a best effort and shouldn't be trusted
>> too much.
>>
>> I think some users will be tempted to wrap old readers to make them
>> look good and then add them back to an index using addIndexes?
>> Something like https://issues.apache.org/jira/browse/LUCENE-8277 would
>> help make it harder to introduce corrupt data in an index.
>>
>> On Wed, Jan 23, 2019 at 3:11 PM Simon Willnauer
>>  wrote:
>> >
>> > Hey folks,
>> >
>> > tl;dr; I want to be able

[jira] [Commented] (LUCENE-8664) Add equals/hashcode to TotalHits

2019-01-29 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754987#comment-16754987
 ] 

Simon Willnauer commented on LUCENE-8664:
-

[~lucacavanna] what's the usecase for this? Why are you trying to put this into 
a map or something? Can you explain this a bit further?

> Add equals/hashcode to TotalHits
> 
>
> Key: LUCENE-8664
> URL: https://issues.apache.org/jira/browse/LUCENE-8664
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Priority: Minor
>
> I think it would be convenient to add equals/hashcode methods to the 
> TotalHits class. I opened a PR here: 
> [https://github.com/apache/lucene-solr/pull/552] .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8662) Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum

2019-01-29 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754984#comment-16754984
 ] 

Simon Willnauer commented on LUCENE-8662:
-


{noformat}
If we think that it's a trap, we should remove the default impl and make it 
abstract (in 8.0).
{noformat}
I agree with this. I think it can be trappy and such an expert API shouldn't. 
Let make it abstract?

> Override seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum
> 
>
> Key: LUCENE-8662
> URL: https://issues.apache.org/jira/browse/LUCENE-8662
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 5.5.5, 6.6.5, 7.6, 8.0
>Reporter: jefferyyuan
>Priority: Major
>  Labels: query
> Fix For: 8.0, 7.7
>
> Attachments: output of test program.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently in our production, we found that Sole uses a lot of memory(more than 
> 10g) during recovery or commit for a small index (3.5gb)
>  The stack trace is:
>  
> {code:java}
> Thread 0x4d4b115c0 
>   at org.apache.lucene.store.DataInput.readVInt()I (DataInput.java:125) 
>   at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock()V 
> (SegmentTermsEnumFrame.java:157) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTermNonLeaf(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:786) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.scanToTerm(Lorg/apache/lucene/util/BytesRef;Z)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnumFrame.java:538) 
>   at 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (SegmentTermsEnum.java:757) 
>   at 
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekCeil(Lorg/apache/lucene/util/BytesRef;)Lorg/apache/lucene/index/TermsEnum$SeekStatus;
>  (FilterLeafReader.java:185) 
>   at 
> org.apache.lucene.index.TermsEnum.seekExact(Lorg/apache/lucene/util/BytesRef;)Z
>  (TermsEnum.java:74) 
>   at 
> org.apache.solr.search.SolrIndexSearcher.lookupId(Lorg/apache/lucene/util/BytesRef;)J
>  (SolrIndexSearcher.java:823) 
>   at 
> org.apache.solr.update.VersionInfo.getVersionFromIndex(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:204) 
>   at 
> org.apache.solr.update.UpdateLog.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (UpdateLog.java:786) 
>   at 
> org.apache.solr.update.VersionInfo.lookupVersion(Lorg/apache/lucene/util/BytesRef;)Ljava/lang/Long;
>  (VersionInfo.java:194) 
>   at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(Lorg/apache/solr/update/AddUpdateCommand;)Z
>  (DistributedUpdateProcessor.java:1051)  
> {code}
> We reproduced the problem locally with the following code using Lucene code.
> {code:java}
> public static void main(String[] args) throws IOException {
>   FSDirectory index = FSDirectory.open(Paths.get("the-index"));
>   try (IndexReader reader = new   
> ExitableDirectoryReader(DirectoryReader.open(index),
> new QueryTimeoutImpl(1000 * 60 * 5))) {
> String id = "the-id";
> BytesRef text = new BytesRef(id);
> for (LeafReaderContext lf : reader.leaves()) {
>   TermsEnum te = lf.reader().terms("id").iterator();
>   System.out.println(te.seekExact(text));
> }
>   }
> }
> {code}
>  
> I added System.out.println("ord: " + ord); in 
> codecs.blocktree.SegmentTermsEnum.getFrame(int).
> Please check the attached output of test program.txt. 
>  
> We found out the root cause:
> we didn't implement seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms, so it uses the base class 
> TermsEnum.seekExact(BytesRef) implementation which is very inefficient in 
> this case.
> {code:java}
> public boolean seekExact(BytesRef text) throws IOException {
>   return seekCeil(text) == SeekStatus.FOUND;
> }
> {code}
> The fix is simple, just override seekExact(BytesRef) method in 
> FilterLeafReader.FilterTerms
> {code:java}
> @Override
> public boolean seekExact(BytesRef text) throws IOException {
>   return in.seekExact(text);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[DISCUSS] Opening old indices for reading

2019-01-23 Thread Simon Willnauer

Hey folks,

tl;dr; I want to be able to open an indexreader on an old index if the
SegmentInfo version is supported and all segment codecs are available.
Today that's not possible even if I port old formats to current
versions.

Our BWC policy for quite a while has been N-1 major versions. That's
good and I think we should keep it that way. Only recently, caused by
changes how we encode/decode norms we also hard-enforce a the
index-version-created in several places and the version a segment was
written with. These are great enforcements and I understand why. My
request here is if we can find consensus on allowing somehow (a
special DirectoryReader for instance) to open such an index for
reading only that doesn't provide the guarantees that our high level
APIs decode norms correctly for instance. This would be enough to for
instance consume stored fields etc. for reindexing or if a users are
aware do they norms decoding in the codec. I am happy to work on a
proposal how this would work. It would still enforce no writing or
anything like this. I am also all for putting such a reader into misc
and being experimental.

simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8639) SeqNo accounting in IW is broken if many threads start indexing while we flush.

2019-01-16 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8639.
-
   Resolution: Fixed
Fix Version/s: master (9.0)
   7.7
   8.0

> SeqNo accounting in IW is broken if many threads start indexing while we 
> flush.
> ---
>
> Key: LUCENE-8639
> URL: https://issues.apache.org/jira/browse/LUCENE-8639
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: 8.0, 7.7, master (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While this is rare in the wild we have a test failure that shows that our 
> seqNo accounting is broken  when we carry over seqNo to a new delete queue. 
> We had this test-failure:
> {noformat}
> 6:06:08[junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs
> 16:06:08[junit4]   2> ??? 14, 2019 9:05:46 ? 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
> 16:06:08[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-8,5,TGRP-TestIndexTooManyDocs]
> 16:06:08[junit4]   2> java.lang.AssertionError: seqNo=7 vs maxSeqNo=6
> 16:06:08[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
> 16:06:08[junit4]   2> at java.lang.Thread.run(Thread.java:748)
> 16:06:08[junit4]   2> 
> 16:06:08[junit4]   2> ??? 14, 2019 9:05:46 ? 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
> 16:06:08[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-9,5,TGRP-TestIndexTooManyDocs]
> 16:06:08[junit4]   2> java.lang.AssertionError: seqNo=6 vs maxSeqNo=6
> 16:06:08[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
> 16:06:08[junit4]   2> at java.lang.Thread.run(Thread.java:748)
> 16:06:08[junit4]   2> 
> 16:06:08[junit4]   2> ??? 14, 2019 11:05:45 ? 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
> 16:06:08

[jira] [Commented] (LUCENE-8639) SeqNo accounting in IW is broken if many threads start indexing while we flush.

2019-01-15 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743155#comment-16743155
 ] 

Simon Willnauer commented on LUCENE-8639:
-

[~mikemccand] can you take a look at the PR?

> SeqNo accounting in IW is broken if many threads start indexing while we 
> flush.
> ---
>
> Key: LUCENE-8639
> URL: https://issues.apache.org/jira/browse/LUCENE-8639
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While this is rare in the wild we have a test failure that shows that our 
> seqNo accounting is broken  when we carry over seqNo to a new delete queue. 
> We had this test-failure:
> {noformat}
> 6:06:08[junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs
> 16:06:08[junit4]   2> ??? 14, 2019 9:05:46 ? 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
> 16:06:08[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-8,5,TGRP-TestIndexTooManyDocs]
> 16:06:08[junit4]   2> java.lang.AssertionError: seqNo=7 vs maxSeqNo=6
> 16:06:08[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
> 16:06:08[junit4]   2> at java.lang.Thread.run(Thread.java:748)
> 16:06:08[junit4]   2> 
> 16:06:08[junit4]   2> ??? 14, 2019 9:05:46 ? 
> com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
>  uncaughtException
> 16:06:08[junit4]   2> WARNING: Uncaught exception in thread: 
> Thread[Thread-9,5,TGRP-TestIndexTooManyDocs]
> 16:06:08[junit4]   2> java.lang.AssertionError: seqNo=6 vs maxSeqNo=6
> 16:06:08[junit4]   2> at 
> __randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
> 16:06:08[junit4]   2> at 
> org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
> 16:06:08[junit4]   2> at java.lang.Thread.run(Thread.java:748)
> 16:06:08[junit4]   2> 
> 16:06:08[junit4]   2> ??? 14, 2019 11:05:45 ? 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
> 16:06:08[junit4]   2> WARNING: Suite execution timed out: 
>

[jira] [Created] (LUCENE-8639) SeqNo accounting in IW is broken if many threads start indexing while we flush.

2019-01-15 Thread Simon Willnauer (JIRA)

Simon Willnauer created LUCENE-8639:
---

 Summary: SeqNo accounting in IW is broken if many threads start 
indexing while we flush.
 Key: LUCENE-8639
 URL: https://issues.apache.org/jira/browse/LUCENE-8639
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Simon Willnauer


While this is rare in the wild we have a test failure that shows that our seqNo 
accounting is broken  when we carry over seqNo to a new delete queue. We had 
this test-failure:

{noformat}
6:06:08[junit4] Suite: org.apache.lucene.index.TestIndexTooManyDocs
16:06:08[junit4]   2> ??? 14, 2019 9:05:46 ? 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
16:06:08[junit4]   2> WARNING: Uncaught exception in thread: 
Thread[Thread-8,5,TGRP-TestIndexTooManyDocs]
16:06:08[junit4]   2> java.lang.AssertionError: seqNo=7 vs maxSeqNo=6
16:06:08[junit4]   2>   at 
__randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
16:06:08[junit4]   2>   at java.lang.Thread.run(Thread.java:748)
16:06:08[junit4]   2> 
16:06:08[junit4]   2> ??? 14, 2019 9:05:46 ? 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
16:06:08[junit4]   2> WARNING: Uncaught exception in thread: 
Thread[Thread-9,5,TGRP-TestIndexTooManyDocs]
16:06:08[junit4]   2> java.lang.AssertionError: seqNo=6 vs maxSeqNo=6
16:06:08[junit4]   2>   at 
__randomizedtesting.SeedInfo.seed([43B7C75B765AFEBD]:0)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterDeleteQueue.getNextSequenceNumber(DocumentsWriterDeleteQueue.java:482)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:168)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterDeleteQueue.add(DocumentsWriterDeleteQueue.java:146)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterPerThread.finishDocument(DocumentsWriterPerThread.java:362)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:264)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1586)
16:06:08[junit4]   2>   at 
org.apache.lucene.index.TestIndexTooManyDocs.lambda$testIndexTooManyDocs$0(TestIndexTooManyDocs.java:70)
16:06:08[junit4]   2>   at java.lang.Thread.run(Thread.java:748)
16:06:08[junit4]   2> 
16:06:08[junit4]   2> ??? 14, 2019 11:05:45 ? 
com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
16:06:08[junit4]   2> WARNING: Suite execution timed out: 
org.apache.lucene.index.TestIndexTooManyDocs
16:06:08[junit4]   2>1) Thread[id=20, 
name=SUITE-TestIndexTooManyDocs-seed#[43B7C75B765AFEBD], state=RUNNABLE, 
group=TGRP-TestIndexTooManyDocs]
16:06:08[junit4]   2> at 
java.lang.Thread.getStackTrace(Thread.java:1559)
16:06:08[junit4]   2> at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:696)
16:06:08[junit4]   2> at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$4.run(ThreadLeakControl.java:693)
16:06:08[junit4]   2> at 
java.security.AccessController.doPrivileged(Native Method)
16:06:08[junit4]   2> at 
com.carrotsearch.

[jira] [Commented] (LUCENE-8525) throw more specific exception on data corruption

2019-01-11 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740186#comment-16740186
 ] 

Simon Willnauer commented on LUCENE-8525:
-

I do agree with [~rcmuir] here. There is not much to do in terms of detecting 
this particular problem on DataInput and friends. One way to improve this would 
certainly be the wording on the java doc. We can just clarify that detecting 
_CorruptIndexException_ is best effort. 
Another idea is to checksum the entire file before we read the commit we can 
either do this on the Elasticsearch end or improve _SegmentInfos#readCommit_ . 
Reading this file twice isn't a big deal I guess.

> throw more specific exception on data corruption
> 
>
> Key: LUCENE-8525
> URL: https://issues.apache.org/jira/browse/LUCENE-8525
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Vladimir Dolzhenko
>Priority: Major
>
> DataInput throws generic IOException if data looks odd
> [DataInput:141|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L141]
> there are other examples like 
> [BufferedIndexInput:219|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java#L219],
>  
> [CompressionMode:226|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L226]
>  and maybe 
> [DocIdsWriter:81|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java#L81]
> That leads to some difficulties - see [elasticsearch 
> #34322|https://github.com/elastic/elasticsearch/issues/34322]
> It would be better if it throws more specific exception.
> As a consequence 
> [SegmentInfos.readCommit|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L281]
>  violates its own contract
> {code:java}
> /**
>* @throws CorruptIndexException if the index is corrupt
>* @throws IOException if there is a low-level IO error
>*/
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8609) Allow getting consistent docstats from IndexWriter

2018-12-15 Thread Simon Willnauer (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722290#comment-16722290
 ] 

Simon Willnauer commented on LUCENE-8609:
-

[~sokolov] I opened [https://github.com/mikemccand/luceneutil/pull/28/] /cc 
[~mikemccand]

> Allow getting consistent docstats from IndexWriter
> --
>
> Key: LUCENE-8609
> URL: https://issues.apache.org/jira/browse/LUCENE-8609
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (8.0), 7.7
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (8.0), 7.7
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough
> to get all stats for the current index but it's subject to concurrency
> and might return numbers that are not consistent ie. some cases can
> return maxDoc < numDocs which is undesirable. This change adds a 
> getDocStats()
> method to index writer to allow fetching consistent numbers for these 
> stats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (LUCENE-8609) Allow getting consistent docstats from IndexWriter

2018-12-15 Thread Simon Willnauer




What benchmarks are you talking about? Can you link them?

> On 14. Dec 2018, at 23:47, Mike Sokolov (JIRA)  wrote:
> 
> 
>[ 
> https://issues.apache.org/jira/browse/LUCENE-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721848#comment-16721848
>  ] 
> 
> Mike Sokolov commented on LUCENE-8609:
> --
> 
> I think this will break nightly benchmarks? Anyway I'm currently getting 
> compile errors there
> 
>> Allow getting consistent docstats from IndexWriter
>> --
>> 
>>Key: LUCENE-8609
>>URL: https://issues.apache.org/jira/browse/LUCENE-8609
>>Project: Lucene - Core
>>     Issue Type: Improvement
>>   Affects Versions: master (8.0), 7.7
>>   Reporter: Simon Willnauer
>>   Priority: Major
>>Fix For: master (8.0), 7.7
>> 
>> Time Spent: 50m
>> Remaining Estimate: 0h
>> 
>> Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough
>>to get all stats for the current index but it's subject to concurrency
>>and might return numbers that are not consistent ie. some cases can
>>return maxDoc < numDocs which is undesirable. This change adds a 
>> getDocStats()
>>method to index writer to allow fetching consistent numbers for these 
>> stats.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8609) Allow getting consistent docstats from IndexWriter

2018-12-14 Thread Simon Willnauer (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-8609.
-
   Resolution: Fixed
Fix Version/s: 7.7
   master (8.0)

thanks everybody

> Allow getting consistent docstats from IndexWriter
> --
>
> Key: LUCENE-8609
> URL: https://issues.apache.org/jira/browse/LUCENE-8609
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (8.0), 7.7
>    Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (8.0), 7.7
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  Today we have #numDocs() and #maxDoc() on IndexWriter. This is enough
> to get all stats for the current index but it's subject to concurrency
> and might return numbers that are not consistent ie. some cases can
> return maxDoc < numDocs which is undesirable. This change adds a 
> getDocStats()
> method to index writer to allow fetching consistent numbers for these 
> stats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3801 matches

Mail list logo