Re: hybrid document routing

2020-08-10 Thread Gus Heck
Sounds like complex ACLs based on group memberships that use graph queries
? that would require local ACL's...

On Mon, Aug 10, 2020 at 5:56 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> This seems like an XY problem. Would it be possible to describe the
> original problem that led you to this solution (in the prototype)? Also, do
> you think folks at solr-users@ list would have more ideas related to this
> usecase and cross posting there would help?
>
> On Tue, 11 Aug, 2020, 1:43 am David Smiley,  wrote:
>
>> Are you sure you need the docs in the same shard when maybe you could
>> assume a core exists on each node and then do a query-time join?
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Mon, Aug 10, 2020 at 2:34 PM Joel Bernstein 
>> wrote:
>>
>>> I have a situation where I'd like to have the standard compositeId
>>> router in place for a collection. But, I'd like certain documents (ACL
>>> documents) to be duplicated on each shard in the collection. To achieve the
>>> level of access control performance and scalability I'm looking for I need
>>> the ACL records to be in the same core as the main documents.
>>>
>>> I put together a prototype where the compositeId router accepted
>>> implicit routing parameters and it worked in my testing. Before I open a
>>> ticket suggesting this approach I wonder what other people thought the best
>>> approach would be to accomplish this goal.
>>>
>>>
>>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: SOLR-13412 (Make the Lucene Luke module available from a Solr distribution)

2020-08-10 Thread Erick Erickson
See comments on the JIRA. Short form: let’s not do this.

> On Aug 10, 2020, at 7:47 PM, Tomoko Uchida  
> wrote:
> 
> Thanks David, for the information. I agree with Luke - a GUI app which needs 
> Window system - is not inherently suited to a Server application.
> 
> > if Docker could run GUI apps 
> This reminds me an elasticsearch user once notified us he/she worked on 
> Dockernized Luke. I refused to merge it at that time (the integration had 
> just been ongoing then), but we could revisit it.
> https://github.com/DmitryKey/luke/issues/162
> 
> There may be a few options to materialize the goal... the most natural 
> direction is, I think, to improve LukeRequestHandler ;) 
> Or, CUI application might be more suitable for some situations 
> (https://github.com/javasoze/clue) ?
> Until we find somewhat sensible ways, I am totally fine with the current way 
> of doing, just download Lucene package and use it.
> 
> Tomoko
> 
> 
> 2020年8月11日(火) 5:37 David Smiley :
> There's a decent tutorial here: https://sematext.com/blog/solr-plugins-system/
> But it's unclear if a standalone tool like Luke is really sensible as a Solr 
> "plug-in" because it does not "plug-in" to Solr; it does not live within Solr 
> in any way.
> 
> It'd be interesting if Docker could run GUI apps or if somehow Luke could run 
> as an Applet or something.  Or maybe "java web start" but I thought that 
> technology might be dead.
> 
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
> 
> 
> On Sun, Aug 9, 2020 at 8:57 PM Tomoko Uchida  
> wrote:
> I don't know anything about Solr packages, is there any guide for plugin 
> developers / maintainers? Also, maybe an official host server for the plugin 
> is needed?
> In general Luke is just an ordinary JAR, all you need is downloading the 
> correct version of it and setting the right classpaths.
> If there is proper documentation and others think it's somewhat beneficial 
> that solr has the "Luke plugin", I'd be happy to add it my todo list (or it'd 
> be perfectly fit for "newdevs", I think).
> 
> Tomoko
> 
> 
> 2020年8月10日(月) 4:39 Erick Erickson :
> Tomoko:
> 
> Indeed,  this is what is behind my question about whether it should be a 
> package for Solr rather than something in the standard distro. The more I 
> think about this, it’s hard to justify it being part of the standard distro 
> rather than a package given that some people find it _very_ useful, but I’d 
> bet that most Solr users don’t even know it exists...
> 
> Which means I’ll have to actually _understand_ the package infrastructure… 
> Something about an old dog and new tricks. Siiigggh…
> 
> Best,
> Erick
> 
> > On Aug 9, 2020, at 2:32 PM, Tomoko Uchida  
> > wrote:
> > 
> > LUCENE-9448
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: SOLR-13412 (Make the Lucene Luke module available from a Solr distribution)

2020-08-10 Thread Tomoko Uchida
Thanks David, for the information. I agree with Luke - a GUI app which
needs Window system - is not inherently suited to a Server application.

> if Docker could run GUI apps
This reminds me an elasticsearch user once notified us he/she worked on
Dockernized Luke. I refused to merge it at that time (the integration had
just been ongoing then), but we could revisit it.
https://github.com/DmitryKey/luke/issues/162

There may be a few options to materialize the goal... the most natural
direction is, I think, to improve LukeRequestHandler ;)
Or, CUI application might be more suitable for some situations (
https://github.com/javasoze/clue) ?
Until we find somewhat sensible ways, I am totally fine with the current
way of doing, just download Lucene package and use it.

Tomoko


2020年8月11日(火) 5:37 David Smiley :

> There's a decent tutorial here:
> https://sematext.com/blog/solr-plugins-system/
> But it's unclear if a standalone tool like Luke is really sensible as a
> Solr "plug-in" because it does not "plug-in" to Solr; it does not live
> within Solr in any way.
>
> It'd be interesting if Docker could run GUI apps or if somehow Luke could
> run as an Applet or something.  Or maybe "java web start" but I thought
> that technology might be dead.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, Aug 9, 2020 at 8:57 PM Tomoko Uchida 
> wrote:
>
>> I don't know anything about Solr packages, is there any guide for plugin
>> developers / maintainers? Also, maybe an official host server for the
>> plugin is needed?
>> In general Luke is just an ordinary JAR, all you need is downloading the
>> correct version of it and setting the right classpaths.
>> If there is proper documentation and others think it's somewhat
>> beneficial that solr has the "Luke plugin", I'd be happy to add it my todo
>> list (or it'd be perfectly fit for "newdevs", I think).
>>
>> Tomoko
>>
>>
>> 2020年8月10日(月) 4:39 Erick Erickson :
>>
>>> Tomoko:
>>>
>>> Indeed,  this is what is behind my question about whether it should be a
>>> package for Solr rather than something in the standard distro. The more I
>>> think about this, it’s hard to justify it being part of the standard distro
>>> rather than a package given that some people find it _very_ useful, but I’d
>>> bet that most Solr users don’t even know it exists...
>>>
>>> Which means I’ll have to actually _understand_ the package
>>> infrastructure… Something about an old dog and new tricks. Siiigggh…
>>>
>>> Best,
>>> Erick
>>>
>>> > On Aug 9, 2020, at 2:32 PM, Tomoko Uchida <
>>> tomoko.uchida.1...@gmail.com> wrote:
>>> >
>>> > LUCENE-9448
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>


Performance in Solr 9 / Java 11

2020-08-10 Thread Marcus Eagan
In my IDE, I have a few profiling tools that I bounce between that I
started using in my work at Lucidworks but I continue to use in my current
work today. I have suspicions that there may be some performance
improvements in Java 11 that we can exploit further.  I'm curious as to if
there has been any investigation, possibly Mark Miller or @u...@thetaphi.de
,  into performance improvements specific to the newer
version of Java in Master? There are some obvious ones that we get for
free, like a better GC, but curious as to prior work in this area before
publishing anything that might be redundant or irrelevant.

Best,

-- 
Marcus Eagan


Re: When zero offsets are not bad - a.k.a. multi-token synonyms yet again

2020-08-10 Thread Roman Chyla
oh,thanks! that saves everybody some time. I have commented in there,
pleading to be allowed to do something - if that proposal sounds even
little bit reasonable, please consider amplifying the signal

On Mon, Aug 10, 2020 at 4:22 PM David Smiley  wrote:
>
> There already is one: https://issues.apache.org/jira/browse/LUCENE-8776
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Aug 10, 2020 at 1:30 PM Roman Chyla  wrote:
>>
>> I'll have to somehow find a solution for this situation, giving up
>> offsets seems like too big a price to pay, I see that overriding
>> DefaultIndexingChain is not exactly easy -- the only thing I can think
>> of is to just trick the classloader into giving it a different version
>> of the chain (praying this can be done without compromising security,
>> I have not followed JDK evolutions for some time...) - aside from
>> forking lucene and editing that; which I decidedly don't want to do
>> (monkey-patching it, ok, i can live with that... :-))
>>
>> It *seems* to me that the original reason for negative offset checks
>> stemmed from the fact that vint could have been written (and possibly
>> vlong too) - https://issues.apache.org/jira/browse/LUCENE-3738
>>
>> but the underlying issue and some of the patches seem to have been
>> addressing those problems; but a much shorter version of the patch was
>> committed -- despite the perf results not being indicative (i.e. it
>> could have been good with the longer patch) -- but to really
>> understand it, one would have to spend more than 10mins reading the
>> comments
>>
>> Further to the point, I think negative offsets can be produced only on
>> the very first token, unless there is a bug in a filter (there was/is
>> a separate check for that in 6x and perhaps it is still there in 7x).
>> That would be much less restrictive than the current condition which
>> disallows all backward offsets. We never ran into an index corruption
>> in lucene 4-6x, so I really wonder if the "forbid all backwards
>> offsets" approach might be too restrictive.
>>
>> Looks like I should create an issue...
>>
>> On Thu, Aug 6, 2020 at 11:28 AM Gus Heck  wrote:
>> >
>> > I've had a nearly identical experience to what Dave describes, I also 
>> > chafe under this restriction.
>> >
>> > On Thu, Aug 6, 2020 at 11:07 AM David Smiley  wrote:
>> >>
>> >> I sympathize with your pain, Roman.
>> >>
>> >> It appears we can't really do index-time multi-word synonyms because of 
>> >> the offset ordering rule.  But it's not just synonyms, it's other forms 
>> >> of multi-token expansion.  Where I work, I've seen an interesting 
>> >> approach to mixed language text analysis in which a sophisticated 
>> >> Tokenizer effectively re-tokenizes an input multiple ways by producing a 
>> >> token stream that is a concatenation of different interpretations of the 
>> >> input.  On a Lucene upgrade, we had to "coarsen" the offsets to the point 
>> >> of having highlights that point to a whole sentence instead of the words 
>> >> in that sentence :-(.  I need to do something to fix this; I'm trying 
>> >> hard to resist modifying our Lucene fork for this constraint.  Maybe 
>> >> instead of concatenating, it might be interleaved / overlapped but the 
>> >> interpretations aren't necessarily aligned to make this possible without 
>> >> risking breaking position-sensitive queries.
>> >>
>> >> So... I'm not a fan of this constraint on offsets.
>> >>
>> >> ~ David Smiley
>> >> Apache Lucene/Solr Search Developer
>> >> http://www.linkedin.com/in/davidwsmiley
>> >>
>> >>
>> >> On Thu, Aug 6, 2020 at 10:49 AM Roman Chyla  wrote:
>> >>>
>> >>> Hi Mike,
>> >>>
>> >>> Yes, they are not zero offsets - I was instinctively avoiding
>> >>> "negative offsets"; but they are indeed backward offsets.
>> >>>
>> >>> Here is the token stream as produced by the analyzer chain indexing
>> >>> "THE HUBBLE constant: a summary of the hubble space telescope program"
>> >>>
>> >>> term=hubble pos=2 type=word offsetStart=4 offsetEnd=10
>> >>> term=acr::hubble pos=0 type=ACRONYM offsetStart=4 offsetEnd=10
>> >>> term=constant pos=1 type=word offsetStart=11 offsetEnd=20
>> >>> term=summary pos=1 type=word offsetStart=23 offsetEnd=30
>> >>> term=hubble pos=1 type=word offsetStart=38 offsetEnd=44
>> >>> term=syn::hubble space telescope pos=0 type=SYNONYM offsetStart=38 
>> >>> offsetEnd=60
>> >>> term=syn::hst pos=0 type=SYNONYM offsetStart=38 offsetEnd=60
>> >>> term=acr::hst pos=0 type=ACRONYM offsetStart=38 offsetEnd=60
>> >>> term=space pos=1 type=word offsetStart=45 offsetEnd=50
>> >>> term=telescope pos=1 type=word offsetStart=51 offsetEnd=60
>> >>> term=program pos=1 type=word offsetStart=61 offsetEnd=68
>> >>>
>> >>> Sometimes, we'll even have a situation when synonyms overlap: for
>> >>> example "anti de sitter space time"
>> >>>
>> >>> "anti de sitter space time" -> "antidesitter space" (one token
>> >>> spanning offsets 0-26; it gets 

Re: hybrid document routing

2020-08-10 Thread Ishan Chattopadhyaya
This seems like an XY problem. Would it be possible to describe the
original problem that led you to this solution (in the prototype)? Also, do
you think folks at solr-users@ list would have more ideas related to this
usecase and cross posting there would help?

On Tue, 11 Aug, 2020, 1:43 am David Smiley,  wrote:

> Are you sure you need the docs in the same shard when maybe you could
> assume a core exists on each node and then do a query-time join?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Aug 10, 2020 at 2:34 PM Joel Bernstein  wrote:
>
>> I have a situation where I'd like to have the standard compositeId router
>> in place for a collection. But, I'd like certain documents (ACL documents)
>> to be duplicated on each shard in the collection. To achieve the level of
>> access control performance and scalability I'm looking for I need the ACL
>> records to be in the same core as the main documents.
>>
>> I put together a prototype where the compositeId router accepted implicit
>> routing parameters and it worked in my testing. Before I open a ticket
>> suggesting this approach I wonder what other people thought the best
>> approach would be to accomplish this goal.
>>
>>
>>


Re: SOLR-13412 (Make the Lucene Luke module available from a Solr distribution)

2020-08-10 Thread David Smiley
There's a decent tutorial here:
https://sematext.com/blog/solr-plugins-system/
But it's unclear if a standalone tool like Luke is really sensible as a
Solr "plug-in" because it does not "plug-in" to Solr; it does not live
within Solr in any way.

It'd be interesting if Docker could run GUI apps or if somehow Luke could
run as an Applet or something.  Or maybe "java web start" but I thought
that technology might be dead.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Aug 9, 2020 at 8:57 PM Tomoko Uchida 
wrote:

> I don't know anything about Solr packages, is there any guide for plugin
> developers / maintainers? Also, maybe an official host server for the
> plugin is needed?
> In general Luke is just an ordinary JAR, all you need is downloading the
> correct version of it and setting the right classpaths.
> If there is proper documentation and others think it's somewhat beneficial
> that solr has the "Luke plugin", I'd be happy to add it my todo list (or
> it'd be perfectly fit for "newdevs", I think).
>
> Tomoko
>
>
> 2020年8月10日(月) 4:39 Erick Erickson :
>
>> Tomoko:
>>
>> Indeed,  this is what is behind my question about whether it should be a
>> package for Solr rather than something in the standard distro. The more I
>> think about this, it’s hard to justify it being part of the standard distro
>> rather than a package given that some people find it _very_ useful, but I’d
>> bet that most Solr users don’t even know it exists...
>>
>> Which means I’ll have to actually _understand_ the package
>> infrastructure… Something about an old dog and new tricks. Siiigggh…
>>
>> Best,
>> Erick
>>
>> > On Aug 9, 2020, at 2:32 PM, Tomoko Uchida 
>> wrote:
>> >
>> > LUCENE-9448
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>


Re: When zero offsets are not bad - a.k.a. multi-token synonyms yet again

2020-08-10 Thread David Smiley
There already is one: https://issues.apache.org/jira/browse/LUCENE-8776

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Aug 10, 2020 at 1:30 PM Roman Chyla  wrote:

> I'll have to somehow find a solution for this situation, giving up
> offsets seems like too big a price to pay, I see that overriding
> DefaultIndexingChain is not exactly easy -- the only thing I can think
> of is to just trick the classloader into giving it a different version
> of the chain (praying this can be done without compromising security,
> I have not followed JDK evolutions for some time...) - aside from
> forking lucene and editing that; which I decidedly don't want to do
> (monkey-patching it, ok, i can live with that... :-))
>
> It *seems* to me that the original reason for negative offset checks
> stemmed from the fact that vint could have been written (and possibly
> vlong too) - https://issues.apache.org/jira/browse/LUCENE-3738
>
> but the underlying issue and some of the patches seem to have been
> addressing those problems; but a much shorter version of the patch was
> committed -- despite the perf results not being indicative (i.e. it
> could have been good with the longer patch) -- but to really
> understand it, one would have to spend more than 10mins reading the
> comments
>
> Further to the point, I think negative offsets can be produced only on
> the very first token, unless there is a bug in a filter (there was/is
> a separate check for that in 6x and perhaps it is still there in 7x).
> That would be much less restrictive than the current condition which
> disallows all backward offsets. We never ran into an index corruption
> in lucene 4-6x, so I really wonder if the "forbid all backwards
> offsets" approach might be too restrictive.
>
> Looks like I should create an issue...
>
> On Thu, Aug 6, 2020 at 11:28 AM Gus Heck  wrote:
> >
> > I've had a nearly identical experience to what Dave describes, I also
> chafe under this restriction.
> >
> > On Thu, Aug 6, 2020 at 11:07 AM David Smiley  wrote:
> >>
> >> I sympathize with your pain, Roman.
> >>
> >> It appears we can't really do index-time multi-word synonyms because of
> the offset ordering rule.  But it's not just synonyms, it's other forms of
> multi-token expansion.  Where I work, I've seen an interesting approach to
> mixed language text analysis in which a sophisticated Tokenizer effectively
> re-tokenizes an input multiple ways by producing a token stream that is a
> concatenation of different interpretations of the input.  On a Lucene
> upgrade, we had to "coarsen" the offsets to the point of having highlights
> that point to a whole sentence instead of the words in that sentence :-(.
> I need to do something to fix this; I'm trying hard to resist modifying our
> Lucene fork for this constraint.  Maybe instead of concatenating, it might
> be interleaved / overlapped but the interpretations aren't necessarily
> aligned to make this possible without risking breaking position-sensitive
> queries.
> >>
> >> So... I'm not a fan of this constraint on offsets.
> >>
> >> ~ David Smiley
> >> Apache Lucene/Solr Search Developer
> >> http://www.linkedin.com/in/davidwsmiley
> >>
> >>
> >> On Thu, Aug 6, 2020 at 10:49 AM Roman Chyla 
> wrote:
> >>>
> >>> Hi Mike,
> >>>
> >>> Yes, they are not zero offsets - I was instinctively avoiding
> >>> "negative offsets"; but they are indeed backward offsets.
> >>>
> >>> Here is the token stream as produced by the analyzer chain indexing
> >>> "THE HUBBLE constant: a summary of the hubble space telescope program"
> >>>
> >>> term=hubble pos=2 type=word offsetStart=4 offsetEnd=10
> >>> term=acr::hubble pos=0 type=ACRONYM offsetStart=4 offsetEnd=10
> >>> term=constant pos=1 type=word offsetStart=11 offsetEnd=20
> >>> term=summary pos=1 type=word offsetStart=23 offsetEnd=30
> >>> term=hubble pos=1 type=word offsetStart=38 offsetEnd=44
> >>> term=syn::hubble space telescope pos=0 type=SYNONYM offsetStart=38
> offsetEnd=60
> >>> term=syn::hst pos=0 type=SYNONYM offsetStart=38 offsetEnd=60
> >>> term=acr::hst pos=0 type=ACRONYM offsetStart=38 offsetEnd=60
> >>> term=space pos=1 type=word offsetStart=45 offsetEnd=50
> >>> term=telescope pos=1 type=word offsetStart=51 offsetEnd=60
> >>> term=program pos=1 type=word offsetStart=61 offsetEnd=68
> >>>
> >>> Sometimes, we'll even have a situation when synonyms overlap: for
> >>> example "anti de sitter space time"
> >>>
> >>> "anti de sitter space time" -> "antidesitter space" (one token
> >>> spanning offsets 0-26; it gets emitted with the first token "anti"
> >>> right now)
> >>> "space time" -> "spacetime" (synonym 16-26)
> >>> "space" -> "universe" (25-26)
> >>>
> >>> Yes, weird, but useful if people want to search for `universe NEAR
> >>> anti` -- but another usecase which would be prohibited by the "new"
> >>> rule.
> >>>
> >>> DefaultIndexingChain checks new token offset against the last emitted
> >>> token, so I don't see a way 

Re: hybrid document routing

2020-08-10 Thread David Smiley
Are you sure you need the docs in the same shard when maybe you could
assume a core exists on each node and then do a query-time join?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Aug 10, 2020 at 2:34 PM Joel Bernstein  wrote:

> I have a situation where I'd like to have the standard compositeId router
> in place for a collection. But, I'd like certain documents (ACL documents)
> to be duplicated on each shard in the collection. To achieve the level of
> access control performance and scalability I'm looking for I need the ACL
> records to be in the same core as the main documents.
>
> I put together a prototype where the compositeId router accepted implicit
> routing parameters and it worked in my testing. Before I open a ticket
> suggesting this approach I wonder what other people thought the best
> approach would be to accomplish this goal.
>
>
>


[VOTE] Release Lucene/Solr 8.6.1 RC2

2020-08-10 Thread Houston Putman
Please vote for release candidate 2 for Lucene/Solr 8.6.1

The artifacts can be downloaded from:
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC2-rev6e11a1c3f0599f1c918bc69c4f51928d23160e99

You can run the smoke tester directly with this command:

python3 -u dev-tools/scripts/smokeTestRelease.py \
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC2-rev6e11a1c3f0599f1c918bc69c4f51928d23160e99

The vote will be open for at least 72 hours i.e. until 2020-08-13 20:00 UTC.

[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)

Here is my +1


hybrid document routing

2020-08-10 Thread Joel Bernstein
I have a situation where I'd like to have the standard compositeId router
in place for a collection. But, I'd like certain documents (ACL documents)
to be duplicated on each shard in the collection. To achieve the level of
access control performance and scalability I'm looking for I need the ACL
records to be in the same core as the main documents.

I put together a prototype where the compositeId router accepted implicit
routing parameters and it worked in my testing. Before I open a ticket
suggesting this approach I wonder what other people thought the best
approach would be to accomplish this goal.


Re: Badapple report

2020-08-10 Thread Erick Erickson
OK, thanks. I’m not really annotating things at this point, although 
occasionally removing some that haven’t failed in a long time.

> On Aug 10, 2020, at 1:44 PM, Tomás Fernández Löbbe  
> wrote:
> 
> Hi Erick,
> I've introduced and later fixed a bug in TestConfig. It hasn't failed since, 
> so please don't annotate it.
> 
> On Mon, Aug 10, 2020 at 7:47 AM Erick Erickson  
> wrote:
> We’re backsliding some. I encourage people to look at: 
> http://fucit.org/solr-jenkins-reports/failure-report.html, we have a number 
> of ill-behaved tests, particularly TestRequestRateLimiter, 
> TestBulkSchemaConcurrent, TestConfig, SchemaApiFailureTest and 
> TestIndexingSequenceNumbers…
> 
> 
> Raw fail count by week totals, most recent week first (corresponds to bits):
> Week: 0  had  100 failures
> Week: 1  had  82 failures
> Week: 2  had  94 failures
> Week: 3  had  502 failures
> 
> 
> Failures in Hoss' reports for the last 4 rollups.
> 
> There were 585 unannotated tests that failed in Hoss' rollups. Ordered by the 
> date I downloaded the rollup file, newest->oldest. See above for the dates 
> the files were collected 
> These tests were NOT BadApple'd or AwaitsFix'd
> 
> Failures in the last 4 reports..
>Report   Pct runsfails   test
>  0123   4.4 1583 37  BasicDistributedZkTest.test
>  0123   4.3 1727 77  CloudExitableDirectoryReaderTest.test
>  0123   2.5 8598248  
> CloudExitableDirectoryReaderTest.testCreepThenBite
>  0123   1.9 1712 36  
> CloudExitableDirectoryReaderTest.testWhitebox
>  0123   0.5 1587 11  
> DocValuesNotIndexedTest.testGroupingDVOnlySortLast
>  0123   2.2 1679 82  HttpPartitionOnCommitTest.test
>  0123   0.5 1592 16  HttpPartitionTest.test
>  0123   1.0 1578  9  HttpPartitionWithTlogReplicasTest.test
>  0123   1.3 1569 13  LeaderFailoverAfterPartitionTest.test
>  0123   7.4 1643 59  MultiThreadedOCPTest.test
>  0123   0.3 1567  8  ReplaceNodeTest.test
>  0123   0.2 1588  6  ShardSplitTest.testSplitShardWithRule
>  0123 100.0   38 33  SharedFSAutoReplicaFailoverTest.test
>  0123   2.1  818 19  
> TestCircuitBreaker.testBuildingMemoryPressure
>  0123   2.6  818 13  
> TestCircuitBreaker.testResponseWithCBTiming
>  0123   6.2 1848104  TestContainerPlugin.testApiFromPackage
>  0123   2.5 1662 33  TestDistributedGrouping.test
>  0123   0.4 1448  6  TestDynamicLoading.testDynamicLoading
>  0123   6.4 1614 74  TestExportWriter.testExpr
>  0123   8.6 1356 70  TestHdfsCloudBackupRestore.test
>  0123   9.1 1697136  TestLocalFSCloudBackupRestore.test
>  0123   0.5 1607 26  TestPackages.testPluginLoading
>  0123   0.7 1596 15  
> TestQueryingOnDownCollection.testQueryToDownCollectionShouldFailFast
>  0123   1.5 1610 59  TestReRankQParserPlugin.testMinExactCount
>  0123   0.3 1552  4  TestReplicaProperties.test
>  0123   0.3 1556  5  
> TestSolrCloudWithDelegationTokens.testDelegationTokenRenew
>  0123   0.3 1565  9  TestSolrConfigHandlerCloud.test
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Badapple report

2020-08-10 Thread Tomás Fernández Löbbe
Hi Erick,
I've introduced and later fixed a bug in TestConfig. It hasn't failed
since, so please don't annotate it.

On Mon, Aug 10, 2020 at 7:47 AM Erick Erickson 
wrote:

> We’re backsliding some. I encourage people to look at:
> http://fucit.org/solr-jenkins-reports/failure-report.html, we have a
> number of ill-behaved tests, particularly TestRequestRateLimiter,
> TestBulkSchemaConcurrent, TestConfig, SchemaApiFailureTest and
> TestIndexingSequenceNumbers…
>
>
> Raw fail count by week totals, most recent week first (corresponds to
> bits):
> Week: 0  had  100 failures
> Week: 1  had  82 failures
> Week: 2  had  94 failures
> Week: 3  had  502 failures
>
>
> Failures in Hoss' reports for the last 4 rollups.
>
> There were 585 unannotated tests that failed in Hoss' rollups. Ordered by
> the date I downloaded the rollup file, newest->oldest. See above for the
> dates the files were collected
> These tests were NOT BadApple'd or AwaitsFix'd
>
> Failures in the last 4 reports..
>Report   Pct runsfails   test
>  0123   4.4 1583 37  BasicDistributedZkTest.test
>  0123   4.3 1727 77  CloudExitableDirectoryReaderTest.test
>  0123   2.5 8598248
> CloudExitableDirectoryReaderTest.testCreepThenBite
>  0123   1.9 1712 36
> CloudExitableDirectoryReaderTest.testWhitebox
>  0123   0.5 1587 11
> DocValuesNotIndexedTest.testGroupingDVOnlySortLast
>  0123   2.2 1679 82  HttpPartitionOnCommitTest.test
>  0123   0.5 1592 16  HttpPartitionTest.test
>  0123   1.0 1578  9  HttpPartitionWithTlogReplicasTest.test
>  0123   1.3 1569 13  LeaderFailoverAfterPartitionTest.test
>  0123   7.4 1643 59  MultiThreadedOCPTest.test
>  0123   0.3 1567  8  ReplaceNodeTest.test
>  0123   0.2 1588  6  ShardSplitTest.testSplitShardWithRule
>  0123 100.0   38 33  SharedFSAutoReplicaFailoverTest.test
>  0123   2.1  818 19
> TestCircuitBreaker.testBuildingMemoryPressure
>  0123   2.6  818 13
> TestCircuitBreaker.testResponseWithCBTiming
>  0123   6.2 1848104  TestContainerPlugin.testApiFromPackage
>  0123   2.5 1662 33  TestDistributedGrouping.test
>  0123   0.4 1448  6  TestDynamicLoading.testDynamicLoading
>  0123   6.4 1614 74  TestExportWriter.testExpr
>  0123   8.6 1356 70  TestHdfsCloudBackupRestore.test
>  0123   9.1 1697136  TestLocalFSCloudBackupRestore.test
>  0123   0.5 1607 26  TestPackages.testPluginLoading
>  0123   0.7 1596 15
> TestQueryingOnDownCollection.testQueryToDownCollectionShouldFailFast
>  0123   1.5 1610 59
> TestReRankQParserPlugin.testMinExactCount
>  0123   0.3 1552  4  TestReplicaProperties.test
>  0123   0.3 1556  5
> TestSolrCloudWithDelegationTokens.testDelegationTokenRenew
>  0123   0.3 1565  9  TestSolrConfigHandlerCloud.test
> 
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org


Re: When zero offsets are not bad - a.k.a. multi-token synonyms yet again

2020-08-10 Thread Roman Chyla
I'll have to somehow find a solution for this situation, giving up
offsets seems like too big a price to pay, I see that overriding
DefaultIndexingChain is not exactly easy -- the only thing I can think
of is to just trick the classloader into giving it a different version
of the chain (praying this can be done without compromising security,
I have not followed JDK evolutions for some time...) - aside from
forking lucene and editing that; which I decidedly don't want to do
(monkey-patching it, ok, i can live with that... :-))

It *seems* to me that the original reason for negative offset checks
stemmed from the fact that vint could have been written (and possibly
vlong too) - https://issues.apache.org/jira/browse/LUCENE-3738

but the underlying issue and some of the patches seem to have been
addressing those problems; but a much shorter version of the patch was
committed -- despite the perf results not being indicative (i.e. it
could have been good with the longer patch) -- but to really
understand it, one would have to spend more than 10mins reading the
comments

Further to the point, I think negative offsets can be produced only on
the very first token, unless there is a bug in a filter (there was/is
a separate check for that in 6x and perhaps it is still there in 7x).
That would be much less restrictive than the current condition which
disallows all backward offsets. We never ran into an index corruption
in lucene 4-6x, so I really wonder if the "forbid all backwards
offsets" approach might be too restrictive.

Looks like I should create an issue...

On Thu, Aug 6, 2020 at 11:28 AM Gus Heck  wrote:
>
> I've had a nearly identical experience to what Dave describes, I also chafe 
> under this restriction.
>
> On Thu, Aug 6, 2020 at 11:07 AM David Smiley  wrote:
>>
>> I sympathize with your pain, Roman.
>>
>> It appears we can't really do index-time multi-word synonyms because of the 
>> offset ordering rule.  But it's not just synonyms, it's other forms of 
>> multi-token expansion.  Where I work, I've seen an interesting approach to 
>> mixed language text analysis in which a sophisticated Tokenizer effectively 
>> re-tokenizes an input multiple ways by producing a token stream that is a 
>> concatenation of different interpretations of the input.  On a Lucene 
>> upgrade, we had to "coarsen" the offsets to the point of having highlights 
>> that point to a whole sentence instead of the words in that sentence :-(.  I 
>> need to do something to fix this; I'm trying hard to resist modifying our 
>> Lucene fork for this constraint.  Maybe instead of concatenating, it might 
>> be interleaved / overlapped but the interpretations aren't necessarily 
>> aligned to make this possible without risking breaking position-sensitive 
>> queries.
>>
>> So... I'm not a fan of this constraint on offsets.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Thu, Aug 6, 2020 at 10:49 AM Roman Chyla  wrote:
>>>
>>> Hi Mike,
>>>
>>> Yes, they are not zero offsets - I was instinctively avoiding
>>> "negative offsets"; but they are indeed backward offsets.
>>>
>>> Here is the token stream as produced by the analyzer chain indexing
>>> "THE HUBBLE constant: a summary of the hubble space telescope program"
>>>
>>> term=hubble pos=2 type=word offsetStart=4 offsetEnd=10
>>> term=acr::hubble pos=0 type=ACRONYM offsetStart=4 offsetEnd=10
>>> term=constant pos=1 type=word offsetStart=11 offsetEnd=20
>>> term=summary pos=1 type=word offsetStart=23 offsetEnd=30
>>> term=hubble pos=1 type=word offsetStart=38 offsetEnd=44
>>> term=syn::hubble space telescope pos=0 type=SYNONYM offsetStart=38 
>>> offsetEnd=60
>>> term=syn::hst pos=0 type=SYNONYM offsetStart=38 offsetEnd=60
>>> term=acr::hst pos=0 type=ACRONYM offsetStart=38 offsetEnd=60
>>> term=space pos=1 type=word offsetStart=45 offsetEnd=50
>>> term=telescope pos=1 type=word offsetStart=51 offsetEnd=60
>>> term=program pos=1 type=word offsetStart=61 offsetEnd=68
>>>
>>> Sometimes, we'll even have a situation when synonyms overlap: for
>>> example "anti de sitter space time"
>>>
>>> "anti de sitter space time" -> "antidesitter space" (one token
>>> spanning offsets 0-26; it gets emitted with the first token "anti"
>>> right now)
>>> "space time" -> "spacetime" (synonym 16-26)
>>> "space" -> "universe" (25-26)
>>>
>>> Yes, weird, but useful if people want to search for `universe NEAR
>>> anti` -- but another usecase which would be prohibited by the "new"
>>> rule.
>>>
>>> DefaultIndexingChain checks new token offset against the last emitted
>>> token, so I don't see a way to emit the multi-token synonym with
>>> offsetts spanning multiple tokens if even one of these tokens was
>>> already emitted. And the complement is equally true: if multi-token is
>>> emitted as last of the group - it trips over `startOffset <
>>> invertState.lastStartOffset`
>>>
>>> 

Badapple report

2020-08-10 Thread Erick Erickson
We’re backsliding some. I encourage people to look at: 
http://fucit.org/solr-jenkins-reports/failure-report.html, we have a number of 
ill-behaved tests, particularly TestRequestRateLimiter, 
TestBulkSchemaConcurrent, TestConfig, SchemaApiFailureTest and 
TestIndexingSequenceNumbers…


Raw fail count by week totals, most recent week first (corresponds to bits):
Week: 0  had  100 failures
Week: 1  had  82 failures
Week: 2  had  94 failures
Week: 3  had  502 failures


Failures in Hoss' reports for the last 4 rollups.

There were 585 unannotated tests that failed in Hoss' rollups. Ordered by the 
date I downloaded the rollup file, newest->oldest. See above for the dates the 
files were collected 
These tests were NOT BadApple'd or AwaitsFix'd

Failures in the last 4 reports..
   Report   Pct runsfails   test
 0123   4.4 1583 37  BasicDistributedZkTest.test
 0123   4.3 1727 77  CloudExitableDirectoryReaderTest.test
 0123   2.5 8598248  
CloudExitableDirectoryReaderTest.testCreepThenBite
 0123   1.9 1712 36  
CloudExitableDirectoryReaderTest.testWhitebox
 0123   0.5 1587 11  
DocValuesNotIndexedTest.testGroupingDVOnlySortLast
 0123   2.2 1679 82  HttpPartitionOnCommitTest.test
 0123   0.5 1592 16  HttpPartitionTest.test
 0123   1.0 1578  9  HttpPartitionWithTlogReplicasTest.test
 0123   1.3 1569 13  LeaderFailoverAfterPartitionTest.test
 0123   7.4 1643 59  MultiThreadedOCPTest.test
 0123   0.3 1567  8  ReplaceNodeTest.test
 0123   0.2 1588  6  ShardSplitTest.testSplitShardWithRule
 0123 100.0   38 33  SharedFSAutoReplicaFailoverTest.test
 0123   2.1  818 19  
TestCircuitBreaker.testBuildingMemoryPressure
 0123   2.6  818 13  TestCircuitBreaker.testResponseWithCBTiming
 0123   6.2 1848104  TestContainerPlugin.testApiFromPackage
 0123   2.5 1662 33  TestDistributedGrouping.test
 0123   0.4 1448  6  TestDynamicLoading.testDynamicLoading
 0123   6.4 1614 74  TestExportWriter.testExpr
 0123   8.6 1356 70  TestHdfsCloudBackupRestore.test
 0123   9.1 1697136  TestLocalFSCloudBackupRestore.test
 0123   0.5 1607 26  TestPackages.testPluginLoading
 0123   0.7 1596 15  
TestQueryingOnDownCollection.testQueryToDownCollectionShouldFailFast
 0123   1.5 1610 59  TestReRankQParserPlugin.testMinExactCount
 0123   0.3 1552  4  TestReplicaProperties.test
 0123   0.3 1556  5  
TestSolrCloudWithDelegationTokens.testDelegationTokenRenew
 0123   0.3 1565  9  TestSolrConfigHandlerCloud.test


DO NOT ENABLE LIST:
MoveReplicaHDFSTest.testFailedMove
MoveReplicaHDFSTest.testNormalFailedMove
TestControlledRealTimeReopenThread.testCRTReopen
TestICUNormalizer2CharFilter.testRandomStrings
TestICUTokenizerCJK
TestImpersonationWithHadoopAuth.testForwarding
TestLTRReRankingPipeline.testDifferentTopN
TestRandomChains


DO NOT ANNOTATE LIST
CdcrBidirectionalTest.testBiDir
IndexSizeTriggerTest.testMergeIntegration
IndexSizeTriggerTest.testMixedBounds
IndexSizeTriggerTest.testSplitIntegration
IndexSizeTriggerTest.testTrigger
InfixSuggestersTest.testShutdownDuringBuild
ShardSplitTest.test
ShardSplitTest.testSplitMixedReplicaTypes
ShardSplitTest.testSplitWithChaosMonkey
Test2BPostings.test
TestLatLonShapeQueries.testRandomBig
TestPackedInts.testPackedLongValues
TestRandomChains.testRandomChainsWithLargeStrings
TestTriggerIntegration.testSearchRate

SuppressWarnings count: last week: 4,825, this week: 4,819, delta -6


*** Files with increased @SuppressWarnings annotations:

Suppress count increase in: 
solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java. Was: 13, 
now: 15
Suppress count increase in: 
solr/core/src/java/org/apache/solr/packagemanager/PackageManager.java. Was: 7, 
now: 8
Suppress count increase in: 
solr/core/src/test/org/apache/solr/core/TestSolrConfigHandler.java. Was: 14, 
now: 17
Suppress count increase in: 
solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java. Was: 
12, now: 13

*** Files with decreased @SuppressWarnings annotations:

Suppress count decrease in: 
solr/core/src/java/org/apache/solr/core/PluginBag.java. Was: 6, now: 5

Processing file (History bit 3): HOSS-2020-08-10.csv
Processing file (History bit 2): HOSS-2020-08-03.csv
Processing file (History bit 1): HOSS-2020-07-27.csv
Processing file (History bit 0): HOSS-2020-07-20.csv


Number of AwaitsFix: 33 Number of BadApples: 4


**Annotated tests that didn't fail in the last 4 weeks.

  **Tests removed from the next two lists 

SOLR-14714 (Solr.cmd in windows loads the incorrect jetty module when using java>=9)

2020-08-10 Thread Erick Erickson
Could someone with a Windows machine try the patch at SOLR-14714? I looked it 
over and LGTM with one nit: I would move the following up to before they’re 
actually used:

set JAVA_MAJOR_VERSION=0
set JAVA_VERSION_INFO=
set JAVA_BUILD=0

I don’t think it matters functionally, just a style thing.

Seems a straightforward fix, I just can’t try it even without SSL ‘cause I 
don’t have a Windows machine.

I’ll push it if someone can double check...
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org