Index analyzer concatenate tokens

2021-01-29 Thread Florin Babes
Hello,
I'm trying to index the following token with payload "winter tires|1.4" as
an exact match but also I want to apply hunspell lemmer to this token and
keep both the original and the lemma. So after all that I want to have the
following tokens:
"winter tires" with payload 1.4
"winter tire" with payload 1.4

I thought of doing it this way:

 







 



But what happens here is that the indexed tokens are "winter tires|1.4" and
"winter tire|1.4" because any filter
after solr.ConcatenateGraphFilterFactory does not apply.

Do you have any idea how I can concatenate the tokens from a stream without
using solr.ConcatenateGraphFilterFactory? Or how I can achieve the above?

Thanks.


Re: Possible bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-06 Thread Florin Babes
Hello, Christine and thank you for your help!

So, we've investigated further based on your suggestions and have the
following things to note:

Reproducibility: We can reproduce the same queries on multiple runs, with
the same error.
Data as a factor: Our setup is single-sharded, so we can't investigate
further on this.
Feature vs. Model: We've also tried a dummy LinearModel with only two
features and the problem still occurs.
Identification of the troublesome feature(s): We've narrowed our model to
only two features and the problem always occurs (for some queries, not all)
when we have a feature with a mm=1 and a feature with a mm>=3. The problem
also occurs when we only do feature extraction and the problem seems to
always occur on the feature with the bigger mm. The errors seem to be
related to the size of the head DisiPriorityQueue created here:
https://github.com/apache/lucene-solr/blob/branch_8_6/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L107
as the error changes as we change the mm for the second feature:

1 feature with mm=1 and one with mm=3 -> Index 4 out of bounds for length 4
1 feature with mm=1 and one with mm=5 -> Index 2 out of bounds for length 2

You can find below the dummy feature-store.

[
{
"store": "dummystore",
"name": "similarity_name_mm_1",
"class": "org.apache.solr.ltr.feature.SolrFeature",
"params": {
"q": "{!dismax qf=name mm=1}${term}"
}
},
{
"store": "dummystore",
"name": "similarity_names_mm_3",
"class": "org.apache.solr.ltr.feature.SolrFeature",
"params": {
"q": "{!dismax qf=name mm=3}${term}"
}
}
]

The problem starts occuring in Solr 8.6.0, as we tried multiple versions <
8.6 and >= 8.6 and the problem started on 8.6.0 and we tend to believe it's
because of the following changes:
https://issues.apache.org/jira/browse/SOLR-14364 as they're the only major
changes related to LTR which were introduced in Solr 8.6.0.

I've created a Solr JIRA bug/issue ticket here:
https://issues.apache.org/jira/browse/SOLR-15071

Thank you for your help!

În mar., 5 ian. 2021 la 19:40, Christine Poerschke (BLOOMBERG/ LONDON) <
cpoersc...@bloomberg.net> a scris:

> Hello Florin Babes,
>
> Thanks for this detailed report! I agree you experiencing
> ArrayIndexOutOfBoundsException during SolrFeature computation sounds like a
> bug, would you like to open a SOLR JIRA issue for it?
>
> Here's some investigative ideas I would have, in no particular order:
>
> Reproducibility: if a failed query is run again, does it also fail second
> time around (when some caches may be used)?
>
> Data as a factor: is your setup single-sharded or multi-sharded? in a
> multi-sharded setup if the same query fails on some shards but succeeds on
> others (and all shards have some documents that match the query) then this
> could support a theory that a certain combination of data and features
> leads to the exception.
>
> Feature vs. Model: you mention use of a MultipleAdditiveTrees model, if
> the same features are used in a LinearModel instead, do the same errors
> happen? or if no model is used but only feature extraction is done, does
> that give errors?
>
> Identification of the troublesome feature(s): narrowing down to a single
> feature or a small combination of features could make it easier to figure
> out the problem. assuming the existing logging doesn't identify the
> features, replacing the org.apache.solr.ltr.feature.SolrFeature with a
> com.mycompany.solr.ltr.feature.MySolrFeature containing instrumentation
> could provide insights e.g. the existing code [2] logs feature names for
> UnsupportedOperationException and if it also caught
> ArrayIndexOutOfBoundsException then it could log the feature name before
> rethrowing the exception.
>
> Based on your detail below and this [3] conditional in the code probably
> at least two features will be necessary to hit the issue, but for
> investigative purposes two features could still be simplified potentially
> to effectively one feature e.g. if one feature is a SolrFeature and the
> other is a ValueFeature or if featureA and featureB are both SolrFeature
> features with _identical_ parameters but different names.
>
> Hope that helps.
>
> Regards,
>
> Christine
>
> [1]
> https://lucene.apache.org/solr/guide/8_6/learning-to-rank.html#extracting-features
> [2]
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.3/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/SolrFeature.java#L243
> [3]
> https://github.com/apache/lucene-solr/blob/releases/lu

Possible bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-04 Thread Florin Babes
stractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
for length 2
at
org.apache.lucene.search.DisiPriorityQueue.add(DisiPriorityQueue.java:102)
at
org.apache.lucene.search.MinShouldMatchSumScorer.advanceTail(MinShouldMatchSumScorer.java:246)
at
org.apache.lucene.search.MinShouldMatchSumScorer.updateFreq(MinShouldMatchSumScorer.java:312)
at
org.apache.lucene.search.MinShouldMatchSumScorer.score(MinShouldMatchSumScorer.java:320)
at
org.apache.solr.ltr.feature.SolrFeature$SolrFeatureWeight$SolrFeatureScorer.score(SolrFeature.java:242)
at
org.apache.solr.ltr.LTRScoringQuery$ModelWeight$ModelScorer$SparseModelScorer.score(LTRScoringQuery.java:595)
at
org.apache.solr.ltr.LTRScoringQuery$ModelWeight$ModelScorer.score(LTRScoringQuery.java:540)
at org.apache.solr.ltr.LTRRescorer.scoreFeatures(LTRRescorer.java:183)
at org.apache.solr.ltr.LTRRescorer.rescore(LTRRescorer.java:122)
at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:119)


We've searched the mailings lists and issues tracker and we didn't find any
bug opened.
Could you please give us a hint of what we can do to fix this?

Thanks,
Florin Babes


Re: Index download speed while replicating is fixed at 5.1 in replication.html

2020-06-16 Thread Florin Babes
Hello,
The patch is to fix the display. It doesn't configure or limit the speed :)


În mar., 16 iun. 2020 la 14:26, Shawn Heisey  a scris:

> On 6/14/2020 12:06 AM, Florin Babes wrote:
> > While checking ways to optimize the speed of replication I've noticed
> that
> > the index download speed is fixed at 5.1 in replication.html. There is a
> > reason for that? If not, I would like to submit a patch with the fix.
> > We are using solr 8.3.1.
>
> Looking at the replication.html file, the part that says "5.1 MB/s"
> appears to be purely display.  As far as I can tell, it's not
> configuring anything, and it's not gathering information from anywhere.
>
> So unless your solrconfig.xml is configuring a speed limit in the
> replication handler, I don't think there is one.
>
> I'm curious about exactly what you have in mind for a patch.
>
> Thanks,
> Shawn
>


Index download speed while replicating is fixed at 5.1 in replication.html

2020-06-14 Thread Florin Babes
Hello,
While checking ways to optimize the speed of replication I've noticed that
the index download speed is fixed at 5.1 in replication.html. There is a
reason for that? If not, I would like to submit a patch with the fix.
We are using solr 8.3.1.
Thanks,
Florin Babes