Fyi, I am finally getting back to this. I apologize for the delay.
I am going to try using the ‘method=topLevelDV’ option to see if that makes
a difference. I will run same tests used below, and follow up with results.
As far as more details about this scenario:
- Per the ‘user query’. Some of them are quite simple, edismax,
q=Maricopa county ethel
- from a content point of view, updates are not happening very
frequently. Typically get batches of updates spread out over the course of
the day.
- not quite sure what you are asking for per the 'collection
definitions'. The main collection is about 27 million docs, across 96
shards, 2 replicas. The fromIndex 'join' collection is quite small...about
80k docs, single shard, but replicated across the 96 shards.
- in the table below are the qtimes, response times, run both
with/without using the ‘join’. Also have resultCount, for reference.
- it is a small test sample iof 12 queries, single-threaded,
- Note, the qtimes…on average, for this small query set, increases
about 40% with the join
search_qtime - no join
responseTime - no join
search_qtime - with join
responseTime - with join
resultCount
1748
3179
2834
4292
471894
1557
2865
1794
3108
332
929
2278
1261
2654
541282
813
2107
1036
2322
15347
413
1730
539
1838
42
388
1725
678
2027
313
1095
2481
1453
2821
435627
829
2263
1310
2739
299
838
2103
1081
2358
86049
1236
2610
1911
3283
77881
950
2274
1313
2661
15160
763
2066
885
2184
738
What is most concerning is the cpu increase that we see in Solr. Here is
a more ‘concurrent' test, at about 12 qps, but it is not at a 'full'
load...maybe 50%. This test 'held up', meaning we did not get into any
trouble.
Hope these images comes thru...but, here is a cpu profile for a 1 hour test
with no 'join' being used,
[image: image.png]
And, here is the same 1 hour test, using the 'join', run twice. Not the
difference in 'scale' of cpu of these 2 tests vs. the one above, from a
'cores' point of view:
[image: image.png]
Like I said, I'll run these same tests with the ‘method=topLevelDV’, and
see if it changes behavior.
Thx
Ron Haines
On Thu, May 25, 2023 at 4:29 PM Mikhail Khludnev <[email protected]> wrote:
> Ron, how often both indices are updated? Presumably if they are static,
> filter cache may help.
> It's worth making sure that the app gives a chance to filter cache.;
> To better understand the problem it is worth taking a few treadumps under
> load: a deep stack gives a clue for hotspot (or just take a sampling
> profile). Once we know the hot spot we can think about a workaround.
> https://issues.apache.org/jira/browse/SOLR-16717 about sharding
> "fromIndex"
> https://issues.apache.org/jira/browse/SOLR-16242 about keeping "local/to"
> index cache when fromIndex is updated.
>
> On Thu, May 25, 2023 at 5:01 PM Andy Lester <[email protected]> wrote:
>
> >
> >
> > > On May 25, 2023, at 7:51 AM, Ron Haines <[email protected]> wrote:
> > >
> > > So, when this feature is enabled, this negative &fq gets added:
> > > -{!join fromIndex=primary_rollup from=group_id_mv to=group_member_id
> > > score=none}${q}
> >
> >
> > Can we see collection definitions of both the source collection and the
> > join? Also, a sample query, not just the one parameter? Also, how often
> are
> > either of these collections updated? One thing that killed off an entire
> > project that we were doing was that the join table was getting updated
> > about once a minute, and this destroyed all our caching, and made the
> > queries we wanted to do unusable.
> >
> >
> > Thanks,
> > Andy
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>