Optimal size for queries?

2020-04-10 Thread Mark H. Wood
I need to pull a *lot* of records out of a core, to be statistically
analyzed and the stat.s presented to the user, who is sitting at a
browser waiting.  So far I haven't seen a way to calculate the stat.s
I need in Solr itself.  It's difficult to know the size of the total
result, so I'm running the query repeatedly and windowing the results
with 'start' and 'rows'.  I just guessed that a window of 1000
documents would be reasonable.  We currently have about 48GB in the
core.

The product uses Solr 4.10.  Yes, I know that's very old.

What I got is that every three seconds or so I get another 1000
documents, totalling around 500KB per response.  For a user request
for a large range, this is taking way longer than the user's browser
is willing to wait.  The single CPU on my test box is at 99%
continuously, and Solr's memory use is around 90% of 8GB.  The test
hardware is a VMWare guest on an 'Intel(R) Xeon(R) Gold 6150 CPU @
2.70GHz'.

A sample query:

0:0:0:0:0:0:0:1 - - [10/Apr/2020:13:34:18 -0400] "GET 
/solr/statistics/select?q=*%3A*=1000=%2Btype%3A0+%2BbundleName%3AORIGINAL+%2Bstatistics_type%3Aview=%2BisBot%3Afalse=%2Btime%3A%5B2018-01-01T05%3A00%3A00Z+TO+2020-01-01T04%3A59%3A59Z%5D=time+asc=867000=javabin=2
 HTTP/1.1" 200 497475 "-" 
"Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0"

As you can see, my test was getting close to 1000 windows.  It's still
going.  I don't know how far along that is.

So I'm wondering:

o  how can I do better than guessing that 1000 is a good window size?
   How big a response is too big?

o  what else should I be thinking about?

o  given that my test on a full-sized copy of the live data has been
   running for an hour and is still going, is it totally impractical
   to expect that I can improve the process enough to give a response
   to an ad-hoc query while-you-wait?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: entity in DIH for partial update?

2020-04-10 Thread matthew sporleder
Do you mean something along the lines of this (hackish?)
https://stackoverflow.com/questions/21006045/can-solr-dih-do-atomic-updates
method?

On Fri, Apr 10, 2020 at 10:19 AM Jörn Franke  wrote:
>
> You could use atomic updates in DIH. However, there is a bug in 
> current/potentially also old Solr version that this leaks a searcher (which 
> means the index data is infinitely growing until you restart the server).
> You can also export from the database to Jsonline, post it to the json update 
> handler together with the atomic processor.
>
> > Am 10.04.2020 um 16:02 schrieb matthew sporleder :
> >
> > I have an field I would like to add to my schema which is stored in a
> > different database from my primary data.  Can I use a separate entity
> > in my DIH to update a single field of my documents?
> >
> > Thanks,
> > Matt


Re: entity in DIH for partial update?

2020-04-10 Thread Jörn Franke
You could use atomic updates in DIH. However, there is a bug in 
current/potentially also old Solr version that this leaks a searcher (which 
means the index data is infinitely growing until you restart the server). 
You can also export from the database to Jsonline, post it to the json update 
handler together with the atomic processor.

> Am 10.04.2020 um 16:02 schrieb matthew sporleder :
> 
> I have an field I would like to add to my schema which is stored in a
> different database from my primary data.  Can I use a separate entity
> in my DIH to update a single field of my documents?
> 
> Thanks,
> Matt


entity in DIH for partial update?

2020-04-10 Thread matthew sporleder
I have an field I would like to add to my schema which is stored in a
different database from my primary data.  Can I use a separate entity
in my DIH to update a single field of my documents?

Thanks,
Matt