Hello Utkarsh,
This may or may not be relevant for your use-case, but the way we deal with
this scenario is to retrieve the top N documents 5,10,20or100 at a time
(user selectable). We can then page the results, changing the start
parameter to return the next set. This allows us to 'retrieve' millions of
documents - we just do it at the user's leisure, rather than make them wait
for the whole lot in one go.
This works well because users very rarely want to see ALL 2000 (or whatever
number) documents at one time - it's simply too much to take in at one time.
If your use-case involves an automated or offline procedure (e.g. running a
report or some data-mining op), then presumably it doesn't matter so much
it takes a bit longer (as long as it returns in some reasonble time).
Have you looked at doing paging on the client-side - this will hugely
speed-up your search time.
HTH
Peter



On Sat, Jun 29, 2013 at 6:17 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> Well, depending on how many docs get served
> from the cache the time will vary. But this is
> just ugly, if you can avoid this use-case it would
> be a Good Thing.
>
> Problem here is that each and every shard must
> assemble the list of 2,000 documents (just ID and
> sort criteria, usually score).
>
> Then the node serving the original request merges
> the sub-lists to pick the top 2,000. Then the node
> sends another request to each shard to get
> the full document. Then the node merges this
> into the full list to return to the user.
>
> Solr really isn't built for this use-case, is it actually
> a compelling situation?
>
> And having your document cache set at 1M is kinda
> high if you have very big documents.
>
> FWIW,
> Erick
>
>
> On Fri, Jun 28, 2013 at 8:44 PM, Utkarsh Sengar <utkarsh2...@gmail.com
> >wrote:
>
> > Also, I don't see a consistent response time from solr, I ran ab again
> and
> > I get this:
> >
> > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 "
> >
> >
> http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> > "
> >
> >
> > Benchmarking x.amazonaws.com (be patient)
> > Completed 100 requests
> > Completed 200 requests
> > Completed 300 requests
> > Completed 400 requests
> > Completed 500 requests
> > Finished 500 requests
> >
> >
> > Server Software:
> > Server Hostname:       x.amazonaws.com
> > Server Port:            8983
> >
> > Document Path:
> >
> >
> /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> > Document Length:        1538537 bytes
> >
> > Concurrency Level:      10
> > Time taken for tests:   10.858 seconds
> > Complete requests:      500
> > Failed requests:        8
> >    (Connect: 0, Receive: 0, Length: 8, Exceptions: 0)
> > Write errors:           0
> > Total transferred:      769297992 bytes
> > HTML transferred:       769268492 bytes
> > Requests per second:    46.05 [#/sec] (mean)
> > Time per request:       217.167 [ms] (mean)
> > Time per request:       21.717 [ms] (mean, across all concurrent
> requests)
> > Transfer rate:          69187.90 [Kbytes/sec] received
> >
> > Connection Times (ms)
> >               min  mean[+/-sd] median   max
> > Connect:        0    0   0.3      0       2
> > Processing:   110  215  72.0    190     497
> > Waiting:       91  180  70.5    152     473
> > Total:        112  216  72.0    191     497
> >
> > Percentage of the requests served within a certain time (ms)
> >   50%    191
> >   66%    225
> >   75%    252
> >   80%    272
> >   90%    319
> >   95%    364
> >   98%    420
> >   99%    453
> >  100%    497 (longest request)
> >
> >
> > Sometimes it takes a lot of time, sometimes its pretty quick.
> >
> > Thanks,
> > -Utkarsh
> >
> >
> > On Fri, Jun 28, 2013 at 5:39 PM, Utkarsh Sengar <utkarsh2...@gmail.com
> > >wrote:
> >
> > > Hello,
> > >
> > > I have a usecase where I need to retrive top 2000 documents matching a
> > > query.
> > > What are the parameters (in query, solrconfig, schema) I shoud look at
> to
> > > improve this?
> > >
> > > I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB
> > > RAM, 8vCPU and 7GB JVM heap size.
> > >
> > > I have documentCache:
> > >   <documentCache class="solr.LRUCache"  size="1000000"
> > > initialSize="1000000"   autowarmCount="0"/>
> > >
> > > allText is a copyField.
> > >
> > > This is the result I get:
> > > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 "
> > >
> >
> http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> > > "
> > >
> > > Benchmarking x.amazonaws.com (be patient)
> > > Completed 100 requests
> > > Completed 200 requests
> > > Completed 300 requests
> > > Completed 400 requests
> > > Completed 500 requests
> > > Finished 500 requests
> > >
> > >
> > > Server Software:
> > > Server Hostname:        x.amazonaws.com
> > > Server Port:            8983
> > >
> > > Document Path:
> > >
> >
> /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> > > Document Length:        1538537 bytes
> > >
> > > Concurrency Level:      10
> > > Time taken for tests:   35.999 seconds
> > > Complete requests:      500
> > > Failed requests:        21
> > >    (Connect: 0, Receive: 0, Length: 21, Exceptions: 0)
> > > Write errors:           0
> > > Non-2xx responses:      2
> > > Total transferred:      766221660 bytes
> > > HTML transferred:       766191806 bytes
> > > Requests per second:    13.89 [#/sec] (mean)
> > > Time per request:       719.981 [ms] (mean)
> > > Time per request:       71.998 [ms] (mean, across all concurrent
> > requests)
> > > Transfer rate:          20785.65 [Kbytes/sec] received
> > >
> > > Connection Times (ms)
> > >               min  mean[+/-sd] median   max
> > > Connect:        0    0   0.6      0       8
> > > Processing:     9  717 2339.6    199   12611
> > > Waiting:        9  635 2233.6    164   12580
> > > Total:          9  718 2339.6    199   12611
> > >
> > > Percentage of the requests served within a certain time (ms)
> > >   50%    199
> > >   66%    236
> > >   75%    263
> > >   80%    281
> > >   90%    548
> > >   95%    838
> > >   98%  12475
> > >   99%  12545
> > >  100%  12611 (longest request)
> > >
> > > --
> > > Thanks,
> > > -Utkarsh
> > >
> >
> >
> >
> > --
> > Thanks,
> > -Utkarsh
> >
>

Reply via email to