So my guess is you're spending by far the largest portion of your time doing
the DB query(ies), which makes sense....


On Tue, Feb 11, 2014 at 11:50 AM, Joel Cohen <joel.co...@bluefly.com> wrote:

> It's a custom ingestion process. It does a big DB query and then inserts
> stuff in batches. The batch size is tuneable.
>
>
> On Tue, Feb 11, 2014 at 11:23 AM, Erick Erickson <erickerick...@gmail.com
> >wrote:
>
> > Hmmm, I'm still a little puzzled BTW. 300K documents, unless they're
> > huge, shouldn't be taking 100 minutes. I can index 11M documents on
> > my laptop (Wikipedia dump) in 45 minutes for instance.... Of course
> > that's a single core, not cloud and not replicas...
> >
> > So possibly it' on the data acquisition side? Is your Solr CPU pegged?
> >
> > YMMV of course.
> >
> > Erick
> >
> >
> > On Tue, Feb 11, 2014 at 6:40 AM, Joel Cohen <joel.co...@bluefly.com>
> > wrote:
> >
> > > I'd like to thank you for lending a hand on my query time problem with
> > > SolrCloud. By switching to a single shard with replicas setup, I've
> > reduced
> > > my query time to 18 msec. My full ingestion of 300k+ documents went
> down
> > > from 2 hours 50 minutes to 1 hour 40 minutes. There are some code
> changes
> > > that are going in that should help a bit as well. Big thanks to
> everyone
> > > that had suggestions.
> > >
> > >
> > > On Tue, Feb 4, 2014 at 8:11 PM, Alexandre Rafalovitch <
> > arafa...@gmail.com
> > > >wrote:
> > >
> > > > I suspect faceting is the issue here. The actual query you have shown
> > > > seem to bring back a single document (or a single set of document for
> > > > a product):
> > > > fq=id:(320403401)
> > > >
> > > > On the other hand, you are asking for 4 field facets:
> > > > facet.field=q_virtualCategory_ss
> > > > facet.field=q_brand_s
> > > > facet.field=q_color_s
> > > > facet.field=q_category_ss
> > > > AND 2 range facets, both clustered/grouped:
> > > > facet.range=daysSinceStart_i
> > > > facet.range=activePrice_l (e.g. f.activePrice_l.facet.range.gap=5000)
> > > >
> > > > And for all facets you have asked to bring back ALL of the results:
> > > > facet.limit=-1
> > > >
> > > > Plus, you are doing a complex sort:
> > > > sort=popularity_i desc,popularity_i desc
> > > >
> > > > So, you are probably spending quite a bit of time counting
> (especially
> > > > in a shared setup) and then quite a bit more sending the response
> > > > back.
> > > >
> > > > I would check the size of the result document (HTTP result) and see
> > > > how large it is. Maybe you don't need all of the stuff that's coming
> > > > back. I assume you are not actually querying Solr from the client's
> > > > machine (that is I hope it is inside your data centre close to your
> > > > web server), otherwise I would say to look at automatic content
> > > > compression as well to minimize on-wire document size.
> > > >
> > > > Finally, if your documents have many stored fields (store=true in
> > > > schema.xml) but you only return small subsets of them during search,
> > > > you could look into using enableLazyFieldLoading flag in the
> > > > solrconfig.
> > > >
> > > > Regards,
> > > >    Alex.
> > > > P.s. As others said, you don't seem to have too many documents.
> > > > Perhaps you want replication instead of sharding for improved
> > > > performance.
> > > > Personal website: http://www.outerthoughts.com/
> > > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > > - Time is the quality of nature that keeps events from happening all
> > > > at once. Lately, it doesn't seem to be working.  (Anonymous  - via
> GTD
> > > > book)
> > > >
> > > >
> > > > On Wed, Feb 5, 2014 at 6:31 AM, Alexey Kozhemiakin
> > > > <alexey_kozhemia...@epam.com> wrote:
> > > > > Btw "timing" for distributed requests are broken at this moment, it
> > > > doesn't combine values from requests to shards.  I'm working on a
> > patch.
> > > > >
> > > > > https://issues.apache.org/jira/browse/SOLR-3644
> > > > >
> > > > > -----Original Message-----
> > > > > From: Jack Krupansky [mailto:j...@basetechnology.com]
> > > > > Sent: Tuesday, February 04, 2014 22:00
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: Lowering query time
> > > > >
> > > > > Add the debug=true parameter to some test queries and look at the
> > > > "timing"
> > > > > section to see which search components are taking the time.
> > > > Traditionally, highlighting for large documents was a top culprit.
> > > > >
> > > > > Are you returning a lot of data or field values? Sometimes reducing
> > the
> > > > amount of data processed can help. Any multivalued fields with lots
> of
> > > > values?
> > > > >
> > > > > -- Jack Krupansky
> > > > >
> > > > > -----Original Message-----
> > > > > From: Joel Cohen
> > > > > Sent: Tuesday, February 4, 2014 1:43 PM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: Lowering query time
> > > > >
> > > > > 1. We are faceting. I'm not a developer so I'm not quite sure how
> > we're
> > > > doing it. How can I measure?
> > > > > 2. I'm not sure how we'd force this kind of document partitioning.
> I
> > > can
> > > > see how my shards are partitioned by looking at the clusterstate.json
> > > from
> > > > Zookeeper, but I don't have a clue on how to get documents into
> > specific
> > > > shards.
> > > > >
> > > > > Would I be better off with fewer shards given the small size of my
> > > > indexes?
> > > > >
> > > > >
> > > > > On Tue, Feb 4, 2014 at 12:32 PM, Yonik Seeley <
> yo...@heliosearch.com
> > >
> > > > wrote:
> > > > >
> > > > >> On Tue, Feb 4, 2014 at 12:12 PM, Joel Cohen <
> joel.co...@bluefly.com
> > >
> > > > >> wrote:
> > > > >> > I'm trying to get the query time down to ~15 msec. Anyone have
> any
> > > > >> > tuning recommendations?
> > > > >>
> > > > >> I guess it depends on what the slowest part of the query currently
> > is.
> > > > >>  If you are faceting, it's often that.
> > > > >> Also, it's often a big win if you can somehow partition documents
> > such
> > > > >> that requests can normally be serviced from a single shard.
> > > > >>
> > > > >> -Yonik
> > > > >> http://heliosearch.org - native off-heap filters and fieldcache
> for
> > > > >> solr
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > joel cohen, senior system engineer
> > > > >
> > > > > e joel.co...@bluefly.com p 212.944.8000 x276 bluefly, inc. 42 w.
> > 39th
> > > > st. new york, ny 10018 www.bluefly.com <
> > > > http://www.bluefly.com/?referer=autosig> | *fly since
> > > > > 2013...*
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > joel cohen, senior system engineer
> > >
> > > e joel.co...@bluefly.com p 212.944.8000 x276
> > > bluefly, inc. 42 w. 39th st. new york, ny 10018
> > > www.bluefly.com <http://www.bluefly.com/?referer=autosig> | *fly since
> > > 2013...*
> > >
> >
>
>
>
> --
>
> joel cohen, senior system engineer
>
> e joel.co...@bluefly.com p 212.944.8000 x276
> bluefly, inc. 42 w. 39th st. new york, ny 10018
> www.bluefly.com <http://www.bluefly.com/?referer=autosig> | *fly since
> 2013...*
>

Reply via email to