Does the query time _stay_ low? Once the data is read from HDFS it
should pretty much stay in memory. So my question is whether, once
Solr warms up you see this kind of query response time.

Have you tried this on a non HDFS system? That would be useful to help
figure out where to look.

And given the sizes of your collections, unless you expect them to get
much larger, there's no reason to shard any of them. Sharding should
only really be used when the collections are too big for a single
shard as distributed searches inevitably have increased overhead. I
expect _at least_ 20M documents/shard, and have seen 200M docs/shard.
YMMV of course.

Best,
Erick

On Tue, Sep 26, 2017 at 7:43 AM, sasarun <sasa...@gmail.com> wrote:
> Hi All,
> I have been using Solr for some time now but mostly in standalone mode. Now
> my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
> has the following configuration. In the prod environment the performance on
> querying seems to really slow. Can anyone help me with few pointers on
> howimprove on the same.
>
> <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
>         <str name="solr.hdfs.home">${solr.hdfs.home:}</str>
>         <bool
> name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}</bool>
>         <int
> name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}</int>
>         <bool
> name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false}</bool>
>         <int
> name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}</int>
>         <bool
> name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}</bool>
>         <bool
> name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false}</bool>
>         <bool
> name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}</bool>
>         <int
> name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}</int>
>         <int
> name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}</int>
> </directoryFactory>
>     <lockType>hdfs</lockType>
> It has 6 collections of following size
> Collection 1 -->6.41 MB
> Collection 2 -->634.51 KB
> Collection 3 -->4.59 MB
> Collection 4 -->1,020.56 MB
> Collection 5 --> 607.26 MB
> Collection 6 -->102.4 kb
> Each Collection has 5 shards each. Allocated heap size for young generation
> is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
> size
> utlisation is really low compared to these values.
> But querying to Collection 4 and collection 5 is giving really slow response
> even thoughwe are not using any complex queries.Output of debug quries run
> with debug=timing
> are given below for reference. Can anyone help suggest a way improve the
> performance.
>
> Response to query
> <response>
> <lst name="responseHeader">
> <bool name="zkConnected">true</bool>
> <int name="status">0</int>
> <int name="QTime">3962</int>
> <lst name="params">
> <str name="q">
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> </str>
> <str name="defType">edismax</str>
> <str name="debug">true</str>
> <str name="indent">on</str>
> <arr name="qf">
> <str>host</str>
> <str>title</str>
> <str>url</str>
> <str>customContent</str>
> <str>contentSpecificSearch</str>
> </arr>
> <arr name="fl">
> <str>id</str>
> <str>contentTagsCount</str>
> </arr>
> <str name="start">0</str>
> <str name="bq.op">OR</str>
> <str name="q.op">OR</str>
> <str name="correlationID">3985d7e2-3e54-48d8-8336-229e85f5d9de</str>
> <str name="rows">600</str>
> <str name="bq">
> ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
> "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
> economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
> "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
> "hybrid electric"^15.0 "electric powerplant"^15.0)
> </str>
> </lst>
> </lst>
> <result name="response" numFound="205458" start="0" maxScore="1836.806">
> <lst name="timing">
> <double name="time">15374.0</double>
> <lst name="prepare">
> <double name="time">2.0</double>
> <lst name="query">
> <double name="time">2.0</double>
> </lst>
> <lst name="facet">
> <double name="time">0.0</double>
> </lst>
> <lst name="facet_module">
> <double name="time">0.0</double>
> </lst>
> <lst name="mlt">
> <double name="time">0.0</double>
> </lst>
> <lst name="highlight">
> <double name="time">0.0</double>
> </lst>
> <lst name="stats">
> <double name="time">0.0</double>
> </lst>
> <lst name="expand">
> <double name="time">0.0</double>
> </lst>
> <lst name="terms">
> <double name="time">0.0</double>
> </lst>
> <lst name="debug">
> <double name="time">0.0</double>
> </lst>
> </lst>
> <lst name="process">
> <double name="time">15363.0</double>
> <lst name="query">
> <double name="time">1313.0</double>
> </lst>
> <lst name="facet">
> <double name="time">0.0</double>
> </lst>
> <lst name="facet_module">
> <double name="time">0.0</double>
> </lst>
> <lst name="mlt">
> <double name="time">0.0</double>
> </lst>
> <lst name="highlight">
> <double name="time">0.0</double>
> </lst>
> <lst name="stats">
> <double name="time">0.0</double>
> </lst>
> <lst name="expand">
> <double name="time">0.0</double>
> </lst>
> <lst name="terms">
> <double name="time">0.0</double>
> </lst>
> <lst name="debug">
> <double name="time">14048.0</double>
> </lst>
> </lst>
> </lst>
>
>
> Thanks,
> Arun
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to