solr server heap out
Hi, I frequently getting solr heap out once or twice a day. what could be the possible reasons for the same and is there any way to log memory used by the query in solr.log . Thanks , Abhishek Tiwari
Re: Searching Home's, Homes and Home
Hi Surender, Please go through Stemmer documentation which will give you idea on how stemmer works. I see below issues in configured field types, 1. You have added porter stemmer awa english minimal stemmer also. You can remove one of those based on your requirement. Minimal stemmer is conservative and removes mainly plural endings. 2. KeywordMarkerFilterFactory Protects words from being modified by stemmers. Any words in the protected word list will not be modified by any stemmer in Solr. So it should be added before stemmer. You can try, -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286902.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching Home's, Homes and Home
Hi, I do not want to use Synonyms.txt as this would require to a big library and that will be time consuming. Thanks, Surender Singh -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286897.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching Home's, Homes and Home
Hi, The following is the analyzer information and let me know what I am missing. Thanks, Surender Singh -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286896.html Sent from the Solr - User mailing list archive at Nabble.com.
Zookeeper overseer queue clogging
There are 16 Solr Nodes (Solr 5.2.1) & 5 Zookeeper Nodes (Zookeeper 3.4.6) in our production cluster. We had to restart Solr nodes for some reason and we are doing it after 3 months. To our surprise, none of the solr nodes came up. We can see the Solr process running the machine, but, the Solr Admin console is not reachable. We even tried restarting Zookeeper cluster and Solr node cluster. Still, the issue remained. On debugging I have found out - 1. Below exception in solr.log : > > > *ERROR - 2016-07-12 07:43:48.988; > org.apache.solr.servlet.SolrDispatchFilter; Could not start Solr. Check > solr/home property and the logsERROR - 2016-07-12 07:43:49.012; > org.apache.solr.common.SolrException; > null:org.apache.solr.common.SolrException: Could not find collection : > cont_coll_2_frat > org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:164)* 2. Connected to zookeeper quorum using Zookeeper's zkCli.sh and found out that there are few collections (which are deleted using Solr Collections Delete API) still exists in zookeeper (ls /collections). The same collections doesn't exist on the solr node disk. 3. There are entries related to these deleted collections in Zookeeper's clusterstate.json file as well. 4. There are many entries in overseer queue (/overseer/queue) & queue-work (/overseer/queue-work). I have tried below things based on some existing suggestions on the net - 1. Stopped all the Solr nodes and removed unwanted (which are deleted using Solr Collections Delete API) collections using *rmr *command from Zookeeper (/collections). 2. Removed all the entries from overseer queue (/overseer/queue) & queue-work (/overseer/queue-work) as well. 3. Restarted Zookeeper and then Solr. Even, after doing this the issue still remains. Can someone help me on how to resolve this? - Thanks
Re: sorlcloud connection issue
Dear Mr. Heisey. It seems that we can not send the picture or attachments to solr-user, so I send the screen shot to your personal email, sorry to disturb! Thanks! Kent 2016-07-13 8:13 GMT+08:00 Shawn Heisey: > On 7/12/2016 8:30 AM, Kent Mu wrote: > > We have configed the maxThreads in JBOSS, and the good news is solrcloud > > now running OK. but I another issue came across. We find the number of > the > > HTTP connections is very high, and the number can be around 3300. and > > solrcloud does no release the connections. > > I understand that, the solrcloud needs to connect to zookeeper and > > communication between leader and replica need the connection. but I think > > the number should not to be so huge. > > besides, we use the singleton pattern to connect solrcloud in JAVA. > > Are you referring to the number of http connections in your SolrJ app, > or the number of http connections in Solr itself? Hopefully these are > being run by completely separate JVMs. Where exactly are you looking > when you see 3300 connections? > > The connection to Zookeeper does not use HTTP. It is a TCP connection > but the protocol is custom. Both Solr and SolrJ will maintain a > connection to each of the zookeeper hosts that are in the zkHost string > used when they start. > > Thanks, > Shawn > >
Re: sorlcloud connection issue
we have 5 shards and each shard with one leader and one replica. the "3300" connections is only for one JVM. please see the follow analysis in zabbix. and we the solrj code as follow: public synchronized static CloudSolrServer getSolrCloudReadServer() { if (reviewSolrCloudReadServer == null) { ModifiableSolrParams params = new ModifiableSolrParams(); params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 1000); params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 100); HttpClient client = HttpClientUtil.createClient(params); LBHttpSolrServer lbServer = new LBHttpSolrServer(client); lbServer.setConnectionTimeout(ReviewProperties.getCloudConnectionTimeOut()); lbServer.setSoTimeout(ReviewProperties.getCloudSoTimeOut()); reviewSolrCloudReadServer = new CloudSolrServer(ReviewProperties.getZkHost(),lbServer); reviewSolrCloudReadServer.setDefaultCollection(ReviewProperties.getZkReviewConnection()); reviewSolrCloudReadServer.setZkClientTimeout(ReviewProperties.getZkClientTimeout()); reviewSolrCloudReadServer.setZkConnectTimeout(ReviewProperties.getZkConnectTimeout()); } return reviewSolrCloudReadServer; } 2016-07-13 8:13 GMT+08:00 Shawn Heisey: > On 7/12/2016 8:30 AM, Kent Mu wrote: > > We have configed the maxThreads in JBOSS, and the good news is solrcloud > > now running OK. but I another issue came across. We find the number of > the > > HTTP connections is very high, and the number can be around 3300. and > > solrcloud does no release the connections. > > I understand that, the solrcloud needs to connect to zookeeper and > > communication between leader and replica need the connection. but I think > > the number should not to be so huge. > > besides, we use the singleton pattern to connect solrcloud in JAVA. > > Are you referring to the number of http connections in your SolrJ app, > or the number of http connections in Solr itself? Hopefully these are > being run by completely separate JVMs. Where exactly are you looking > when you see 3300 connections? > > The connection to Zookeeper does not use HTTP. It is a TCP connection > but the protocol is custom. Both Solr and SolrJ will maintain a > connection to each of the zookeeper hosts that are in the zkHost string > used when they start. > > Thanks, > Shawn > >
Re: solrcloud consumes more time than solr when write index
Dear Mr. Wartes, Thanks for your reply. well, I see. for solr we do have replicas, and for solrcloud, we have 5 shards and each shards with one leader and one replica. and the data number is nearly 100 million, you mean we do not need to optimize the index data? Thanks! Kent 2016-07-12 23:02 GMT+08:00 Jeff Wartes: > Well, two thoughts: > > > 1. If you’re not using solrcloud, presumably you don’t have any replicas. > If you are, presumably you do. This makes for a biased comparison, because > SolrCloud won’t acknowledge a write until it’s been safely written to all > replicas. In short, solrcloud write time is max(per-replica write time). > The more replicas you add, the bigger the chance some replica randomly > takes longer (gc pause, perhaps?), and the longer your overall write time, > assuming a fixed number of indexing threads. > 2. The parallelism of the optimize operation across replicas has gone back > and forth a bit, and I’m not sure what it was doing in 4.9. However, at one > point the optimize happened per-replica, serially. So it’d do > shard1_replica1, then when that was done, do shard1_replica2, then > shard2_replica1, etc. Other versions of Solr would do those at the same > time. Again, I don’t know if you’re comparing to a non-replicated solr > index, but that could explain some of the difference. > > There’s a sort of an obligatory comment at this point that optimize > doesn’t necessarily save you a lot. There are certainly cases where it > does, but if you haven’t already, you’ll want to validate that you have one > of them and that you’re not just doing unnecessary work. > > > On 7/12/16, 7:41 AM, "Kent Mu" wrote: > > >hello, does anybody also come across the issue? can anybody help me? > > > >2016-07-11 23:17 GMT+08:00 Kent Mu : > > > >> Hi friends! > >> > >> solr version: 4.9.0. > >> > >> we use solr and solrcloud in our project, that means we use sorl and > >> solrcloud at the same time. > >> but we find a phenomenon that sorlcoud consumes more time than solr when > >> write index. it takes nearly 5 or more times longer. I wonder that is > why? > >> > >> in our project, we have a scheduler job to add index, and then execute > the > >> the method of "optimize(false, true, 2)" to optimize the added index. > >> I wonder if it is caused by solrcloud internal that when writing index, > >> solrcloud needs to just which shard it should be stored? and when > >> optimizing the replicate needs to take some time to synchronize the data > >> from leader? > >> > >> and I wonder what about query? will solrcloud also take more time than > >> solr when query data? > >> > >
Re: Upgrading solr 4.1.4 to solr 6.1.0
Thank you very much for your prompt response. I really appreciate it! Rachid On Jul 12, 2016 17:13, "Shawn Heisey"wrote: > On 7/12/2016 5:54 PM, Rachid Bouacheria wrote: > > I am running solr 4.10.4 and I would like to upgrade to the latest > version > > 6.1.0 > > > > The documentation I found provides steps to upgrade from 4.10.4 to 5.x > > And it seems like going from 4.x to 5.x is pretty consequent. > > Going from 5.x to 6.1.0 seems to be less effort but still non negligible. > > > > I am wondering if anyone had to do a similar upgrade? If so how did you > do > > it? Upgrade to 5.x and then to 6, or straight from 4.x to 6? > > Any tips or advice are welcome. > > The 6.1.0 version cannot read your 4.x indexes. It can read 5.x and > later indexes. > > If you can "upgrade" by setting up a new Solr install and reindexing > everything, that will always achieve the best results. This is how I do > upgrades. There's no need to worry about the old index format at all. > > If that's not possible, then you will need to convert your index to 5.x > format before upgrading to 6.x. You can do this by upgrading to a 5.x > version first and optimizing all your indexes, or you can use the > IndexUpgrader tool from Lucene, first from 5.x, and then from 6.x, to > upgrade your index in stages. > > https://cwiki.apache.org/confluence/display/solr/IndexUpgrader+Tool > > Thanks, > Shawn > >
Re: High cpu and gc time when performing optimization.
On 7/12/2016 9:45 AM, Jason wrote: > I'm using optimize because it's a option for fast search. Our index > updates one or more weekly. If I don't use optimize, many index files > should be kept. Any performance issues in that case? And I'm wondering > relation between index file size and heap size. In case of running as > master server that only update index, is there any guide for heap size > include Xmx, NewSize, MaxNewSize, etc.? In older (2.x and 3.x) versions of Lucene, optimizing an index would make a huge difference in performance. In modern versions, the performance increase from an optimize is much less dramatic. Lucene (and by extension, Solr) has gotten very good at dealing with an index comprised of many segments. The recommendation for the last few years has been to AVOID doing an optimize unless it can be done during times of very low query traffic, when the I/O load will not cause issues. About the only good reason left for frequent optimizes is when the index has many updates to existing documents, resulting in a very large percentage of deleted documents in the index. In that case, the optimize will shrink the overall index size, which will make it faster and make relevancy more accurate. There is no general information available for setting the heap size. There is also no general information available on "acceptable" index size. The following wiki page touches a little bit on the heap size topic: https://wiki.apache.org/solr/SolrPerformanceProblems The reason that there is no generic information available is covered here: https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Thanks, Shawn
Re: sorlcloud connection issue
On 7/12/2016 8:30 AM, Kent Mu wrote: > We have configed the maxThreads in JBOSS, and the good news is solrcloud > now running OK. but I another issue came across. We find the number of the > HTTP connections is very high, and the number can be around 3300. and > solrcloud does no release the connections. > I understand that, the solrcloud needs to connect to zookeeper and > communication between leader and replica need the connection. but I think > the number should not to be so huge. > besides, we use the singleton pattern to connect solrcloud in JAVA. Are you referring to the number of http connections in your SolrJ app, or the number of http connections in Solr itself? Hopefully these are being run by completely separate JVMs. Where exactly are you looking when you see 3300 connections? The connection to Zookeeper does not use HTTP. It is a TCP connection but the protocol is custom. Both Solr and SolrJ will maintain a connection to each of the zookeeper hosts that are in the zkHost string used when they start. Thanks, Shawn
Re: Upgrading solr 4.1.4 to solr 6.1.0
On 7/12/2016 5:54 PM, Rachid Bouacheria wrote: > I am running solr 4.10.4 and I would like to upgrade to the latest version > 6.1.0 > > The documentation I found provides steps to upgrade from 4.10.4 to 5.x > And it seems like going from 4.x to 5.x is pretty consequent. > Going from 5.x to 6.1.0 seems to be less effort but still non negligible. > > I am wondering if anyone had to do a similar upgrade? If so how did you do > it? Upgrade to 5.x and then to 6, or straight from 4.x to 6? > Any tips or advice are welcome. The 6.1.0 version cannot read your 4.x indexes. It can read 5.x and later indexes. If you can "upgrade" by setting up a new Solr install and reindexing everything, that will always achieve the best results. This is how I do upgrades. There's no need to worry about the old index format at all. If that's not possible, then you will need to convert your index to 5.x format before upgrading to 6.x. You can do this by upgrading to a 5.x version first and optimizing all your indexes, or you can use the IndexUpgrader tool from Lucene, first from 5.x, and then from 6.x, to upgrade your index in stages. https://cwiki.apache.org/confluence/display/solr/IndexUpgrader+Tool Thanks, Shawn
Upgrading solr 4.1.4 to solr 6.1.0
Hi All, I am running solr 4.10.4 and I would like to upgrade to the latest version 6.1.0 The documentation I found provides steps to upgrade from 4.10.4 to 5.x And it seems like going from 4.x to 5.x is pretty consequent. Going from 5.x to 6.1.0 seems to be less effort but still non negligible. I am wondering if anyone had to do a similar upgrade? If so how did you do it? Upgrade to 5.x and then to 6, or straight from 4.x to 6? Any tips or advice are welcome. Thank you all very much!
Update QParserPlugin containing FilteredQuery to Solr 5.5 - HowTo?
Hi, we developed a custom QParserPlugin for Solr 4.3. This QParser is for comparing the numeric values of the documents with numeric values of the search query. The first step was to reduce the number of documents by pre parsing the request and creating a lucene query: final String queryString = "myField:" + preParse(searchString); final QParser parser = getParser(queryString, "lucene", getReq()); Then we used an FilteredQuery to select only matching records (which will then get scored): this.innerQuery = new MyQuery(new FilteredQuery(parser.parse(), new MyFilter(searchString))); This worked really well. But now we want to update to Solr 5.5. There the FilteredQuery is marked as deprecated. The documentation says: FilteredQuery will be removed in Lucene 6.0. It should be replaced with a BooleanQuery with one BooleanClause.Occur.MUST clause for the query and one BooleanClause.Occur.FILTER clause for the filter. But I don't know how to do this. Is there any tutorial for this? I would start with: BooleanQuery.Builder builder = new BooleanQuery. builder.add(parser.parse(), BooleanClause.Occur.MUST); builder.add(new MyFilterQuery(searchString), BooleanClause.Occur.FILTER); this.innerQuery = builder.build(); But how do I get MyFilterQuery to filter my results? Thank you for your time and you help! -Oliver
Re: High cpu and gc time when performing optimization.
Heap: start small and increase as necessary. Leave as much RAM for FS cache, don't give it to the JVM until it starts crying. SPM for Solr will help you see when Solr and JVM are starting to hurt. Otis > On Jul 12, 2016, at 11:45, Jasonwrote: > > I'm using optimize because it's a option for fast search. > Our index updates one or more weekly. > If I don't use optimize, many index files should be kept. > Any performance issues in that case? > > And I'm wondering relation between index file size and heap size. > In case of running as master server that only update index, > is there any guide for heap size include Xmx, NewSize, MaxNewSize, etc.? > > > > Yonik Seeley wrote >> Optimize is a very expensive operation. It involves reading the >> entire index and merging and rewriting at a single segment. >> If you find it too expensive, do it less often, or don't do it at all. >> It's an optional operation. >> >> -Yonik >> >> >> On Mon, Jul 11, 2016 at 10:19 PM, Jason > >> hialooha@ > >> wrote: >>> hi, all. >>> >>> I'm running solr instance with two cores and JVM max heap is 32G. >>> Each core index size is 68G, 61G repectively. >>> I'm always keeping on optimization after update index. >>> BTW, on last week, document update is completed but optimize phase cpu is >>> very high. >>> I think that is because long gc time. >>> How should I solve this problem? >>> welcome any idea. >>> thanks, >>> >>> >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html >>> Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286796.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: High cpu and gc time when performing optimization.
It's more a matter of "is unoptimized fast enough"? If so, why bother? The background merging will keep segment counts relatively reasonable. If you're updating your index only once a week, it's reasonable to optimize. Anecdotal reports are of on the order of a 10% speedup _at best_. As Yonik says, optimizing is expensive. You'll have to evaluate whether that expense is worth it in your case, there's no universal answer. Best, Erick On Tue, Jul 12, 2016 at 8:45 AM, Jasonwrote: > I'm using optimize because it's a option for fast search. > Our index updates one or more weekly. > If I don't use optimize, many index files should be kept. > Any performance issues in that case? > > And I'm wondering relation between index file size and heap size. > In case of running as master server that only update index, > is there any guide for heap size include Xmx, NewSize, MaxNewSize, etc.? > > > > Yonik Seeley wrote >> Optimize is a very expensive operation. It involves reading the >> entire index and merging and rewriting at a single segment. >> If you find it too expensive, do it less often, or don't do it at all. >> It's an optional operation. >> >> -Yonik >> >> >> On Mon, Jul 11, 2016 at 10:19 PM, Jason > >> hialooha@ > >> wrote: >>> hi, all. >>> >>> I'm running solr instance with two cores and JVM max heap is 32G. >>> Each core index size is 68G, 61G repectively. >>> I'm always keeping on optimization after update index. >>> BTW, on last week, document update is completed but optimize phase cpu is >>> very high. >>> I think that is because long gc time. >>> How should I solve this problem? >>> welcome any idea. >>> thanks, >>> >>> >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html >>> Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286796.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching Home's, Homes and Home
copy in your analyzer from your schema.xml -- *John Blythe* Product Manager & Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Tue, Jul 12, 2016 at 8:10 AM, Surenderwrote: > Hi, > > I have checked the results and I am not getting desired results. Please > suggest. > > Thanks, > Surender Singh > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286757.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Searching Home's, Homes and Home
Hi, I have checked the results and I am not getting desired results. Please suggest. Thanks, Surender Singh -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286757.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: High cpu and gc time when performing optimization.
I'm using optimize because it's a option for fast search. Our index updates one or more weekly. If I don't use optimize, many index files should be kept. Any performance issues in that case? And I'm wondering relation between index file size and heap size. In case of running as master server that only update index, is there any guide for heap size include Xmx, NewSize, MaxNewSize, etc.? Yonik Seeley wrote > Optimize is a very expensive operation. It involves reading the > entire index and merging and rewriting at a single segment. > If you find it too expensive, do it less often, or don't do it at all. > It's an optional operation. > > -Yonik > > > On Mon, Jul 11, 2016 at 10:19 PM, Jason > hialooha@ > wrote: >> hi, all. >> >> I'm running solr instance with two cores and JVM max heap is 32G. >> Each core index size is 68G, 61G repectively. >> I'm always keeping on optimization after update index. >> BTW, on last week, document update is completed but optimize phase cpu is >> very high. >> I think that is because long gc time. >> How should I solve this problem? >> welcome any idea. >> thanks, >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html >> Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286796.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: High cpu and gc time when performing optimization.
Let me know the guide reference address which is mentioned reasonable index size is around 15G. -- View this message in context: http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286790.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrcloud consumes more time than solr when write index
Well, two thoughts: 1. If you’re not using solrcloud, presumably you don’t have any replicas. If you are, presumably you do. This makes for a biased comparison, because SolrCloud won’t acknowledge a write until it’s been safely written to all replicas. In short, solrcloud write time is max(per-replica write time). The more replicas you add, the bigger the chance some replica randomly takes longer (gc pause, perhaps?), and the longer your overall write time, assuming a fixed number of indexing threads. 2. The parallelism of the optimize operation across replicas has gone back and forth a bit, and I’m not sure what it was doing in 4.9. However, at one point the optimize happened per-replica, serially. So it’d do shard1_replica1, then when that was done, do shard1_replica2, then shard2_replica1, etc. Other versions of Solr would do those at the same time. Again, I don’t know if you’re comparing to a non-replicated solr index, but that could explain some of the difference. There’s a sort of an obligatory comment at this point that optimize doesn’t necessarily save you a lot. There are certainly cases where it does, but if you haven’t already, you’ll want to validate that you have one of them and that you’re not just doing unnecessary work. On 7/12/16, 7:41 AM, "Kent Mu"wrote: >hello, does anybody also come across the issue? can anybody help me? > >2016-07-11 23:17 GMT+08:00 Kent Mu : > >> Hi friends! >> >> solr version: 4.9.0. >> >> we use solr and solrcloud in our project, that means we use sorl and >> solrcloud at the same time. >> but we find a phenomenon that sorlcoud consumes more time than solr when >> write index. it takes nearly 5 or more times longer. I wonder that is why? >> >> in our project, we have a scheduler job to add index, and then execute the >> the method of "optimize(false, true, 2)" to optimize the added index. >> I wonder if it is caused by solrcloud internal that when writing index, >> solrcloud needs to just which shard it should be stored? and when >> optimizing the replicate needs to take some time to synchronize the data >> from leader? >> >> and I wonder what about query? will solrcloud also take more time than >> solr when query data? >>
Re: Return docs with only the matched fields for a query
I’m not sure you need a custom component. Try using the standard highlighter. Configure hl.simple.pre and hl.simple.post to be empty strings. Configure it to return one maximum length snippet. That should return the entire matching fields, though I haven’t tested it. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 12, 2016, at 1:24 AM, Prasanna Josium> wrote: > > Hi all, > > My requirement is in line with > https://issues.apache.org/jira/browse/SOLR-3955 > I'm working on a project that has very low network bandwidth for the clients. > I'm using Solr 4.10 > > The problem: > I have ~ 1M documents with multiple fields(~50), many of them are indexed, > stored and some of them are multivalued. > Queries are searched across all these fields and often, only a few of the > fields have matching terms in them. > > When I search for a term="Hobbit", I want to return documents only with the > matching fields where "Hobbit" is found. > All other un matched fields in the result doc shall be dropped from the > result set. > > Naïve solution: > The obvious solution I could think of was to implement a custom search > component based on the "Highlighter" component to filter out unwanted fields. > But I'm not sure of the performance penalty for large number of fields or > many multi valued fields / document. > > Question: > Is there a better way to solve this problem? Apparently I'm not the first > person facing such an issue. > > Thanks > Cheers > Prasanna > > > >
Re: solrcloud consumes more time than solr when write index
hello, does anybody also come across the issue? can anybody help me? 2016-07-11 23:17 GMT+08:00 Kent Mu: > Hi friends! > > solr version: 4.9.0. > > we use solr and solrcloud in our project, that means we use sorl and > solrcloud at the same time. > but we find a phenomenon that sorlcoud consumes more time than solr when > write index. it takes nearly 5 or more times longer. I wonder that is why? > > in our project, we have a scheduler job to add index, and then execute the > the method of "optimize(false, true, 2)" to optimize the added index. > I wonder if it is caused by solrcloud internal that when writing index, > solrcloud needs to just which shard it should be stored? and when > optimizing the replicate needs to take some time to synchronize the data > from leader? > > and I wonder what about query? will solrcloud also take more time than > solr when query data? >
Re: sorlcloud connection issue
Dear Mr. Heisey. We have configed the maxThreads in JBOSS, and the good news is solrcloud now running OK. but I another issue came across. We find the number of the HTTP connections is very high, and the number can be around 3300. and solrcloud does no release the connections. I understand that, the solrcloud needs to connect to zookeeper and communication between leader and replica need the connection. but I think the number should not to be so huge. besides, we use the singleton pattern to connect solrcloud in JAVA. look forward to your reply. Thanks! 2016-07-07 22:24 GMT+08:00 Shawn Heisey: > On 7/6/2016 5:26 AM, Kent Mu wrote: > > Hi friends! > > *solr version: 4.9.0* > > > > I came across a problem when use solrcloud, it becomes dead lock, we got > > the java core log, it looks like the http connection pool is exhausted > and > > most threads are waiting to get a free connection.. > > > > I posted the problem in JIRA, the link is > > https://issues.apache.org/jira/browse/SOLR-9253 > > I have increased http connection defaults for the SolrJ client, and also > > configed the connection defaults in solr.xml for all shard servers as > below. > > > > > class="HttpShardHandlerFactory"> > > 6 > > 3 > > 1 > > 500 > > > > I can see JBoss classes in the thread dump that was added to SOLR-9253. > > That thread dump shows 213 threads in the RUNNABLE state, and 507 in the > WAITING state. I do not think you are running into the configured shard > handler limits. I think your container is not allowing enough Solr > threads to run. > > Just like Tomcat and Jetty, JBoss has a "maxThreads" setting that > defaults to 200. Increasing this setting is critical for scalability > when using a third-party container. I recommend 1 -- which is the > setting you'll find in the Jetty that's included with Solr. > > Note that if you upgrade Solr to 5.x or 6.x, running in JBoss will no > longer be a supported configuration. > > https://wiki.apache.org/solr/WhyNoWar > > Thanks, > Shawn > >
Re: High cpu and gc time when performing optimization.
Optimize is a very expensive operation. It involves reading the entire index and merging and rewriting at a single segment. If you find it too expensive, do it less often, or don't do it at all. It's an optional operation. -Yonik On Mon, Jul 11, 2016 at 10:19 PM, Jasonwrote: > hi, all. > > I'm running solr instance with two cores and JVM max heap is 32G. > Each core index size is 68G, 61G repectively. > I'm always keeping on optimization after update index. > BTW, on last week, document update is completed but optimize phase cpu is > very high. > I think that is because long gc time. > How should I solve this problem? > welcome any idea. > thanks, > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom Post Filter length & performance
You should be able to send a POST to Solr that would work with larger requests. Postfilter performance is driven by three things: 1) How much overhead is involved in the handling of fq parameter, turning into data structures etc... 2) How many documents the post filter needs to look at. 3) How fast is the filter for each document. If you have large result sets, you'll need to optimized #3. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jul 12, 2016 at 8:06 AM, Vasu Ywrote: > Hi, > I am implementing a custom post filter for permission checks along the > lines described by Erik at > https://lucidworks.com/blog/2012/02/22/custom-security-filtering-in-solr/ > > Is there a limit to the length (number of characters) of the custom post > filter? In our case, length of this "fq" could be up to a maximum of 15000 > characters. > Also, if the post filter is not accessing any external system (no DB access > and no REST/Web-service calls) and just only doing a look-up of about 4 > field values (for each document) against the passed "fq" values (stored in > couple of HashSets), would the performance degrade significantly (I do > understand there will be some cost) when compared to not applying the > security filter. > > Thanks, > Vasu >
Re: High cpu and gc time when performing optimization.
as I said before. we also come across the issue. and I just guess the possible reason. let's wait the expert to explain for us. on the other hand. I find that your index data is 68G, that is too large, I recommend you to use solrcloud, as the guide reference, the reasonable size is around 15G. now our project use solr and solrcloud together so that if anyone down or other issue, we can switch to the well-running one. 2016-07-12 17:02 GMT+08:00 Jason: > hi, Kent > thanks your reply. > > I think that I need more explain to my server status. > I'm using solr 4.2.1 and master-slave replication model. > On master server many solr(tomcat) instances are running. > (server has 64 cores, 128G ram.) > Now 4 solr(tomcat) instances are running and are allocated 32, 16, 16, 8G > max heap respectively. > When cpu is high on optimize phase, load average is almost over 100. > And high cpu time is continued very long(5 hours over). > Besides, other process of solr(tomcat) instance use also high cpu. > But I'd not operated in other instances. > So, I tried stop the other instances and just run one instance. > But still cpu is high. > I don't know how should I do. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286733.html > Sent from the Solr - User mailing list archive at Nabble.com. >
AW: group.facet=true and facet on field of type int -> org.apache.solr.common.SolrException: Exception during facet.field
Hi all, Tested on Solr 6.1.0 (as well as 5.4.0 and 5.5.0) using the "techproducts" example the following query throws the same exception as in my original question: To reproduce: 1) set up the techproducts example: solr start -e techproducts -noprompt 2) go to Solr Admin: http://localhost:8983/solr/#/techproducts/query 3) in "Raw Query Parameters" enter: group=true=true=true=manu_id_s=true=popularity 4) Hit "Execute Query" [..] "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","java.lang.IllegalStateException"], "msg":"Exception during facet.field: popularity", "trace":"org.apache.solr.common.SolrException: Exception during facet.field: popularity\r\n\tat org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$50(SimpleFacets.java:739)\r\n\tat org.apache.solr.request.SimpleFacets$$Lambda$37/2022187546.call(Unknown Source)\r\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\r\n\tat org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:672)\r\n\tat org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:748)\r\n\tat org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:321)\r\n\tat org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:265)\r\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:293)\r\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)\r\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\r\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\r\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\r\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\r\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat org.eclipse.jetty.server.Server.handle(Server.java:518)\r\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\r\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\r\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\r\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\r\n\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\r\n\tat java.lang.Thread.run(Thread.java:745)\r\nCaused by: java.lang.IllegalStateException: unexpected docvalues type NUMERIC for field 'popularity' (expected=SORTED). Use UninvertingReader or index with docvalues.\r\n\tat org.apache.lucene.index.DocValues.checkField(DocValues.java:212)\r\n\tat org.apache.lucene.index.DocValues.getSorted(DocValues.java:264)\r\n\tat org.apache.lucene.search.grouping.term.TermGroupFacetCollector$SV.doSetNextReader(TermGroupFacetCollector.java:129)\r\n\tat org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33)\r\n\tat org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:660)\r\n\tat org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:473)\r\n\tat org.apache.solr.request.SimpleFacets.getGroupedCounts(SimpleFacets.java:638)\r\n\tat org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:443)\r\n\tat
Re: Searching Home's, Homes and Home
Hi Surender, Can you share your current field configuration so that we can debug it from there.. ? Share your field and fieldtype definition from schema.xml . -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286768.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching Home's, Homes and Home
Or you can build a file called synonym.txt in your directory config of your core. Le 11 juil. 2016 17:06, "Surender"a écrit : > Thanks... > > I am applying these filters and will share update on this issue. It will > take couple of days. > > Thanks, > Surender Singh > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341p4286579.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Custom Post Filter length & performance
Hi, I am implementing a custom post filter for permission checks along the lines described by Erik at https://lucidworks.com/blog/2012/02/22/custom-security-filtering-in-solr/ Is there a limit to the length (number of characters) of the custom post filter? In our case, length of this "fq" could be up to a maximum of 15000 characters. Also, if the post filter is not accessing any external system (no DB access and no REST/Web-service calls) and just only doing a look-up of about 4 field values (for each document) against the passed "fq" values (stored in couple of HashSets), would the performance degrade significantly (I do understand there will be some cost) when compared to not applying the security filter. Thanks, Vasu
Re: Multilevel grouping?
I started this a while ago, but haven't found the time to finish: https://issues.apache.org/jira/browse/SOLR-7830 -Yonik On Tue, Jul 12, 2016 at 7:29 AM, Aditya Sundaramwrote: > Does solr support multilevel grouping? I want to group upto 2/3 levels > based on different fields i.e 1st group on field one, within which i group > by field 2 etc. > I am aware of facet.pivot which does the same but retrieves only the count. > Is there anyway to get the documents as well along with the count in > facet.pivot??? > > -- > Aditya Sundaram
Multilevel grouping?
Does solr support multilevel grouping? I want to group upto 2/3 levels based on different fields i.e 1st group on field one, within which i group by field 2 etc. I am aware of facet.pivot which does the same but retrieves only the count. Is there anyway to get the documents as well along with the count in facet.pivot??? -- Aditya Sundaram
Re: Return docs with only the matched fields for a query
Hi Josium, Have to try something like this http://localhost:8983/solr/mycollection/select?fq=Hobbit:*=on=*:*=json This will return all documents that contain the field Hobbit ONLY. Well, I'm not very sure to understand what you seeking for, excuse me if my answer is out of topic. Best regards, Cole. On Tue, Jul 12, 2016 at 9:24 AM, Prasanna Josium < prasanna.jos...@clustr.co.in> wrote: > Hi all, > > My requirement is in line with > https://issues.apache.org/jira/browse/SOLR-3955 > I'm working on a project that has very low network bandwidth for the > clients. > I'm using Solr 4.10 > > The problem: > I have ~ 1M documents with multiple fields(~50), many of them are > indexed, stored and some of them are multivalued. > Queries are searched across all these fields and often, only a few of the > fields have matching terms in them. > > When I search for a term="Hobbit", I want to return documents only with > the matching fields where "Hobbit" is found. > All other un matched fields in the result doc shall be dropped from the > result set. > > Naïve solution: > The obvious solution I could think of was to implement a custom search > component based on the "Highlighter" component to filter out unwanted > fields. > But I'm not sure of the performance penalty for large number of fields or > many multi valued fields / document. > > Question: > Is there a better way to solve this problem? Apparently I'm not the first > person facing such an issue. > > Thanks > Cheers > Prasanna > > > > >
Re: High cpu and gc time when performing optimization.
hi, Kent thanks your reply. I think that I need more explain to my server status. I'm using solr 4.2.1 and master-slave replication model. On master server many solr(tomcat) instances are running. (server has 64 cores, 128G ram.) Now 4 solr(tomcat) instances are running and are allocated 32, 16, 16, 8G max heap respectively. When cpu is high on optimize phase, load average is almost over 100. And high cpu time is continued very long(5 hours over). Besides, other process of solr(tomcat) instance use also high cpu. But I'd not operated in other instances. So, I tried stop the other instances and just run one instance. But still cpu is high. I don't know how should I do. -- View this message in context: http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704p4286733.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: High cpu and gc time when performing optimization.
we also came across this issue. I think it is not caused by gc time, but the optimize action, though I did not read the source code, I think when optimize the index in master internally, it will produce the replicate log file, and the replicates synchronize the log file, just like the DB master and slave theory, it will consumes much CPU and the IO will be very high. but It is OK, and will take some time. 2016-07-12 10:19 GMT+08:00 Jason: > hi, all. > > I'm running solr instance with two cores and JVM max heap is 32G. > Each core index size is 68G, 61G repectively. > I'm always keeping on optimization after update index. > BTW, on last week, document update is completed but optimize phase cpu is > very high. > I think that is because long gc time. > How should I solve this problem? > welcome any idea. > thanks, > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/High-cpu-and-gc-time-when-performing-optimization-tp4286704.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Return docs with only the matched fields for a query
Hi all, My requirement is in line with https://issues.apache.org/jira/browse/SOLR-3955 I'm working on a project that has very low network bandwidth for the clients. I'm using Solr 4.10 The problem: I have ~ 1M documents with multiple fields(~50), many of them are indexed, stored and some of them are multivalued. Queries are searched across all these fields and often, only a few of the fields have matching terms in them. When I search for a term="Hobbit", I want to return documents only with the matching fields where "Hobbit" is found. All other un matched fields in the result doc shall be dropped from the result set. Naïve solution: The obvious solution I could think of was to implement a custom search component based on the "Highlighter" component to filter out unwanted fields. But I'm not sure of the performance penalty for large number of fields or many multi valued fields / document. Question: Is there a better way to solve this problem? Apparently I'm not the first person facing such an issue. Thanks Cheers Prasanna