Re: DIH - incorrect datasource being picked up by XPathEntityProcessor
Thanks Gora, I tried that but didn't help. Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-incorrect-datasource-being-picked-up-by-XPathEntityProcessor-tp3994802p3995211.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: are stopwords indexed?
Look at the index with the Schema Browser in the Solr UI. This pulls the terms for each field. On Sun, Jul 15, 2012 at 8:38 PM, Giovanni Gherdovich g.gherdov...@gmail.com wrote: Hi all, are stopwords from the stopwords.txt config file supposed to be indexed? I would say no, but this is the situation I am observing on my Solr instance: * I have a bunch of stopwords in stopwords.txt * my fields are of fieldType text from the example schema.xml, i.e. I have -- -- 8 -- -- 8 -- -- 8 -- -- 8 fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index [...] filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_FR.txt enablePositionIncrements=true / [...] /analyzer analyzer type=query [...] filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_FR.txt enablePositionIncrements=true / /analyzer /fieldType -- -- 8 -- -- 8 -- -- 8 -- -- 8 * searching for a stopwords thru solr gives always zero results * inspecting the index with LuCLI http://manpages.ubuntu.com/manpages/natty/man1/lucli.1.html show that all stopwords are in my index. Note that I query LuCLI specifying the field, i.e. with myFieldName:and and not just with the stopword and. Is this normal? Are stopwords indexed? Cheers, Giovanni -- Lance Norskog goks...@gmail.com
Re: Solr facet multiple constraint
Ok i'm added the debug, there is the query from the response after executing query : facet=true,sort=publishingdate desc,debugQuery=true,facet.mincount=1,q=service:1 AND publicationstatus:LIVE,facet.field=pillar,wt=javabin,fq=(((pillar:10))),version=2}},response={numFound=2,start=0,docs=[SolrDocument[{uniquenumber=UniqueNumber1, name=Doc 1, publicationstatus=LIVE, service=1, servicename=service_1, pillar=[10], region=EU, regionname=Europe, documenttype=TRACKER, publishingdate=Sun Jul 15 09:03:32 CEST 2012, publishingyear=2012, teasersummary=Seo_Description, content=answer, creator=chandan, version=1, documentinstanceid=1}], SolrDocument[{uniquenumber=UniqueNumber2, name=Doc 2, publicationstatus=LIVE, service=1, servicename=service_1, pillar=[10], region=EU, regionname=Europe, documenttype=TRACKER, publishingdate=Sat Jul 14 09:03:32 CEST 2012, publishingyear=2012, teasersummary=Seo_Description, content=answer, creator=chandan, version=1, documentinstanceid=1}]]},facet_counts={facet_queries={},facet_fields={pillar={10=2}},facet_dates={},facet_ranges={}},debug={rawquerystring=service:1 AND publicationstatus:LIVE,querystring=service:1 AND publicationstatus:LIVE,parsedquery=+service:1 +publicationstatus:LIVE,parsedquery_toString=+service:1 +publicationstatus:LIVE,explain={UniqueNumber1= 1.2917422 = (MATCH) sum of: 0.7741482 = (MATCH) weight(service:1 in 0), product of: 0.7741482 = queryWeight(service:1), product of: 1.0 = idf(docFreq=4, maxDocs=5) 0.7741482 = queryNorm 1.0 = (MATCH) fieldWeight(service:1 in 0), product of: 1.0 = tf(termFreq(service:1)=1) 1.0 = idf(docFreq=4, maxDocs=5) 1.0 = fieldNorm(field=service, doc=0) 0.517594 = (MATCH) weight(publicationstatus:LIVE in 0), product of: 0.6330043 = queryWeight(publicationstatus:LIVE), product of: 0.81767845 = idf(docFreq=5, maxDocs=5) 0.7741482 = queryNorm 0.81767845 = (MATCH) fieldWeight(publicationstatus:LIVE in 0), product of: 1.0 = tf(termFreq(publicationstatus:LIVE)=1) 0.81767845 = idf(docFreq=5, maxDocs=5) 1.0 = fieldNorm(field=publicationstatus, doc=0) ,UniqueNumber2= 1.2917422 = (MATCH) sum of: 0.7741482 = (MATCH) weight(service:1 in 0), product of: 0.7741482 = queryWeight(service:1), product of: 1.0 = idf(docFreq=4, maxDocs=5) 0.7741482 = queryNorm 1.0 = (MATCH) fieldWeight(service:1 in 0), product of: 1.0 = tf(termFreq(service:1)=1) 1.0 = idf(docFreq=4, maxDocs=5) 1.0 = fieldNorm(field=service, doc=0) 0.517594 = (MATCH) weight(publicationstatus:LIVE in 0), product of: 0.6330043 = queryWeight(publicationstatus:LIVE), product of: 0.81767845 = idf(docFreq=5, maxDocs=5) 0.7741482 = queryNorm 0.81767845 = (MATCH) fieldWeight(publicationstatus:LIVE in 0), product of: 1.0 = tf(termFreq(publicationstatus:LIVE)=1) 0.81767845 = idf(docFreq=5, maxDocs=5) 1.0 = fieldNorm(field=publicationstatus, doc=0) },QParser=LuceneQParser,filter_queries=[(((pillar:10))) As you can see in this request i'm talking about pillar not about user. Thanks for all, David. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-facet-multiple-constraint-tp3992974p3995215.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr - Spatial Search for Specif Areas on Map
David, Thanks for such a detailed response. The data volume I mentioned is the total set of records we have - but we would never ever need to search the entire base in one query; we would divide the data by region or zip code. So, in that case I assume that for a single region, we would not have more than 200M records (this is real , we have a region with that many records). So, I can assume that I can create shards based on regions and the requests would get distributed among these region servers, right? You also mentioned about ~20 concurrent queries per shard - do you have links to some benchmarks? I am very interested to know about the hardware sizing details for such a setup. About setting up Solr for a single shard, I think I will go by your advice. Will see how much a single shard can handle in a decent machine :) The reason why I came up with that figure was, I have a user base of 500k and theres a lot of activity which would happen on the map - every time someone moves the tiles, zooms in/out, scrolls, we are going to send a server side request to fetch some data ( I agree we can benefit much using caching but I believe Solr itself has its own local cache). I might be a bit unrealistic with my 10K rps projections but I have read about 9K rps to map servers from some sources on the internet. And, NO, I don't work for Google :) But who knows we might be building something that can get so much traffic to us in a while. :D BTW, my question still remains - can we do search on polygonal areas on the map? If so, do you have any link where i can get more details? Bounding Box thing wont work for me I guess :( Sam -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995209.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 4.0-ALPHA for general development use?
OK: that is helpful, thanks! On 13 July 2012 15:44, Mark Miller markrmil...@gmail.com wrote: It really comes down to you. Many people run a trunk version of Solr in production. Some never would. Generally, bugs are fixed quickly, and trunk is pretty stable. The main issue is index format changes and upgrades. If you use trunk you generally have to be willing to reindex to upgrade. That's one nice thing about this Alpha - we are saying that unless there is a really bad bug, you will be able to upgrade to future versions without reindexing. Most of the code itself has been in dev and use for years - so it's not so risky in my opinion. It's almost more about Java APIs and what not than code stability when we say Alpha. In fact, just read this http://www.lucidimagination.com/blog/2012/07/03/4-0-alpha-whats-in-a-name/ That should help clarify what this release is. On Fri, Jul 13, 2012 at 6:51 AM, John Field jfi...@astreetpress.com wrote: Hi, we are considering a long-term project (likely lifecycle of several years) with an initial production release in approximately three months. We're intending to use Solr 3.6.0, with a view for upgrading to 4.0 upon stable release. However, http://lucene.apache.org/solr/ now has 4.0-ALPHA as the main download, implying this version is for general use. But on the other hand, the release notes state This is an alpha release for early adopters. and http://wiki.apache.org/solr/Solr4.0 gives a timescale of 60 days minimum before final release. We'd like to use 4.0 features such as near real-time updates, but haven't identified these as must-haves for the initial release. Given that our first production release is likely to occur a month after that 60 days, is 4.0-ALPHA suitable for general product development, or is it recommended to stick with 3.6.0 and accept an upgrade cost when 4.0 is stable? (Perhaps this hinges on understanding why 4.0-ALPHA is now the main download option). Thanks. -- - Mark http://www.lucidimagination.com -- John Field, Software Architect http://www.alexanderstreet.com - Alexander Street Press, world-leading digital humanities publisher.
Re: Computed fields - can I put a function in fl?
Yes, sorry Just a typo. I meant q=*:*fq=start=0rows=10qt=wt=explainOther=fl=product:(if(show_product:true, product, ) thanks On Sat, Jul 14, 2012 at 11:27 PM, Erick Erickson [via Lucene] ml-node+s472066n3995045...@n3.nabble.com wrote: I think in 4.0 you can, but not 3.x as I remember. Your example has the fl as part of the highlight though, is that a typo? Best Erick On Fri, Jul 13, 2012 at 5:21 AM, maurizio1976 [hidden email] wrote: Hi, I have 2 fields, one containing a string (product) and another containing a boolean (show_product). Is there a way of returning the product field with a value of null when the show_product field is false? I can make another field (product_computed) and index that with null where I need but I would like to understand if there is a better approach like putting a function query in the fl and make a computed field. something like: q=*:*fq=start=0rows=10fl=qt=wt=explainOther=hl.fl=/*product:(if(show_product:true, product, )*/ that obviously doesn't work. thanks for any help Maurizio -- View this message in context: http://lucene.472066.n3.nabble.com/Computed-fields-can-I-put-a-function-in-fl-tp3994799.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Computed-fields-can-I-put-a-function-in-fl-tp3994799p3995045.html To unsubscribe from Computed fields - can I put a function in fl?, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/Computed-fields-can-I-put-a-function-in-fl-tp3994799p3995218.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Computed fields - can I put a function in fl?
On Mon, Jul 16, 2012 at 4:43 AM, maurizio1976 maurizio.picc...@gmail.com wrote: Yes, sorry Just a typo. I meant q=*:*fq=start=0rows=10qt=wt=explainOther=fl=product:(if(show_product:true, product, ) thanks Functions normally derive their values from the fieldCache... there isn't currently a function to load stored fields (e.g. your product field), but it's not a bad idea (given this usecase). Here's an example with the exampledocs that shows IN_STOCK_PRICE only if the item is in stock, and otherwise shows 0. This works because price is a single-valued indexed field that the fieldCache works on. http://localhost:8983/solr/query? q=*:* fl=id, inStock, IN_STOCK_PRICE:if(inStock,price,0) -Yonik http://lucidimagination.com
Re: DIH include Fieldset in query
So you want to re-use same SQL sentence in many entities? Yes is it necessary to deploy complete solr and lucene for this? -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-include-Fieldset-in-query-tp3994798p3995228.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: are stopwords indexed?
Hi Giovanni, you have entered the stopwords into stopword.txt file, right? But in the definition of the field type you are referencing stopwords_FR.txt.. best regards, Michael On Mon, 16 Jul 2012 05:38:04 +0200, Giovanni Gherdovich g.gherdov...@gmail.com wrote: Hi all, are stopwords from the stopwords.txt config file supposed to be indexed? I would say no, but this is the situation I am observing on my Solr instance: * I have a bunch of stopwords in stopwords.txt * my fields are of fieldType text from the example schema.xml, i.e. I have -- -- 8 -- -- 8 -- -- 8 -- -- 8 fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index [...] filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_FR.txt enablePositionIncrements=true / [...] /analyzer analyzer type=query [...] filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_FR.txt enablePositionIncrements=true / /analyzer /fieldType -- -- 8 -- -- 8 -- -- 8 -- -- 8 * searching for a stopwords thru solr gives always zero results * inspecting the index with LuCLI http://manpages.ubuntu.com/manpages/natty/man1/lucli.1.html show that all stopwords are in my index. Note that I query LuCLI specifying the field, i.e. with myFieldName:and and not just with the stopword and. Is this normal? Are stopwords indexed? Cheers, Giovanni
Re: are stopwords indexed?
Hi all, thank you for your replies. Lance: Look at the index with the Schema Browser in the Solr UI. This pulls the terms for each field. I did it, and it was the first alarm I got. After the indexing, I went on the schema browser hoping to don't see any stopword in the top-terms, but... they were all there. Michael: Hi Giovanni, you have entered the stopwords into stopword.txt file, right? But in the definition of the field type you are referencing stopwords_FR.txt.. good catch Micheal, but that's not the problem. In my message I referred to stopwords.txt, but actually my stopwords file is named stopwords_FR.txt, consistently with what I put in my schema.xml By the way, your answers make me think that yes, I have a problem: stopwords should not appear in the index. what a weird situation: * querying with SOLR for a stopword (say and) gives me zero result (so, somewhere in the indexing / searching pipeline my stopwords file *is* taken into account) * checking the index files with LuCLI for the same stopword give me tons of hits. cheers, GGhh
Grouping performance problem
Hi, Is the any way to make grouping searches more efficient? My queries look like: /select?q=querygroup=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 For index with 3 mln documents query for all docs with group=true takes almost 4000ms. Because queryResultCache is not used next queries take a long time also. When I remove group=true and leave only faceting the query for all docs takes much more less time: for first time ~ 700ms and next runs only 200ms because of queryResultCache being used. So with group=true the query is about 20 time slower than without it. Is it possible or is there any way to improve performance with grouping? My application needs grouping feature and all of the queries use it but the performance of them is to low for production use. I use Solr 4.x from trunk Agnieszka Kukalowicz
Re: DIH - incorrect datasource being picked up by XPathEntityProcessor
Okay... found the problem after some more debugging. I was using a wrong datasource tag in the data-config.xml, may be Solr should validate the xml against a schema so these kind of issues are caught upfront. wrong: datalt;bs*ource name=fieldSource type=FieldReaderDataSource / correct: datalt;bS*ource name=fieldSource type=FieldReaderDataSource / this resolved the issue. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-incorrect-datasource-being-picked-up-by-XPathEntityProcessor-tp3994802p3995246.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet on all the dynamic fields with *_s feature
Yes, This feature will solve the below problem very neatly. All, Is there any approach to achieve this for now? --Rajani On Sun, Jul 15, 2012 at 6:02 PM, Jack Krupansky j...@basetechnology.comwrote: The answer appears to be No, but it's good to hear people express an interest in proposed features. -- Jack Krupansky -Original Message- From: Rajani Maski Sent: Sunday, July 15, 2012 12:02 AM To: solr-user@lucene.apache.org Subject: Facet on all the dynamic fields with *_s feature Hi All, Is this issue fixed in solr 3.6 or 4.0: Faceting on all Dynamic field with facet.field=*_s Link : https://issues.apache.org/**jira/browse/SOLR-247https://issues.apache.org/jira/browse/SOLR-247 If it is not fixed, any suggestion on how do I achieve this? My requirement is just same as this one : http://lucene.472066.n3.**nabble.com/Dynamic-facet-** field-tc2979407.html#nonehttp://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none Regards Rajani
Re: Facet on all the dynamic fields with *_s feature
You'll have to query the index for the fields and sift out the _s ones and cache them or something. On Mon, 2012-07-16 at 16:52 +0530, Rajani Maski wrote: Yes, This feature will solve the below problem very neatly. All, Is there any approach to achieve this for now? --Rajani On Sun, Jul 15, 2012 at 6:02 PM, Jack Krupansky j...@basetechnology.comwrote: The answer appears to be No, but it's good to hear people express an interest in proposed features. -- Jack Krupansky -Original Message- From: Rajani Maski Sent: Sunday, July 15, 2012 12:02 AM To: solr-user@lucene.apache.org Subject: Facet on all the dynamic fields with *_s feature Hi All, Is this issue fixed in solr 3.6 or 4.0: Faceting on all Dynamic field with facet.field=*_s Link : https://issues.apache.org/**jira/browse/SOLR-247https://issues.apache.org/jira/browse/SOLR-247 If it is not fixed, any suggestion on how do I achieve this? My requirement is just same as this one : http://lucene.472066.n3.**nabble.com/Dynamic-facet-** field-tc2979407.html#nonehttp://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none Regards Rajani
Re: Index version on slave incrementing to higher than master
Andrew: I'm not entirely sure that's your problem, but it's the first thing I'd try. As for your config files, see the section Replicating solrconfig.xml here: http://wiki.apache.org/solr/SolrReplication. That at least allows you to centralize separate solrconfigs for master and slave, making promoting a slave to a master a bit easier Best Erick On Sun, Jul 15, 2012 at 2:00 PM, Andrew Davidoff david...@qedmf.net wrote: Erick, Thank you. I think originally my thought was that if I had my slave configuration really close to my master config, it would be very easy to promote a slave to a master (and vice versa) if necessary. But I think you are correct that ripping out from the slave config anything that would modify an index in any way makes sense. I will give this a try very soon. Thanks again. Andy On Sat, Jul 14, 2012 at 5:22 PM, Erick Erickson erickerick...@gmail.comwrote: Gotta admit it's a bit puzzling, and surely you want to move to the 3x versions G.. But at a guess, things might be getting confused on the slaves given you have a merge policy on them. There's no reason to have any policies on the slaves; slaves should just be about copying the files from the master, all the policies,commits,optimizes should be done on the master. About all the slave does is copy the current state of the index from the master. So I'd try removing everything but the replication from the slaves, including any autocommit stuff and just let replication do it's thing. And I'd replicate after the optimize if you keep the optimize going. You should end up with one segment in the index after that, on both the master and slave. You can't get any more merged than that. Of course you'll also copy the _entire_ index every time after you've optimized... Best Erick On Fri, Jul 13, 2012 at 12:31 AM, Andrew Davidoff david...@qedmf.net wrote: Hi, I am running solr 1.4.0+ds1-1ubuntu1. I have a master server that has a number of solr instances running on it (150 or so), and nightly most of them have documents written to them. The script that does these writes (adds) does a commit and an optimize on the indexes when it's entirely finished updating them, then initiates replication on the slave per instance. In this configuration, the index versions between master and slave remain in synch. The optimize portion, which, again, happens nightly, is taking a lot of time and I think it's unnecessary. I was hoping to stop doing this explicit optimize, and to let my merge policy handle that. However, if I don't do an optimize, and only do a commit before initiating slave replication, some hours later the slave is, for reasons that are unclear to me, incrementing its index version to 1 higher than the master. I am not really sure I understand the logs, but it looks like the incremented index version is the result of an optimize on the slave, but I am never issuing any commands against the slave aside from initiating replication, and I don't think there's anything in my solr configuration that would be initiating this. I do have autoCommit on with maxDocs of 1000, but since I am initiating slave replication after doing a commit on the master, I don't think there would ever be any uncommitted documents on the slave. I do have a merge policy configured, but it's not clear to me that it has anything to do with this. And if it did, I'd expect to see similar behavior on the master (right?). I have included a snipped from my slave logs that shows this issue. In this snipped index version 1286065171264 is what the master has, and 1286065171265 is what the slave increments itself to, which is then out of synch with the master in terms of version numbers. Nothing that I know of is issuing any commands to the slave at this time. If I understand these logs (I might not), it looks like something issued an optimize that took 1023720ms? Any ideas? Thanks in advance. Andy Jul 12, 2012 12:21:14 PM org.apache.solr.update.SolrIndexWriter close FINE: Closing Writer DirectUpdateHandler2 Jul 12, 2012 12:21:14 PM org.apache.solr.core.SolrDeletionPolicy onCommit INFO: SolrDeletionPolicy.onCommit: commits:num=2 commit{dir=/var/lib/ontolo/solr/o_3952/index,segFN=segments_h8,version=1286065171264,generation=620,filenames=[_h6.fnm, _h5.nrm, segments_h8, _h4.nrm, _h5.tii, _h4 .tii, _h5.tis, _h4.tis, _h4.fdx, _h5.fnm, _h6.tii, _h4.fdt, _h5.fdt, _h5.fdx, _h5.frq, _h4.fnm, _h6.frq, _h6.tis, _h4.prx, _h4.frq, _h6.nrm, _h5.prx, _h6.prx, _h6.fdt, _h6 .fdx] commit{dir=/var/lib/ontolo/solr/o_3952/index,segFN=segments_h9,version=1286065171265,generation=621,filenames=[_h7.tis, _h7.fdx, _h7.fnm, _h7.fdt, _h7.prx, segment s_h9, _h7.nrm, _h7.tii, _h7.frq] Jul 12, 2012 12:21:14 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1286065171265 Jul 12, 2012 12:21:14 PM
Re: Query results vs. facets results
Ahhh, you need to look down another few lines. When you specify fq, there should be a section of the debug output like arr name=filter_queries . . . /arr where the array is the parsed form of the filter queries. I was thinking about comparing that with the parsed form of the q parameter in the non-filter case to see what insight one could gain from that. But there's already one difference, when you use *, you get str name=parsedqueryID:*/str Is it possible that you have some documents that do NOT have an ID field? try *:* rather than just *. I'm guessing that your default search field is ID and you have some documents without an ID field. Not a good guess if ID is your uniqueKey though.. Try q=*:* -ID:* and see if you get 31 docs. Also note that if you _have_ specified ID as your uniqueKey _but_ you didn't re-index afterwards (actually, I'd blow away the entire solrhome/data directory and restart) you may have stale data in there that allowed documents to exist that do not have uniqueKey fields. Best Erick On Sun, Jul 15, 2012 at 4:49 PM, tudor tudor.zaha...@gmail.com wrote: Hi Erick, Thanks for the reply. The query: http://localhost:8983/solr/db/select?indent=onversion=2.2q=CITY:MILTONfq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.ngroups=truegroup.truncate=truedebugQuery=on yields this in the debug section: lst name=debugstr name=rawquerystringCITY:MILTON/str str name=querystringCITY:MILTON/str str name=parsedqueryCITY:MILTON/str str name=parsedquery_toStringCITY:MILTON/str str name=QParserLuceneQParser/str There is no information about grouping. Second query: http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field=CITYfacet.missing=truegroup.ngroups=truedebugQuery=on yields this in the debug section: lst name=debug str name=rawquerystring*/str str name=querystring*/str str name=parsedqueryID:*/str str name=parsedquery_toStringID:*/str str name=QParserLuceneQParser/str To be honest, these do not tell me too much. I would like to see some information about the grouping, since I believe this is where I am missing something. In the mean time, I have combined the two queries above, hoping to make some sense out of the results. The following query filters all the entries with the city name MILTON and groups together the ones with the same ID. Also, the query facets the entries on city, grouping the ones with the same ID. So the results numbers refer to the number of groups. http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on yields the same (for me perplexing) results: lst name=grouped lst name=ID int name=matches284/int int name=ngroups134/int (i.e.: fq says: 134 groups with CITY:MILTON) ... lst name=facet_counts lst name=facet_queries/ lst name=facet_fields ... int name=MILTON103/int (i.e.: faceted search says: 103 groups with CITY:MILTON) I really believe that these different results have something to do with the grouping that Solr makes, but I do not know how to dig into this. Thank you and best regards, Tudor -- View this message in context: http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995156.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Grouping performance problem
Hi Agnieszka , if you don't need number of groups, you can try leaving out group.ngroups=true param. In this case Solr apparently skips calculating all groups and delivers results much faster. At least for our application the difference in performance with/without group.ngroups=true is significant (have to say, we use Solr 3.6). WBR, Pavel On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl wrote: Hi, Is the any way to make grouping searches more efficient? My queries look like: /select?q=querygroup=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 For index with 3 mln documents query for all docs with group=true takes almost 4000ms. Because queryResultCache is not used next queries take a long time also. When I remove group=true and leave only faceting the query for all docs takes much more less time: for first time ~ 700ms and next runs only 200ms because of queryResultCache being used. So with group=true the query is about 20 time slower than without it. Is it possible or is there any way to improve performance with grouping? My application needs grouping feature and all of the queries use it but the performance of them is to low for production use. I use Solr 4.x from trunk Agnieszka Kukalowicz
Re: Facet on all the dynamic fields with *_s feature
In this URL - https://issues.apache.org/jira/browse/SOLR-247 there are *patches *and one patch with name *SOLR-247-FacetAllFields* Will that help me to fix this problem? If yes, how do I add this to solr plugin ? Thanks Regards Rajani On Mon, Jul 16, 2012 at 5:04 PM, Darren Govoni dar...@ontrenet.com wrote: You'll have to query the index for the fields and sift out the _s ones and cache them or something. On Mon, 2012-07-16 at 16:52 +0530, Rajani Maski wrote: Yes, This feature will solve the below problem very neatly. All, Is there any approach to achieve this for now? --Rajani On Sun, Jul 15, 2012 at 6:02 PM, Jack Krupansky j...@basetechnology.com wrote: The answer appears to be No, but it's good to hear people express an interest in proposed features. -- Jack Krupansky -Original Message- From: Rajani Maski Sent: Sunday, July 15, 2012 12:02 AM To: solr-user@lucene.apache.org Subject: Facet on all the dynamic fields with *_s feature Hi All, Is this issue fixed in solr 3.6 or 4.0: Faceting on all Dynamic field with facet.field=*_s Link : https://issues.apache.org/**jira/browse/SOLR-247 https://issues.apache.org/jira/browse/SOLR-247 If it is not fixed, any suggestion on how do I achieve this? My requirement is just same as this one : http://lucene.472066.n3.**nabble.com/Dynamic-facet-** field-tc2979407.html#none http://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none Regards Rajani
Re: Grouping performance problem
Hi Pavel, I tried with group.ngroups=false but didn't notice a big improvement. The times were still about 4000 ms. It doesn't solve my problem. Maybe this is because of my index type. I have millions of documents but only about 20 000 groups. Cheers Agnieszka 2012/7/16 Pavel Goncharik pavel.goncha...@gmail.com Hi Agnieszka , if you don't need number of groups, you can try leaving out group.ngroups=true param. In this case Solr apparently skips calculating all groups and delivers results much faster. At least for our application the difference in performance with/without group.ngroups=true is significant (have to say, we use Solr 3.6). WBR, Pavel On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl wrote: Hi, Is the any way to make grouping searches more efficient? My queries look like: /select?q=querygroup=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 For index with 3 mln documents query for all docs with group=true takes almost 4000ms. Because queryResultCache is not used next queries take a long time also. When I remove group=true and leave only faceting the query for all docs takes much more less time: for first time ~ 700ms and next runs only 200ms because of queryResultCache being used. So with group=true the query is about 20 time slower than without it. Is it possible or is there any way to improve performance with grouping? My application needs grouping feature and all of the queries use it but the performance of them is to low for production use. I use Solr 4.x from trunk Agnieszka Kukalowicz
Re: JRockit with SOLR3.4/3.5
Michael, Thanks for the response. Below is the stack trace. Note: Our environment is 64 bit and the Initial Pool size is set to 4GB and Max pool size is 12GB so it doesn't makes sense why it tries to allocate 24GB (even that is available as the total RAM is 64GB). This issue doesn't come with SOLR 1.4 - SEVERE: Error waiting for multi-thread deployment of directories to completehostConfig.deployWar=Deploying web application archive {0} java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: classblock allocation, 1535880 loaded, 1536K footprint, in check_alloc (src/jvm/model/classload/classalloc.c:215). Attempting to allocate 24000M bytes There is insufficient native memory for the Java Runtime Environment to continue. Possible reasons: The system is out of physical RAM or swap space In 32 bit mode, the process size limit was hit Possible solutions: Reduce memory load on the system Increase physical memory or swap space Check if swap backing store is full Use 64 bit Java on a 64 bit OS Decrease Java heap size (-Xmx/-Xms) Decrease number of Java threads Decrease Java thread stack sizes (-Xss) Disable compressed references (-XXcompressedRefs=false) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.catalina.startup.HostConfig.deployDirectories( HostConfig.java:1018) at org.apache.catalina.startup.HostConfig.deployApps( HostConfig.java:475) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1412) at org.apache.catalina.startup.HostConfig.lifecycleEvent( HostConfig.java:312) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent( LifecycleSupport.java:119) at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent( LifecycleBase.java:91) at org.apache.catalina.util.LifecycleBase.setStateInternal( LifecycleBase.java:401) at org.apache.catalina.util.LifecycleBase.setState( LifecycleBase.java:346) at org.apache.catalina.core.ContainerBase.startInternal( ContainerBase.java:1117) at org.apache.catalina.core.StandardHost.startInternal( StandardHost.java:782) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase$StartChild.call( ContainerBase.java:1526) at org.apache.catalina.core.ContainerBase$StartChild.call( ContainerBase.java:1515) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:139) at java.util.concurrent.ThreadPoolExecutor$Worker. runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:909) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.OutOfMemoryError: classblock allocation, 1535880 loaded, 1536K footprint, in check_alloc (src/jvm/model/classload/ classalloc.c:215). Attempting to allocate 24000M bytes There is insufficient native memory for the Java Runtime Environment to continue. Possible reasons: The system is out of physical RAM or swap space In 32 bit mode, the process size limit was hit Possible solutions: Reduce memory load on the system Increase physical memory or swap space Check if swap backing store is full Use 64 bit Java on a 64 bit OS Decrease Java heap size (-Xmx/-Xms) Decrease number of Java threads Decrease Java thread stack sizes (-Xss) Disable compressed references (-XXcompressedRefs=false) at sun.misc.Unsafe.defineClass(Native Method) at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:45) at sun.reflect.MethodAccessorGenerator$1.run( MethodAccessorGenerator.java:381) at sun.reflect.MethodAccessorGenerator.generate( MethodAccessorGenerator.java:377) at sun.reflect.MethodAccessorGenerator.generateConstructor( MethodAccessorGenerator.java:76) at sun.reflect.NativeConstructorAccessorImpl.newInstance( NativeConstructorAccessorImpl.java:30) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance( DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at java.lang.Class.newInstance0(Class.java:355) at java.lang.Class.newInstance(Class.java:308) at javax.xml.parsers.FactoryFinder.newInstance(FactoryFinder.java:147) at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:233) at javax.xml.parsers.SAXParserFactory.newInstance( SAXParserFactory.java:128) at org.apache.tomcat.util.digester.Digester.getFactory( Digester.java:470) at org.apache.tomcat.util.digester.Digester.getParser(Digester.java:677) at org.apache.catalina.startup.ContextConfig.init( ContextConfig.java:780) at org.apache.catalina.startup.ContextConfig.lifecycleEvent( ContextConfig.java:320) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent( LifecycleSupport.java:119) at
Re: Lost answers?
Hello, Bruno, No, 4 simultaneous requests should not be a problem. Have you checked the Tomcat logs or logged the data in the query response object to see if there are any clues to what the problem might be? Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Sun, Jul 15, 2012 at 2:10 PM, Bruno Mannina bmann...@free.fr wrote: I forgot: I do the request on the uniqueKey field, so each request gets one document Le 15/07/2012 14:11, Bruno Mannina a écrit : Dear Solr Users, I have a solr3.6 + Tomcat and I have a program that connect 4 http requests at the same time. I must do 1902 requests. I do several tests but each time it losts some requests: - sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs. With Jetty, I get always 1902 docs. As it's a dev' environment, I'm alone to test it. Is it a problem to do 4 requests at the same time for tomcat6? thanks for your info, Bruno
Re: Solr - Spatial Search for Specif Areas on Map
samabhiK wrote David, Thanks for such a detailed response. The data volume I mentioned is the total set of records we have - but we would never ever need to search the entire base in one query; we would divide the data by region or zip code. So, in that case I assume that for a single region, we would not have more than 200M records (this is real , we have a region with that many records). So, I can assume that I can create shards based on regions and the requests would get distributed among these region servers, right? The fact that your searches are always per region (or almost always) helps things a lot. Instead of doing a distributed search to all shards, you would search the specific shard, or worst case 2 shards, and not burden the other shards with queries you no won't be satisfied. This new information suggests that the total 10k queries per second volume would be divided amongst your shards, so 10k / 40 shards = 250 queries per second. Now we are approaching something reasonable. If any of your regions need to scale up (more query volume) or out (big region) then you can do that on a case by case basis. I can think of ways to optimize that for spatial. Thinking in terms of pure queries per second on a machine, say a 16 CPU core/machine one, then 250/16 = ~ 16 queries per second per CPU core of a shard. I think that's plausible but you would really need to determine how many exactly you could do. I assume the spatial index is going to fit in RAM. If successful, this means ~40 machines (one per region). You also mentioned about ~20 concurrent queries per shard - do you have links to some benchmarks? I am very interested to know about the hardware sizing details for such a setup. The best I can offer is on the geospatial side: https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=12988316page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12988316 But this was an index of only 2M distinct points. It may be that these figures still hold if the overhead of the spatial query with data is so low that other constant elements comprise the times, but I really don't know. To be clear, this is older code that is not the same as the latest, but they are algorithmically the same. The current code has an error epsilon to the query shape which helps scale further. There is plenty more optimization that could be done, like a more efficient binary grid scheme, using Hilbert Curves, and using an optimizer to find the hotspots to try and optimize them. About setting up Solr for a single shard, I think I will go by your advice. Will see how much a single shard can handle in a decent machine :) The reason why I came up with that figure was, I have a user base of 500k and theres a lot of activity which would happen on the map - every time someone moves the tiles, zooms in/out, scrolls, we are going to send a server side request to fetch some data ( I agree we can benefit much using caching but I believe Solr itself has its own local cache). I might be a bit unrealistic with my 10K rps projections but I have read about 9K rps to map servers from some sources on the internet. And, NO, I don't work for Google :) But who knows we might be building something that can get so much traffic to us in a while. :D BTW, my question still remains - can we do search on polygonal areas on the map? If so, do you have any link where i can get more details? Bounding Box thing wont work for me I guess :( Sam Polygons are supported; I've been doing them for years now. But it requires some extensions. Today, you need the latest Solr trunk, and you need to apply the Solr adapters to Lucene 4 spatial SOLR-3304, and you need to have the JTS jar on your classpath, something you download separately. BTW here are some basic docs:http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995333.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR 4 Alpha Out Of Mem Err
On Jul 15, 2012, at 2:45 PM, Nick Koton wrote: I converted my program to use the SolrServer::add(CollectionSolrInputDocument docs) method with 100 documents in each add batch. Unfortunately, the out of memory errors still occur without client side commits. This won't change much unfortunately - currently, each host has 10 add and 10 deletes buffered for it before it will flush. There are some recovery implications that have kept that buffer size low so far - but what it ends up meaning is that when you stream docs, every 10 docs is sent off on a thread. Generally, you might be able to keep up with this - but the commit cost appears to perhaps cause a small resource drop that backs things up a bit - and some of those threads take a little longer to finish while new threads fire off to keep servicing the constantly coming new documents. What appears will happen is large momentary spikes in the number of threads. Each thread needs a bit of space on the heap, and it would seem with a high enough spike you could get an OOM. In my testing, I have not triggered that yet, but I have seen large thread count spikes. Raising the add doc buffer to 100 docs makes those thread bursts much, much less severe. I can't remember all of the implications of that buffer size though - need to talk to Yonik about it. We could limit the number of threads for that executor, but I think that comes with some negatives as well. You could try lowering -Xss so that each thread uses less RAM (if possible) as a shorter term (possible) workaround. You could also use multiple threads with the std HttpSolrServer - it won't be quite as fast probably, but it can get close(ish). My guess is that your client commits help because a commit will cause a wait on all outstanding requests - so that the commit is in logical order - this probably is like releasing a pressure valve - the system has a chance to catch up and reclaim lots of threads. We will keep looking into what the best improvement is. - Mark Miller lucidimagination.com
Re: Grouping performance problem
Re: Solr - Spatial Search for Specif Areas on Map
Thinking more about this, the way to get a Lucene based system to scale to the maximum extent possible for geospatial queries would be to get a geospatial query to be satisfied by just one (usually) Lucene index segment. It would take quite a bit of customization and work to make this happen. I suppose you could always optimize a Solr index and thus get one Lucene segment, but deploy 10-20x the number of Solr shards (aka Solr cores) that one would normally do, and that wouldn't be that hard. There would be some work in determining which Solr core (== Lucene segment) a given document should belong to and which ones to query. ~ David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995357.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Grouping performance problem
I have server with 24GB RAM. I have 4 shards on it, each of them with 4GB RAM for java: JAVA_OPTIONS=-server -Xms4096M -Xmx4096M The size is about 15GB for one shard (i use ssd disk for index data). Agnieszka 2012/7/16 alx...@aim.com What are the RAM of your server and size of the data folder? -Original Message- From: Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl To: solr-user solr-user@lucene.apache.org Sent: Mon, Jul 16, 2012 6:16 am Subject: Re: Grouping performance problem Hi Pavel, I tried with group.ngroups=false but didn't notice a big improvement. The times were still about 4000 ms. It doesn't solve my problem. Maybe this is because of my index type. I have millions of documents but only about 20 000 groups. Cheers Agnieszka 2012/7/16 Pavel Goncharik pavel.goncha...@gmail.com Hi Agnieszka , if you don't need number of groups, you can try leaving out group.ngroups=true param. In this case Solr apparently skips calculating all groups and delivers results much faster. At least for our application the difference in performance with/without group.ngroups=true is significant (have to say, we use Solr 3.6). WBR, Pavel On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl wrote: Hi, Is the any way to make grouping searches more efficient? My queries look like: /select?q=querygroup=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 For index with 3 mln documents query for all docs with group=true takes almost 4000ms. Because queryResultCache is not used next queries take a long time also. When I remove group=true and leave only faceting the query for all docs takes much more less time: for first time ~ 700ms and next runs only 200ms because of queryResultCache being used. So with group=true the query is about 20 time slower than without it. Is it possible or is there any way to improve performance with grouping? My application needs grouping feature and all of the queries use it but the performance of them is to low for production use. I use Solr 4.x from trunk Agnieszka Kukalowicz
Re: Index version on slave incrementing to higher than master
Thanks Erick, I will look harder at our current configuration and how we're handling config replication, but I just realized that a backup script was doing a commit and an optimize on the slave prior to taking the backup. This happens daily, after updates and replication from the master. This is something I put in place many ages ago and didn't think to look at until now :/ Based on the times in the logs and the conditions under which my problem was occurring (when I wasn't optimizing on the master before initiating replication) it seems clear that this backup script is my problem. Sorry for taking your time with something that was clearly my own dang fault. I appreciate your suggestions and responses regardless! Andy On Mon, Jul 16, 2012 at 7:35 AM, Erick Erickson erickerick...@gmail.comwrote: Andrew: I'm not entirely sure that's your problem, but it's the first thing I'd try. As for your config files, see the section Replicating solrconfig.xml here: http://wiki.apache.org/solr/SolrReplication. That at least allows you to centralize separate solrconfigs for master and slave, making promoting a slave to a master a bit easier Best Erick On Sun, Jul 15, 2012 at 2:00 PM, Andrew Davidoff david...@qedmf.net wrote: Erick, Thank you. I think originally my thought was that if I had my slave configuration really close to my master config, it would be very easy to promote a slave to a master (and vice versa) if necessary. But I think you are correct that ripping out from the slave config anything that would modify an index in any way makes sense. I will give this a try very soon. Thanks again. Andy On Sat, Jul 14, 2012 at 5:22 PM, Erick Erickson erickerick...@gmail.com wrote: Gotta admit it's a bit puzzling, and surely you want to move to the 3x versions G.. But at a guess, things might be getting confused on the slaves given you have a merge policy on them. There's no reason to have any policies on the slaves; slaves should just be about copying the files from the master, all the policies,commits,optimizes should be done on the master. About all the slave does is copy the current state of the index from the master. So I'd try removing everything but the replication from the slaves, including any autocommit stuff and just let replication do it's thing. And I'd replicate after the optimize if you keep the optimize going. You should end up with one segment in the index after that, on both the master and slave. You can't get any more merged than that. Of course you'll also copy the _entire_ index every time after you've optimized... Best Erick On Fri, Jul 13, 2012 at 12:31 AM, Andrew Davidoff david...@qedmf.net wrote: Hi, I am running solr 1.4.0+ds1-1ubuntu1. I have a master server that has a number of solr instances running on it (150 or so), and nightly most of them have documents written to them. The script that does these writes (adds) does a commit and an optimize on the indexes when it's entirely finished updating them, then initiates replication on the slave per instance. In this configuration, the index versions between master and slave remain in synch. The optimize portion, which, again, happens nightly, is taking a lot of time and I think it's unnecessary. I was hoping to stop doing this explicit optimize, and to let my merge policy handle that. However, if I don't do an optimize, and only do a commit before initiating slave replication, some hours later the slave is, for reasons that are unclear to me, incrementing its index version to 1 higher than the master. I am not really sure I understand the logs, but it looks like the incremented index version is the result of an optimize on the slave, but I am never issuing any commands against the slave aside from initiating replication, and I don't think there's anything in my solr configuration that would be initiating this. I do have autoCommit on with maxDocs of 1000, but since I am initiating slave replication after doing a commit on the master, I don't think there would ever be any uncommitted documents on the slave. I do have a merge policy configured, but it's not clear to me that it has anything to do with this. And if it did, I'd expect to see similar behavior on the master (right?). I have included a snipped from my slave logs that shows this issue. In this snipped index version 1286065171264 is what the master has, and 1286065171265 is what the slave increments itself to, which is then out of synch with the master in terms of version numbers. Nothing that I know of is issuing any commands to the slave at this time. If I understand these logs (I might not), it looks like something issued an optimize that took 1023720ms? Any ideas? Thanks in advance. Andy Jul 12, 2012 12:21:14 PM
Re: Grouping performance problem
This is strange. We have data folder size 24Gb, RAM for java 2GB. We query with grouping, ngroups and highlighting, do not query all fields and query time mostly is less than 1 sec it rarely goes up to 2 sec. We use solr 3.6 and tuned off all kind of caching. Maybe your problem is with caching and displaying all fields? Hope this may help. Alex. -Original Message- From: Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl To: solr-user solr-user@lucene.apache.org Sent: Mon, Jul 16, 2012 10:04 am Subject: Re: Grouping performance problem I have server with 24GB RAM. I have 4 shards on it, each of them with 4GB RAM for java: JAVA_OPTIONS=-server -Xms4096M -Xmx4096M The size is about 15GB for one shard (i use ssd disk for index data). Agnieszka 2012/7/16 alx...@aim.com What are the RAM of your server and size of the data folder? -Original Message- From: Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl To: solr-user solr-user@lucene.apache.org Sent: Mon, Jul 16, 2012 6:16 am Subject: Re: Grouping performance problem Hi Pavel, I tried with group.ngroups=false but didn't notice a big improvement. The times were still about 4000 ms. It doesn't solve my problem. Maybe this is because of my index type. I have millions of documents but only about 20 000 groups. Cheers Agnieszka 2012/7/16 Pavel Goncharik pavel.goncha...@gmail.com Hi Agnieszka , if you don't need number of groups, you can try leaving out group.ngroups=true param. In this case Solr apparently skips calculating all groups and delivers results much faster. At least for our application the difference in performance with/without group.ngroups=true is significant (have to say, we use Solr 3.6). WBR, Pavel On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl wrote: Hi, Is the any way to make grouping searches more efficient? My queries look like: /select?q=querygroup=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 For index with 3 mln documents query for all docs with group=true takes almost 4000ms. Because queryResultCache is not used next queries take a long time also. When I remove group=true and leave only faceting the query for all docs takes much more less time: for first time ~ 700ms and next runs only 200ms because of queryResultCache being used. So with group=true the query is about 20 time slower than without it. Is it possible or is there any way to improve performance with grouping? My application needs grouping feature and all of the queries use it but the performance of them is to low for production use. I use Solr 4.x from trunk Agnieszka Kukalowicz
Re: Wildcard query vs facet.prefix for autocomplete?
Maybe try EdgeNgramFilterFactory http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/#solr.EdgeNGramFilterFactory On Mon, Jul 16, 2012 at 6:57 AM, santamaria2 aravinda@contify.comwrote: I'm about to implement an autocomplete mechanism for my search box. I've read about some of the common approaches, but I have a question about wildcard query vs facet.prefix. Say I want autocomplete for a title: 'Shadows of the Damned'. I want this to appear as a suggestion if I type 'sha' or 'dam' or 'the'. I don't care that it won't appear if I type 'hadows'. While indexing, I'd use a whitespace tokenizer and a lowercase filter to store that title in the index. Now I'm thinking two approaches for 'dam' typed in the search box: 1) q=title:dam* 2) q=*:*facet=onfacet.field=titlefacet.prefix=dam So any reason that I should favour one over the other? Speed a factor? The index has around 200,000 items. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Metadata and FullText, indexed at different times - looking for best approach
Thank you, I am already on 4alpha. Patch feels a little too unstable for my needs/familiarity with the codes. What about something around multiple cores? Could I have full-text fields stored in a separate cores and somehow (again, minimum hand-coding) do search against all those cores and get back combined list of document IDs? Or would it making comparative ranking/sorting impossible? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Jul 15, 2012 at 12:08 PM, Erick Erickson erickerick...@gmail.com wrote: You've got a couple of choices. There's a new patch in town https://issues.apache.org/jira/browse/SOLR-139 that allows you to update individual fields in a doc if (and only if) all the fields in the original document were stored (actually, all the non-copy fields). So if you're storing (stored=true) all your metadata information, you can just update the document when the text becomes available assuming you know the uniqueKey when you update. Under the covers, this will find the old document, get all the fields, add the new fields to it, and re-index the whole thing. Otherwise, your fallback idea is a good one. Best Erick On Sat, Jul 14, 2012 at 11:05 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Hello, I have a database of metadata and I can inject it into SOLR with DIH just fine. But then, I also have the documents to extract full text from that I want to add to the same records as additional fields. I think DIH allows to run Tika at the ingestion time, but I may not have the full-text files at that point (they could arrive days later). I can match the file to the metadata by a file name matching a field name. What is the best approach to do that staggered indexing with minimum custom code? I guess my fallback position is a custom full-text indexer agent that re-adds the metadata fields when the file is being indexed. Is there anything better? I am a newbie using v4.0alpha of SOLR (and loving it). Thank you, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Solr 3.5 DIH delta-import replicating full index or Admin UI problem?
Hello. We are running Solr 3.5 multicore in master-slave mode. -Our delta-import looks like: /solr/core01/dataimport?command=delta-import*optimize=false* The size of the index in 1.18GB When delta-import is going on, on the slave admin UI 8983/solr/core01/admin/replication/index.jsp I can see the following output: Master http://solrmaster01.somedomain.com:8983/solr/core01/replication Latest Index Version:null, Generation: null Replicatable Index Version:1342183977587, Generation: 33 Poll Interval 00:00:60 Local Index Index Version: 1342183977585, Generation: 32 Location: /var/somedomain/solr/solrhome/core01/data/index Size: 1.18 GB Times Replicated Since Startup: 32 Previous Replication Done At: Mon Jul 16 17:08:58 GMT 2012 Config Files Replicated At: null Config Files Replicated: null Times Config Files Replicated Since Startup: null Next Replication Cycle At: Mon Jul 16 17:09:58 GMT 2012 Current Replication Status Start Time: Mon Jul 16 17:08:58 GMT 2012 Files Downloaded: 12 / 95 *Downloaded: 4.33 KB / 1.18 GB [0.0%]* *Downloading File: _1o.fdt, Downloaded: 510 bytes / 510 bytes [100.0%]* Time Elapsed: 22s, Estimated Time Remaining: 6266208s, Speed: 201 bytes/s - - Does Downloaded: 4.33 KB / *1.18 GB [0.0%] *means that the solr slave is going to download the whole 1.18GB? -I have been monitoring this and the replications takes less that a minute. And checking the files in the index directory on the slave, the timestamps are quite different, so apparently, the slave is not downloading the full index all the time. -Please, has anyone else seen the whole index size being shown as denominator of the Downloaded fraction? -Anything I may be doing wrong? -Also notice the Files Downloaded: 12 / 95. That bit never increase to 95 / 95 Our solrconfig looks like this: -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFilessolrconfig.xml,synonyms.txt,schema.xml,stopwords.txt,data-config.xml/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlsome-master-full-url/str str name=pollInterval00:00:60/str /lst /requestHandler -- Thanks. Arcadius. * *
Re: Wildcard query vs facet.prefix for autocomplete?
term component will be faster. like below: http://host:port/solr/terms?terms.fl=contentterms.prefix=sol -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995378.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lost answers?
Hello Michael, I will check the log, but today I think to another thing may be it's my program that it losts some requests. It's the first time where the download is so fast. With Jetty, it's a little bit slower so may be for this reason my program works fine. Do you think I can use Jetty for my prod' environment? I will have around 500 users / year with 10 000 requests by day max Le 16/07/2012 16:40, Michael Della Bitta a écrit : Hello, Bruno, No, 4 simultaneous requests should not be a problem. Have you checked the Tomcat logs or logged the data in the query response object to see if there are any clues to what the problem might be? Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Sun, Jul 15, 2012 at 2:10 PM, Bruno Mannina bmann...@free.fr wrote: I forgot: I do the request on the uniqueKey field, so each request gets one document Le 15/07/2012 14:11, Bruno Mannina a écrit : Dear Solr Users, I have a solr3.6 + Tomcat and I have a program that connect 4 http requests at the same time. I must do 1902 requests. I do several tests but each time it losts some requests: - sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs. With Jetty, I get always 1902 docs. As it's a dev' environment, I'm alone to test it. Is it a problem to do 4 requests at the same time for tomcat6? thanks for your info, Bruno
Re: Lost answers?
Hello Bruno, Jetty is a legitimate choice. I do, however, worry that you might be masking an underlying problem by making that choice, without a guarantee that it won't someday hurt you even if you use Jetty. A question: are you using a client to connect to Solr and issue your queries? Something like SolrJ, solr-php-client, rsolr, etc.? If not, you might find that someone has already done the work for you of making a durable client-side API for Solr, and achieve better results. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Mon, Jul 16, 2012 at 3:16 PM, Bruno Mannina bmann...@free.fr wrote: Hello Michael, I will check the log, but today I think to another thing may be it's my program that it losts some requests. It's the first time where the download is so fast. With Jetty, it's a little bit slower so may be for this reason my program works fine. Do you think I can use Jetty for my prod' environment? I will have around 500 users / year with 10 000 requests by day max Le 16/07/2012 16:40, Michael Della Bitta a écrit : Hello, Bruno, No, 4 simultaneous requests should not be a problem. Have you checked the Tomcat logs or logged the data in the query response object to see if there are any clues to what the problem might be? Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Sun, Jul 15, 2012 at 2:10 PM, Bruno Mannina bmann...@free.fr wrote: I forgot: I do the request on the uniqueKey field, so each request gets one document Le 15/07/2012 14:11, Bruno Mannina a écrit : Dear Solr Users, I have a solr3.6 + Tomcat and I have a program that connect 4 http requests at the same time. I must do 1902 requests. I do several tests but each time it losts some requests: - sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs. With Jetty, I get always 1902 docs. As it's a dev' environment, I'm alone to test it. Is it a problem to do 4 requests at the same time for tomcat6? thanks for your info, Bruno
Re: Query results vs. facets results
Erick Erickson wrote Ahhh, you need to look down another few lines. When you specify fq, there should be a section of the debug output like arr name=filter_queries . . . /arr where the array is the parsed form of the filter queries. I was thinking about comparing that with the parsed form of the q parameter in the non-filter case to see what insight one could gain from that. There is no filter_queries section because I do not use an fq in the first two queries. I use one in the combined query, for which you can see the output further below. Erick Erickson wrote But there's already one difference, when you use *, you get str name=parsedqueryID:*/str Is it possible that you have some documents that do NOT have an ID field? try *:* rather than just *. I'm guessing that your default search field is ID and you have some documents without an ID field. Not a good guess if ID is your uniqueKey though.. Try q=*:* -ID:* and see if you get 31 docs. All the entries have an ID, so q=*:* -ID:* yielded 0 results. The ID could appear multiple times, that is the reason behind grouping of results. Indeed, ID is the default search field. Erick Erickson wrote Also note that if you _have_ specified ID as your uniqueKey _but_ you didn't re-index afterwards (actually, I'd blow away the entire solrhome/data directory and restart) you may have stale data in there that allowed documents to exist that do not have uniqueKey fields. For Solr's unique id I use a fieldType name=uuid class=solr.UUIDField indexed=true / field (which, of course, has a different name than the default search ID), so it should not be a problem. I have re-indexed the data, and I get somewhat a different result. This is the query: http://localhost:8983/solr/db/select?indent=onversion=2.2q=*:*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=STR_ENTERPRISE_IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on And the results as well as the debug information: lst name=grouped lst name=ID int name=matches284/int int name=ngroups134/int arr name=groups/arr ... lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=CITY ... int name=MILTON89/int ... lst name=debug str name=rawquerystring*:*/str str name=querystring*:*/str str name=parsedqueryMatchAllDocsQuery(*:*)/str str name=parsedquery_toString*:*/str lst name=explain/lst str name=QParserLuceneQParser/str arr name=filter_queries str{!tag=dt}CITY:MILTON/str /arrarr name=parsed_filter_queries strCITY:MILTON/str /arr lst name=timing/lst /lst So now fq says: 134 groups with CITY:MILTON and faceted search says: 83 groups with CITY:MILTON. How can I see some information about the grouping in Solr? Thanks Erick! Regards, Tudor -- View this message in context: http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995388.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SOLR 4 Alpha Out Of Mem Err
That suggests you're running out of threads Michael, Thanks for this useful observation. What I found just prior to the problem situation was literally thousands of threads in the server JVM. I have pasted a few samples below obtained from the admin GUI. I spent some time today using this barometer, but I don't have enough to share right now. I'm looking at the difference between ConcurrentUpdateSolrServer and HttpSolrServer and how my client may be misusing them. I'll assume my client is misbehaving and driving the server crazy for now. If I figure out how, I will share it so perhaps a safe guard can be put in place. Nick Server threads - very roughly 0.1 %: cmdDistribExecutor-9-thread-7161 (10096) java.util.concurrent.SynchronousQueue$TransferStack@17b90c55 . sun.misc.Unsafe.park(Native Method) . java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) . java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Synchronous Queue.java:424) . java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueu e.java:323) . java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:874) . java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:945) . java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 07) . java.lang.Thread.run(Thread.java:662) -0.ms -0.ms cmdDistribExecutor-9-thread-7160 (10086) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5509b5 6 . sun.misc.Unsafe.park(Native Method) . java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) . java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await( AbstractQueuedSynchronizer.java:1987) . org.apache.http.impl.conn.tsccm.WaitingThread.await(WaitingThread.java:158) . org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByR oute.java:403) . org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRou te.java:300) . org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection( ThreadSafeClientConnManager.java:224) . org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDir ector.java:401) . org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja va:820) . org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja va:754) . org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja va:732) . org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java :351) . org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java :182) . org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:325 ) . org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306 ) . java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) . java.util.concurrent.FutureTask.run(FutureTask.java:138) . java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) . java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) . java.util.concurrent.FutureTask.run(FutureTask.java:138) . java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) . java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) . java.lang.Thread.run(Thread.java:662) 20.ms 20.ms cmdDistribExecutor-9-thread-7159 (10085) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6f062d d3 . sun.misc.Unsafe.park(Native Method) . java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) . java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await( AbstractQueuedSynchronizer.java:1987) . org.apache.http.impl.conn.tsccm.WaitingThread.await(WaitingThread.java:158) . org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByR oute.java:403) . org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRou te.java:300) . org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection( ThreadSafeClientConnManager.java:224) . org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDir ector.java:401) . org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja va:820) . org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja va:754) . org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja va:732) . org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java :351) . org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java :182) . org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:325 ) . org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306 ) . java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) . java.util.concurrent.FutureTask.run(FutureTask.java:138) .
Re: Mmap
Any thought on this? Is the default Mmap? Sent from my mobile device 720-256-8076 On Feb 14, 2012, at 7:16 AM, Bill Bell billnb...@gmail.com wrote: Does someone have an example of using unmap in 3.5 and chunksize? I am using Solr 3.5. I noticed in solrconfig.xml: directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ I don't see this parameter taking.. When I set -Dsolr.directoryFactory=solr.MMapDirectoryFactory How do I see the setting in the log or in stats.jsp ? I cannot find a place that indicates it is set or not. I would assume StandardDirectoryFactory is being used but I do see (when I set it or NOT set it) Bill Bell Sent from mobile
Using Solr 3.4 running on tomcat7 - very slow search
Hi, Our index is divided into two shards and each of them has 120M docs , total size 75G in each core. The server is a pretty good one , jvm is given memory of 70G and about same is left for OS (SLES 11) . We use all dynamic fields except th eunique id and are using long queries but almost all of them are filter queires, Each query may have 10 -30 fq parameters. When I tested the index ( same size) but with max heap size 40 G, queries were blazing fast. I used solrmeter to load test and it was happily serving 12000 queries or more per min with avg 65 ms qtime.We had an excellent filtercache hit ratio. This index is only used for searching and being replicated every 7 sec from the master. But now in production server it is horribly slow and taking 5 mins(qtime) to return a query ( same query). What could go wrong? Really appreciate your suggestions on debugging this thing.. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html Sent from the Solr - User mailing list archive at Nabble.com.
How to setup SimpleFSDirectoryFactory
We all know that MMapDirectory is fastest. However we cannot always use it since you might run out of memory on large indexes right? Here is how I got iSimpleFSDirectoryFactory to work. Just set -Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory. Your solrconfig.xml: directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ You can check it with http://localhost:8983/solr/admin/stats.jsp Notice that the default for Windows 64bit is MMapDirectory. Else NIOFSDirectory except for WIndows It would be nicer if we just set it all up with a helper in solrconfig.xml... if (Constants.WINDOWS) { if (MMapDirectory.UNMAP_SUPPORTED Constants.JRE_IS_64BIT) return new MMapDirectory(path, lockFactory); else return new SimpleFSDirectory(path, lockFactory); } else { return new NIOFSDirectory(path, lockFactory); } } -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Mmap
Yep. -Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory or -Dsolr.directoryFactory=solr.MMapDirectoryFactory works great. On Mon, Jul 16, 2012 at 7:55 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi Bill, Standard picks one for you. Otherwise, you can hardcode the DirectoryFactory in your config file, or I believe if you specify -Dsolr.solr.directoryFactory=solr.MMapDirectoryFactory That will get you what you want. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Mon, Jul 16, 2012 at 9:32 PM, Bill Bell billnb...@gmail.com wrote: Any thought on this? Is the default Mmap? Sent from my mobile device 720-256-8076 On Feb 14, 2012, at 7:16 AM, Bill Bell billnb...@gmail.com wrote: Does someone have an example of using unmap in 3.5 and chunksize? I am using Solr 3.5. I noticed in solrconfig.xml: directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ I don't see this parameter taking.. When I set -Dsolr.directoryFactory=solr.MMapDirectoryFactory How do I see the setting in the log or in stats.jsp ? I cannot find a place that indicates it is set or not. I would assume StandardDirectoryFactory is being used but I do see (when I set it or NOT set it) Bill Bell Sent from mobile -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Using Solr 3.4 running on tomcat7 - very slow search
Thanks Brian. Excellent suggestion. I haven't used VisualVM before but I am going to use it to see where CPU is going. I saw that CPU is overly used. I haven't seen so much CPU use in testing. Although I think GC is not a problem, splitting the jvm per shard would be a good idea. On Mon, Jul 16, 2012 at 9:44 PM, Bryan Loofbourrow [via Lucene] ml-node+s472066n3995446...@n3.nabble.com wrote: 5 min is ridiculously long for a query that used to take 65ms. That ought to be a great clue. The only two things I've seen that could cause that are thrashing, or GC. Hard to see how it could be thrashing, given your hardware, so I'd initially suspect GC. Aim VisualVM at the JVM. It shows how much CPU goes to GC over time, in a nice blue line. And if it's not GC, try out its Sampler tab, and see where the CPU is spending its time. FWIW, when asked at what point one would want to split JVMs and shard, on the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC cost reasons. You're way above that. Maybe multiple JVMs and sharding, even on the same machine, would serve you better than a monster 70GB JVM. -- Bryan -Original Message- From: Mou [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=3995446i=0] Sent: Monday, July 16, 2012 7:43 PM To: [hidden email]http://user/SendEmail.jtp?type=nodenode=3995446i=1 Subject: Using Solr 3.4 running on tomcat7 - very slow search Hi, Our index is divided into two shards and each of them has 120M docs , total size 75G in each core. The server is a pretty good one , jvm is given memory of 70G and about same is left for OS (SLES 11) . We use all dynamic fields except th eunique id and are using long queries but almost all of them are filter queires, Each query may have 10 -30 fq parameters. When I tested the index ( same size) but with max heap size 40 G, queries were blazing fast. I used solrmeter to load test and it was happily serving 12000 queries or more per min with avg 65 ms qtime.We had an excellent filtercache hit ratio. This index is only used for searching and being replicated every 7 sec from the master. But now in production server it is horribly slow and taking 5 mins(qtime) to return a query ( same query). What could go wrong? Really appreciate your suggestions on debugging this thing.. -- View this message in context: http://lucene.472066.n3.nabble.com/Using- Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995446.html To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995449.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Using Solr 3.4 running on tomcat7 - very slow search
Another thing you may wish to ponder is this blog entry from Mike McCandless: http://blog.mikemccandless.com/2011/04/just-say-no-to-swapping.html In it, he discusses the poor interaction between OS swapping, and long-neglected allocations in a JVM. You're on Linux, which has decent control over swapping decisions, so you may find that a tweak is in order, especially if you can discover evidence that the hard drive is being worked hard during GC. If the problem exists, it might be especially pronounced in your large JVM. I have no direct evidence of thrashing during GC (I am not sure how to go about gathering such evidence), but I have seen, on a Windows machine, a Tomcat running Solr refuse to shut down for many minutes, while a Resource Monitor session reports that that same Tomcat process is frantically reading from the page file the whole time. So there is something besides plausibility to the idea. -- Bryan -Original Message- From: Mou [mailto:mouna...@gmail.com] Sent: Monday, July 16, 2012 9:09 PM To: solr-user@lucene.apache.org Subject: Re: Using Solr 3.4 running on tomcat7 - very slow search Thanks Brian. Excellent suggestion. I haven't used VisualVM before but I am going to use it to see where CPU is going. I saw that CPU is overly used. I haven't seen so much CPU use in testing. Although I think GC is not a problem, splitting the jvm per shard would be a good idea. On Mon, Jul 16, 2012 at 9:44 PM, Bryan Loofbourrow [via Lucene] ml-node+s472066n3995446...@n3.nabble.com wrote: 5 min is ridiculously long for a query that used to take 65ms. That ought to be a great clue. The only two things I've seen that could cause that are thrashing, or GC. Hard to see how it could be thrashing, given your hardware, so I'd initially suspect GC. Aim VisualVM at the JVM. It shows how much CPU goes to GC over time, in a nice blue line. And if it's not GC, try out its Sampler tab, and see where the CPU is spending its time. FWIW, when asked at what point one would want to split JVMs and shard, on the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC cost reasons. You're way above that. Maybe multiple JVMs and sharding, even on the same machine, would serve you better than a monster 70GB JVM. -- Bryan -Original Message- From: Mou [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=3995446i=0] Sent: Monday, July 16, 2012 7:43 PM To: [hidden email]http://user/SendEmail.jtp?type=nodenode=3995446i=1 Subject: Using Solr 3.4 running on tomcat7 - very slow search Hi, Our index is divided into two shards and each of them has 120M docs , total size 75G in each core. The server is a pretty good one , jvm is given memory of 70G and about same is left for OS (SLES 11) . We use all dynamic fields except th eunique id and are using long queries but almost all of them are filter queires, Each query may have 10 -30 fq parameters. When I tested the index ( same size) but with max heap size 40 G, queries were blazing fast. I used solrmeter to load test and it was happily serving 12000 queries or more per min with avg 65 ms qtime.We had an excellent filtercache hit ratio. This index is only used for searching and being replicated every 7 sec from the master. But now in production server it is horribly slow and taking 5 mins(qtime) to return a query ( same query). What could go wrong? Really appreciate your suggestions on debugging this thing.. -- View this message in context: http://lucene.472066.n3.nabble.com/Using- Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7- very-slow-search-tp3995436p3995446.html To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=uns ubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg 1MTA5MTUw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=mac ro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespace s.BasicNamespace-nabble.view.web.template.NabbleNamespace- nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21na bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml- send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Using- Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995449.html Sent from the Solr - User mailing list archive at Nabble.com.