Re: how to make sure a particular query is ALWAYS cached
seperating requests over 2 ports is a nice solution when having multiple user-types. I like that althuigh I don't think i need it for this case. I'm just going to go the 'normal' caching-route and see where that takes me, instead of thinking it can't be done upfront :-) Thanks! hossman wrote: : Although I haven't tried yet, I can't imagine that this request returns in : sub-zero seconds, which is what I want (having a index of about 1M docs with : 6000 fields/ doc and about 10 complex facetqueries / request). i wouldn't neccessarily assume that :) If you have a request handler which does a query with a facet.field, and then does a followup query for the top N constraings in that facet.field, the time needed to execute that handler on a cold index should primarily depend on the faceting aspect and how many unique terms there are in that field. try it and see. : The navigation-pages are pretty important for, eh well navigation ;-) and : although I can rely on frequent access of these pages most of the time, it : is not guarenteed (so neither is the caching) if i were in your shoes: i wouldn't worry about it. i would setup cold cache warming of the important queries using a firstSearcher event listener, i would setup autowarming on the caches, i would setup explicit warming of queries using sort fields i care about in a newSearcher event listener, andi would make sure to tune my caches so that they were big enough to contain a much larger number of entries then are used by my custom request handler for the queris i care about (especially if my index only changed a few times a day, the caches become a huge win in that case, so throw everything you've got at them) and for the record: i've been in your shoes. From a purely theoretical standpoint: if enough other requests are coming in fast enough to expunge the objects used by your important navigation pages from the caches ... then those pages aren't that important (at least not to your end users as an aggregate) on the other hand: if you've got discreet pools of users (like say: customers who do searches, vs your boss who thiks navigation pages are really important) then another appraoch is to have to ports searching queries -- one that you send your navigation type queries to (with the caches tuned appropriately) and one that you send other traffic to (with caches tuned appropriately) ... i do that for one major index, it makes a lot of sense when you have very distinct usage profiles and you want to get the most bang for your buck cache wise. : #1 wouldn't really accomplish what you want without #2 as well. : regarding #1. : Wouldn't making a user-cache for the sole-purpose of storing these queries : be enough? I could then reference this user-cache by name, and extract the only if you also write a custom request handler ... that was my point before it was clear that you were already doing that no matter what (you had custom request handler listed in #2) you could definitely make sure to explicitly put all of your DocLists in your own usercache, that will certainly work. but frankly, based on what you've described about your use case, and how often your data cahnges, it would probably be easier to set up a layer of caching in front of Solr (since you are concerned with ensuring *all* of the date for these important pages gets cached) ... something like an HTTP reverse proxy cache (aka: acelerator proxy) would help you ensure that thes whole pages were getting cached. i've never tried it, but in theory: you could even setup a newSearcher event listener to trigger a little script to ping your proxy with a request thatforced it to revalidate the query when your index changes. -Hoss -- View this message in context: http://www.nabble.com/how-to-make-sure-a-particular-query-is-ALWAYS-cached-tf4566711.html#a13110514 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr deployment in tomcat
Hi, Here's what I've got (multiplesolr instance within the same tomcat server) In /var/tomcat/conf/Catalina/localhost/ For an instance 'foo' : foo.xml : Context path=foo docBase=/var/tomcat/solrapp/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/var/solr/foo/ override=true / /Context /var/tomcat/solrapp/solr.war is the path to the solr war file. It can be anywhere on the disk. /var/solr/foo/ is the solr home for this instance (where you'll put your schema.xml , solrconfig.xml etc.. ) . Restart tomcat and you should see your foo app appear in your deployed apps. Jerome. On 10/9/07, Chris Laux [EMAIL PROTECTED] wrote: Hello Group, Does anyone able to deploy solr.war @ tomcat. I just tried to deploy it as per wiki and it gives bunch of exceptions and I dont think those exceptions have any relevance with the actual cause. I was wondering if there is any speciaf configuration needed? I had that very same problem while trying to set solr up with tomcat (and multiple instances). I have given up for now and am working with Jetty instead. Chris Laux -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Re: Solr deployment in tomcat
Jérôme Etévé wrote: [...] /var/solr/foo/ is the solr home for this instance (where you'll put your schema.xml , solrconfig.xml etc.. ) . Thanks for the input Jérôme, I gave it another try and discovered that what I was doing wrong was copying the solr/example/ directory to what you call /var/solr/foo/, while copying solr/example/solr/ is what works now. Maybe I should add a note to the Wiki... Chris
Re: Solr deployment in tomcat
On 10/9/07, Chris Laux [EMAIL PROTECTED] wrote: Jérôme Etévé wrote: [...] /var/solr/foo/ is the solr home for this instance (where you'll put your schema.xml , solrconfig.xml etc.. ) . Thanks for the input Jérôme, I gave it another try and discovered that what I was doing wrong was copying the solr/example/ directory to what you call /var/solr/foo/, while copying solr/example/solr/ is what works now. Maybe I should add a note to the Wiki... Sounds like a good idea ! Actually I remember struggling a bit to have multiple instance of solr in tomcat. -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Re: High-Availability deployment
Hi Hoss, Yes I know that, but I want to have a proper dummy backup (something that could be kept in a very controlled environment). I thought about using this approach (a slave just for this purpose), but if I'm using it just as a backup node there is no reason I don't use a proper backup structure (as I have all needed infra-structure in place for that). It's just an extra redundancy level as I'm going to have a Master/Slaves structure and the index is replicated amongst them anyway. Yes, I got it. I have implemented ways to re-index stuff in an incremental way so I can just re-index a slice of my content (based on dates or id's) which should be enough to keep my index up-to-date quickly after a possible disaster. Thank you for your considerations, Daniel On 8/10/07 18:29, Chris Hostetter [EMAIL PROTECTED] wrote: : I'm setting up a backup task to keep a copy of my master index, just to : avoid having to re-build my index from scratch. And other important issue is every slave is a backup of the master, so you don't usually need a seperate backup mechanism. re-building hte index is more about peace of mind when asking why did it crash? what did/didn't get writen the index before it crashed? -Hoss http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
problems with arabic search
Hello I’m a newbie to solr and I need ur help in developing an Arabic search engine using solr. I succeeded to build the index but failed searching it. I got that error when I submit a query like “محمد”. XML Parsing Error: mismatched tag. Expected: /HR. Location: http://localhost:8080/solrServlet/searchServlet?query=%D9%85%D8%AD%D9%85%D8%AFcmdSearch=Search%21 Line Number 1, Column 1260:htmlheadtitleApache Tomcat/6.0.13 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 400 - Query parsing error: Cannot parse '': '*' or '?' not allowed as first character in WildcardQuery/h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uQuery parsing error: Cannot parse '': '*' or '?' not allowed as first character in WildcardQuery/u/ppbdescription/b uThe request sent by the client was syntactically incorrect (Query parsing error: Cannot parse '': '*' or '?' not allowed as first character in WildcardQuery)./u/pHR size=1 noshade=noshadeh3Apache Tomcat/6.0.13/h3/body/html - The apache server URIEncoding, jsp and servlets encodings r all set to UTF-8 but no way. Thanks in advance Best regards, Heba Farouk Software Engineer Bibliotheca Alexandrina
RE: Availability Issues
Chris: We're using Jetty also, so I get the sense I'm looking at the wrong log file. On that note -- I've read that Jetty isn't the best servlet container to use in these situations, is that your experience? Dave -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 11:20 PM To: solr-user Subject: RE: Availability Issues : My logs don't look anything like that. They look like HTTP : requests. Am I looking in the wrong place? what servlet container are you using? every servlet container handles applications logs differently -- it's especially tricky becuse even the format can be changed, the examples i gave before are in the default format you get if you use the jetty setup in the solr example (which logs to stdout), but many servlet containers won't include that much detail by default (they typically leave out the classname and method name). there's also typically a setting that controls the verbosity -- so in some configurations only the SEVERE messages are logged and in others the INFO messages are logged ... you're going to want at least the INFO level to debug stuff. grep all the log files you can find for Solr home set to ... that's one of the first messages Solr logs. if you can find that, you'll find the other messages i was talking about. -Hoss
RE: Availability Issues
All: How can I break up my install onto more than one box? We've hit a learning curve here and we don't understand how best to proceed. Right now we have everything crammed onto one box because we don't know any better. So, how would you build it if you could? Here are the specs: a) the index needs to hold at least 25 million articles b) the index is constantly updated at a rate of 10,000 articles per minute c) we need to have faceted queries Again, real-world experience is preferred here over book knowledge. We've tried to read the docs and it's only made us more confused. TIA Dave W -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 3:42 PM To: solr-user@lucene.apache.org Subject: Re: Availability Issues On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: Do you see any requests that took a really long time to finish? The requests that take a long time to finish are just simple queries. And the same queries run at a later time come back much faster. Our logs contain 99% inserts and 1% queries. We are constantly adding documents to the index at a rate of 10,000 per minute, so the logs show mostly that. Oh, so you are using the same boxes for updating and querying? When you insert, are you using multiple threads? If so, how many? What is the full URL of those slow query requests? Do the slow requests start after a commit? Start with the thread dump. I bet it's multiple queries piling up around some synchronization points in lucene (sometimes caused by multiple threads generating the same big filter that isn't yet cached). What would be my next steps after that? I'm not sure I'd understand enough from the dump to make heads-or-tails of it. Can I share that here? Yes, post it here. Most likely a majority of the threads will be blocked somewhere deep in lucene code, and you will probably need help from people here to figure it out. -Yonik
Re: extending StandardRequestHandler gives ClassCastException
Are you compiling your custom request handler against the same version of Solr that you are deploying with? My hunch is that you're compiling against an older version. Erik On Oct 9, 2007, at 9:04 AM, Britske wrote: I'm trying to add a new requestHandler-plugin to Solr by extending StandardRequestHandler. However, when starting solr-server after configuration i get a ClassCastException: SEVERE: java.lang.ClassCastException: wrappt.solr.requesthandler.TopListRequestHandler cannot be cast to org.apache.solr.request.SolrRequestHandler at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java: 149) I can't get my head around what might be wrong, as I am extending org.apache.solr.handler.StandardRequestHandler which already implements org.apache.solr.request.SolrRequestHandler so it must be able to cast i figure. Anyone any ideas? below is the code / setup I used. My handler: --- package wrappt.solr.requesthandler; import org.apache.solr.handler.StandardRequestHandler; import org.apache.solr.request.SolrRequestHandler; public class TopListRequestHandler extends StandardRequestHandler implements SolrRequestHandler { //no code here (so it mimicks StandardRequestHandler) } -- configured in solrconfig as: requestHandler name=toplist class=wrappt.solr.requesthandler.TopListRequestHandler/ added this handler to a jar called: solrRequestHandler1.jar and added this jar along with apache-solr-nightly.jar to the \lib directory of my server. (It needs the last jar for resolving the StandardRequestHandler. Isnt this strange btw, because I thought that it would be resolved from solr.war automatically. ) general solr-info of the server: Solr Specification Version: 1.2.2007.10.07.08.05.52 Solr Implementation Version: nightly ${svnversion} - yonik - 2007-10-07 08:05:52 I double-checked that the included apache-solr-nightly.jar are the same version as the deployed server by getting the latest nightly build and getting the .jars and .war from it. Furthermore, I noticed that org.apache.solr.request.StandardRequestHandler is deprecated. Note that I'm extending org.apache.solr.handler.StandardRequestHandler. Is it possible that this has anything to do with it? with regards, Geert-Jan -- View this message in context: http://www.nabble.com/extending- StandardRequestHandler-gives-ClassCastException- tf4594102.html#a13115182 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr deployment in tomcat
It worked. Thanks a lot. I just updated value attrb of Environment tag of solr.xml. Maybe you should update wiki for Unix as well as Windows examples. Context path=solr docBase=C:/apache-solr-1.2.0/example/webapps/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=C:/apache-solr-1.2.0/example/solr override=true / /Context - Original Message From: Jérôme Etévé [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, October 9, 2007 6:49:38 AM Subject: Re: Solr deployment in tomcat On 10/9/07, Chris Laux [EMAIL PROTECTED] wrote: Jérôme Etévé wrote: [...] /var/solr/foo/ is the solr home for this instance (where you'll put your schema.xml , solrconfig.xml etc.. ) . Thanks for the input Jérôme, I gave it another try and discovered that what I was doing wrong was copying the solr/example/ directory to what you call /var/solr/foo/, while copying solr/example/solr/ is what works now. Maybe I should add a note to the Wiki... Sounds like a good idea ! Actually I remember struggling a bit to have multiple instance of solr in tomcat. -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/ Yahoo! oneSearch: Finally, mobile search that gives answers, not web links. http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC
indexing problem
Hi All, i m trying to index my data using post.jar and i get the following error titleError 500 /title /head bodyh2HTTP ERROR: 500/h2prename and value cannot both be empty java.lang.IllegalArgumentException: name and value cannot both be empty at org.apache.lucene.document.Field.lt;initgt;(Field.java:197) the only required field in my schema is identifier (i started with the default schema.xml and made my changes on that) How do i debug this? Is there a better way to index data? Best regards, Urvashi
Re: extending StandardRequestHandler gives ClassCastException
Yeah, I'm compiling with a reference to apache-solr-nightly.jar wich is from the same nightly builld (7 october 2007) as the apache.solr-nightly.war I'm deploying against. I include this same apache-solr-nightly.jar in the lib folder of my deployed server. It still seems odd that I have to include the jar, since the StandardRequestHandler should be picked up in the war right? Is this also a sign that there must be something wrong with the deployment? btw: I deployed by copying a directory which contains the example deployment, and swapped in the apache.solr-nightly.war in the 'webapps'-dir after renaming it to solr.war. This enables me to start the new server using: java -jar start.jar. I don't know if this is common practice or considered 'exotic', but it might just be causing the problem.. Anyway, after deploying the server picks up the correct war, as solr/admin shows the correct Solr Specification Version: 1.2.2007.10.07.08.05.52. other options? Geert-Jan Erik Hatcher wrote: Are you compiling your custom request handler against the same version of Solr that you are deploying with? My hunch is that you're compiling against an older version. Erik On Oct 9, 2007, at 9:04 AM, Britske wrote: I'm trying to add a new requestHandler-plugin to Solr by extending StandardRequestHandler. However, when starting solr-server after configuration i get a ClassCastException: SEVERE: java.lang.ClassCastException: wrappt.solr.requesthandler.TopListRequestHandler cannot be cast to org.apache.solr.request.SolrRequestHandler at org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java: 149) I can't get my head around what might be wrong, as I am extending org.apache.solr.handler.StandardRequestHandler which already implements org.apache.solr.request.SolrRequestHandler so it must be able to cast i figure. Anyone any ideas? below is the code / setup I used. My handler: --- package wrappt.solr.requesthandler; import org.apache.solr.handler.StandardRequestHandler; import org.apache.solr.request.SolrRequestHandler; public class TopListRequestHandler extends StandardRequestHandler implements SolrRequestHandler { //no code here (so it mimicks StandardRequestHandler) } -- configured in solrconfig as: requestHandler name=toplist class=wrappt.solr.requesthandler.TopListRequestHandler/ added this handler to a jar called: solrRequestHandler1.jar and added this jar along with apache-solr-nightly.jar to the \lib directory of my server. (It needs the last jar for resolving the StandardRequestHandler. Isnt this strange btw, because I thought that it would be resolved from solr.war automatically. ) general solr-info of the server: Solr Specification Version: 1.2.2007.10.07.08.05.52 Solr Implementation Version: nightly ${svnversion} - yonik - 2007-10-07 08:05:52 I double-checked that the included apache-solr-nightly.jar are the same version as the deployed server by getting the latest nightly build and getting the .jars and .war from it. Furthermore, I noticed that org.apache.solr.request.StandardRequestHandler is deprecated. Note that I'm extending org.apache.solr.handler.StandardRequestHandler. Is it possible that this has anything to do with it? with regards, Geert-Jan -- View this message in context: http://www.nabble.com/extending- StandardRequestHandler-gives-ClassCastException- tf4594102.html#a13115182 Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/extending-StandardRequestHandler-gives-ClassCastException-tf4594102.html#a13118296 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Availability Issues
The way I'd do it would be to buy more servers, set up Tomcat on each, and get SOLR replicating from your current machine to the others. Then, throw them all behind a load balancer, and there you go. You could also post your updates to every machine. Then you don't need to worry about getting replication running. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Oct 9, 2007, at 7:12 AM, David Whalen wrote: All: How can I break up my install onto more than one box? We've hit a learning curve here and we don't understand how best to proceed. Right now we have everything crammed onto one box because we don't know any better. So, how would you build it if you could? Here are the specs: a) the index needs to hold at least 25 million articles b) the index is constantly updated at a rate of 10,000 articles per minute c) we need to have faceted queries Again, real-world experience is preferred here over book knowledge. We've tried to read the docs and it's only made us more confused. TIA Dave W -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 3:42 PM To: solr-user@lucene.apache.org Subject: Re: Availability Issues On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: Do you see any requests that took a really long time to finish? The requests that take a long time to finish are just simple queries. And the same queries run at a later time come back much faster. Our logs contain 99% inserts and 1% queries. We are constantly adding documents to the index at a rate of 10,000 per minute, so the logs show mostly that. Oh, so you are using the same boxes for updating and querying? When you insert, are you using multiple threads? If so, how many? What is the full URL of those slow query requests? Do the slow requests start after a commit? Start with the thread dump. I bet it's multiple queries piling up around some synchronization points in lucene (sometimes caused by multiple threads generating the same big filter that isn't yet cached). What would be my next steps after that? I'm not sure I'd understand enough from the dump to make heads-or-tails of it. Can I share that here? Yes, post it here. Most likely a majority of the threads will be blocked somewhere deep in lucene code, and you will probably need help from people here to figure it out. -Yonik
Re: indexing problem
What is the XML you POSTed into Solr? It looks like somehow you've sent in a field with no name or value, though this is an error that probably should be caught higher up in Solr. Erik On Oct 9, 2007, at 11:06 AM, Urvashi Gadi wrote: Hi All, i m trying to index my data using post.jar and i get the following error titleError 500 /title /head bodyh2HTTP ERROR: 500/h2prename and value cannot both be empty java.lang.IllegalArgumentException: name and value cannot both be empty at org.apache.lucene.document.Field.lt;initgt;(Field.java: 197) the only required field in my schema is identifier (i started with the default schema.xml and made my changes on that) How do i debug this? Is there a better way to index data? Best regards, Urvashi
Facets and running out of Heap Space
Hi All. I run a faceted query against a very large index on a regular schedule. Every now and then the query throws an out of heap space error, and we're sunk. So, naturally we increased the heap size and things worked well for a while and then the errors would happen again. We've increased the initial heap size to 2.5GB and it's still happening. Is there anything we can do about this? Thanks in advance, Dave W
Re: indexing problem
is there a way to find out the line number in the xml file? the xml file i m using is quite large. On 10/9/07, Erik Hatcher [EMAIL PROTECTED] wrote: What is the XML you POSTed into Solr? It looks like somehow you've sent in a field with no name or value, though this is an error that probably should be caught higher up in Solr. Erik On Oct 9, 2007, at 11:06 AM, Urvashi Gadi wrote: Hi All, i m trying to index my data using post.jar and i get the following error titleError 500 /title /head bodyh2HTTP ERROR: 500/h2prename and value cannot both be empty java.lang.IllegalArgumentException: name and value cannot both be empty at org.apache.lucene.document.Field.lt;initgt;(Field.java: 197) the only required field in my schema is identifier (i started with the default schema.xml and made my changes on that) How do i debug this? Is there a better way to index data? Best regards, Urvashi
Re: Facets and running out of Heap Space
On 10/9/07, David Whalen [EMAIL PROTECTED] wrote: I run a faceted query against a very large index on a regular schedule. Every now and then the query throws an out of heap space error, and we're sunk. So, naturally we increased the heap size and things worked well for a while and then the errors would happen again. We've increased the initial heap size to 2.5GB and it's still happening. Is there anything we can do about this? Try facet.enum.cache.minDf param: http://wiki.apache.org/solr/SimpleFacetParameters -Yonik
RE: Facets and running out of Heap Space
Hi Yonik. According to the doc: This is only used during the term enumeration method of faceting (facet.field type faceting on multi-valued or full-text fields). What if I'm faceting on just a plain String field? It's not full-text, and I don't have multiValued set for it Dave -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 09, 2007 12:47 PM To: solr-user@lucene.apache.org Subject: Re: Facets and running out of Heap Space On 10/9/07, David Whalen [EMAIL PROTECTED] wrote: I run a faceted query against a very large index on a regular schedule. Every now and then the query throws an out of heap space error, and we're sunk. So, naturally we increased the heap size and things worked well for a while and then the errors would happen again. We've increased the initial heap size to 2.5GB and it's still happening. Is there anything we can do about this? Try facet.enum.cache.minDf param: http://wiki.apache.org/solr/SimpleFacetParameters -Yonik
Re: Availability Issues
I'm about to do a prototype deployment of Solr for a pretty high-volume site, and I've been following this thread with some interest. One thing I want to confirm: It's really possible for Solr to handle a constant stream of 10K updates/min (150 updates/sec) to a 25M-document index? I new Solr and Lucene were good, but that seems like a pretty tall order. From the responses I'm seeing to David Whalen's inquiries, it seems like people think that's possible. Thanks, Charlie On 10/9/07, Matthew Runo [EMAIL PROTECTED] wrote: The way I'd do it would be to buy more servers, set up Tomcat on each, and get SOLR replicating from your current machine to the others. Then, throw them all behind a load balancer, and there you go. You could also post your updates to every machine. Then you don't need to worry about getting replication running. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Oct 9, 2007, at 7:12 AM, David Whalen wrote: All: How can I break up my install onto more than one box? We've hit a learning curve here and we don't understand how best to proceed. Right now we have everything crammed onto one box because we don't know any better. So, how would you build it if you could? Here are the specs: a) the index needs to hold at least 25 million articles b) the index is constantly updated at a rate of 10,000 articles per minute c) we need to have faceted queries Again, real-world experience is preferred here over book knowledge. We've tried to read the docs and it's only made us more confused. TIA Dave W -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 3:42 PM To: solr-user@lucene.apache.org Subject: Re: Availability Issues On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: Do you see any requests that took a really long time to finish? The requests that take a long time to finish are just simple queries. And the same queries run at a later time come back much faster. Our logs contain 99% inserts and 1% queries. We are constantly adding documents to the index at a rate of 10,000 per minute, so the logs show mostly that. Oh, so you are using the same boxes for updating and querying? When you insert, are you using multiple threads? If so, how many? What is the full URL of those slow query requests? Do the slow requests start after a commit? Start with the thread dump. I bet it's multiple queries piling up around some synchronization points in lucene (sometimes caused by multiple threads generating the same big filter that isn't yet cached). What would be my next steps after that? I'm not sure I'd understand enough from the dump to make heads-or-tails of it. Can I share that here? Yes, post it here. Most likely a majority of the threads will be blocked somewhere deep in lucene code, and you will probably need help from people here to figure it out. -Yonik
Re: Availability Issues
When we are doing a reindex (1x a day), we post around 150-200 documents per second, on average. Our index is not as large though, about 200k docs. During this import, the search service (with faceted page navigation) remains available for front-end searches and performance does not noticeably change. You can see this install running at http://www.6pm.com, where SOLR is in use for every part of the navigation and search. I believe that a sustained load of 150+ posts per second is very possible. At that load though, it does make sense to consider multiple machines. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Oct 9, 2007, at 10:16 AM, Charles Hornberger wrote: I'm about to do a prototype deployment of Solr for a pretty high-volume site, and I've been following this thread with some interest. One thing I want to confirm: It's really possible for Solr to handle a constant stream of 10K updates/min (150 updates/sec) to a 25M-document index? I new Solr and Lucene were good, but that seems like a pretty tall order. From the responses I'm seeing to David Whalen's inquiries, it seems like people think that's possible. Thanks, Charlie On 10/9/07, Matthew Runo [EMAIL PROTECTED] wrote: The way I'd do it would be to buy more servers, set up Tomcat on each, and get SOLR replicating from your current machine to the others. Then, throw them all behind a load balancer, and there you go. You could also post your updates to every machine. Then you don't need to worry about getting replication running. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Oct 9, 2007, at 7:12 AM, David Whalen wrote: All: How can I break up my install onto more than one box? We've hit a learning curve here and we don't understand how best to proceed. Right now we have everything crammed onto one box because we don't know any better. So, how would you build it if you could? Here are the specs: a) the index needs to hold at least 25 million articles b) the index is constantly updated at a rate of 10,000 articles per minute c) we need to have faceted queries Again, real-world experience is preferred here over book knowledge. We've tried to read the docs and it's only made us more confused. TIA Dave W -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Monday, October 08, 2007 3:42 PM To: solr-user@lucene.apache.org Subject: Re: Availability Issues On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: Do you see any requests that took a really long time to finish? The requests that take a long time to finish are just simple queries. And the same queries run at a later time come back much faster. Our logs contain 99% inserts and 1% queries. We are constantly adding documents to the index at a rate of 10,000 per minute, so the logs show mostly that. Oh, so you are using the same boxes for updating and querying? When you insert, are you using multiple threads? If so, how many? What is the full URL of those slow query requests? Do the slow requests start after a commit? Start with the thread dump. I bet it's multiple queries piling up around some synchronization points in lucene (sometimes caused by multiple threads generating the same big filter that isn't yet cached). What would be my next steps after that? I'm not sure I'd understand enough from the dump to make heads-or-tails of it. Can I share that here? Yes, post it here. Most likely a majority of the threads will be blocked somewhere deep in lucene code, and you will probably need help from people here to figure it out. -Yonik
Re: extending StandardRequestHandler gives ClassCastException
It still seems odd that I have to include the jar, since the StandardRequestHandler should be picked up in the war right? Is this also a sign that there must be something wrong with the deployment? Note that in 1.3, the StandardRequestHandler was moved from o.a.s.request to o.a.s.handler: http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/request/StandardRequestHandler.java http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/StandardRequestHandler.java If you are subclassing StandardRequestHandler, make sure you are using a consistent versions ryan
Re: indexing problem
Does all your XML look like this sample here - http://wiki.apache.org/ solr/UpdateXmlMessages ?? Are you sending in any field elements without a name attribute or with a blank value? Erik On Oct 9, 2007, at 12:45 PM, Urvashi Gadi wrote: is there a way to find out the line number in the xml file? the xml file i m using is quite large. On 10/9/07, Erik Hatcher [EMAIL PROTECTED] wrote: What is the XML you POSTed into Solr? It looks like somehow you've sent in a field with no name or value, though this is an error that probably should be caught higher up in Solr. Erik On Oct 9, 2007, at 11:06 AM, Urvashi Gadi wrote: Hi All, i m trying to index my data using post.jar and i get the following error titleError 500 /title /head bodyh2HTTP ERROR: 500/h2prename and value cannot both be empty java.lang.IllegalArgumentException: name and value cannot both be empty at org.apache.lucene.document.Field.lt;initgt;(Field.java: 197) the only required field in my schema is identifier (i started with the default schema.xml and made my changes on that) How do i debug this? Is there a better way to index data? Best regards, Urvashi
Re: extending StandardRequestHandler gives ClassCastException
: SEVERE: java.lang.ClassCastException: : wrappt.solr.requesthandler.TopListRequestHandler cannot be cast to : org.apache.solr.request.SolrRequestHandler at : org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:149) : added this handler to a jar called: solrRequestHandler1.jar and added this : jar along with apache-solr-nightly.jar to the \lib directory of my server. : (It needs the last jar for resolving the StandardRequestHandler. Isnt this : strange btw, because I thought that it would be resolved from solr.war : automatically. ) classpaths are very very very tricky and anoying. i believe the problem you are seeing is that the SolrCore knows about the copy of StandardREquestHandler in the Classloader for your war, but because of where you put your custom request handler, the war's classloader is delegating up to it's parent (the containers class loader) to find it, at which point the containers class loader also needs to resolve StandardRequestHandler (hence you put apache-solr-nightly.jar in that lib so that classloader can find it) now the container classloader has resolved all of the classes it needs for Solr to finsh constructing your hanlder -- except that your handler doesn't extend the copy of StandardRequestHandler Solr knows about -- it extends one up in in the parent classloader. try creating a lib directory in your solrhome and putting your jar there ... make sure you get rid of your jar (and the solr-nightly jar) that you put in the containers main lib directory. they will cause you nothing but problems. if that *still* doesn't work, try unpacking the Solr war, and adding your class directly to it ... that *completeley* eliminates any possibility of classpath issues and will help identify if it's some other random problem (but it's a last resort since it makes upgrading later hard) http://wiki.apache.org/solr/SolrPlugins -Hoss
Re: Facets and running out of Heap Space
On 10/9/07, David Whalen [EMAIL PROTECTED] wrote: This is only used during the term enumeration method of faceting (facet.field type faceting on multi-valued or full-text fields). What if I'm faceting on just a plain String field? It's not full-text, and I don't have multiValued set for it Then you will be using the FieldCache counting method, and this param is not applicable :-) Are all your field that you facet on like this? The FieldCache entry might be taking up too much room, esp if the number of entries is high, and the entries are big. The requests themselves can take up a good chunk of memory temporarily (4 bytes * nValuesInField). You could try a memory profiling tool and see where all the memory is being taken up too. -Yonik
Re: extending StandardRequestHandler gives ClassCastException
Thanks, but I'm using the updated o.a.s.handler.StandardRequestHandler. I'm going to try on 1.2 instead to see if it changes things. Geert-Jan ryantxu wrote: It still seems odd that I have to include the jar, since the StandardRequestHandler should be picked up in the war right? Is this also a sign that there must be something wrong with the deployment? Note that in 1.3, the StandardRequestHandler was moved from o.a.s.request to o.a.s.handler: http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/request/StandardRequestHandler.java http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/StandardRequestHandler.java If you are subclassing StandardRequestHandler, make sure you are using a consistent versions ryan -- View this message in context: http://www.nabble.com/extending-StandardRequestHandler-gives-ClassCastException-tf4594102.html#a13121575 Sent from the Solr - User mailing list archive at Nabble.com.
RE: Availability Issues
: We're using Jetty also, so I get the sense I'm looking at the : wrong log file. if you are using the jetty configs that comes in the solr downloads, it writes all of the solr log messages to stdout (ie: when you run it on the commandline, the messages come to your terminal). i don't know off the top of my head how to configure Jetty to log application log messages to a specific file ... there may be jetty specific config options ofr controlling this, or jetty may expect you to explicitly set the system properties that tell the JVM default log manager what you wnat it to do... http://java.sun.com/j2se/1.5.0/docs/guide/logging/overview.html : On that note -- I've read that Jetty isn't the best servlet : container to use in these situations, is that your experience? i can't make any specific recommendations ... i use Resin because someone else at my work did some research and decided it's worth paying for. From what i've seen tomcat seems easier to configure then jetty and i had an easier time understanding it's docs, but i've never done any performance tests. -Hoss
Re: Facets and running out of Heap Space
: So, naturally we increased the heap size and things worked : well for a while and then the errors would happen again. : We've increased the initial heap size to 2.5GB and it's : still happening. is this the same 25,000,000 document index you mentioned before? 2.5GB of heap doesn't seem like much if you are also doing faceting ... even if you are faceting on an int field, there's going to be 95MB of FieldCache for that field, you said this was a string field, so it's going to be 95MB+however much space is needed for all the terms (presumably if you are faceting on this field every doc doesn't have a unique value, but even assuming a conservative 10% unique values of 10 characters each that's another ~50MB, so we're up to about 150MB of FieldCache to facet that field -- and we haven't even started talking about how big the index is itself (or how big the filterCache gets, or how many other fields you are faceting on) how big is your index on disk? are you faceting or sorting on other fields as well? what does the LukeReqeust Handler tell you about the # of distinct terms in each field that you facet on? -Hoss
solr tuple/tag store
Hello- I am running into some scaling performance problems with SQL that I hope a clever solr solution could fix. I've already gone through a bunch of loops, so I figure I should solicit advice before continuing to chase my tail. I have a bunch of things (100K-500K+) that are defined by a set of user tags. ryan says: (name=xxx, location=yyy, foo=[aaa,bbb,ccc]), and alison says (name:zzz, location=bbb) - this list is constantly updating, it is fed from automated crawlers and user generated content. The 'names' can be arbitrary, but 99% of them will be ~25 distinct names. My approach has been to build a repository of all the 'tags' and then as things come into that repository, I merge all the tags for that entry into a single 'flat' document and index it with solr. When my thing+tag count was small, a simple SQL table with a row for each tag works great: CREATE TABLE `my_tags` ( entryID varchar(40) NOT NULL, source varchar(40) NOT NULL, name varchar(40) NOT NULL, value TEXT NOT NULL, KEY( entryID ), KEY( source ) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; but as the row count gets big(2M+) this gets to be unusable. To make it tractable, I am now splitting the tags across a bunch of tables and pushing the per user name/value pairs into a single text field (stored with JSON) CREATE TABLE `my_tags_000` ( entryID varchar(40) NOT NULL, source varchar(40) NOT NULL, tags LONGTEXT NOT NULL, PRIMARY KEY( entryID, source ) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Then I pick what table that goes into using: Math.abs( id.hashCode() )%10 This works OK, but it is still slower then I would like. DB access is slow, and it also needs to search across the updating solr index, and that gets slow since it keeps reopening the searcher (autowarming is off!) S... I see a few paths and would love external feedback before banging my head on this longer. 1. Get help from someone who know more SQL then me and try to make a pure SQL approach work. This would need to work with 10M+ tags. Solr indexing is then a direct SQL - solr dump. 2. Figure out how to keep the base Tuple store in solr. I think this will require finishing up SOLR-139. This would keep the the core data in solr - so there is no good way to 'rebuild' the index. 3. something else? store input on disk? Any thoughts / pointers / nay-saying would be really helpful! thanks ryan
RE: Facets and running out of Heap Space
Make sure you have: requestHandler name=/admin/luke class=org.apache.solr.handler.admin.LukeRequestHandler / defined in solrconfig.xml What's the consequence of me changing the solrconfig.xml file? Doesn't that cause a restart of solr? for a large index, this can be very slow but the results are valuable. In what way? I'm still not clear on what this does for me -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 09, 2007 4:01 PM To: solr-user@lucene.apache.org Subject: Re: Facets and running out of Heap Space what does the LukeReqeust Handler tell you about the # of distinct terms in each field that you facet on? Where would I find that? check: http://wiki.apache.org/solr/LukeRequestHandler Make sure you have: requestHandler name=/admin/luke class=org.apache.solr.handler.admin.LukeRequestHandler / defined in solrconfig.xml for a large index, this can be very slow but the results are valuable. ryan
Re: Facets and running out of Heap Space
David Whalen wrote: Make sure you have: requestHandler name=/admin/luke class=org.apache.solr.handler.admin.LukeRequestHandler / defined in solrconfig.xml What's the consequence of me changing the solrconfig.xml file? Doesn't that cause a restart of solr? editing solrconfig.xml does *not* restart solr. But you need to restart solr to see any changes to solrconfig. for a large index, this can be very slow but the results are valuable. In what way? I'm still not clear on what this does for me It gives you all kinds of index statistics - that may or may not be useful in figuring out how big field caches will need to be. It is just a diagnostics tool, not a fix. ryan
Re: index size
Late reply on this but I just wanted to say thanks for the suggestions. I went through my whole schema and was storing things that didn't need to be stored and indexing a lot of things that didn't need to be indexed. Just completed a full reindex and it's a much more reasonable size now. Kevin On 8/20/07, Mike Klaas [EMAIL PROTECTED] wrote: On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote: Are there any tips on reducing the index size or what factors most impact index size? My index has 2.7 million documents and is 200 gigabytes and growing. Most documents are around 2-3kb and there are about 30 indexed fields. An ls -sh will tell you roughly where the the space is being occupied. There is something strange going on: 2.5kB * 2.7m is only 6GB, and I have trouble imagining where the 30-fold index size expansion is coming from. -Mike
Re: solr tuple/tag store
On Oct 9, 2007, at 3:14 PM, Ryan McKinley wrote: 2. Figure out how to keep the base Tuple store in solr. I think this will require finishing up SOLR-139. This would keep the the core data in solr - so there is no good way to 'rebuild' the index. With SOLR-139, cool stuff can be done to 'rebuild' an index actually. Obviously if your store is Solr you'll be using stored fields. So store the most basic stuff, and copyField things around. With SOLR-139, to rebuild an index you simply reconfigure the copyField settings and basically `touch` each document to reindex it. I did this with Collex recently as I refactored all of my old Collex tag architecture into SOLR-139. My tag design is nowhere near as scalable as the one you're after, I don't think. Yonik has some pretty prescient design ideas here: http://wiki.apache.org/solr/UserTagDesign Particularly interesting are the parts about leveraging intra Lucene Field matching capability (Phrase/SpanQuery possibilities are pretty neat) to reduce the number of fields. 3. something else? store input on disk? *gasp* Inconceivable! :) Erik
Re: solr tuple/tag store
Given that the tables are of type InnoDB, I think it's safe to assume that you're not planning to use MySQL full-text search (only supported on MyISAM tables). If you are not concerned about transactional integrity provided by InnoDB, perhaps you could try using MyISAM tables (although most people report speed improvements for insert operations (on relatively small data sets) rather than selects). Without seeing the actual queries that are slow, it's difficult to determine what the problem is. Have you tried using EXPLAIN ( http://dev.mysql.com/doc/refman/5.0/en/explain.html) to check if your query is using the table indexes effectively? Pieter On 10/10/2007, Lance Norskog [EMAIL PROTECTED] wrote: You did not give your queries. I assume that you are searching against the 'entryID' and updating the tag list. MySQL has a fulltext index. I assume this is a KWIC index but do not know. A fulltext index on entryID should be very very fast since single-record results are what Lucene does best. Lance
Re: solr tuple/tag store
You could just make a separate Lucene index with the document ID unique and with multiple tag values. Your schema would have the entryID as the unique field and multiple tag values per entryID. I just made a phrase-suggesting clone of the Spellchecker class that is almost exactly the same. It indexes multiple second words for each single first word. It was my first Lucene project and was very easy to code. Lance On 10/9/07, Pieter Berkel [EMAIL PROTECTED] wrote: Given that the tables are of type InnoDB, I think it's safe to assume that you're not planning to use MySQL full-text search (only supported on MyISAM tables). If you are not concerned about transactional integrity provided by InnoDB, perhaps you could try using MyISAM tables (although most people report speed improvements for insert operations (on relatively small data sets) rather than selects). Without seeing the actual queries that are slow, it's difficult to determine what the problem is. Have you tried using EXPLAIN ( http://dev.mysql.com/doc/refman/5.0/en/explain.html) to check if your query is using the table indexes effectively? Pieter On 10/10/2007, Lance Norskog [EMAIL PROTECTED] wrote: You did not give your queries. I assume that you are searching against the 'entryID' and updating the tag list. MySQL has a fulltext index. I assume this is a KWIC index but do not know. A fulltext index on entryID should be very very fast since single-record results are what Lucene does best. Lance
Re: Facets and running out of Heap Space
On 9-Oct-07, at 12:36 PM, David Whalen wrote: field name=id type=string indexed=true stored=true / field name=content_date type=date indexed=true stored=true / field name=media_type type=string indexed=true stored=true / field name=location type=string indexed=true stored=true / field name=country_code type=string indexed=true stored=true / field name=text type=text indexed=true stored=true multiValued=true / field name=content_source type=string indexed=true stored=true / field name=title type=string indexed=true stored=true / field name=site_id type=string indexed=true stored=true / field name=journalist_id type=string indexed=true stored=true / field name=blog_url type=string indexed=true stored=true / field name=created_date type=date indexed=true stored=true / I'm sure we could stop storing many of these columns, especially if someone told me that would make a big difference. I don't think that it would make a difference in memory consumption, but storage is certainly not necessary for faceting. Extra stored fields can slow down search if they are large (in terms of bytes), but don't really occupy extra memory, unless they are polluting the doc cache. Does 'text' need to be stored? what does the LukeReqeust Handler tell you about the # of distinct terms in each field that you facet on? Where would I find that? I could probably estimate that myself on a per-column basis. it ranges from 4 distinct values for media_type to 30-ish for location to 200-ish for country_code to almost 10,000 for site_id to almost 100,000 for journalist_id. Using the filter cache method on the things like media type and location; this will occupy ~2.3MB of memory _per unique value_, so it should be a net win for those (although quite close in space requirements for a 30-ary field on your index size). -Mike
Re: Index files not being deleted
So, this problem came up again. Now it only happens in a linux environment when searches are being conducted while an index is running. Does anything need to be closed on the searching side? AgentHubcap wrote: As it turns out I was modifying code that wasn't being run. Running an optimize after deleting did solve my problem. =) AgentHubcap wrote: I'm running 1.2. Acutally, i am doing an optimize after I delete the indexes. (twice, as I read there was an issue with the optimize). Do I need to close something manually? Here's my optimize code: private void optimize() throws IOException { UpdateHandler updateHandler = SolrCore.getSolrCore().getUpdateHandler(); CommitUpdateCommand commitcmd = new CommitUpdateCommand(false); commitcmd.optimize = true; updateHandler.commit(commitcmd); updateHandler.close(); } ryantxu wrote: - Delete all index files via a delete command make sure to optimize after deleting the docs -- optimize has lucene get rid of deleted files rather then appending them to the end of the index. what version of solr are you running? if you are running 1.3-dev deleting *:* is fast -- if you aren't using 1.3, i don't suggest moving there just for that though ryan -- View this message in context: http://www.nabble.com/Index-files-not-being-deleted-tf4512068.html#a13128043 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facets and running out of Heap Space
Using the filter cache method on the things like media type and location; this will occupy ~2.3MB of memory _per unique value_ Mike, how did you calculate that value? I'm trying to tune my caches, and any equations that could be used to determine some balanced settings would be extremely helpful. I'm in a memory limited environment, so I can't afford to throw a ton of cache at the problem. (I don't want to thread-jack, but I'm also wondering whether anyone has any notes on how to tune cache sizes for the filterCache, queryResultCache and documentCache). Thanks, Stu -Original Message- From: Mike Klaas [EMAIL PROTECTED] Sent: Tuesday, October 9, 2007 9:30pm To: solr-user@lucene.apache.org Subject: Re: Facets and running out of Heap Space On 9-Oct-07, at 12:36 PM, David Whalen wrote: (snip) I'm sure we could stop storing many of these columns, especially if someone told me that would make a big difference. I don't think that it would make a difference in memory consumption, but storage is certainly not necessary for faceting. Extra stored fields can slow down search if they are large (in terms of bytes), but don't really occupy extra memory, unless they are polluting the doc cache. Does 'text' need to be stored? what does the LukeReqeust Handler tell you about the # of distinct terms in each field that you facet on? Where would I find that? I could probably estimate that myself on a per-column basis. it ranges from 4 distinct values for media_type to 30-ish for location to 200-ish for country_code to almost 10,000 for site_id to almost 100,000 for journalist_id. Using the filter cache method on the things like media type and location; this will occupy ~2.3MB of memory _per unique value_, so it should be a net win for those (although quite close in space requirements for a 30-ary field on your index size). -Mike
Re: Facets and running out of Heap Space
On 9-Oct-07, at 7:53 PM, Stu Hood wrote: Using the filter cache method on the things like media type and location; this will occupy ~2.3MB of memory _per unique value_ Mike, how did you calculate that value? I'm trying to tune my caches, and any equations that could be used to determine some balanced settings would be extremely helpful. I'm in a memory limited environment, so I can't afford to throw a ton of cache at the problem. 8bits * 25m docs. Note that HashSet filters will be smaller (cardinality 3000). (I don't want to thread-jack, but I'm also wondering whether anyone has any notes on how to tune cache sizes for the filterCache, queryResultCache and documentCache). I'll give the usual Solr answer: it depends g. For me: The filterCache is the most important. I want my faceting filters to be there at all times, as well as the common fq's I throw at Solr. I have this bumped up to 4096 or so. The queryResultCache isn't too important. I'm mostly interested in keeping around a few recent queries since they tend to be reexecuted. There is generally not a whole lot of overlap, though, and I never page very far into the results (10 results over 100 slaves is more than I typically would ever need). Memory usage is quite low, though, so you might have success going nuts with this cache. docCache? Make sure this is set to at least maxResults*max concurrent queries, since the query processing sometimes assumes fetching a document earlier in the request will let us retrieve it for free later in the request from the cache. Other than that, it depends on your document usage overlap. It you have a set of documents needed for meta-data storage, it behooves you to make sure these are always cached. cheers, -Mike
Cache Memory Usage (was: Facets and running out of Heap Space)
Sorry... where do the unique values come into the equation? Also, you say that the queryResultCache memory usage is very low... how could this be when it is storing the same information as the filterCache, but with the addition of sorting? Your answers are very helpful, thanks! Stu Hood Webmail.us You manage your business. We'll manage your email.®
Re: proximity search not working in solr lucene
: I have installed solr lucene for my website: clickindia.com, but I am : unable to apply proximity search for the same over there. : : Please help me that how should I index solrconfig.xml schema.xml : after providing an option of proximity search. in order for us to help you, you're going to have to elaborate on what you've tried, and what results you get. there's nothing special you need to do in eather file to get proximity queries ... just use quotes. what do your query URLs look like? -Hoss
Re: problems with arabic search
FYI: you don't need to resend your question just because you didn't get a reply within a day, either people haven't had a chance to reply, or they don't know the answer. : XML Parsing Error: mismatched tag. Expected: /HR. : : Location: http://localhost:8080/solrServlet/searchServlet?query=%D9%85%D8%AD%D9%85%D8%AFcmdSearch=Search%21 this doesn't look like a query error .. and that doesn't look like a solr URL, this looks something you have in front of Solr. : /headbodyh1HTTP Status 400 - Query parsing error: Cannot parse : '': '*' or '?' not allowed as first character in that looks like a Solr error. i'm guessing that your app isn't dealing with the UTF8 correctly, something is substituting ? characters in place of any character it doesn't understand - and Solr thinks you are trying to do a wildcard query. have you tried querying solr directly (in your browser or using curl) for your arabic word? -Hoss
Re: index become bigger and the only way seems to add hardware, another way?
Here are some ways: Index less data, store fewer fields and less data, compress fields, change Lucene's the term index interval (default 128; increasing it will make your index a little bit smaller, but will slow down queries)... But in general, the more your index the more hw you'll need. I saw 1TB disks for ~$300 USD the other day. You are in China and this stuff is even cheaper there. Otis -- Lucene - Solr - Nutch - Consulting -- http://sematext.com/ - Original Message From: James liu [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, October 9, 2007 11:15:56 PM Subject: index become bigger and the only way seems to add hardware, another way? i just wanna know is it exist which can decrease index size,,not by increasing hardware or optimizing lucene params. -- regards jl
Re: Cache Memory Usage (was: Facets and running out of Heap Space)
On 9-Oct-07, at 8:28 PM, Stu Hood wrote: Sorry... where do the unique values come into the equation? Faceting. You should have a filterCache # unique values in all fields faceted-on (using the fieldCache method). Also, you say that the queryResultCache memory usage is very low... how could this be when it is storing the same information as the filterCache, but with the addition of sorting? Solr caches only the top N documents in the queryResultCache (boosted by queryResultWindowSize), which amounts to 40-odd ints, 40-odd float, and change. -Mike
Re: solr tuple/tag store
the most basic stuff, and copyField things around. With SOLR-139, to rebuild an index you simply reconfigure the copyField settings and basically `touch` each document to reindex it. had not thought of that... yes, that would work Yonik has some pretty prescient design ideas here: http://wiki.apache.org/solr/UserTagDesign Yonik is quite clever! this does not even involve bit operations. In the example: add to A10, field utag=erik#lucene // or erik lucene, single token add to A10, field user=erik // via copyField add to A10, field tag=lucene // via copyField I take it 'user' needs a fieldType that would only keep the first part of what is passed in, and 'tag' would be a different type (or params) that only keeps the later. To add a 'name', I guess the best approach is to use a dynamic field: utag_name=erik#lucene I'll give this a try. thanks ryan